27 Knowledge representation using NMRS and Probability
Bhushan Trivedi
Introduction
We have seen how predicate logic can be used to represent and reason from set of statements. An important skill of human expert is (which is most difficult to replicate in knowledge representation scheme) the ability to deal with imprecise, incomplete and uncertain information. Some statements represent a case where the truthfulness of the statement is not complete but partial. In some cases the truthfulness depends on a case and varies over a period of time. Sometimes the information is coming from a random domain and the AI problem solver has to deal with such information to reason from. To deal with such information some other techniques are designed to augment the knowledge representation scheme. One is an NMRS(non-monotonic reasoning system) which can alter the truthfulness of statements and manage inferences made from such statements. The other is based on probabilistic reasoning. The last one which we will see in this module is based on subjective probability using certainty factors.
Problems with predicate logic
Let us take some statements to learn the problems to represent them using predicate logic.
1. The atmosphere is hot now but will be cooler in the evening
2. It is a very well designed website.
3. The young politician is older than an old student.
4. If the cloths are heavier, rinse more
5. South Indians are generally soft spoken
6. You can safely assume that you can roam around in Ahmedbad even during night without any trouble.
7. If the patient has high fever and stomach ache, there is a suggestive evidence (70%) that he
has typhoid
8. More than 80% of road accident victims are pedestrians.
Can you see the problem in representing above statements? Let us discuss one after another.
1. The atmosphere is hot is true now but will no longer remain true in the evening. How one can do so in predicate logic? In predicate logic, once something is proven to be true, will always remain true. There is no provision to alter the truthfulness or falsehood of a statement in predicate logic. We need to have a system which can do so for representing such statements. In fact, when a true statement becomes false, all the other statements resting on this statement’s truthfulness will become false. We need a system which can handle this cascading effect.
2. The word “very well designed” is a personal perception and also unclear in its meaning. The word very requires some quantification if we want to have some sort of comparison or using that information for some purpose. For example, sentiment analysis demands such statements to be converted to some quantified expression. Such quantified expression also depends on who is talking. For example a normal user says so and an expert in web design says so are truly different things. One must need a mechanism where such information is kept in a form which can be processed accordingly.
3. This statement is actually an extension to the idea presented in statement 2. Young politician being older than an old teacher looks a controversy and thus a false statement in predicate logic form. In fact this is a case where the word Young has a different range of age while talking about politician and students. A student of the age 27 is considered old even in graduate schools but a politician of age 32 is considered young. Thus perception based meaning of word young is somehow to be expressed in the system and some type of reasoning must be provided to handle them.
4. This statement is also of the same type, heaviness of the cloth determines the rinse amount. This is again fuzzy word, quantification of which depends on who is making this statement. In fact this statement also introduces another point, relating two fuzzy sets. What do you mean by rinsing more? How much? How the heaviness of the cloths are related to rinsing value?
5. This statement is also difficult to be represented. How we can represent a general belief? We do not have any truthfulness associated with it, this is usually true, unless we have anything known to the contrary. It is not possible for the predicate logic system to represent this statement.
6. This statement is also an extension to the previous statement. The statement is true most of the time but there is a possibility of it being false.
7. This is a different type of statement where the conclusion is not hundred percent true. A truthfulness of anything other than 0 or 1 is impossible to be represented in predicate logic.
8. How can we represent the percentage of truth for this statement? This statement is true for 80% of victims but not for others. This seems similar to 7 but there is a difference. Statement 7 indicates a measurement from an expert which has little to do with statistics while this statement is derived using some statistical methods and have nothing to do with an expert opinion.
Non-monotonic Reasoning system
This is the system which can manage the case where some statements change their truthfulness. Let us take an example to understand.
1. It is sunny
2. If it is raining, one must use umbrella
3. If it is raining it is not sunny
4. If it is sunny, you can take goggles
5. Ramji is going out
Looking at above, if you ask a question, should Ramji use an umbrella? The answer is No. Now, we have a new fact coming in, (as it started raining now)
6. It is raining.
That will have an effect on the database as statements no longer remain consistent. Usually NMRS looks at such things and somehow conclude that 6 and 1 are contradiction to each other. That means, when 1 is in 6 is out and when 6 is in 1 must go out.
So we will add one more statement called contradiction.
7. Contradiction (1,6)
This contradiction statements are useful in the sense that it helps us that whenever one of the nodes is coming in, it will force the other node out. That is also logical to understand. When it starts raining, it is no longer sunny. Also when it is sunny, it is no longer raining. Thus, once the 7 is in, 1 is removed. Not only 1, if we have statements as follows
8. Ramji should wear goggles
9. Ramji should take umbrella
Another important impact on the database is that, statement 9 would be included and 8 is taken out. Why, when a true statement becomes untrue, all statements based on it must be removed from the database and when an untrue statement becomes true, those dependent statements makes an entry again.
The NMRS looks for the contradiction and resolve that contradiction by removing one of the contradicting statements. In this case, it prefers to remove 1 as it is older than 6. NMRS also removes all statements based on 1 being true (like 8). Now there is no contradicting set of statements in the database and the system may continue functioning like conventional predicate logic system.
Interestingly, we do not only have premises, which are always true in NMRS, we need to have other facts which are currently known to be true but may become otherwise at some other point of time. They are called default assumptions.
The basis for non-monotonic reasoning
Predicate logic is said to be monotonic, literally meaning as “moving in one direction only”. In monotonic reasoning system number of facts to be true only increases and never decrease. Predicate logic provides a very good basis for ascertaining or deriving truth from premises. There is no chance that we have contradiction of any sort.
Above example clearly indicate that such a monotonic representation does not really work with a very simple real world case. There are few problems in real world representations, which NMRS attempts to address.
1. The information is incomplete sometimes. This incomplete information might become complete at some other point of time.
2. Conditions vary continuously and thus the truthfulness of some statements change over the period of time.
3. While we do not have complete information, it is sometimes needed to have a good guess, which, if proven otherwise, should be reverted back.
4. Humans always have two types of information in their database, truths (like Ramji is man) with assumptions or temporary truths (like it is raining)
5. Most of the times, the decision making system assume many things, for example while hiring a car, we assume that the driver knows how to drive and also is aware of how to reach to the destination, the car is in working condition and the tyres are not punctured and so on. These things are assumed to be true unless we have some evidence to the contrary. Though these things are little different than described in 4, they require similar treatment as of them. For example, the raining example that we have seen, it is assumed by default that it is sunny. When we learn that it is raining, we inserted that fact in the database, found if there is a contradiction and make sure that the statements part of the contradicting set do not belong to the database at the same point of time. This is known as circumscription.
6. While we are dealing with such default assumptions, temporary beliefs generated from those assumptions (Ramji should wear goggles) also becomes dependent on that default assumption. As soon as the default assumption is found to be untrue, the system must make sure that all facts which depends on that default assumption also becomes untrue. For that, the NMRS must store information about such dependencies in addition to the statement themselves. The other requirement of dependency also exists. When something is true based on the fact that something else is assumed to be false, for example we may state that it is sunny unless we know that it is raining. It talks about negative dependency. Only when something is untrue, this statement is true.
7. At any given point of time, we must maintain all statements known to be true like conventional predicate logic. Additionally, we must also keep all statements which were once true but over a period of time became untrue. In NMRS parlance, these two set of information are known as In list and Out List. In list contains all information which is currently true while Out list contains all information that was once true but now not. Such information is kept for two reasons, first they might serve negative dependency. So whenever they are inserted in the In List, those statements depending negatively over this statement, can be removed. Second, at any given point of time, we may need to disprove something. If that information exists in the Out List, we can disprove it very easily. Predicate logic does not contain false statements and thus cannot do this.
NMRS Processing
Let us take an example to understand how NMRS process takes place. We will be describing a scene from a movie “Heena” where a character played by Rishi Kapoor accidently fall in the river and reached to Pakistan. When he found himself on a bank of a river, he needs to find way back home. Here is the subset of statements which might be generated from a system designed to solve such problem. There are few statements which might look controversial but it is a contrived example to make things clear.
1. Rishi found himself on a bank of River
2. The river flows westwards.
3. Rishi found the place surrounded with large hills with canopy of snow also with a barren land.
4. Rishi belongs to Delhi
5. Rishi is in India
6. The only river flowing westwards in hilly surrounding and also barren land is Brahmaputra in India
7. Delhi is on the eastern side of the area where Brahmaputra is flowing. Now based on these facts, Rishi devices following
8. He should travel east (based on 4, 1,2,3,5,6,7)
Once he has made this decision, and started moving eastwards, he finds signboards written in Urdu. What should he do now? He might have few more facts to refer to in the database
9. The predominant language of the area he found himself in, is Urdu.
10. The Urdu is a predominant language of Pakistan
11. Brahmaputra is flowing through Asam
12. The predominant language of Asam is Asamese Looking at the database, he concludes following
13. He is in Pakistan (based on 9,10)
14. The river is not Brahmaputra but one of the rivers flowing from Kashmir to Pakistan (13,11,12)
What will that do? That will remove default assumption 5 (Rishi is in India) out, as it is contradicting with 13. 8 is also removed as it is based on 5. He might now decide to move westwards. So far so good! We stop now but you can continue playing this game and help Rishi reach home.
There are quite a few attempts to implement NMRS. Some of them are justification based, length based and Assumption based Truth maintenance systems. The NMRS is definitely has more power than conventional predicate logic representation, but not without a cost. The cost of maintaining and revising the database upon entry of a new fact or alteration of a truthfulness of a statement is enormous as it has cascading effect on a large database.
Uncertainty and related issues
In many cases we need to represent the statements using some numeric value to represent certainty rather than binary labelling them as true or false. For example 50% of population supports XYZ political party, 90% of Indians are mad after cricket, more than 80% believe in God and so on.
There are many ways in which uncertainty is introduced in the system. Here are few cases
1. When the actual domain is too large to be studied. For example if we want to study motivational preferences of all IT employees, it is next to impossible. We might take a good sample and generalize our findings. This generalization process introduces uncertainty. For example we may conclude that more than 70% young employees prefer to go abroad. The result that we get, 70%, may not be exactly matching with the results if we survey the complete population but we cannot do that. Usually probabilistic reasoning is applied in this case.
2. When the expert has some heuristic information about the domain. For example a doctor might conclude “There is a 70% chance of patients having fever in rainy season are suffering from Malaria”. If you carefully look at this problem, you can understand that this is not a probability problem. First, total probability of all diseases must be = 1. That means whenever we calculate probability we must carefully take the statistical measures and decide the value which makes total amount of probability always equal to 1. The doctor does not calculate the probability that way. This number is the quantitative measurement of doctor’s intuition and judgement. Second, in probability theory, only one belief can be true at any given point of time, a patient with more than one disease is very common. Third, in probability theory, we cannot have new additional beliefs (take an example of drawing a card from a pack of cards, we cannot have an event where a new type of card is introduced) Instead, it is quite possible to have new diseases and new variants of old diseases. Such systems are usually modelled using experts systems based on some mechanism which is not pure probability. One of such system is based on Certainty Factors or CF for short. These measures are sometimes denoted as subjective probabilities as compared to objective probabilities which are conventional probabilities based on statistical measurements.
3. Sometimes the data itself introduces uncertainty. For example dealing with some experiments we might receive data which we cannot 100% rely on. When some kind of chaining in the rule is provided there is more chance of having so. For example a security expert might conclude that if a user is accessing sensitive system files a few times, there is 80% likelihood of him being an attacker. When an attacker is trying to access system files there is 70% likelihood that he is trying some sort of denial of service attack. Now if theobserver observed a user accessing system files 4 times in a day, what is the probability that the system will experience denial of service? You can see that the word a few times introduces uncertainty. 4 or 5 or 6 whatever value the security expert believes to be a correct value is not clear. We cannot rely on our monitoring tools in this case as some of them may be compromised and give us incorrect values.
4. We do not have complete information. We have already seen an example in NMRS.
5. The domain itself is such that we get random inputs. For example a character recognition system might come across any character at random. Though it is possible to have some prediction based on aprior knowledge, context in which the character is used, spell checker’s output and so on but still it is random. Here also, it is possible to use probabilistic measures.
Statistical reasoning and Probability
Probability is one of the most common methods of dealing with uncertainty and studied in a very detailed way. There is solid mathematical basis for this theory and is widely used in many systems. Here is some basic information related to probability.
P(E) is probability of event E to occur in a random experiment. For example the probability of getting ace when we draw one card from the deck of cards.
The value of P(E) is calculated on the basis of statistical calculation. For example we have total 54 cards and there are 4 aces so the probability is calculated as 4/54. Similarly if we need to find the probability of getting dice value which is odd, we can calculate that as 3/6 (3 odd values out of total 6 possible). This way, we can have probability of any event between 0 and 1. A probability of 0 indicates an impossible event while a probability of 1 indicates a certain event. Sum of all probabilities of all possible outcomes must be equal to one.
Probability of two different, independent events occurring together can be calculated using the formula
P(A and B) = P(A) * P(B)
For example if we calculate probability for having an ace is 4/54 from a deck of cards. Similarly a random draw will fetch a black card with the probability 13/54 (which is basically ¼). Now if we want to calculate the probability of having a black ace is calculated as
P(CardIsAce and CardIsBlack) = P(CardIsAce) * P(CardIsBlack)
= 4/54 * ¼
= 1/54
Which, if you calculate using a fact that black ace is only one card out of the deck of 54, you can get 1/54 thus proving it to be right.
Bay’s formula
Bay’s formula is used for judging how much probable a given belief is for a given evidence. Many expert systems use Bay’s formula in their calculation. For example if we want to recognize a character. The character is one of the lowercase letters of English language. Thus there are 26 possible outcomes, a..z. There are few observations while we look at the image of the character for example a dot is observed or a horizontal or vertical line is present etc. The Bay’s formula gives us the amount of probability of some character when some evidence is present; for example what is the probability of A when horizontal line is present. There are some relations possible to be drawn in some cases like, the possibility of character ‘i’ present when the dot is observed as 0.5 as there are two such characters i and j. What is the possibility of a vertical line being present? What is the possibility of the dot being observed if character ‘i’ is certain? What is the probability of vertical line being present when i is observed? You can say 1 if we are recognizing printed character and something else if we are trying to recognize a handwritten character. In fact linguists have studied English language extensively and their research revealed the percentage of characters used. Looking at which one can even get a priory probability of a given character to some value. For example, characters like T and R are more often used than characters like X and Z. Based on all these information, Bay’s formula can be applied to identify the probability of a given character to either a or b or c or any other till z.
To understand the Bay’s formula, let us understand the notations used in that formula Bi is a specific belief (for example the character is a)
P(Bi) is the probability of belief Bi to be true (For example probability of character being a)
P(Bi/E) is the probability of belief Bi to be true given some event E (for example the probability of the character being i, when the dot is present or G when a horizontal line is present)
P(E/Bi) is the probability that the event E occurs when belief Bi is true (For example the probability of the dot present when the character is i)
K is total number of beliefs, in our case it is 26 (total number of outcomes) The bays formula is written as follows
The formula gives exact probability of some character to be present when some evidence is observed. Character recognition is a simple thing as compared to other domains like detecting an attack from packet contents. One of the Ph D students of the author is using bay’s formula for learning which attack is more likely given the features of the packet. The statistical results help segregating more probable attacks which are confirmed using other methods at later phases.
Certainty factors
One of the issues of Experts Systems design is that they cannot use objective probabilistic measures. For example a doctor cannot determine the probability of malaria in strict statistical sense as he has no way of having all the patients and their diseases. Some of the probability rules are not applied in his case. In probability, one of the outcomes must be true and only one of the outcome is true. For example when we draw a card from a deck of cards, we have one of the cards and only one of it. Unlike that, when a doctor examines a patient, he might conclude that the patient has either no disease or has more than one disease.
Another example is drawn from the domain of an intrusion detection system. When an IDS picks up a suspicious packet, the packet might not actually be carrying any attack related data (no attack) or might be carrying multiple attack data. A system which determines the confidence level of that packet being malicious (and so the process which generated that packet), requires using methods which allows this flexibility. One such system is to use certainty factors or CFs for short. While using CFs we do not stress either the probability value to be summed to one or only one outcome to be certain and so on.CF was introduced and used in one of the well-known expert systems of all times, Mycin. In fact we need to use such subjective probabilities for one more reason, we do not have much idea about conditional probabilities or a priory probabilities. Even if we have some idea about them, the other trouble is that such things keep on changing. For example in rainy season probability of some specific type of disease like typhoid increases, while in winter probability of diseases like swine flu increases. In the Intrusion detection case, typical setups have more chances of having typical attacks. For example educational institutes have more chances of script-kiddies (quite preliminary) type of attacks while a bank website has more probability of a targeted attack (which is much more sophisticated and executed by experts). Look at both of the following statements carefully.
- If the patient has fever, weakness and body ache, then there is 70% conclusive evidence that he has malaria
- If the packet contains source and destination addresses from some range, the packet uses unknown port number, and it has contents which is not possible to be decrypted than there is 80% conclusive evidence that the packet is malicious.
CFs are well used in many expert systems and have many things to discuss but this is not the place. You may refer to the references provided.
Summary
The predicate logic fails to represent default assumptions and incomplete information. The NMRS is designed to have premises as well as default assumptions and dealing with contradictions and then making sure the statements which inflicts contradiction are removed from database and thus keep the database consistently true. Thus NMRS is designed to deal with dynamically changing world. Probability based statistical reasoning helps reasoning with observed values and information coming from random samples. Bay’s formula is useful in many expert systems to determine the probability of a specific outcome from a set of outcomes based on information about some events occurring in the domain. The experts use subjective probability in many cases dealing with real world which does not fit into a probabilistic model.
you can view video on Knowledge representation using NMRS and Probability |