34 ES Development process-I
Bhushan Trivedi
Introduction
As expert systems are special type of systems, they need special treatment while development. Conventional system design is well studied and documented so we assume that the reader is well aware of the conventional system development process and challenges associated with it1. In this and the subsequent module we will try to address the additional problems associated with the development of the expert systems. The software engineering discipline has tried to address many challenges of software and project development, we will look at the differences and extensions to those processes in development of ES.
SE challenges
Software engineering has presented many methods to develop conventional systems. Waterfall, spiral and agile are few known methods. ES development is an extension to conventional system development process. ES development needs to address typical characteristics of ES along with normal software development requirements. One must take additional care while developing expert systems irrespective of whether we are developing a complete ES or an ES component of a conventional system. The ES development process is not as standardized as conventional systems. Many researchers are continuously working on improving the process and there are many alternate methods suggested. We will be looking some of the common features in due course.
Many things, including selection of experts for the domain, introduce rapid prototype development to boost the process of synchronization and quickly bring the project on track, development of ontology for correct interpretation of terms based on context, plan and schedule knowledge engineering process are special needs for ES and must be taken care of in the development process.
Thus the system design process includes many steps not present in conventional system. Selecting an expert for the domain is one such thing. If everything else is available including the finance, if we do not have right expert or the expert does not have time, the whole project comes to a standstill. Such a problem does not occur in conventional system as most people working in the department is aware of the functioning and can help the system built as per their need. Expert’s availability during the development process also is a critical factor. The conventional systems usually have two parties talking to each other, the customer and the designer, here the third party also is involved; i.e. the expert of the domain and thus the synchronization problem is much severe. Another important addition is the prototype based method. As the system is more abstract in nature, prototypes help develop the vision of the system and also help the architecture to be built as per satisfaction of all stakeholders.
Some researchers are proposing knowledge management as a super domain which includes ES development process while some others call it extension to AI while some others consider it an extension to a conventional system design process. In true sense it includes all of them. The KM (knowledge management) which is considered to be one of the evolving branches of computer science today, is about gathering and preserving the knowledge of experts. The organizations are always wary of the knowledge the experts gained working in the organization being lost when they leave the organization. Due to huge turnover in many industries, preserving knowledge about critical systems is
1 One good resource for learning about system development process is Roger Pressmen’s classic book on the subject.
lifeline for those industries. Knowledge management help these industries to retain such knowledge and make it accessible to others so whenever the expert is not around, the knowledge management process can guide them through. However, we will confine our discussion to ES development process and will not discuss the KM any further in this module.
Not only the system design process but the testing also needs to be extended in ES. Testers of conventional systems are usually lowly ranked than developers. They develop test cases and then test the systems once it is completely developed. In ES, it is the job of a much higher ranked person, usually an expert of the domain. They design elaborate test cases with extensive care. They cover all possible problems and entire range of possible errors, including boundary cases. The phase begins with experts who can evaluate this system and certify them. The expert who tests the system not only should have enough knowledge about the domain, but must also have information about the main expert who has designed the ES. His testing must be aligned with the main expert’s knowledge. For example the main expert might start with testing for typhoid and then malaria while the tester might design the test case which expects the program to test for malaria first, will generate an error which is not really needed.
As the expert is both busy and not much inclined to work with knowledge structure, it is the KE who is going to manage the knowledge. He write (code) routines for gathering information from the expert, he designs the data structure in which the information is to be filled, write code for manipulating that data structure, search and get related information and generate reports. For example storing intrusion related information, the KE decides to have some fields of packet headers and also some other information the packet contains in a suitable form. He should also design how these information is extracted from the packet being examined and copied to this structure once he designs this structure. Administrators generally require information like what is the source address of a malicious packet, which process generated it, on which machine this process is running, which user has initiated this process, is this malicious process listed as malicious in our database, and so on. To answer such queries, information about all these is to extracted and kept. KE designs this complete process, data structures and code accordingly. He also designs routines which can aggregate, summarise and correlate the information for administrator to look at. He also designs the structure in which the information is to be presented to admin and others, popularly known as reports.
Errors are likely to creep in due to incorrect pattern matching process based on false negatives and false positives. For example an innocuous packet may be incorrectly classified as a malicious packet. The KE has to provide means of conflict resolution (few observations indicate attack while few indicate otherwise), manage erroneous knowledge (signature used for a typical attack is wrong or changed later), incorrect rule ordering (the rules are ordered in a way that malicious packets are allowed in or good packets are disallowed), etc. Sometimes the rules are applied in a way that the resultant action is not as per the administrators wish, sometimes the errors are introduced due to knowledge mismatch between expert and KE. Sometimes the ES are developed for mission critical problems where the design must be more robust than conventional (for example the ES managing the missile launch). Such a system might require many checks and alternatives for continuing further when something goes wrong. Every such thing is to be handled in the expert system development process.
ES requires much more frequent maintenance due to the abstract nature of the process. The continuous maintenance is critical for correct operation, for example the weights set for CF may be revised if the results indicate so. If it is not done in time, the ES continues to generate incorrect results. This part is crucial because the ES is dealing with inexact knowledge and most stakeholders are not completely aware of the system when it is designed. Another issue is that the heuristics, unlike
correctly, it is likely to produce incorrect answers and need to be revised. The researchers used conventional water flow model, incremental model, spiral model and also devised a linear model and augment them for ES requirements.
ES Development steps
Let us briefly describe the steps for expert system development project. Let us reiterate the fact that we need to follow complete software development lifecycle and cannot ignore SE issues. As we are focusing on only ES development, we are not going to discuss those issues in the following. We will only address the part which ES development requires. They are as follows.
1. Feasibility: – Assess if the problem really requires expert system
2. Identification
3. Conceptualization
4. Formalization
5. Implementation
6. Testing and Evaluation
The first step indicates an important requirement, ES is not for all problems, neither is a panacea. If a problem is possible to be solved using a conventional system, one must do so. ES is a slow, imprecise and not guaranteed to succeed type of a system compared to conventional systems which seldom fails, usually does the job pretty quick and usually provides precise answers if designed clearly. One more issue is to see which part of the problem requires ES attention if the complete system does not demand ES solution. The remaining part of the ES can still be solved using conventional algorithms and conventional solutions. On the contrary if we somehow can solve the problem using conventional methods, such solutions are easier to implement, faster to execute and is usually more robust and trustworthy.
Let us reiterate that the problems that are ill structured and not well defined and are proven to be solved using human experts are candidates for ES development. Having claimed that, not all such problems are really ES problems. If the problem is really tough and requires elaborate expertize, may be the ES solution is not possible at all (rather there is no solution possible for such a problem). When we encounter a problem which cannot be solved using conventional methods, one of the biggest mistakes is to assume that ES solution is possible. There is an interesting method to test if a problem is a right candidate for an ES. It is called ‘telephone test’. There are experts and users who are connected by a telephone. User’s unstructured problems are explained to experts over phone and the experts respond back using the same. If users are able so solve their problems successfully using this method, it is quite possible that the ES can successfully solve this problem too. On the other hand if the user fails to clearly describe his problem, or the expert is unable to handle it over phone and requires personal visit and personal observations, the domain is harder for ES development and it is quite possible that ES cannot solve the problem to user’s satisfaction.
This simple heuristic is actually quite powerful as it tests if experts can solve the problem despite confined only to verbal input and also if users are capable to describe their problem precisely, both components need for building a successful expert system. The problems and domains for which this is not possible, they are not ES problems.
For expert system development model to succeed, there are three important requirements.
- Both the normal user (sometimes called the customer) as well as expert is involved throughout the development process, this is in contrast with conventional systems where the user provides specifications and usually does not get involved unless the first version is available for testing
- The process of rapid prototyping is encouraged, that means frequent demonstrations of the system-would-be is provided to help the experts and the user to envision how the system is going to function. This catches the issues early as in most cases nobody understands the expert system in the beginning clearly.
- Changes are encouraged in this phase. As it is easiest at this point of time to modify the system and the parameters associated with it.
Most researchers stresses the rapid prototype approach for development. The actual development process is basically presenting a prototype, accept or reject or modify it with user’s inputs, refine it and present it again. The prototype at any stage provides the basis for further development. Rapid prototyping helps the stakeholders learn how the system look like when completed. As expert’s time is most precious, this approach best utilize his time. This is an iterative model for development. A prototype is developed, user’s input is taken, the prototype is modified, again input is taken, and again prototype is modified and so on.
Once the problem is checked to be solved using ES model, we go deep and identify it more elaborately.
Identification
The first job is to identify the problem clearly. That includes surveying the problem, look at alternative methods to solve based on many factors and finally coming out with all required tasks to be completed. The process begins with problem survey.
This step is also known as problem selection. This is basically a requirement analysis process. This phase is divided into four different phases as depicted in figure 34.1. At the end of this phase, the problem is clearly learned for coding process. This process identifies the content which is to be developed using expert system development process. This does not mean that other part is not developed, but it requires other techniques. Mostly manual solutions for which expert solution is not possible. Whatever is left out, is still to be done by experts.
The problem survey does a rough estimate of the work to be done. The output usually is a list of things to be done, described in brief, usually in one line. This list is the platform over which the rest of the process is performed. The next step involves shortlisting tasks from the earlier list. We call it a tentative selection. This tentative selection is based on some basic criteria. Here is a typical list of queries which are posed against all the candidates and answers are sought.
- Is it really an expert job? Check if any traditional solution is possible to be used.
- Is the expert for this job is available and ready to contribute?
- What is the value addition if this job is done? If there are many experts in the field and more likely to increase in number, the automated solution might not be attractive. On the contrary, if the experts are rare and the knowledge is scarce, there is a huge value addition. It is also important to assess somehow the value addition in the context of the organization itself. That means if the problem solving is really worth for the organization or not is an important part. The value addition is also is to be checked for its longevity. ES development is a tough job and involves lot of time and effort. It is imperative that the solution remains important at least for the time period in which the investment is returned back.
- Look for different parameters like uncertainty in the data, need for judgemental knowledge, need for default assumptions, and the likelihood of dealing with incomplete information and so on to validate the choice as well as feasibility of actually solving it.
- Is it possible that the problem can be solved using interactive methods which is common for computer based systems using the known methods for interaction? The idea is to assess the feasibility of ES be able to actually do what experts are expected to. In the case where the expert is also required to exhibit physical skills (for example operating a patient), it is hard or almost impossible to be managed by an ES.
- Another important criteria is to see what the consequences of failure of the system is. Like manual experts, and due to the fact that the ES functions over incomplete and imprecise information, it is likely that the ES comes back with incorrect answers sometimes or find a suboptimal solutions sometimes. One must assess the openness of the normal user to accept this.
- ES development takes more time than it seems at the first look. All AI problems are harder than they appear and ES is no exception. One of the major challenges is the feasibility with respect to time. One must ask a question, when will the expert system likely to be completed? Do we have that much time?
- Do we have KE who can act as intermediary? Is the domain expert a legitimate authority? Does both of them have sufficient time for the development of the system and are they able to work in sync? Will they receive all administrative support for their work? Answers to all these questions must be sought.
- Is there sufficient reference material available for machine learning and continuing with the project when the expert is unavailable? More such material and more it is capable to machine learn, more feasible the ES development is. When the experts are available only intermittently, the machine learning process can be scheduled when experts are not available.
- Is it possible to sync between conventional system part and the ES part? Is it possible for the system to be designed in a way that the expert’s time is properly utilized? Is it clear how to communicate between conventional part and ES part of the system? Is it feasible to train the KE and make him start using the system in minimum possible time? Many systems today are designed using conventional languages and tools just because of this part2.
- Is it possible to provide the user interface to make sure the normal users are not discouraged to use the system? This critical part is often overlooked but the best ES can fail if the normal user is dissatisfied with the interface.
- Is it possible to add value using latest technological nuances, for example is it possible to store data over cloud so readily available to users? Is it possible to use IOT based devices and subsequently can a proper interface be designed for them? For example implantable medical devices (IMDs) can communicate with Medical ES and provide first-hand information about patient’s condition.
This phase is carried out to prune the list of candidates to ones that are really feasible and valuable. Once this list complete, the analysis part shall start. The analysis begins with assessment of applicability Assessment of applicability This step involves more detailed study of the candidates chosen. One must assess if this ES development process really requires scares expert reasoning. This can be done by fetching a few known factual data.
- Is it so that the difference between an expert and a normal practitioner is really huge? For example you need to have a real expert for 80% of complex jobs, most other practitioners can only handle rest 20%?
- The process of problem solving is so intricate that only a few experts are aware of. There is a dire need to document the processes which inherently are not formal or structured
- There is a likelihood that the knowledge may be lost if not preserved
- The experts spend most of them time assisting other practitioners in solving their problems. In short, true experts being scarce.
- Is it the case that more than one person has to get together to solve the problem as no one has complete problem solving expertize?
- The knowledge is either voluminous, comes from different sources in different forms, contain many varieties of forms, and changing. In this case, the experts might not have much problem with solution but managing data about the problem and getting right information at right time or getting the right analysis at right time.
2 Python is also a very popular platform for implementing ES solutions, especially machine learning related. We will study machine learning in the 36th
Point mentioned in point no. 6 requires some elaboration. The Big Data solutions help in this case. Interestingly another issue pointed out by some other researchers is veracity or truthfulness of the information coming in from such sources. When multiple sources indicate different things (for example opinion polls to test who is going to win the next election), it becomes harder to assess the truth. For example finding out why some people has cancer and why some other do not even if having similar other characteristics or guessing what kind of disease a person is likely to have in future. Solving similar problems require huge data which is also very complex. New data about newer patients and newer diseases is being constantly added. When an IDS is to be designed considering latest attacks, one must need huge data, newer and continuously changing operating systems, databases, and even attacking methods. The difference between conventional Big Data solutions and ES solutions which require similar services is that ES also have to deal with imprecise and incomplete data. The data of medical domain comes from doctors’ handwritten information, hospital records, lab reports, IMD observations and so on. The information about intrusion different operating system’s logs which are usually in their native forms different from other. Different IDS sensors generate intrusion related alerts in their own format which requires elaborate methods to sync them for aggregating and correlation.
The second step is to check if the problem solving process involves verbal (cognitive) and not physical (For example drawing or sculpting3) skills.
Availability of the expert
One can easily understand that the suitable expert is the most critical for ES to succeed. It is possible to design systems which attempts to solve problems which aren’t possible to solve by humans (for example space shuttle launching or managing space station). Unfortunately such systems do not have ES component derived from expert’s knowledge and are not true ES4.
Lack of availability of an expert is a major reason for ES failure and thus having an expert who can devote sufficient time is equally critical. Not only the expert chosen must be an authority in the given subject, he should also be able to articulate his rules of thumb, at least to the KE if not to other system developers. He should have clear idea about how the domain knowledge can be structured and applied. Another psychological point which we have stressed earlier is about insecurity that the expert
3 Interestingly, the 3d printing has helped the process to a large extent but it requires human experts to draw 3d design.
4 This does not mean that these systems are easy to design. They have very complicated problems which are solved using innovative solutions by great scientists and by all means great systems. Point is, for being classified as an ES, it must have the ES component. If they do not have one, they are not ES however complex they are. might feel providing the knowledge. The expert knowledge is scarce and thus the expert and the organization he belongs to might feel against revealing too much.
Another point is the expert who is chosen to provide knowledge must have reputation in the circle and well-known. More popular and respected the expert is, more likely that others accept the system with less doubt. Credibility of the expert has found to have profound impact on the acceptance level of potential users and thus influences purchase of the system and also the success of the implementation of the same in the organization.
In the previous modules we have stated that there are other experts who are going to produce test cases and evaluate the ES. It is quite possible that they may not agree that the primary expert’s approach being the best, but they must agree that the primary expert’s approach is one of the acceptable methods.
Defining the scope
Determining the scope of the ES project is a non-trivial job. All AI problems are larger, sometimes incredibly larger than they appear at the first sight. There are few guidelines for determining the scope of the problem.
1. Check how many rules are executed or knowledge components accessed to answer a particular query. One needs to build as many rules for the typical query. If one can have rough estimate of queries, it is possible to decide the rules or knowledge components to be developed. This can help determining the scope of the project. Another heuristics used is to test how much time the expert takes while answering that query. More time he takes, more knowledge chunks are likely to be accessed. One may test how many records the expert accesses on an average for a given case, how much information is required for a given case to take a decision and/or confirm a decision.
2. The ES should be built for a narrower domain. For example one might develop an ES which can check for some typical set of attacks and not all possible attacks. It is usually better to start with narrower domain and then extend the ES further. It is easier to decide the scope if the domain is narrow.
3. The ES should be dealing with well-bounded problem. That means it must be clear to all stakeholders what the ES is going to do and what it is not going to do. Unless such boundaries are defined, it is impossible to decide the scope of the problem.
4. The ES should be designed on the lines of training provided in that field. Once this is designed, it gives clear estimate of the scope. The ES should cover what is normally done in such training and should not cover which is not, at least in the first version.
5. Though the ES may only be a component of the complete system, it is a good thing to design it in a modular form so we can use it in other systems. Current trend is to build the ES as a collection of classes so it can be easily used and extended further to build other systems.
Economic feasibility
The major cost of building ES involves the expert’s and KE’s time. The major benefits are additional income that is generated when ES enabled them to address problems which were previously remained unsolved, increased user satisfaction (for example Siri), or preservation of expert knowledge that help institute sustain.In fact most of the benefits are quite intangible and thus it is really hard to assess the exact economic feasibility but a heuristic is commonly used. One must determine the breakeven point for the ES. For example if the breakeven point arrives at 30 to 40% of ES efficiency over a period which half than estimated life time of the ES, and there is safe to assume that the ES will not lead to a loss. This type of analysis is performed for all candidates and the comparison is used for further selection.
Final Selection
The final selection is made based on the candidate analysis. Tasks which are Infeasible and not guaranteed to reach breakeven point may be discarded and other are kept. This list is more or less final. This is the final step of the Identification stage. The next stage begins with prototype construction. The idea behind prototype construction is to get more insight into each candidate’s feasibility and implementation related issues. Sometimes the tasks are dropped even after final selection is made if prototypes reveal something which was not apparent at the time of candidate analysis. We will discuss prototype construction and the rest of the process in the next module. Let us reiterate that the system is to be developed completely, some part which is not a candidate for ES but conventional system, can be solved using conventional part and manual part harder to be managed by ES is done manually.
Summary
We began from discussing the challenges that and expert system development process faces from software engineering outlook. The ES, being different than conventional system, we need to augment whatever method used for development of ES with some typical issues like selection of expert and how knowledge engineer processes the information. We listed 6 different steps of ES development and discussed first two in this module. We have seen that expert’s availability is a major concern and rapid prototyping method is more suited for ES development. The problem is completely surveyed to assess if one really needs to develop an ES or a conventional solution will do. Each candidate is further analysed minutely based on many things for assessment of their feasibility. Final selection is made once complete feasibility including economic feasibility is calculated and the solution is confirmed to be able to provide a cost-effective solution to the problem. It is also seen that no problem chosen is either too simple to be solved using conventional system nor too hard to be solved even by ES.
you can view video on ES Development process-I |