24 Classification and Indexing
Dr M P Satija
1. Introduction:
Classification has always been viewed as an act of grouping, or a tool for shelf arrangement of documents. Indexing has always been popularly, though erroneously, associated as alphabetical arrangement of terms, concepts and names. Indeed index is mostly alphabetical, though other indexes such as numerical, formula, chronological are also available. If classification is a tool for organization then indexing is anaid in retrieval. B.C. Vickery seminally defines an index as “Known names in known order”. Classification too has an order which is systematic. Therefore there is something common between the two. Common is not only the order but alsothe purpose of information organization and retrieval in general. Both seem synonymous or two sides of the same coin. Therefore both are best known by a common name Controlled Vocabulary. Purpose of this module is to understand the use of classification in designing indexes for information retrieval.
1.1. Problems of Natural Language Indexing
Certainly the natural language approach to indexing can produce relevant and apt results, that is, a good match between the users’ perception of their subject and the manner in which it has been dealt with by the author. This is only possible if both author and user have described it in the same terms and have been on the same wavelength. The average likelihood that any two people will use the same term for a concept or a book, or that a searcher and an information system will use the same term for a concept is in the range of 10-20%, says Marcia Bates. Also the natural language approach takes no account of the synonyms, the homographs and the semantic relationships between concepts, and does not ensure high recall and precision. Therefore a sort of control is needed on natural language when used for knowledge organization and representation.
2. Use ofClassification inIndexing
According to a great authority on Information Retrieval Professor B.C. Vickery(1918-2009) classification involves one or both of two acts:
(a) grouping and division: putting together entities that are alike in some way, and keeping apart those that are unlike;
(b) arranging the entities of a group in a meaningful sequence, that is arranging the groups in some logical way.
2.1. Modes of using classification
In theleft list of index terms no classification has been introduced. By the use of inverted headings five alphabetically arranged groups have emerged as shown in the right column; within each group further arrangement is alphabetical. It is an alphabetical-classed index:
Art paper Binding, cloth
Beveled edges flexible
Cartridge paper hand
Cloth binding Blocking, gold
Flexible binding silver
Gold blocking Boards, paper
Guillotined edges wooden
Hand binding Edges, bevelled
Paper boards guillotined
Rough edges rough
Silver blocking Paper, art
Supercalendered paper cartridge
Wooden boards supercalendered
‘Gold blocking’ and ‘silver blocking’ are classed as types of blocking, and similarly with the other entries.
A comparative list of subject ‘names’ with and without classification is as follows:
Inversion to produce subheadings immediately forms a meaningful group on the subject of books:In all these examples, classificatory grouping is used best in column four. Meaningful sequences of subheadings and sub-subheadings may also onoccasion be found, particularly in book indexes, as in the followingexample :
Sulphur,
sources of supply
use in gunpowder
insulation
matches
vulcanization
The final aspect of classification found in alphabetical indexes and vocabularies is that which is implied by cross-references:
Aeronautics see also Aerodynamics; Airplanes; Civil aviation; Flight Aviation see Aeronautics
The implication is that all the terms so linked form a meaningful group or class.
The three main ways in which classification is introduced into alphabetical indexes and vocabularies are thus (1) inversion of headings, (2)formation of sub-headings, and the use of (3) cross-references. Classification has been used consciously or unconsciously. The inversion of names of compound subject, leads to the formation of classified groupings. However, inversion is not by itself a classificatory technique, whatever may be its effect. In considering a compound name such as agricultural chemistry, CA Cutter (1837-1903) was solely concerned with the way a searcher was most likely to look a multi-worded term.
2.2. The value of classification
Classificatory techniquesis visible in: inversion rules, categorical analysis, relational analysis, facet hierarchy. Has it any value in information retrieval? Would we not be better served by the mere alphabetical order of terms?, asks B.C. Vickery.The advocates and users of classificatory principles in alphabetical indexing would make answer on two counts. First, as soon as the ‘name of a topic is something more complex than a single word, problems of sequence arise which the alphabetical order cannot solve as there is no longer a universally known order. Second, there are few subjects in which terminology is so settled and so widely accepted that each topic has a single known name.
The first problem of compound and multi-worded names immediately necessities some principles of ordering, so as to achieve a consistency in indexing that will facilitate search. It has given rise to inversion rules, categorial and relational analysis, which in turn create classificatory groupings in the index. The second problem, of standard or unique names, necessitates some principles of alternatives and interrelation, so as to achieve a flexibility in the choice of search terms. It has given rise to the whole cross-reference structure and its elaboration, with the creation of classificatory groupings in the index vocabulary.Index entries constructed according to a consistent pattern are particularly important in retrieval systems that rely on visual search.
A sequence of ordered entries, however, produces an alphabetico-classed grouping in which a searcher may more readily browse to pin-point a topic to follow up.
3. Boolean searching and full text databases
The use of Boolean operators in online searching is now very common and widely accepted. AND, OR and NOT are known as Boolean operators and this type of searching is referred to as Boolean searching. A search for ‘Muslims AND Punjabi’ would locate documents on Punjabi Muslims that had been indexed under both of these terms.Muslims OR Punjabi will widen the search to retrieve mixed bag of items both on Punjabis, and Muslims of any mother tongue. Then Muslims AND Punjabi NOT Pakistani will only retrieve Indian Punjabi Muslims. Grouping through parsing is obvious.
3.1Full text Searching
Today, many online information systems contain elaboratedabstracts or the full text of articles, documents, books, etc. It is now possible to search a complete text, even a large work such as an entire encyclopedia for a single word or term. It is “full text” searching. Because of this, some writers argue that a knowledge of classification is no longer necessary for the information worker.Many computer scientists believe that the best way to search for information on the Web is by using keyword searching.
But keyword searching often fails miserably, tell the librarians. Internet search engines can rapidly find the growing volumes of information created and stored electronically using full text retrieval techniques. These search engines though are capable of handling complex search strategies with Boolean operators, but in the output there is more chaff than the grains. It may be emphasized that even Boolean searching involves elements of classification.
Though use of Boolean in online searching, is popular, yet it is primitive or basic. Boolean search formulation syntax and retrieval techniques are not very effective in terms of recall and relevance ratio, and not very usable or efficient search methods for end-users. Serious searchers need a flexible, rich, contextual subject search and browsing mode which offers plenty of options for navigation. Classification provides the context in key word searching for better precision.
A user interested in the church as a physical entity, might enter the more general, broader term ‘Buildings’ or perhaps ‘Architecture’. Clearly, this type of searching is making use of classification and, where a hierarchical relationship such as this is concerned, the diagrammatic representation would be very different to the Boolean diagram shown above. It would appear thus:
This sort of relationship is therefore not conducive to Boolean type searches but requires some more explicit form of hierarchical classificatory facility.
3.2. Truncation method
The computer has given us a wonderful, flexible capability that offers search possibilities that were impossible previously and we must make full use of all of these facilities. The development of computer search of word indexes has led inter alia to a new form of grouping together of a set of words all of which include the same sequence of letters.It is a technique known as truncation. It is a device, that allows searching on word stems, e.g. a search for ‘Comput*’ would find ‘Computer’, ‘Computers’, ‘Computing’, ‘Computerization’ and ‘Computerisation’. A search for *CELL* (where the asterisks mean ‘irrespective of what letters appear in the word at this point’) collects together entries containing such words as:
Cell
Cells
Cellular
Cellulose
Cellophane
Celluloid
Cellar
Cello
Cellist
Micelle
Hermicellulose
Unicellular
Cancellation
The aim is to produce a meaningful group—and the art of truncation is to achieve this aim with minimum of the irrelevant.
3.3. Truncating Class Numbers
Some of the other special facilities, introduced to improve alphabetical term searching in computerized systems, can be adapted for use with classification numbers. The same device can also be used on classification numbers as a means of broadening a search. In the previous example of ‘Churches’ the DDC number for this topic is 726.5. Truncation would allow the search to be widened progressively. Searching for:
726.5 ‘Churches’ (Buildings associated with Christianity)
726 ‘Religious buildings’*Architecture+
72 ‘Architecture’
7 ‘Arts’
Thus, if a classification scheme is hierarchical and the notation is expressive, the search can obviously be broadened, narrowed or widened in a manner explained above for word truncation.
4. Lists of Subject Headings
Library of Congress Subject Headings consists of an alphabetical listing of subject headings, cross-references and subdivisions (a total of 280,000 in the 30th edition 2007). The printed edition (known as the ‘red books’) is in five volumes. The list is also available online via ‘Classification Web’. Also available online is Library of Congress Authorities, which enables one to search, browse and view authorities for subject and other headings from the Library of Congress online catalogue. In 1995 the British Library resorted to Library of Congress Subject Headings for its BNB records as the means of controlled language subject access. The printed subject index to the BNB also uses Library of Congress headings and BNB software arranges the entries producing a lead term followed by the strings in which it occurs, for example:
Minorities | Civil rights | 305.488 |
Minorities | Education | 370.117 |
Minorities | Legal status, laws, etc. | 364.089 |
Minorities | United States. Political activity | 327.73 |
It is necessary to consult the classified section to ascertain the context of the various distributed relatives as the following entries illustrate:
Business ethics | |
Business ethics | 174.4 |
Business ethics | 303.34 |
Business ethics | 657.45 |
Business ethics | 658.408 |
The arrangement of the BNB thus is a classified sequence with alphabetical indexes. Such a system is designed to cater for both an alphabetical subject approach or a classified approach. (www.bl.uk/services/bsds/nbs/subject.html)
5.Classified Catalogues
Being in two parts the traditional, manual, classified catalogue offers the ability to look in an alphabetical subject index for the classification number for a particular subject and then to go to that number in a classified sequence to find relevant items. This is an efficient way to find all the items dealing with a specific subject and also facilitates browsing for related coordinate, subordinate and superordinate subjects.OPACs (Online Public Access Catalogues) are now the norm, giving the user single window online access to holdingsof the library. However, in the early stages, many researchers concluded that the online catalogue, despite its numerous virtues, had not improved subject access. Now libraries have discovered that the original classified catalogue arrangement offers a useful methodology for the online catalogue. Under the onslaught of dictionary catalogue and keywords searchesclassified catalogue approach has been almost forgotten by contemporary librarians. Searches by author, title and keywords are now commonplace, with more advanced searches by subject, class number, or other access points possible in many cases. However, the ability to browse ‘up and down’ a classified list of items is rarely offered, although recent developments indicate that this is changing as libraries seek to enhance the capabilities of their online catalogues. A library catalogue must allow such browsing (Eric Hunter).
5.1 British example:
Eric Hunter furtherwrites: The University of Liverpool Library uses the Library of Congress Classification and Library of Congress Subject Headings. Searches can be carried out using either of these tools. When searching for a class number, the user enters as much of the class number known to the user and the system will display the section of the catalogue around that number. A search for TH6000, for example, will result in a section of the catalogue being displayed as shown below:
TH5667.B63
Stairs / Alan and Sylvia Blanc | 2001 |
TH5667.H11
Stairs design and construction/ Karl J. Habermann | 2003 |
Your entry TH6000 would be here
TH6010.G7.E51
Co-ordination and components in housing : the dimensional framework
and
component size, addendum | 1970 |
TH6010.H 17
Essential building services and equipment / F. Hall etc. | 1995 |
The user can browse through the displayed records and can scroll up and down the classified listing using a ‘Prev / Next’ page facility. This example demonstrates the value of using a classified approach to improve access. Some other libraries have also developed a ‘classification browse’ feature making use of a library classification system.Though the classified approach is not the only way to approach an online catalogue, yet to ignore classificatory techniques is to ignore one of the most powerful access tools that we possess.
5.2. Using class number as asearch reference
It is now becoming more widely accepted that a greater use of classification in searching could lead to an improvement in search technique and in system efficiency. The value of classificatory techniques in alphabetically based retrieval systems is established both by logic and by their use in practice. The alphabet is indeed a basic precision tool but it suffers from many problems unless augmented by classificatory support.
Using the Dewey Decimal Classification as an example, a record for a bibliographic item might contain the following fields:
Class number 623.89
Dewey subject Nautical engineering and seamanship —
Navigation — Selection and determination of course
Dewey index Navigation — Technology – Seamanship
Author Taylor
Title The geometrical seaman
The user would be able to conduct a search in various ways:
1 By class number, for example:623.89
2 By pre-coordinate phrases from the alphabetical subject fields, for example:
Nautical engineering
3 By ‘keyword’ from the alphabetical subject fields, for example:Navigation, or
Seamanship
The latter facility would permit users to search systematically and receive interactive direction from the system as to areas of the classification where there are items matching the entered keywords. In addition the user could browse backwards and forwards through the index to the system, for example:
Navigation — aquatic sports | 797.1 |
Navigation — maritime transportation | 623.89 |
Navigation — maritime transportation — law | 343.096 6 |
Navigation — space flight | 623.89 |
Navigation aids | 387.155 |
Navigators | 629.045092 |
This would assist in the identification of related topics that might also be of use. Clearly, classification can play an important role in indexing methods and retrieving information from databases.
6. Use of authority lists and thesauri
In searching or indexing when terms are taken directly from the text of a document, a thesaurus, or an alphabetical subject heading list may still assist in the search process. Alphabetical lists of subject headings, or thesauri, showing related terms are useful in this respect and therefore these tools may be used not only for indexing but also for searching. An entry such as the following from the (Royal College of Nursing) RCN Library Thesaurus of Nursing Terms, (2007):
FERTILISATION
UF Human fertilization
BT Reproduction
NT Assisted conception
RT Fertility
It indicates that ‘Fertilisation’ is a term used as an indexing term and also that the search may be broadened (via ‘Reproduction’), narrowed (via ‘Assisted conception’), or ‘widened’, coordinately (via ‘Fertility’).
6.1. Using a thesaurus for searching online catalogues
Eric Hunter provides another example of online African Studies Thesaurus in online catalogue searching. This is a structured vocabulary of 12,100 (2008) English terms in the field of African studies, developed and maintained by staff at the Library of the African Studies Centre, Leiden, the Netherlands. It is used for indexing and retrieving material and is directly linked to the catalogue. If, for instance, one searches in this thesaurus for ‘dairy industry’, the following entry appears:
dairy industry
Search catalogue
Used for
dairy products industry
Broader terms
food industry
Narrower terms
cheese industry
Related terms
dairy farms
milk
If ‘dairy industry’ satisfies the user’s requirement, then a search of the catalogue could be done immediately but, if one of the other terms is more appropriate, then the examination of the thesaurus can continue. If the term ‘Milk’ is more suitable and selecting that term will produce:
Milk
Search catalogue
Broader terms
Beverages
Related terms
Dairy industry
Selecting Search catalogue at this point will reveal a list of items dealing with the subject ‘milk’ which may be of relevance. When the record which describes an item is retrieved, it will include details such as author, title, publisher, date, series (if any), and a class number.
6.2. Thesauri construction
Automation has resulted in an ever increasing need for authoritative and standardized vocabularies. Thesauri are an attempt to provide guidance on the terms which should be used both in indexing an item and in its retrieval from an information store. The International Standard ISO 2788-1986 describes the distinction between the syntactical (or posteriori) and thesaural (or priori) relationships. The thesaural, or priori, semantic, relationship, this standard states, ‘adds a second dimension to an indexing language’ and ‘the effectiveness of a subject index as a means of identifying and retrieving documents’ in any system used to store and manipulate terms or to identify documents associated with terms depends upon a well-constructed indexing language (International Standard for Organization, 1986).They control the natural language of the enquirer and attempt to overcome the complexities of semantic expression and the existence of synonyms, homonyms, differences in spelling, word forms and so on by conveying to the searcher the terms used by the indexer to describe that concept. Thesauri construction has much in common with classification. A thesaurus, like a classification scheme, imposes order on a subject and displays the structure and multidimensional relationships within that subject. It normally shows the scope of each term and the context in which it is to be applied. It will also direct users to the preferred terms and provide references from those which are not to be used. The Thesaurus of Engineering and Scientific Terms provides ‘a list of engineering and related scientific terms and their relationships for use as a vocabulary reference in indexing and retrieving technical information. It directs users from unused terms to the preferred term:
Merchant marine
Use Merchant navy and then displays synonymous terms which have not been used, related terms, broader terms and narrower terms. Such structuring is very helpful for the searcher to move up and down hierarchies where necessary.The use of the abbreviation (UF) signifies the concepts the heading has been used for:
Insurance
UF Life Insurance
Bookmarks
UF Book-marks
Precoordination is evident in these phrases and inversions, as it is in the choice of entry form in any multiple term combination. The thesaurus also provides a list of (auxiliary) common subdivisions, such as ‘manufacturers’ or ‘textbook’, ’research’ which can be used to subdivide any heading, in a manner similar to the use of common isolates or auxiliaries in a classification scheme. Here too, the concept of precoordination is being applied in that these concepts are deemed secondary in the search for information. If anyone is interested in manufacturers of automobiles, then the thesaurus determines that the subject term which should be sought is ‘automobiles’ and then the subheading ‘manufacturers’. Such choices are made all the time in indexesas in classification.
6.3. Thesaurofacet
In thesauri there has been a move away from the purely alphabetical specific single heading approach to the use of some systematic clustering of concepts. Trend has emerged of combining classification and thesaurus and these developments are probably best exemplified in the Thesaurofacet which grew out of the English Electric Company’s faceted classification for engineering. Associated chiefly with Jean Aitchison, the work is a faceted classification with a fully structured thesaurus as an index. Such thesauri also provide class numbers for use in subject retrieval and therefore fully combine both approaches. Subject Headings for Engineering (SHE), produced by the publishers of the Engineering Index and Compendex, its electronic equivalent, provides a list of headings, with scope notes and their class numbers. It offers a tool which can be used in pre- coordinate and post-coordinate indexing. The searcher can find from the thesaurus the class number for their subject and then proceed to the classified sequence where the hierarchy of the subject will be fully revealed. Alternatively, the searcher may approach the index for a specific subject and will there be assisted by the display of broader, narrower and related headings to which they are guided. The index of a thesaurofacet does not reproduce but complements the associations made in the classified sequence. Therefore, to show all material of value both should be consulted. There are built-in links between the classification and the thesaurus and this is an added advantage of such dual purpose tools. Such systems attempt to provide both for shelf arrangement and for subject retrieval. The principles of classification are, in many ways, in such schemes applied to indexing method and vice versa. This is a healthy process and thesaurofacet is in fashion now.
6.4 Ontologies
An Ontologyis defined as a collection of concepts, arranged in a hierarchy of categories, combined with the relationships between those concepts, in order to reflect the vocabulary of an area of knowledge. Ontologies provide a powerful framework in which to make connections between ideas, literature, and disciplines, resulting in opportunities to deliver conceptual bodies of knowledge to people whose curiosity or business needs lead them to ask questions.Ontologies and semantics have the power to transform the future of knowledge management for librarians and the public. By contributing to the creation of ontologies used to aid in semantic search, librarians can make the transition from their role as translators between the public and search engines to one where they contribute to the translation taking place inside the search engine, empowering users to ask their own questions in their own word. A dictionary defines concepts. A thesaurus lists words and their synonymous concepts. A taxonomy places concepts into a hierarchy. Ontologies combine elements of all three of these knowledge structures, defining concepts like a dictionary, establishing relationships like a thesaurus, and providing categorization for concepts like a taxonomy. An ontology allows for the exploitation of all the kinds of relationships our brains make automatically. Ontologies are being used to:
- Standardize vocabulary by publishing concepts, relationships and definitions provided by subject experts
- Provide better routes of exploration by organizing the information in a more precise, deep and complex way
- Provide better search results by using ontology-based search techniques and natural language processing. (King and Reinold, 2008)
7. Classificatory Principles underlyingIndexing
Classificatory principles are inherent in traditional, pre and post coordinate indexes and thesauri in order to control and make more effective the process of subject retrieval. Pre-coordinate indexes, replicate the problems associated with classification in general, by forming a fixed order of citation of elements.They perform an additional function in supporting the classification and drawing attention to the alternative locations of a topic in the various disciplines as found in the classification. Effectiveness of an IR system is mathematically calculated in terms of Recall and Precision. Classificatory principles can be applied to enhance effectiveness of indexing systems. As far as recall is concerned, though classification collocates all materials on a specific topic but also scatters by discipline related aspects of thesubject.It thus aids recallonly the dominant aspect of the subject. Full recall largely depends upon the depth and accuracy of indexing. A topic being sought may be scattered amongst a variety of disciplines. This can be overcome by the relative index. Classification also conceals and represents hierarchy, where one is guided via the notation to more general or specific works. The idea of more general and more specific terms being linked in an index is therefore a useful one, which has been utilized all types of indexing. In a list of subject headings we are instructed on the creation of references to be made from a subject to related and subordinate topics. For example:
Accounting
see also Auditing; Inventories, etc.
The headings in such an index should be specific entries, that is, they should correspond precisely to the subject content of the document. Such links can be most useful in guiding the user, particularly the uninitiated, as, for example, when an enquirer is searching for information on acupuncture and some of the most valuable material may be found in works on holistic medicine. Certain indexes adopt this approach and would provide links thus:
Acupuncture
see also Holistic medicine
7.1. Chain Indexing
A ‘chain’ enshrines the classification hierarchy for a subject, going down from the general to the more specific. Such a chain can provide a significant aid in the search process, not only by the use of classification numbers but also by means of alphabetical entries derived from the class number by the ‘chain procedure’ method inverted by Ranganathan. This method can be applied not only to hierarchical classification with expressive notation but also to faceted classification and non-expressive notation. For example, using the Universal Decimal Classification, the classification number for the subject ‘Public health aspects of petroleum pollution of sea water’ would be:
614.777 (26):665.6
This number clearly is not expressive but a hierarchical ‘chain’ can still be constructed, that is:
Alphabetical subject index entries can be produced from this chain by beginning with the last, or most specific ‘link’ and proceeding step by step back through the chain, qualifying where necessary by a more general term or terms to indicate the subject context:
Petroleum: Sea water pollution: Public health | 614.777(26):665.6 |
Oil: Sea water pollution: Public health | 614.777(26):665.6 |
Sea water pollution: Public health | 6l4.777(26) |
Water pollution: Public health | 614.777 |
Pollution: Public health | 614.7 |
Public health | 614 |
Medicine | 61 |
Petroleum: Economic geology 553.982
Petroleum: Mining 622.323
Petroleum: Sea water pollution: Public health 6l4.777(26):665.6
Because of problems relating to terminology and ‘missing’ or ‘false’ links, this method is not purely mechanical but semi-mechanical in that some adjustment to the chain, as derived from the classification, may be required. Nevertheless it does provide a means of producing a specificalphabetical entry for a subject, based upon the classification schedule use, which will indicate the context in which the subject is treated. Entries for related aspects of the same subject can also be pinpointed. Chain procedure has been applied to a number of information systems, especially in libraries and information services. The prime example of the successful use of chain is probably the British National Bibliography (BNB). Works are arranged by the Dewey Decimal Classification and name, title and subject indexes to this classified sequence are provided. Chain procedure was used to produce the printed subject index between 1950 and 1970. Let us take another example from the DDC 615.892 Acupuncture:
Medicine | 610 |
Therapeutics | 615.5 |
Specific therapies | 615.8 |
Other therapies | 615.89 |
Acupuncture | 615.892 |
At each stage one would consider the significance of the concept introduced as the verbal equivalent of the last digit. Some of the above links have no subject meaning, such as ‘Other therapies’, which would therefore be ignored as unsought headings in constructing the index. Synonyms would be sought for all of the terms, which should be included in the index and the result would be a number of entries, filed alphabetically:
Acupuncture: Therapeutics: Medicine | 615.892 |
Acupuncture: Holistic medicine: Medicine | 615.892 |
Therapeutics: Medicine | 615.5 |
Holistic medicine: Medicine | 615.5 |
Alternative medicine: Medicine | 615.5 (synonymous term) |
The searcher who has sought acupuncture as search entry to the index would, therefore, be guided by the form of the entry to holistic medicine as a broader class.
7.2. PRECIS
When the British Library decided to produce the BNB by computerized methods, chain indexing was replaced by PRECIS in 1971 invented by Derek Austin(1921-2001). PRECIS was independent of any particular classification scheme but, nevertheless, it had its roots in classification research and was founded on classification principles. PRECIS is an acronym for PREservedContext Index System and this conveys the intention of allowing a user to enter an alphabetical subject index at any one of the significant terms which together make up a compound subject statement and establish at that point the full context of the subject which contains the selected term.
The principles of categorical analysis and relational operators were combined and applied to alphabetical indexing. Process starts with a natural language phrase expressing a topic, for example:
(a) Plating the spokes of bicycle wheels
(b) DDT for controlling insect diseases in roses
First each term iscategorized as ‘entity’ or ‘attribute’. An ‘entity’ is comparable to Coates’ ‘thing’ or Kaiser’s ‘concrete’,or Ranganathan’s‘personality’ : it is defined as a thing, whether concrete object or mental construction’. Grouping is first of all by entity. Attributes are the properties of entities, their activities, or the properties of activities.Next, among all the entities present in a phrase, it is necessary to decide which is the ‘focal concept’—the idea in the subject which is to be regarded as its key, the direct concern of the author. Other concepts are regarded as non-focal ‘differences’. So in examples (a) and (b) above, the entities BICYCLEand ROSES may be considered as focal concepts.
In example (a), the focus of attention is ‘the spokes of bicycle wheels’. This must be further analysed by means of the ‘possessive’ or ‘thing-to-part’ relation into:
System: bicycles
Subsystem: wheels
Subsubsystem: spokes
and its components assembled in that sequence. The remaining term—PLATINGis an attribute, and the complete headingsare:
BICYCLES. WHEELS. SPOKES. PLATING
ROSES. DISEASES .INSECTS. CONTROL. DDT
The net effect of this detailed categorical analysis is to form index headings grouped by the principal systems of focal entities, with sub-grouping determined by relational analysis; it is comparable to facet analysis. In all these principles of alphabetical indexing developed by Coates, Farradane and Austin, limited classificatory effect is achieved by methods similar to grouping in faceted classification. This last stage in designing alphabetical indexes is to prepare cross-referenceswhich have come to their full flower in thesauri.These days thesaural formats tend to replace the traditional subject heading lists.
Aircraft
Engines. Design 629. 1343532
Similar entries would be made under ‘Engines’ and ‘Design’. In addition, the system provided for the automatic production of references to link synonymous or related terms designed to delimit the number of entries necessary and to cater for the preferred approach of the user. The idea of Precis is that whichever term a user employs to approach a subject, when the index is consulted the searcher will find that term accompanied by a kind of precis or summary of the context in which the term has been dealt with by the author of the document located.
7.3 COMPASS
A simplified and cost-effective system, COMPASS(the Computer Aided Subject System), was developed and introduced to replace PRECIS in 1990. Although simplified, COMPASS entries retained the essential appearance of PRECIS entries. Compassis basically a simpler and less labour-intensive method of creating an index. The resultant entries produce a simplified subject description, which was all that was felt necessary for subject searching. Compass retains some of Precis’ special features, such as role operators, and allows for the meaningful access to the records available in BNB. COMPASS was dropped in 1995 in favour of the LCSH.
7.4 POPSI
POPSI an acronym for Postulate based Permuted Subject Index was invented by Professor G. Bhattacharyya (1936-2006) of DRTC, Bangalore. It was considered an improvement over the Ranganathan’s chain indexing in solving the problem of vanishing chain. Moreover this is a system free of any classification, but is based upon General Theory of Subject Indexing Languages (GT/SIL). There are eight steps involved in working with the system which fully make use of Ranganathan’s theory of classification especially of facet analysis, and principles of facet sequence. POPSI has been used for formulating verbal terms which may be used in subject headings or for other indexing purposes. It can also be used for determining subject index entries for a classified catalogue and also for preparing index to books. POPSI was described as fully amenable to computerization. It was projected as an all-purpose indexing procedure so far as information retrieval is concerned though not much work has been done further, and there is no record of its practical usage anywhere.
Classificatory principles are very much in evidence here, in that the specific entry is being shown in its hierarchical relation to the other elements in the subject of the documents. Precis and Popsi also necessitate the concept analysis of documents in a manner very familiar to the classifier. However, these are not linked to any classification as chain indexing is. Role operators are used to indicate meaning by the interpolation of semantic links such as of into strings which would otherwise be capable of misinterpretation.
7.5. SLIC (Selective Listing in Combination) Indexing
SLIC by the British librarian J.R. Sharp is an attempt to provide for all possible pre-coordinate approaches on the index. Itsindexing identifies all the possible subject combinations in a document and produces as headings or entries a selection of these, eliminating those already found in a larger grouping. It removes all general to specific entries.Also removed are entries which are repetitive and unrevealing. Finally, SLIC is designed as an index of combinations rather than permutations in alphabetical A/ Z order, so that composites can easily be found by the user. In such an index a citation order is needed and it may be some subject significant citation order such as we are familiar with in classification. This way a subject “welding aluminum cans” which may have possibly 15 index terms will have the following four entries:
Aluminum: cans: welding
Aluminum: welding
Cans: welding
Welding
It is a much more economical and supports the classification on the shelves by bringing together aspects of aluminum.The SLIC index is a form of index which can very readily be generated by a computer.
7.6. Title Indexing
Title indexes using techniques such as KWIC (Keyword in Context) automatically generate entries from the titles of documents. These are useful for allowing economy of effort and can be effective when they deal with descriptive article titles in large numbers. It is, however, a relatively speedy and largely non-intellectual process. The terms used are also likely to represent current usage and terminology, although there will be no consistency overall and all synonyms will have to be sought individually by the searcher.
8. Other uses
Classification has other miscellaneous uses in the field of indexing. A formal classification can guide the compiler of an index or thesaurus to the terms to be used, and can assist in displaying the structure, and intra-relation of the subject field. Indexing systems and classifications are both attempts to provide vocabularies for a subject, whether that vocabulary is represented as a term or as a piece of notation, and as such they attempt to control the freedom and unpredictability of natural language.
8.1. Control Devices in Indexing
Three forms of control which may be applied in indexes are role indicators, linking and weighting devices.
8.1.1 Role Indicators
Role indicators in a sense classify a concept by showing the purpose or role for which the term is used. This is particularly necessary when dealing with terms which have a wide variety of different applications such as ‘therapy’, where the process may be found serving different functions such as the clinical, the social, the educational or the occupational. It may also be necessary to distinguish a concept at different stages of its evolution, for example, as a raw material or as an end product of manufacture. Here it is necessary that the index display the role or function of a term, rather than simply qualify the meaning of a term. For example:
Hamlet (Title)
Hamlet (Character)
Hamlet (A Novel)
8.1.2 Linking Devices
Linking devices show the nature of the association between a subject term and another term or terms to which it is related. The concept of the ‘History of philosophy’ is quite different from ‘Philosophy of history’ and an index could use linking devices to convey the difference in approach, a difference conveyed in natural language by semantic construction. If there are no linking devices between terms then post coordinate search for a document “Dog bites a man” will also retrieve “Man bites a dog”. Linking devices seek to increase the relevance ratio. A linking device can explain the nature of the relationship between the terms. Such devices as these may add to the precision of retrieval.
8.1.3. Weighting
According to Marcella and Newton (1994) weighting is another control device, central to classificatory principles, which may be applied to indexing. The whole business of classification is about weighting in determining the citation order of concepts in a document, deciding in which discipline a work may fall, deciding what is the main class number for a work and the creation of added class number entries in the classified sequence of a classified catalogue. In all of these operations we are determining where the chief emphasis of the document lies. For example, in Medline, an item which deals comparatively with a drug treatment of insomnia and behavioural therapy produces the following list of index terms, where the major descriptors are identified by an asterisk:
*Behaviour therapy
*Insomnia therapy
*Triazolam — therapeutic use
Adult
Combined modality therapy
Placebos
Treatment outcome
9. Summary
Earlier classification and indexing were considered two separate though complementary approaches to knowledge organization and information retrieval. S.R. Ranganathan with his path breaking discovery of chain indexing in 1939 brought to the fore the symbiosis between the two. In fact the list of subject headings have some latent classifications in their design in the form of linking synonyms (grouping) and showing both the hierarchical and associative relations in the form of ‘see also’ references. Thesaurus made such relations among subjects more explicit and visible among subjects in a given field. Thesaurofacet invented by Jean Aitchison went a step further which has features of classification system for shelf arrangement as well as thesaurus for indexing and retrieval. Ontologies combine features of a thesaurus and taxonomy and the power to display multidimensional relations of concepts in a domain. Classification is inherent in other indexing systems such as Precis, Popsi and Slic. Boolean operations involve classification in parsing to make desired groups so does KWIC indexing which brings keywords in titles from a large database together by computer manipulation. Vickery mentions that an act of classification is involved when we invert a multi-worded subject name, or bring together distributed relatives as in the relative index of the DDC. To make indexes more effective in performance an element of classification in form of control devices such as roles, links and weights is necessary. In fact at the bottom, classification and indexing are the same KO tools. They have some common features and are best termed under the umbrella term ‘controlled vocabulary’.
10. Glossary
Boolean operations: Logical or algebraic operations in online post-coordinate searching and retrieval involving variables with two values A and B to perform a search using operator AND,OR,NOT to retrieve desired grouping.
Chain: A string of subject terms in order of their successive decreasing extension or increasing specificity.
Cross reference: A direction or reference from one term to another in a controlled vocabulary. These references are either to direct to the equivalent but preferred terms (see), or to related terms (see also) for indexing the area of search.
Dictionary catalogue: Catalogue in which all sorts of entries are filed in a single A/Z dictionary order.
Index: A list of concepts and names in a predictable order for retrieval of information; a tool to navigate the body of its information to locate terms and concepts.Majority of indexes are in alphabetical order.
Indexing: The process of applying an index, or tagging description, to items in the information store with pointed references for full retrieval and recall.
Indexing language: A set of select vocabulary showing semantic and syntactical relations used for knowledge organisation and information retrieval. Also known as controlled vocabulary. The DDC, LCSH and Art and Architecture Thesaurus (AA&T) are outstanding examples of controlled vocabulary.
Keyword: A word or phrase, usually for free text searching, which is deemed significant to describe and spot the concepts in the body of the text.
KWIC: Keywords In Context: A sort of free-text but permuted index usually of titles of articles to preserve the context.
KWOC: Keyword Out of Context:Use ofkeywords as orphans without showing the full context in which these occur.
Natural language: Everyday language as distinguished from controlled vocabulary.
Natural language processing: Used mostly for automated indexing by automatically discerning the meaning of a text based on both linguistics and computer science.
Ontology: A complex knowledge organisation system to describe a knowledge domain depicting seminally deep and wide inter-relations between concepts and terms.
Post-coordinate indexing: Relating two or more terms together at the time of search, now usually using Boolean operators. Uniterms or thesaurus are used for Post-coordinate indexing.
Pre-coordinate indexing: Two or more terms already ordered before search to be used as a single whole. A classification number say of DDC, or LCSH or chain indexing are used for pre-coordinate indexing.
Subject catalogue: Part of the catalogue providing subject access to unknown items.
Taxonomic relationship: A genus-species relationship or relationship between an entity and its kinds.
Taxonomy: Any sort of classification structure.
Thesaurus (IR): A highly structured and controlled subject vocabulary which shows equivalence, hierarchical and other associative relations with other terms in the thesaurus. It is also used to index documents where physical arrangement is not necessary. A literary or desk thesaurus is used by writers to make a better choice of words.
Topic map: A kind of ontology in a standard format for representing complex knowledge organisation systems comprising of topics (concepts), associations (relationships) and occurrences (attributes).
11. References and further Readings
- Bates, M. 1989. “Rethinking Subject Cataloging in the Online Environment” Library Resources and Technical Services, 33(4): 401-412.
- Broughton, Vanda.2004. Essential Classification. London:Facet,pp83-102
- Buchannan, Brian. 1979. Theory of Library Classification. London: Bingley, pp.11-15.
- Chan, Lois Mai .2007. Cataloguing and Classification : An Introduction. 3rd ed. Lanham, MD: The Scarecrow Press, pp. 539-551
- Foskett, A.C.1996. The Subject Approach to Information, 5th ed. London LA Publishing, pp.33-146.
- Hedden,Heather.2010.The Accidental Taxonomist. Medford,N.J.: Information Today, Inc,2010.xxix,442p.
- Hunter, Eric J.2009.Classifcation Made Simple, 3rd ed. Burlington, VT: Ashgate Publishing, 108-126.
- King, Brandy E and Reinold, Kathy. 2008. Finding the Concept, Not Just the Words: A
- Librarians’ Guide to Ontologies and Semantics. Oxford: Chandos, pp. 1-14.
- Maltby, Arthur. 1975. Sayers Manual of Classification for Librarians, 5th ed. London: Andre Deutsch, pp.24 5-258.
- Marcella, Rita and Newton, Robert.1994.A New Manual of Classification. London:Gower,pp.145-155.
- Mills, J. 1962. A Modern Outline of Library Classification. Bombay: Asia, pp. 54-64.
- Palmer, B.I. and Wells, A.J. 1951.The Fundamentals of Library Classification. London: George Allen, pp.101-105.
- Rowley, Jennifer and Hartley, Richard.2008. Organizing Knowledge, 4th ed. Burlington, VT: Ashgate, pp.128-130.
- Satija, M.P. 2004. A Dictionary of Knowledge Organization. Amritsar : Guru Nanak Dev University, 248 p.
- Singh, S.N.1981. “Developments and trends in subject indexing systems and theories”Progress of Lib &Inf Sc.(BHU)2,1981: 45-60
- Vickery, B.C.1976.“Classificatory principles in natural language indexing systems” In: Classification in the 1970s/ edited by Arthur Maltby. London, Bingley,pp.119-141.
Acknowledgement: This module mostly draws on the work of B.C. Vickery, Rita Marcella & Robert Newton and Eric Hunter who have already been listed in the bibliography. Professional debt to these luminaries is gratefully acknowledged
Learn More:
Module LIS/KOP – C/16: Classification and indexing
- Do you know
- Boolean algebra was invented by George Boole (1815-1864) and graphically represented by Venn Diagrams invented by John Venn (1834-1923).
- Thesaurofacet was invented by Jean Aitchison of English Electric Company.
- Citation order is to indexing what facet formula is to a faceted classification.
- Points to remember
- Classification and indexing are synonymous terms in some ways.
- Every act of indexing involves classification in one way or the other.
- Full potential of a thesaurus can only be exploited in online searching.
- The British National Bibliography ultimately came to the use of LCSH since 1994 after trying Chain indexing, Precis and Compass