4 The Physical Organisation of Data

Aditya Tripathi

 

I.  Objectives

 

The objective of this module is to:

 

•    Introduce the concepts of file and related terminology involved in IR and information storage.

•    Introduce reader to the various file formats and their uses.

•    Introduce the most important factors of file structure that influence the organization of data in computer file.

•    Provide various advantages and disadvantages of sequential file organization and direct file organization of data.

•    Enable readers to learn about the organization of data in information retrieval system.

 

 

II.   Learning Outcomes 

 

After reading this module:

 

•    The reader will gain the clear knowledge of entity, attribute and relations between them.

•    Students will understand the role of file system in information processing.

•    The reader will gain the knowledge of file structure and file formats, which are used for organization of data.

•    The reader will understand the concept of primary key and its usefulness in information retrieval.

•    The reader will also gain knowledge of sequential file organization and direct file organization and their advantages and disadvantages in database management system.

 

III.   Structure 

 

1.      Introduction

2.      Record Structure

3.      File Structure

3.1.   Order of Record

3.2.   Finding Record

4.      Organizational Method

4.1.   Sequential File Structure

4.1.1.      Advantages of Sequential File Organization

4.1.2.      Disadvantages of Sequential File Organization

4.2.   Direct File Organization

4.2.1.      Index File Structure

4.2.2.      Trees

4.2.3.      Advantages of Direct File Organization

4.2.4.      Disadvantages of Direct File Organization

5.      Summary

6.      References

 

 

 

1.  Introduction 

 

A file is a collection of bytes stored as an individual entity. All data on disk is stored as files with an assigned file name that is unique within the directory it resides in. To the computer, a file is nothing more than a series of bytes. The structure of a file is known to the software that manipulates it. For example, database files are made up of a series of records. Word processing files, also documents, contain a continuous flow of text.

 

A file contains data that is needed for information processing. The data is about entities. An entity is anything about which information can be stored. For example; a person, concept, physical object or event. An attribute is a characteristic of an entity. The values of the attributes describe a particular entity. An instance of the entity is represented by a set of specific values for each of the attributes.

 

For example, we are collecting details of the books storing in the library. So book is the entity. The attributes of the entity (book) could be author, title, subject, publisher, editor, price, etc. The attributes are the same for all the books in the library, but the values of the attributes in each instance are different. Thus we have a book entitled Colon Classification, written by S. R. Ranganathan, published by EssEss Publications, price 230 rupees and The invisible man, written by H. G. Wells published by Orient BlackSwan, price 115 rupees as another instance. These two cases represent the attributes of two instances of the entity book.

 

Each attribute of an entity is represented in storage by a data item. For example, there is a data item for title, another data item for author, and so on. A data item is assigned a name in order to refer to it in storage, retrieval, and processing operations. A data item is the elementary unit in data storage of each instance of an entity is commonly called as a record. A collection of related records is called file.

 

2.  Record Structure 

 

It is the structure where all the elements of a record are arranged in an organized manner to provide a structure to data. Record is the simplest form to store all instances of attributes of elements whose meaningful organized form gives a meaningful data. A record structure normally consists of information that helps processing the data. A record structure consists of fixed length of records, fixed number of fields or elements and length indicator. It uses an index to keep track of address and place a delimiter at the end of record.

 

3.  File Structure 

 

File is the collection of records within DBMS. File structure basically deals with the order and arrangement of files by computers within a database. There are several file formats which used for arrangement of data. Some of which are used for specific types of files. For example PNG is a file format which is only used for the storage of bitmapped images.

 

3.1  Order of Record 

 

It refers to arrangement of records in a file within a database. How data is recorded or stored, sequences and placement of data. An ordered record is helpful to access data easily from a database. For example, information of employees of a library is arranged according to their job positions.

 

3.2  Finding Record 

 

Record finding is the process of retrieving particular information within a record. Usually a key field is assigned and the record may be retrieved by that key. A record may contain information about a book like title of book, author, ISBN, publication and title may be assigned as a key.

 

4.  Organizational Method 

 

When data are stored on storage devices, the method of file organization chosen will determine how the data can be accessed. The organization of data in a file is influenced by a number of factors, but the most important factor among them is the time required to access a record, to write a record or to modify a record. The important factor in information system file design is, a physical organization that will support the kind of record access needed at the same time be efficient in terms of access time. Each file organization is more efficient in some operations than others. The storage device and the operation that are to be performed will influence the choice of the data organization.

 

4.1  Sequential File Structure 

 

It is a format for storage of record. All records of some length are arranged in a physical order. In sequential file organization, records are stored in some predetermined sequence, one after the other. One field, referred to as the primarykey, usually determines their sequence or order. A primary key is a field (or set of fields) whose content is unique to one record and can be therefore be used to identify that record. Primary key usually include student ID, title of book etc.

 

When a file is organized sequentially by a record key and is accessed in sequence, there is no need to know specifically where any record is stored. The only thing that should be noted is that the records are stored in order by sequencing key. Locating any record is performed by starting at the beginning of the file reading each record and comparing its key to the record which is being sought.

E 101 Marxism and history :a critical

introduction

Rigby, S. H. St. Martin’s Press 1987

 

E 102 Management of aquatic ecosystems Agrawal,V.P. Narendra Pub. House 1989

 

E 103 Cataloguing: theory and practice Viswanathan,

C.G.

Print House 1983

 

Fig. 2: A Sequential File

 

4.1.1  Advantages of Sequential File Organization

 

There are many advantages of a sequential form of file organization. They are as follows:

 

•   It is the most efficient form of organization when the entire file or most of it, must be processed at once as in batch processing.

 

•   Transaction file and master file together act as a back-up, and can be used to create a new master file if it is damaged.

 

4.1.2  Disadvantages of Sequential File Organization 

 

There are some disadvantages of the sequential file organization. The disadvantage includes the following:

 

• It is time consuming: The time it takes to access a particular record as it may be too long for many applications.

 

• The entire file must be processed and a new master file created, even if only one record requires updating.

 

• We have to access entire file, if we want to access a single record which is in the middle of file.

 

4.2  Direct File Organization 

 

Database management systems can use direct file organization to help manage the database. This is one of the basic organizations used by the operating system. Direct file organization is designed to provide random access and rapid direct non-sequential access to records. Using this organization, records are inserted in what appears to be a random order, not in sequence by key field value. Each record is assigned a relative address on the basis of the value of a field within the record. When a record is to be stored, the system takes the value of the specified field and usually performs some type of calculation to derive a target address for the record. Normally, the record is then stored at the target address. At the time of retrieval of a record, the system uses the key value supplied to figure out where the record should be stored and goes to that address to find/access it.

 

From the user’s point of view, the procedures followed in direct file processing are more straightforward than sequential file processing. Most applications do not use transaction files, and there is no reason to create a new master file when a single record is updated or when maintenance required. In direct processing system, data are directly input into the system through a terminal that is in contact with the CPU of central computer. The system locates the specific record in master file and then updates it. Direct access systems do not search the entire file; rather, they move directly to the needed record.

 

4.2.1  Index File Structure 

 

Index provides the means to search and access records in a database. An index file is an auxiliary file that makes it more efficient to search for a record in the data file. An index is also known as access path on the field which is usually specified. Index file is stored on disk and occupies less space than data file due to its much smaller entries. Following is an example of a library database. A person can easily search record whether by employee ID, name, section, designation or pay scale.

 

Employee

 

Emp.

ID

Last

Name

 

First Name

 

Section

 

Designation

 

Salary

Contact

No.

E

124

 

Rao

 

Chandrashekhar

 

Technical

 

PA

 

32000

 

23456

E

125

 

Kumar

 

Anant

 

Circulation

 

LA

 

22000

 

23444

E

126

 

Rakesh

 

Mohan

 

Stack

 

LA

 

22000

 

23483

E

127

 

Chaubay

 

Anil

 

Periodical

 

SPA

 

24000

 

23473

E

128

 

Singh

 

Anand

 

Technical

 

AL

 

60000

 

23481

 

Fig.3: An Indexed file

 

4.2.2  Trees

 

Tree is a structure which consists of nodes or vertices containing node information together with pointers giving access to additional nodes of the tree. A tree organization supports operations such as searching for a record, inserting new record and deleting record. A tree search is performed by comparing the search key with the key values attached to the certain nodes of the tree, starting with the root of the tree. There are two types of trees i.e. binary search tree, balanced tree.

Fig.4: Tree Structure

 

4.2.3  Advantages of Direct File Organization

 

There are many advantages of a direct form of file organization. They are as follows:

 

• Data can be accessed directly and quickly.

• Primary and secondary indexes can be used to search for data in many different ways.

• Centrally maintained data can be updated easily.

 

4.2.4  Disadvantages of Direct File Organization 

 

There are some disadvantages of the direct file organization. The disadvantage includes the following:

 

•   The use of index lowers the computer system efficiency.

•   Files are updated directly; there may be no back up if a file is destroyed. Regular creation of back up files required.

 

5.  Summary 

 

Physical organization of data is important and the way data is organised has a direct impact on system efficiency in terms of searching, access and retrieval. This module has briefed about basic constructs of database management system such as file and record structures. Various concepts such as records, files and file formats have been briefed. The main types of data organization, sequential and direct organization have been described. Each of these is suitable for certain type operations and hence the advantages and disadvantages of sequential and direct file organization have been discussed.

 

6.  References:

  1. Robbins, Robert J., Database Fundamentals, Johns Hopkins University. 1994.
  2. Ramakrishnan, Raghu, Gehrke, Johannes and Derstadt, Jeff [et. al…]. Database Management System: Solution Manual; 3rd edition, University of Wisconsin, Madison, WI, USA, Cornell University, Ithaca, NY, USA.
  3. S Ramakrishnan, Raghu and Gehrke, Johannes. Database Management System; 2nd edition, University of Wisconsin, Madison, WI, USA, Cornell University, Ithaca, NY, USA.
  4. Silberschatz, Korth and Sudarshan, (1997), Database System Concepts
  5. http://pic.dhe.ibm.com/infocenter/analytic/v2r1m0/index.jsp?topic=%2Fcom.ibm.discove ry.es.ta.doc%2Fiiysalgstopwd.htm
  6. http://pic.dhe.ibm.com/infocenter/analytic/v2r1m0/index.jsp?topic=%2Fcom.ibm.discove ry.es.ta.doc%2Fiiysalgstopwd.htm
  7. http://en.wikipedia.org/wiki/Stop_words
  8. http://www.comp.lancs.ac.uk/computing/research/stemming/general/
  9. Haithcoat, Tim, Relational Database Management Systems: Database Design and GIS, University of Missouri, Columbia.
  10. Healey, R.G., Database Management Sysytem
  11. http://en.wikipedia.org/wiki/ACID
  12. http://blog.sqlauthority.com/2007/12/09/sql-server-acid-atomicity-consistency-isolation- durability/
  13. http://www.techterms.com/definition/user_interface
  14. http://veegantechnologies.com/sequential-files/
  15. http://home.iitj.ac.in/~ramana/ch10-storage-2.pdf
  16. http://coronet.iicm.edu/dm/scripts/lesson06.pdf