21 DATA MODELS in GIS

Dr Seema Mehra Parihar

epgp books

 

 

Learning Objectives:

 

Spatial data observations focus on noting the locations.Every object, area and phenomena have unique latitude and longitude coordinates. In this chapter the learning objectives include:

 

  1. The two primary types of spatial data in GIS.
  2. To recognize the nature of raster and vector data models.
  3. To understand the difference between raster and vector data in GIS.
  4. To know when should we use raster and when should we use vector?

     1. Introduction

 

After an introduction to Geographic Information System (GIS) in your previous chapter on ‘Introduction to GIS’ what is now essential is to take the learning of GIS forward by understanding the concept of Data Models in Geography and related spatial fields of study. GIS as we have understood is a Geo-Spatial Tool for capturing (C-Collecting), storing (S), retrieving (R), transforming(T), analyzing (A) and displaying (D) spatial data from a real world. The real world around us is not uniform and smooth. It is undulating, irregular and dynamic and to represent that geographically in a digital mode requires the understanding of Data models. The models provide a way of visualizing and researching about the world in an organised and structured way in a digital domain. Data models are a way of organizing any object, area and/or phenomena that we wish to study in all levels. To understand data models let us recapitulate the spatial data.

 

Spatial data is that which has physical dimension and geographic location on earth. The geographic location of all the objects, area and phenomena present on the earth can be documented using different methods and technologies. Spatial data stored in a 2 – dimensional or 3-dimensional maps and models can be further described by their respective entity; attached attribute and subsequent developed relationship. Whereas, entity is a distinct spatial object like tree, river, house, etc.; attribute is a description about the entity like type of specie, name, origin or the address of house, owner, number of family members, etc., which may be qualitative or quantitative in nature and finally the relationship describes the spatial association among entities. For example, a tree is an entity, its specie type, name , width, length and origin are the attributes, and the connection between two is a relationship. That is, the relation represents the link between two different entities. A map depicts a variety of information e.g. position, spatial relation, type of feature and measurable quantities. All the geographical data must be represented in a simplified form so as they can be stored in the computer. Hence they are represented by three geometric entities such as point, line and area or polygon as depicted in Figure 1.

Figure 1: Types of Geographical Data

 

 

2. WHAT IS A DATA MODEL?

 

Data model is a description, portrayal, explanation and view of the real world. A data model is a conceptual idea, as opposed to the way that the data is actually stored in the computer, which is the data structure. ‘A data model is mathematical formalism (Ullman 1988)”. It is “a notation for describing data together with a set of operations used to manipulate that data”.Tsichritzis and Lochovsky (1977) define a data model as “a set of guidelines for the representation of the logical organization of the data in a database, consisting or having named logical units of data and the relationships between them”. Pequet (1984) defined a data model as “a general description of specific groups of entities and the relation ship between these groups of entities”. A data model as extended by Pequet “is a formal system in which a set of precisely defined objects can be manipulated in accordance with a set of precisely predefined rules, without any regard for the ‘meaning’ or real-world interpretation of those objects or rules”. Real world can only be represented through models. Different models represent diverse conceptualizations of the world with assorted outlooks and varied perspectives. The real world is represented into a data model using varied and sometimes a fairly high level of abstraction by means of using different operators for simplification and selection.

 

We all therefore understand that a geographic model is an abstract and well defined system of concepts and respective vocabulary to describe and reason about real world. The most familiar model of geographic information is a map both 2-dimensional and 3-dimensional where conventions and rules are followed and largely used. Maps often help us comprehend relationships in a spatial framework and understand k data models in a GIS framework .

 

We also “need to understand and recognize that human eye is highly efficient at recognizing shapes and forms, but the computers needs to be instructed exactly how spatial patterns should be handled and displayed (Burrough, 1986)”. Computers require descriptive information and instructions to turn the spatial entities into graphical form. The construction of spatial models passes through numerous stage of data abstraction.

 

 

       Box 1: Two terms

 

Data Formats and Data Models cannot be used interchangeably?

 

Data Models are broader and large.

 

Data Models are conceptual way of understanding the world around us.

 

Data Formats are the actual computer structures we use to store and display information digitally.

 

All data formats have advantages and disadvantages.

 

Two familiar data formats are .doc and .txt: used to store and display textual information as well as more.

 

Spatial Data formats store and display spatial data. Some examples are shape file, dwg.,gdb., kml and wms.

 

 

3:  Two types of Data Models

 

We all are aware that real world features exist either as  objects or phenomenon.

“Objects are discrete and definite such as trees ,buildings, and huts etc., phenomena are distributed over a large area such as temperature, soil, etc. These two forms lead to two distinct approaches. These are object oriented based and field based models (Goodchild 1992, Wang and Howarth 1994)”. The object based spatial database (those obtained by field surveying, remote sensing image analysis, photo interpretation, and digitization etc.) are generally represented as coordinate lines and termed as Vector data models. When the spatial database is structured on the field-based model the basic spatial units are different forms of tessellation (regular as DEM or irregular as TIN) are termed as Raster data model. Identification of features from the real world followed by the selection of appropriate spatial data model, Raster or Vector and their structure are the key to the spatial model building process.

 

Figure 1: Two types of representation of Real World: Vector data model and Raster data model

 

3.1 Raster data model

 

“Raster data model is one of the variants of the field-based model of geographic data representation, Burrough,1990”. These are made up of grid cells, generally regular spaced with square shape. Usually defined by a set of uniform, adjacent pixels ,with each assigned a separate pixel value. Each pixel has a value or values indicating the characteristics of the phenomenon it represents. There are two kinds of them either discrete or continuous.

 

Discrete rasters have distinct values, distinct category and/or theme. For example, one grid cell represents a particular land use or land cover class. Each thematic class can be distinguished and specific class allotted. They have a clear beginning and end and usually have allotted integers representing each class or an interpretation key. For example , the value 1 may represent vegetation, the value 2 represent s water body and value 3 represents urban areas.

 

Figure 2 representing Digital Images -DEM,DSM and DTM

 

Continuous rasters are non-discrete rasters with data changing gradually as represented in a slope map, elevation grades, soil distribution , etc., They are generally represented with fixed registration points. Each has their unique registration points. Digital Elevation Model (DEM) as a Matrix of equal cells representing elevation is whereas depicted in Figure 2a , the digital surface model is represented in figure 2b and digital terrain model in figure 2c. Phenomena however, can vary along a continuous raster from a specific source.

Figure 3: Representation of a Raster model

 

Raster data is also referred to as a lattice or tessellation model. A tessellation is an infinitely repeatable pattern over space (Coxeter, 1961). “Raster data are tessellations (a tessellation is a space filling mesh either with explicit boundaries as a mesh of polygon or with an implicit mesh as defined by a matrix of values in the logical model) and perform a discretization of the geometric area of interest”. A tessellation may be either regular (mesh elements are all of the same size and shape) or irregular. Elements of a regular mesh generally could be squares (raster), rectangles or hexagons.

 

 

     Location is the focus point in raster data model. The size is also known as minimum mapping unit (MMU). The minimum mapping unit on 1:50000 scale is 3mm*3mm or 2.5 hectares. The value assigned to each cell is known as the gray value. Typically there are 256 gray values in a raster image ranging between 0 and 255. A row and a column number represent each grid cell. A single cell is assigned only one value. Objects that have several attributes are represented by a number of raster layers meant for each attribute.

 

The raster model provides simplest way of storing spatial data. Area and resolution are inversely proportional. Topology relations in raster data are not very strong, if the values of rows and columns are known, then only the location of the neighboring cell may be calculated. In other words, its relation to the origin of the matrix defines ‘Topology’ of the pixels, e.g.: top left corner of the image.

 

3.2: Vector Data Model

 

A vector GIS is simply “a generic name to describe a class of GIS that use the vector data structure to describe, represent and use spatial objects with a physical quantity that requires both magnitude and direction for its description” Korte, 1998.

 

Vector data and vector graphics comprises vertices and paths with three basic symbol types including points, lines and polygons (areas) representing vertical points.. A point is specified by location x, y in the Cartesian coordinate system, a line by a sequence of connected points known as vertices and the polygon, a closed area specified by a poly line having the same starting and ending points. The selection of geometric feature used to model a geographic entity, to a large extent depends on scale and size of the map. For example, Uttar Pradesh may be represented as point on a small scale map like any world map but on a map of India (large scale) Uttar Pradesh should represent it as a polygon which may have number of points (spot heights, head quarters, etc.,) lines (Highways eg., NH-1, NH 24 & railway grid lines ) and polygons ( housing complex, lakes, industrial area ). Polygons in a spatial framework are of two types, adjacent polygons and island polygons. The adjacent polygons can be visualized as adjacent countries represented on a world map where the common boundaries of two or more polygons, e.g. plot boundaries, administrative boundaries are interconnected. In a rural space the adjacent plots with common agricultural field boundaries are adjacent polygons. The island polygons whereas, occur in a number of situations, for example presence of pond in a big lake or presence of pasture lands within the forests.

 

Figure 5: Geometric Data

 

In addition , we also need to remember the significance of attribute data. In the words of Heywood “spatial data are where the things are and attribute data are what the things are?” It means attribute data attaches quality to the spatial data. “Attributes are descriptive information about specified spatial objects and are referred as “non-spatial” or “aspatial” information (Modarres, 1998)”. Since all the information cannot be represented by geometry alone, therefore it becomes necessary to attach the quality, e.g. name, number etc. It describes some aspects of the spatial data. Not specified by its geometry alone. But the problem is that it is domain specific. That is the spatial data for different analysis of the region remains unchanged, whereas the attributes keep on changing as per requirement. For example, with the change of any single land use the attribute information changes. A single site may consist of residential complex, worshipping place, community center, shops, health center, cyber café, crèche, etc. Any change in the use of a particular place say the crèche after two years becomes a day care center , the location remains same but the attribute information will vary.

 

In vector data, it is the familiar coordinate geometry representation of specifying a point, line, or polygon by two-dimensional Cartesian coordinates because of the assumption that the real world can be divided in clearly defined elements. A straight line has only two pairs of x, y coordinates but a poly line should have more than two pairs. (Figure 6) In case of area it is a combination of line segments having common starting and ending points. The vector entities are units that carry the information. The selection of appropriate number of points for the construction of an entity is an important step, as too many points will cause repetition and too few will lead to over generalization of the feature. Paper maps are the typical example of vector model.

 

Figure 6: Coding Vector Data

 

 

Since most of the thematic mapping for Geographic Information System are based on polygons, their storage and manipulation has received considerable attention. The characteristic of a polygon is that it must have a unique shape, area and perimeter. The representation of a simple polygon with few points is straightforward but when complex entities such as more than two polygons are stored, the adjoining boundaries will be entered twice. It will cause duplication of adjacent line and will create matching problem. Beside, it will occupy more space in the computer. To overcome this and the related problems, there are largely two models. First is the Spaghetti data model ,also referred as non-topological or geometric or path topological model. As the name suggests, it is a direct line-by-line translation of vector map. Each entity i.e. point, line or polygon becomes a logical record in the digital file and is defined as string of x, y coordinates. Second is the topological model where topological data define the logical connection between points, lines and areas for geographical description and analysis. This part would be covered in detail in the subsequent chapter.

  1. Comparison of Raster and Vector

    By now we have understood about the raster and vector data. In this section therefore it is relevant to compare the two and understand their advantages and disadvantages (Figure 7).

 

Figure 7: Comparison of Raster and Vector Data Models

 

    4.1. Raster data

 

The raster data model answers the question, “What geographic phenomenon occurs at this location.” The advantages of raster data are routed through the basic nature of data. Primary focus of the raster data model as we know by now is location. The geographic location of each cell is implied by its position in the cell matrix well represented in a Cartesian coordinate system and stored as rows and columns of cell values. Due to the nature of the data storage technique data analysis is usually easy to program and quick to perform. Raster based GIS allows the satellite data to be readily incorporated. The raster model can also represent gradual transition between features and surfaces, such as soil classification and elevationIt is well suited to many spatial modeling operations, mathematical modeling and quantitative analysis such as optimum corridor route selection, modeling surface storm runoff, forest fire spread, continuous elevation data. Further, Grid-cell systems are very compatible with raster-based output devices, e.g. electrostatic plotters, graphic terminals.

 

However, the disadvantage with the raster data is largely with the non-continuous entities such as roads and boundaries (lines) or houses (points). The raster data have reduced spatial accuracy. It means raster model represents a more generalized view. Line- and point-data in raster will always be less precise than in vector format. As raster consists of evenly spaced data that cover a certain area, data that is spaced evenly in one attribute may in fact be unevenly spaced in another due to distortions. It is especially difficult to adequately represent linear features depending on the cell resolution. Accordingly, network linkages are difficult to establish. Further, The cell size determines the resolution at which the data is represented. Raster data sets become potentially very large because they record values for each cell in an image. With the increase of resolution, the size of the cell decreases. Consequently, cost for speed of processing and data storage suffers.

 

Moreover, processing of associated attribute data may become cumbersome if large amounts of data exists. Calculation of area and distances is though easy but calculation of perimeters is more difficult. Raster maps inherently reflect only one attribute or characteristic for an area. Since most input data is in vector form, data must undergo vector-to-raster conversion. Besides increased processing requirements this may introduce data integrity concerns due to generalization and choice of inappropriate cell size. Most output maps from grid-cell systems do not conform to high-quality cartographic needs. Besides, raster uses a lot of space and data-compression is used and therefore the more compressible the raster, the less suited the model to the information. A truly variable surface cannot be easily compressed. Chrisman (1994),said that while the “raster is faster, but vector is corrector” is true to a certain extent

 

4.2. Vector Data

 

The vector data model as we have observed above has many advantages. The primary focus of the vector data model is the geographic feature and therefore, is more suited to the question of “What do I know about this geographic feature? All elements are located as x, y coordinates in a Cartesian coordinate systems and therefore are aesthetically more pleasing. They represent feature shape accurately too with well-defined boundaries. Observation units are “end points” and/or variable line or polygon magnitudes. The emphasis is on relationship and distribution of geographic features. Calculation of area and distance and perimeter is easy and precise. Process of data conversion too is reduced because most available maps are in hard copy and are in vector format. Further, Vectors use minimum space for storage of spatial data. Topology rules can help integrity with the data model and therefore, with efficient encoding of topology more efficient operations, that require topological information including network analysis, proximity analysis, etc. can be processed.

 

The disadvantages of vector data are largely related to the inherent nature of data where location of each vertex needs to be stored explicitly. For geo-analytics of all kinds it is important for each vector data to be converted into a topological structure. With static topology, any kind of updating or editing of the vector data , extensive data cleaning is required and re-building of the topology most of the time becomes important. Further, the algorithms required for any analysis with vector data are complex, processing intensive and time consuming. Further, continuous data of any kind is not effectively represented in vector form as they would require substantial generalization and complex vector manipulations. Spatial analysis and filtering within polygons with vector data seems difficult and to an extent not possible.

 

 

4.3 . Making a Choice : Raster or Vector

 

The choice for Raster or vector is generally driven by the cartographers’ conceptualisation of the feature in their map. The key questions pondered over while making a selection are following:

 

What is the required map scale? If the scale and time is not an issue and you can work with any scale of data, the preferred choice would be a vector data. Vectors can scale up the object to any scale. But, this flexibility is not possible with the raster data.

 

Do you want to work with pixels or coordinates? Vector data whereas works with coordinates, the raster data works with pixels.

 

Do you have any file size restriction? Raster data file size can result larger in comparison with the vector data sets with the same phenomena or area.

 

Are you restricted by a particular GIS software? In this case you do not have a choice as that would be driven by the compatibility of the software. Vector data models would be considered when using ArcInfo Coverages; Arc GIS Shape Files; CAD; GBF; TIGER, ETC. However, raser data models would be preferred and used when using ArcInfo Grids; Images; Digital Elevation Models(DEM); PNG; JPEG, etc.,

 

5: Summary

  • The real world around us is not uniform and smooth. Data models provide a way of visualizing and researching about the world in an organised and structured way in a digital domain.
  • A geographic model is an abstract and well defined system of concepts and respective vocabulary to describe and reason about real world.
  • The object based spatial database (those obtained by field surveying, remote sensing image analysis, photo interpretation, and digitization etc.) are generally represented in the form of coordinate lines and termed as Vector data models. When the spatial database is structured on the field-based model the basic spatial units are different forms of tessellation (regular as DEM or irregular as TIN) are termed as Raster data model.
  • Images from remote sensing, scanned maps, etc. are all in the raster format. Whereas, census data (tabular form) ; US census, DLG from USGS for streams, roads , etc. are all vector data.
  • Raster data models are best for continuous features including elevation, temperature, soil type, land use etc., whereas, best for features with discrete boundaries like property lines, transportation, political boundaries are vector data models.

 

you can view video on DATA MODELS in GIS

 

References

  • Chang, Kang-tsung (2002) Introduction to Geographic Information Systems, University of Idaho, Tata McGraw-Hill Publishing Company Ltd, New Delhi, 2002, ISBN 0-07-049552-1.
  • Ganesh, A. Ed. (2006)  Applications of Geospatial Technology: Serial Publishing House
  • Goodchild, M.F. (1997) Geographical Data Modeling, Computers and Geosciences, 18:400-408.
  • Hohl Pat Ed. (1998) GIS Data Conversion, Strategies, Techniques, Management, Onward Press, 1998
  • Korte,G,B., (1997) The GIS Book, Understanding the Value and Implementation of Geographic Information Systems, 4th Edition,. Onward Press, 1997.
  • Parihar,   S.M.   (2007),,    Standardisation   in   Geo   ICT   Arena,    Vol.5.   Issue               5.p.p   40-46.
  • Feb,2007,Geospatial Today,  p.p 40-46.
  • Parihar, S.M., (2007)Catching that Fleeting Detail, Vol.6. Issue 6.p.p 42-45. August,2007, Geospatial Today, pp 42-45
  • Parihar, S.M.,(2006) 3-D GIS: Much Awaited Technology, Vol.4 . Issue 11. August,2006 ,Geospatial Today, pp 43-46
  • Raper, Jonathan, Ed. (1989) Three Dimensional Applications in Geographic Information Systems.
  • Philadelphia, PA: Taylor & Francis, Inc.
  • Siddiqui, M.A. (2011) Concepts and Techniques of Geoinformatics, Allahabad: Sharda Pustak Bhawan
  • Worboys, M., & M. Duckham (2004) GIS A computing Perspective, London, CRC Press