23 Vector and Raster based Analysis
Dr. Puneeta Pandey
1. Learning Objective
The objective of this module is to understand the basic concepts of vector and raster data. This chapter provides an overview on the GIS system and functions with an emphasis on data structure and their types such as Vector and Raster data.
2. Introduction
The Geographic information system is an effective tool for implementation and monitoring of municipal infrastructure, urban planning, public safety, utility services, transport services, etc. GIS is used for improving the efficiency and effectiveness of a project where geographical information is of prime importance (Burrough, 1986). GIS manages all variety of data in a single electronic file in a computer by sorting different spatial features as sub-files. These sub-files are called map layers/themes (soil, water, street, etc.). These map layers are conveniently stored and accessed with the computer in the same scale which is very helpful for regional planner or any administrative body for accurate study of the earth features. GIS can club all the layers and show all features of every layer. It can be displayed and overlaid depending on the requirements.
3. GIS System and Function
3.1 Data Capture: GIS is a tool which integrates the data from various sources into a common format which can be compared and analyzed. Different input sources are mainly obtained from
- Manual digitization
- Scanning of aerial photograph
- Existing digital datasets
- Remote Sensing satellite Images
- GPS
3.2 Data Storage and Manipulation: After collection of data, it is stored and maintained. Data management includes data security, data integrity, data storage, retrieval of data, and maintenance.
3.3 Data Analysis: GIS has an ability to interpret and analyze the collected information quantitatively and qualitatively.
4. Data models of GIS
The information within GIS consists of two elements: spatial data and attributed data. Spatial data is represented by points (e.g. well locations), lines (e.g., streams, road networks), and polygons (e.g. soil delineations of soil mapping units), or grid cells, and attribute data or information that describes characteristics of these spatial features. The spatial data are referenced to a geographic spatial coordinate system and are stored either in a vector or raster format (Burrough 1989).
GIS platform stores and manages geographic data structure in a number of formats. The three basic data models are vector, raster, and TIN.
4.1 Vector Data Structure
In vector data structure, attribute information is always associated with point, line and polygon as spatial entities that describe features occurring in real world (Fig: 1).
Figure 1: Representation of features using Vector data structure
Points are pairs of x,y coordinates. Lines are sets of coordinates that define a linear shape, with width of the feature negligible to the length. Polygons are sets of coordinates defining boundaries that enclose areas. Coordinates are most often pairs (x,y) or triplets (x,y,z, where z represents a value such as elevation) (Fig: 2)
Figure 2: Points, lines and polygon representation using coordinate
This kind of representation of the world is generically called a vector data model. For example, a point representing a town associated with its population, number of houses, and number of hospitals and so on. A linear feature such as river represented by line is associated with name, mean discharge etc. A land use represented by polygon feature is associated with its past land use, soil type, etc. Vector data structure is categorized into:
4.1.1 Spaghetti data structure
4.1.2 Topological data structure
4.1.1 Spaghetti Data Structure
The spaghetti data model is the most simple data structure. In this model, each entity on a map becomes one logical record in digital file and is known as a string of x.y coordinates. Spaghetti vector data structure is not optimal because it does not take into consideration shared lines and points. All identity is defined spatially, without any spatial relationships. This creates limitation to perform any type of spatial analysis. The spatial relationships of entities are obtained through computation. Different lines and polygons are stored as independent objects. Lines between adjacent polygons must be digitized and stored twice.
4.1.2 Topological Data Structure
It is the most widely used method to reveal spatial relationships. For example, an area or polygon is defined by a set of lines which makes up its boundaries. In this case, the line is the border between two polygons. Each line can represent part of a path connecting other paths. For examples, lines can be used to represent streets and the routes. The connectivity of these features is referred as their topology structure. The topological mathematical tool is used to define spatial relationships. The model is also known as Line/Arc-Node data model. The advantage of this model is data redundancy which is reduced because of shared nodes and lines which are stored only once. Attributes are linked to each feature. The attribute data is stored in separate relational tables; therefore, more files are maintained for this purpose. Database management system is used which provides more efficient access.
- Line (Arc): It is a series of points that start and end at a node.
- Node: Is an intersection point where two or more lines meet. Node can also occur at the end of a dangling line that is not linked to another line.
- Polygon: It is comprised of a closed chain of lines that represents the boundary of the area.
- Point: It is encoded as a single XY Co-ordinate pair. Point is considered as the polygon with no area information.
4.1.3 Properties of the Topological data structure
- Connectivity: Indicated which geographic features connect to others or which geographic features intersect each other. For example, line 1 is connected to line 2, 3 and 4.
- Adjacency: Indicates which geographic features (nodes, arc, and smaller polygon) are contained within polygon. For example, the polygon D is inside the polygon B.
- Containment: Indicates which geographic features (node, arc and smaller polygon) are contained within a polygon. For example, Polygon D is inside the polygon B as shown in the figure.
- Proximity: Indicated which geographic features are near to others. For example to travel from Node B to Node A the shortest path is Line 3.
- Relative Direction: It indicates the relative position between the geographical features. This can be used to study the direction of slope and watershed management.
Figure 3: Showing Connectivity, Adjacency, Containment, Proximity, Relative Direction
4.1.4 Advantages and disadvantages of Vector Model
4.2 Raster Data Structure
Raster Model divides entire area into regular grids in a specific sequence. It is generally sequenced row by row from top left corner. Each cell of a grid contains a single value. In most cases, the values are to be assigned to each and every grid in the raster data model. It is often coded as ASCII format. It is relatively a simple approach for data integration both conceptually and operationally.
Digital elevation model uses the cell by cell data structure because the neighboring elevation values are rarely same. Satellite images also use this method for the data storage. The advantage of raster GIS model is easier to interface with the remote sensing images.
Fig 4 – The cell by cell data structure record each cell value by row and column
4.2.1 Advantages and disadvantages of Raster data structure
5. Data Analysis
5.1 Vector data Analysis
5.1.1 Selecting feature by location (spatial joining and location queries): In most of the time it is useful to join attribute information from a polygon map to line map. Moreover you can also select a subset of features based on their location.
5.1.2 Vector data analysis (buffering and overlaying):
Buffering: The selecting feature by location tool is ultimately designed to select a unique subset of objects from different map layers. However, in buffering we analyze spatial data layers in a way that creates entirely new objects. For example, let’s imagine that we want to determine all those areas in the India that are not within 50 kilometers of an interstate, but are within 25 km of a river. We can do this by “buffering” the interstates at 50 km and overlaying the area outside the buffer with the area inside a 25 km buffer of streams.
Overlaying- Suppose next we need to overlay the two buffers to identify the area outside the roads buffer and inside the river buffer. In this overlaying method is applicable.
5.1.3 Feature manipulation for vector analysis (clipping and Dissolving)
- Clipping- It is used basically to clean up our analysis we can remove non-Indian area. In this particular context it is known as ‘clip’.
- Dissolving- This used in a case wherein you will find a situation where you have many different neighboring polygons that all have the same value. Since the polygon share a common boundary there is no reason why they can’t be dissolved together into one bigger polygon. This situation is illustrated below which shows many different polygon with similar value neighboring one another.
Fig 5: Showing dissolving process
5.2 Raster data Analysis
The analysis of raster data includes various steps such as:
1. Display of Digital Elevation Model (DEM)
2. Slope Calculation
3. Calculating Aspect
4. Hillshade
5. Calculating a viewshade
6. Neighborhood statistics
7. Zonal statistics
8. Extract by Mass
9. Distance/Buffer Analysis
10. Reclassification
11. Vector to raster conversion
12. Using the raster Calculator
13. Raster to vector Conversion
- Displaying the Elevation Model- This File contains the elevation information for the 10m x 10m cells. Elevation values are in meters and the coordinates system is UTM18NNAD27. This DEM is added to the map view in the same way other data are added. The appearance of the DEM will depends on the symbolization that is being used.
- Slope Calculation- The slope function calculates the maximum rate of change from every cell to its neighbors. The Function is calculated over 3X3 set of cells and can yield slope in angular degree (0-90) or in percent, which is a measure of vertical rise over horizontal run.
- Calculating Aspect- Aspect identifies the slope direction in compass degree (0=north, 180=south, etc.) As was the case with slope, the calculation is based upon on a 3X3 grid neighborhood.
- Hillshade- Hillshade allows us to determine the illumination of a surface (the DEM in the case) given a direction and angle of a light source (i.e., the sun). The resultant grid contains values ranging from 0-255 with0 representing complete darkness.
- Calculating a Viewshade- This allows you to determine which area on a landscape can be seen from a feature, such as a point location. This calculation is based entirely on the elevation and does not include tree, buildings, etc.
- Neighborhood statistics- It is designed to perform several different function on rater data involving a user defined “neighborhood”. For example, let’s imagine that you want to “smooth” an elevation raster. Moreover, let’s say you are interested in calculating the average of a 9×9 rectangular neighborhood of elevations. If you do this average for every cell it should reduce the peaks and raise the low points.
- Zonal statistics- The zonal statistic function allows the user to define zones using features from another data layer.
- Extract by Mask- Using this you can do the raster equivalent of a clip.
- Distance/Buffer Determination- This offer an option to the Spatial Analyst to create a raster file that contains distances from a set of features, such as points, lines, or areas.
- Reclassify a raster- This is very useful where we face a situation in which we want to simplify, categorize or rank raster data. For example, we may want to categorize the distance range (0 -50 km) in the above raster data into three categories, ‘near’ (0-20 km), ‘medium’ (20-30km), and ‘far’ (30- 50km).
- Vector to Raster conversion- This is used in situation where you might need to convert a vector file to raster map.
- The Raster calculator- The raster calculator is a very useful utility for performing a wide variety of grid-related tasks. One of the most common uses of the raster calculator is creating new raster images that meet criteria from other raster images.
- Raster to vector conversion- This is generally used in situation where you want to know the size of each enclosed area that meets certain condition. One way to get such information is to convert the raster into vector, and calculate the area of each polygon.
- Summary
The analysis is often considered to be the ‘Heart’ of the GIS. As GIS includes both attribute and spatial data; the study can be conducted on both types of data. Through analysis new information is gained about spatial features. In vector data structure, attribute information is always associated with point, line, and polygon as spatial entities that describe features occurring in the real world. However, in case of raster data, grid and cell values are used to retrieve information from the spatial data. These types of analysis are very useful in getting fruitful information from the data and make the data easy to understand by segregating different features present in the data.
Suggested Readings:
- Burrough, P. A. (1986). Principles of geographical information systems for land resources assessment.
- Burrough, P. A., Heuvelink, G. B., & Stein, A. (1989). Propagation of errors in spatial modelling with GIS. International Journal of Geographical Information System, 3(4), 303-322
- Campbell, J. B., & Wynne, R. H. (2011). Introduction to remote sensing. Guilford Press.
- Obe, R. O., & Hsu, L. S. (2015). PostGIS in action. Manning Publications Co..
- Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer.
- Lo, C. P., & Yeung, A. K. (2002). Concepts and techniques of geographic information systems (p.532). Upper Saddle River, NJ: Prentice Hall.