27 Point Pattern and Nearest Neighbor Analysis
Dr. Madhushree Das
(1) E- contents Introduction
In geography, dots are the most commonly used symbol in quantitative mapping. A dot is used to describe or show the pattern of settlement using points or dots. The nearest neighbor analysis is an example of point pattern analysis pattern of features under study. A nearest neighbor analysis (NNA) is a descriptive statistics that shows a pattern of locating features by comparing graphically the observed nearest neighbor distance. That is, it describes phenomenon based on its distance from another phenomenon in space. The nearest neighbor analysis do not only examine distance between points, it also examines the closest point to it ( Fotheringham, et al 1994 and Woulder 1999). The nearest neighbor analysis can be used to describe both human and physical features to know the proximity of points for example, settlement and vegetation.
The Nearest Neighbor Analysis technique was devised by a botanist who wished to describe and provide a quantitative description of the patterns of plant distribution especially the distribution of trees. The early beginnings of nearest neighbor analysis can be attributed to the pioneering works of P.J. Clark and F.C Evans in 1954 is their attempt to describe and analyze the pattern and distribution of trees and other plants in the forest. However, since geographers are interested in the study of the pattern of distribution of phenomenon over space the techniques of the nearest neighbor has since been adapted for geographical studies. As such the nearest neighbor analysis has since evolved to it been used to identify a tendency towards or calc ulate the degree of nucleation (clustering) or dispersion of phenomena in space. The nearest neighbor analysis can be used to analyze the distribution of schools, hospitals, buildings, settlement and a myriad of physical features such as wells, springs mountains, hills etc on the earth surface
Since geography is described as science of spatial relationships of phenomena, the location of human activities (socio- economic as well as cultural) and their distribution pattern have great importance in geographical studies. Distribution of any human activity over space is un-ubiquitous in nature. So the question of location choice of activities is answered by studying their distribution pattern. Nearest Neighbour Analysis (NNA) provides basis of measuring point pattern of an area/region which would help in understanding the spatial processes of the distribution of human activities. Spatial association and distribution of settlements on the surface of the earth is uneven because the evolution processes of each of them are different from one another and are controlled by the different geographical factors like physiography, climate, soil, natural vegetation and socio cultural factors also. If settlements are considered as points over space, then the distribution of points is one of main dimensions of studies related to activity distribution. The present analysis provides basis of measuring various arrangements of points or location of settlements. A set of points of an area can be arranged in a number of ways. But three basic patterns are recognized, namely, regular or uniform, clustered and random.
A uniform pattern occurs when the interval between points is similar; a clustered arrangement is where they appear in bunches separated by gaps; a random pattern is one in which the spacing could have been determined by chance or by using the random number table. However, these patterns are not mutually exclusive as clustering may occur regularly and even spacing may arise at random. The degree of regularity or clustering of points in an area can be judged very easily by eye. It is easy to give the label of clustered, regular or random to these patterns because their forms have been made rather obvious. But it is very difficult to distinguish less obviously clustered or regular patterns from one which could have arisen from random. Nearest neighbor analysis helps us in understanding these patterns which evolve over space because of different geographical factors.
The study of points/settlements distribution in order to discern any regularity in spacing by comparing them with a theoretical random pattern is called Nearest neighbour analysis. It is a method of exploring pattern of locational data by comparing graphically the observed distribution with the Nearest Neighbour Distance (NND). It describes the distribution of points according to their spacing. This analysis is done with the help of an index called Nearest Neighbour Index (NNI)which was originally devised by the plant ecologists Clark and Evans in the year 1954. It was originally developed to measure the pattern of incidence of different tree species and was later subsequently applied to the study of understanding the distribution of settlements. However, much of the pioneering work of this kind has been done by King and Dacey in geography. In the study of spatial distribution of settlement, NNA measures the distances between each nearest point and then compares these with the expected values for a random sample of points with a complete spatial randomness. In simple words, it may be said that it measures the ratio of mean of the observed nearest distance with the average expected distance over space to get a Randomness index of point distribution called Rn. However, it depends upon two assumptions: 1) all places/ locations of an area are equally likely to be the recipient of an event and 2) all events are independent at each location.
In measuring the distribution of points over an area ,the distance between a pair of neares t points is measured and then mean distance of all the point pairs (i.e., n-1 where n= number of points) is calculated. In clustered distribution when points are closed to each other, such mean distance will obviously be low, while on the other hand, higher value of mean distance will exhibit relatively high spacing between points and obviously it shows randomness of distribution. To allow comparison between different point patterns and to standardize the results the overall density of points in the area, the Rn value shows results of point distribution.
Figure given below shows three imaginary situations in which points are distributed over an area. One pattern shows loose clusters the other shows a regular spacing and the third one is random distribution of points. ‘Random’ in this context means the outcome of the processes of location in which any point has the same chance like other points occurring at a particular place on the map or each and every place/point has the same chance of receiving occurrence of an event. However, location of each point does not have influence of the other points. Points are fixed over space.
The nearest neighbour index will produce a result ranging from a minimum as 0 to 2.15, where the following distribution patterns form a continuum:
Figure 1: Three imaginary location patterns measured as clustered, regular and random
The scaling for identifying the distribution pattern is given below (Figure-2). If Rn value is close to 0 the distribution pattern is considered as clustered; if around 1.00 (.5 to 1.5) it is Random;if Rn approaches towards 2.0, the pattern is uniform and if very close to 2.15 (the maximum) is perfectly Uniform.
Figure 2: Rn scale
The Nearest Neighbor Index (NNI) is a complicated tool to measure precisely the spatial distribution of a patter and see if it is regularly dispersed (=probably planned), randomly dispersed, or clustered. It is used for spatial geography (study of landscapes, human settlements, CBDs, etc).
- NNA Shortcut
- Settlement often appears on the maps as dot
- The pattern of these dots are difficult to describe
- Sometimes patterns are obvious
- Nucleated
- Dispersed
- Nearest neighbour helps to determine the pattern
It allows one region to be compared with another
1. Procedure for Calculation
The procedure is as follows:
2.1 Locate the points/settlements on a map which are to be analyzed. For example, Figure-3 illustrates the distribution of 27 villages located in a part of Udaipur district of Rajasthan. The points data are collected from Toposheet No————- at R.F. 1: 50,000 having a extent from 240 – 24o5’ N latitudes and 73o 35’ – 73o40’ E longitudes with a total area of about 79.55 km2
2.2 Connect all the villages with their nearest neighbor points/villages and measure their crow-fly distances to calculate the average NND for the area under consideration (Table-1).
Figure-3: A part of Udaipur district of Rajasthan
Table 1: Location of the Villages of a part of Udaipur district and their distances with the nearest neighbour villages
Measure the distance between each village/town and its nearest neighbor village location. Frequently, only settlements which are close to each other are considered as nearest neighbour and population may be used as an alternate to functional content in establishing those which qualify. If any settlement of the study area has location of nearest neighbor village outside it, these can be included, provided that the necessary information is available for them. In this case, the villages outside the study area are to be ignored.
2.4 Calculate the mean of the distance recorded in the previous step, to give the observed mean distance between villages and their nearest neighbours ( ̅ ). In the present case ( ̅ )=1.15km per point when total length of all pairs of villages is 31.10 km for 27 number of villages (31.1/ 27).
2.5 Calculate the density of points in the area (p ):
Density (p) = number of points (n)
Total area (A)
2.6 Calculate the expected mean distance between the villages and their nearest neighbour in a random distribution ( ̅ ). It is shown that:
Any calculated value for Rn will fall somewhere between 0 – 2.15. The smaller is the value the more clustered will be the pattern and the higher the value the more regular will be the pattern. This indicates that an Rn value of 0will indicate a complete clustering which means that there is maximum aggregation of all the points at one location. 1 indicates a random distribution while 2.15 indicates a regular pattern. In case of village study Rn = 0 will indicate a compact distribution of buildings while Rn = 2.15 will indicate a complete dispersed situation.In the above example a value of 1.337 indicates a near random situation. But the term random describes only the appearance and not the factors which produced it. Nearest Neighbour Analysis is useful for simple objective comparison. The distribution of villages in Udaipur district of Rajasthan may be directly compared with similar patterns with other parts of the country. In practice Rn are unlikely to approach very closely to either end of the scale of possible values and it is as well to avoid labeling distribution as uniform or clustered.
The value of Rn may fall between 0 to 1 or from 1 to2.15 which may indicate either approaching cluster or approaching uniform distribution if the value of ̅ is significantly different from ̅ . Otherwise, the distribution should be considered as random as the difference between observed and expected is attributed to the chance factor only. This situation can be answered by inferential statistical methods. If the sets of observations are a sample or treated as such, the probability that the pattern could have arisen by chance can be established by a statistical test.
2. Z-statistics
The deviation between observed and expected nearest neighbour mean distances is tested to use a z-statistics derived such that:
Greater the difference between the observed and expected average distances, the larger are the values of the z test which shows greater the probability of non-randomness of observed pattern and vice versa. However, this test requires a large number of points, not less than 100 to test the probability of randomness in understanding point distribution.
3. An example of Nearest Neighbour Analysis to show Distribution of Service Centres
Another real world examplemay now give more clear picture of NNA in order to solve the problems involved in service centre distribution in Tezpur Sub-division of Sonitpur district of Assam. There are 23 service centres in the central part of sub-division and are influenced by the emerging road network (Figure- 4). Through the use of NNA, an emerging distributional pattern of these service centres is to be determined to make the spatial planning of infrastructure development.
Figure-4: Distribution pattern of service centers in Tezpur sub- division
Having gone through the procedure of calculation of Rn value of the location of service centres as given above and determining NNI in the study area, it is found that Random Index is 1.0786. It is very close to 1.0 but slightly approaching towards uniform (Table-2). It is to be tested by using Z-test procedure.
Table 2: Determination of Nearest Neighbour Index of Service Centers in Tezpur Sub-division
4.1 Test of Significance
Comparing the Z value with probability table value, it is found that distribution of service centres is insignificant at both 5% and 1% level of significance. Hence the value of Rn shows a random distribution.
4.2 Variance Index
Variance Index is point density test. It is also used to signify the point density – an important parameter of NNA. It is calculated with the formula,
Thus from the above analysis it can be observed that the value of V (variance index) is higher than the value of ̅ (expected mean distance). Therefore, distribution of service centres in the sub-division of Tezpur district is interpreted as clustered. However, the distribution pattern determined as random in earlier case with Z- statistics. It shows a random pattern mainly due to the fact that northern part of the sub-division is covered by forest.It can finally be concluded that Z-statistics is used for the test of NNI which is more relevant in present case, while Variance index test the validity of point density. It is used as parameter in the earlier case.
Nearest Neighbours: Pros and Cons
Pros:
- Simple to implement
- Flexible to feature / distance choices
- Naturally handles multi-class cases
- Can do well in practice with enough representative data
Cons: Nearest Neighbour technique is, however, not free from drawback
- Large search problem to find nearest neighbours
- Storage of data
- Must know we have a meaningful distance function
In addition, the calculation of the index is a time consuming process when the number of points in the pattern is large. In such a case chi-squared(c2) analysis serves a better purpose.
you can view video on Point Pattern and Nearest Neighbor Analysis |
References
- Davis, P. (1988): Science in Geography , Data description and presentation, Oxford University Press, Hong Kong, pp 32 – 35.
- Kothari, C.R. (2013): Research Methodology, Methods and Techniques, New Age International Publishers, New Delhi-110002.
- Mahmood, A. (1986): Statistical Methods in GeographicalStudies, Rajesh Publications, New Delhi- 110002.
- Sahu, B.K. (2004): Statistics in Psychology and Education, Kalyani Publishers, New Delhi, 110002
- Smith.David, M. (1975): Patterns in Human Geography, Penguin Books Limited, Auckland, New Zealand, pp 175 – 185.
- S, Murray. R. (1972): Schaum’s Outline of Theory and Problems of Statistics, McGraw-Hill Book Company, Singapore.