9 Accuracy Assessment
Swati katiyar
Learning Outcome
- Student will acquire understanding why accuracy assessment is required for satellite data .
- Student will acquire skill to analyze errors that can occur at the time of registration of data and different methods to remove them.
- Student will be equipped with knowledge to study further about accuracy assessment methods, kappa coefficient etc.
Outline:
- Introduction
- Accuracy and Precision
- Source of classification Error Measurement of map accuracy
- Site and non site specific accuracy Error matrix
- Indices of accuracy Kappa coefficient
Introduction
There is no classification created from remote sensing data that can be completely accurate as errors originate from different sources including the classification algorithm itself .In order to use the products of classification efficiently, the user needs to know how accurate these products are. This therefore necessitated accuracy assessment of the remote sensing classification process. In remote sensing, accuracy assessment is mandatory and is important for providing information about the quality of the product of classification as well as furnishing the norms for comparing the performance of different classification methods. Accuracy assessment is an important step in the classification process. The goal is to quantitatively determine how effectively pixels were grouped into the correct feature classes in the area under investigation. Accuracy is defined as “correctness” i.e. it measures the agreement between a Standard assumed to be correct and a classified image of unknown quality. An accuracy assessment compares two sources of information: 1) pixels or polygons from a classification map developed from remotely sensed data and 2) ground reference test information, if the image classification corresponds closely with the standard, it is said to be “accurate”. In a statistical context, high accuracy means that bias is low (that estimated values are consistently close to an accepted reference value). The accuracy of spatial data has been defined by the United States Geological Survey USGS, in 1990 as: “Accuracy assessment or validation is an important step in the processing of remote sensing data”. It determines the information value of the resulting data to a user. Productive utilization of geo-data is only possible if the quality of the data is known. If the image classification corresponds closely with the standard (Ground realty), it is said to be “accurate.” We usually judge accuracy against existing maps, large scale aerial photos, or field checks. In a statistical context, high accuracy means that bias is low (that estimated values are consistently close to an accepted reference value).
Ground Verification:
Evaluation of classification results is an important process in the classification procedure. Traditionally, the accuracy is determined empirically by comparing with corresponding reference or ground data wherein the results are tabulated in the form of a square matrix known as confusion matrix. Ground Truth is factual data that has been observed or measured, and can be analyzed objectively. It has not been secondary. If the data is based on an assumption, subject to opinion, or up for discussion, then, by definition, that is not Ground Truth data. If the analytics are based on subjective data and assumptions, analysis will very likely be off base and therefore not valuable. Ideal situation is represented by a diagonal matrix where only principal diagonal elements have non-zero values i.e. all areas of the image have been correctly classified . This is popularly known as the classification error matrix or confusion matrix or a contingency table.
Precision: Precision defines a measure of the sharpness or certainty of a measurement. Precision is a measure of how close your experimental measurements agree with each other. The closer each measurement is to the other measurements, the more precise is the result. If the measurements are accurate value for a given target, the closer the values to each other, the more precise the result which is clearly shown in Fig.1.
In remote sensing, precision has two meanings:
a) Categorical specificity of a thematic map and
b) The confidence interval within which estimates of map accuracy or area are likely to be contained.
High precision means the variability of estimates is low.
Fig.1 Accuracy and Precision
Source: https://en.wikipedia.org/wiki/Accuracy_and_precision
Source of Classification Error
In manual interpretations, errors are caused by:
Misidentification of parcels Excessive generalization
Errors in registration
Variations in detail of interpretation, and other factors
A very simple landscape composed of large, uniform, distinct categories is likely to be easier to classify accurately than one with small, heterogeneous indistinct parcels arranged in a complex pattern. Figure 2 shows the different sources of errors that occur in classification.
Fig.2: Different sources of error
Source: https://www.researchgate.net/figure/23738314_fig2_Figure-2-Error-sources-and-accumulation-of-error-in-a-typical-remote-sensing-information
Measurement of Map Accuracy: This is a task of comparing two sources of information, one based on analysis of remotely sensed data (the map) and another based on a different source of information, assumed to be accurate. For comparing the information that varies temporally the reference data should match with respect to time. To assess the accuracy of a map, it is necessary that the map and reference data be co-registered, that they both use the same classification system and minimum mapping unit, and that they have been classified at comparable levels of detail. For remotely-sensed data to be truly useful and effective, an appropriate technique of accuracy assessment needs to be performed. Accuracy assessment can be defined as a comparison of a map produced from remotely-sensed data with another map from some other source (figure 3). A determination is made of how closely the new map produced from the remotely-sensed data matches the source map. Evaluation of the accuracy of a classification of remotely-sensed data can fall into one of two general categories: non-site-specific assessment, or site-specific assessment (Campbell 1987). Of several approaches to accuracy assessment, the following sections will focus on the site-specific error analysis of pixel misclassification.
Fig.3 Block Diagram of Accuracy Assessment
Source: http://www.mdpi.com/2072-4292/8/12/980
Non Site Specific Accuracy: Non-site-specific assessment is a simplistic approach to assessing the accuracy of the classification of remotely-sensed data (Campbell 1987, p 340). In this method, a comparison is made between the “known” or estimated area and the area derived through the process of the discrete classification of remotely-sensed data. For example, an estimate is made of the percentage of area represented by three categories: grassland, woodland, and water. Suppose these “known” map areas are estimated to be 20 percent grassland, 60 percent forest, and 20 percent water. It is then possible to compare the estimated area by category from the discrete classification of the remotely-sensed data consist of: 18 percent grassland, 61 percent forest, and 21 percent water. After classification of the remotely-sensed data, non-sites error assessment of the derived map is done. Assuming that the area estimates of each of the three categories are correct, the non-site-specific error analysis would not indicate a significant problem with the classification of the remotely-sensed data. Non-site-specific error analysis consists of identifying general problems with the resulting classification, but provides no information about the locational accuracy of the assessment (pixel misclassification), or how well each pixel was classified. Even though there was close agreement between the estimated areas and the areas derived from the classified map, the classification may still have been inaccurate in terms of locational or site-specific errors. If a substantial difference between the total areas in the estimate and the total areas in the classification occurred, it would be clear that the classification had not performed well. Thus, limitations of using non-site-specific error assessment quickly reveal themselves. The proportions of the three categories are similar in each map, but the physical location of each category in the resultant map (Figure 4(a), 4(b) ) does not match the original map. Non-site-specific accuracy assessment has limited utility; it is useful only for detecting gross problems with discrete classifications because of its inherent inability to identify locational errors. In other words, non-site-specific accuracy assessment can provide some measure of agreement between a reference map and classification in terms of the areal extent of each category reference map (G=Grass, F=Forest, W=Water) not provide any information about the Locational accuracy of the classification. Locational accuracy is important if the objective is to derive some form of spatial representation of land cover characteristics from the classification of remotely-sensed data. Results derived from an error assessment using the non-site-specific technique may be misleading. Site-specific error analysis is a more rigorous technique for assessing accuracy. The result of such a comparison is to report the areal proportions of categories. It does not consider agreement between the two maps at specific locations but only the overall figures for the two maps. as inventory error is occurred, as the process considers only the aggregate areas for classes rather than the placement of classes on the map.
Fig.4(a) Reference map G=grass, F=forest, W=water
Source: USACERL Technical Report EN-95/04 Research Laboratories April 1995
Fig.4(b) Classified map
Source: USACERL Technical Report EN-95/04 Research Laboratories April 1995
Site Specific Accuracy: It is based on the detailed assessment of agreement between the map and reference data at specific locations. The goal of accuracy assessment is to estimate the accuracy of the image classification using a sample of reference data. Site specific accuracy is based on detailed assessment between the two maps. In most cases pixels are the unit of comparison known as classification error
This is misidentification of pixels There may also be boundary errors.
Site-specific error analysis takes into account locational accuracy of the classification. This process makes a pixel-by-pixel comparison between the remotely-sensed, data-derived thematic map and a “true” map of the area with the same theme. This accuracy assessment approach is still prone to errors attributable to control point location error, boundary line error, and pixel misclassification. Usually, the purpose of classification is to derive a thematic map of some unknown characteristic of the Earth’s surface or some characteristic that has changed over time, so it would be unusual for a complete and current reference map to exist. However, the reference map can be represented by a sample of locations within each theme for the area of interest. The selection sample locations and sample size is determined by the requirements of the subsequent analysis. In most cases, the analysis will include inter-class analysis as well as overall accuracy analysis.
Data requirements, sampling approach, and sample size: The data requirements for performing a classification include remotely-sensed data, ground-truthed training data for characterizing spectral parameters of each class (e.g., “plant community type”), and an independent set of ground-truth data (reference data) for accuracy assessment. Since it is impractical to have a complete pixel-by-pixel “ground truth” map, an adequate subset or sample number of points (pixels) is needed for there to be a rigorous accuracy assessment of a classification. One must use an appropriate sampling technique that meets statistical requirements. Site-specific accuracy assessment can be evaluated for an overall classification or on a per-category basis. The more rigorous and useful approach is to evaluate accuracy on a per-category basis, which provides more insight into classification errors that may be unique to specific categories. Category specific errors are not as readily apparent in an overall assessment. A stratified random method is an appropriate sampling method for accuracy assessment on a per-category basis (Van Genderen and Lock 1977). The Kappa Coefficient of Agreement, which is a statistical measure of the significance of difference between observed agreements of two classifications versus agreement due to random chance, is commonly used in both types of assessment and requires a multinomial sampling method. A stratified random sample is a multinomial sampling method, and therefore is an appropriate sampling method to be used with the Kappa statistic. With the stratified random approach, points are stratified by map category, and simple random sampling is employed within each stratum (Stehman 1992). Once the sampling design has been determined, the number of sample points must be determined. The number of reference pixels required for accuracy assessment depends on the minimum level of accuracy. Equations suitable for determining a minimum number of pixels required for different levels of accuracy (Jensen 1986). One approach to determining the total number of reference pixels (observations) needed to assess the accuracy at a minimum level uses Equation 1 given below:
= ( )( ′)/ ………………………………………………..Eq (1)
Where,
N= total no of points to be sampled
P=Expected percent accuracy
q’= 100-p
E=Allowable accuracy
The equation above computes the ideal number of pixels to sample as reference points for an overall accuracy assessment of a classification. As allowable error increases, the number of required sample points decreases. Assuming a stratified random sampling approach, the total number of reference pixels or sample points required at a given expected accuracy and allowable error must be further stratified by thematic category. Van Genderen states that a minimum sample size of 20 is required for an 85 percent classification accuracy, while 30 observations (reference pixels) per class are required for 90 percent accuracy (at the 0.05 confidence level) (Van Genderen and Lock 1977).
Locating random points: The simplest way to generate random points is to pick two random numbers, one the horizontal and the other the vertical coordinate. In the UTM coordinate system, one random number would be chosen for the easting and another random number would be chosen for the northing. This is simple to do in a GIS using GRASS, the program random can be used to identify random pixels in a raster map (Westervelt et al. 1987)
Error Matrix: An error matrix can be useful when evaluating the effectiveness of a discrete classification of remotely-sensed data. An error matrix is a means of reporting site-specific error (Campbell 1987). The error matrix is derived from a comparison of reference map pixels to the classified map pixels and is organized as a two dimensional matrix. This matrix takes the form of the columns representing the reference data by category and rows representing the classification by category. An error matrix is also referred to as a confusion matrix or contingency table, and in many cases, classification categories are arranged in columns and reference data represented along the rows of the matrix (Janssen and vander Well 1994). However, for consistency and ease of explanation, this document assumes an error matrix arranged according to the original definition shown in Table 1.The standard form for reporting site-specific error is the error matrix sometimes referred to as the confusion matrix. It identifies not only overall errors for each category but also misclassifications (due to confusion between categories) by category .It consists of an n × n array, where n represents the number of categories. The left-hand side (y axis) is labeled with the categories on the map; the upper edge (x axis) is labeled with the same n categories from the reference data. The values in the matrix represent numbers of pixels for which the analyst has been able to compare the evaluated and reference images. In the evaluation of classification errors, a classification error matrix is typically formed. This matrix is sometimes called confusion matrix or contingency table. In this table, classification is given as rows and verification (ground truth) is given as columns for each sample point. The diagonal elements in this matrix indicate numbers of sample for which the classification results agree with the reference data .off diagonal elements in each row present the numbers of sample that has been misclassified by the classifier, i.e., the classifier is committing a label to those samples which actually belong to other labels. The misclassification error is called commission error.The off-diagonal elements in each column are those samples being omitted by the classifier. Therefore, the misclassification error is also called omission error.
Table.1
Source: http://www.emodnet-seabedhabitats.eu/default.aspx?page=1771
Total 58 reference samples were in row crops out of which 46 were correctly classified as such whereas early succession,5; pasture or hay, 7; coniferous forest, 0; deciduous forest, 0,open water, 0 were wrongly classified as row crops. Across the row we see what pixels mapped as a given category actually were when field-verified. The diagonal from upper left to lower right gives numbers of correctly classified pixels for each class. Row totals give the total number of pixels in each class as recorded on the reference image column totals show the numbers of pixels assigned to each class on the thematic map. Misclassification can be identified by Error of Commission and Error of Omission. Error of commission or error of ommition inclusion is wrongly inclusion of a sample location in particular category due tomisclassification. EOC is the off-diagonal values across the rows. Error of omission is when the same sample location is removed from another category . EOO is the off diagonal values down the column.
Measures of agreement: From the error matrix, several measures of classification accuracy can be calculated, including percentage of pixels correctly classified, errors of omission, and errors of commission. In addition, statistical measures such as the Kappa Coefficient of Agreement, Kappa variance, and Kappa standard normal deviate can be calculated from the error matrix. The most commonly used measure of agreement is percentage of pixels correctly classified.
Simply the number of pixels correctly classified from the validation set of pixels divided by the total number of reference pixels. Percentage correct is calculated by dividing the sum of the diagonal entries of the error matrix by the total number of reference pixels. Therefore, percent correct provides an overall accuracy assessment of a classification. However, if minimum classification accuracy is required, it is necessary to verify that the calculated percent correct for the overall classification does indeed exceed the pre-determined minimum classification accuracy with some level of confidence. To assure that a minimum overall accuracy, a one-tailed lower confidence limit at a specific level of confidence must exceed the minimum accuracy standard (Jensen 1986). For example, the lower confidence limit for a one-tailed binomial distribution at a 95 percent confidence level can be calculated by Equation 2 given below.
Where,
P=95 % confidence limit
P’=the % correct for the category
q’=100 – p’
n= no of observation in a particular categories
As with overall percentage correct calculations, if the confidence interval for percentage correct for an individual category is greater than the minimum required accuracy for a specific category, then the accuracy of classification of that individual category meets or exceeds the minimum accuracy for that category at a certain level of confidence. In addition to providing information necessary to calculate percentage correct for an overall classification or for individual categories with respective confidence intervals, an error matrix also contains other information useful in assessing the accuracy of a classification. The diagonal that extends from the upper left corner to the lower right corner of the matrix is referred to as “the diagonal,” where each diagonal entry represents the number of correctly classified pixels for that specific category. In addition, nondiagonal values in each column represent errors of omission and nondiagonal values in each row represent errors of commission.
There are 3 indices to evaluate the accuracy of attribute data
Overall Accuracy (Percentage Correctly Classified)
Where,
Sd= sum of values along diagonal n = total no of sample location
Producers Accuracy: Producer accuracy is calculated in a similar fashion, as user’s accuracy with the only difference being that the total number of correctly classified pixels for a category is divided by the total number of pixels in that category in the classification map (i.e., the row marginal or row total) instead of dividing by the total number of pixels in that category in the reference map (i.e., the column marginal or column total)
Probability of a sample spatial data unit being correctly classified and is measure of EOO for particular category to which sample belong.
Where,
Ci = correctly classified sample location in column Ct = total number of sample location in column
User’s Accuracy: User accuracy, or reliability, is actually the equivalent of percentage correct for an individual category and is calculated as described earlier. Probability that a spatial data unit classified on map or image actually represent that particular category on ground. It is measure of EOC.
Where,
Ri = correctly classify sample location in row
Rt = total number of sample location in row
EOC = 100-user’s accuracy
The omission error is 1 – producer’s accuracy;
The commission error is 1 – user’s accuracy
Kappa Coefficient: κ (kappa) is a measure of the difference between the observed agreement between two maps as reported by the diagonal entries in the error matrix. One of the advantages of using this method is that we can statistically compare two classification products. For example, two classification maps can be made using different algorithms and we can use the same reference data to verify them.
K = Po-Pc/1-Pc
Where,
Po = ∑mi=1 Pii = 1/N ∑mi=1 nii
Pc = ∑mi=1 Pi + P+I = 1/N2∑mi=1 ni + n + i
ni + n + i = A+B+C+D+E+F = 269
K = 0.5714-0.2196/1-0.2196 = 0.451
Table.2
Source: self
A = 5*3 = 15
B = 9*10 = 90
C = 13*13 = 169
D = 7*16= 112
E = 14*16 = 224
F = 17*9 = 153
Pc = 269/35*35= 0.2196
Po = 1+4+2+6+4+8+5= 30/65=0.4615
Table 2 shows the calculation of the kappa coefficient. The Kappa Coefficient is a discrete multivariate measure that differs from the usual measures of overall accuracy assessment in basically two ways. First, the calculation takes into account all of the elements of the error matrix, not just the diagonals of the matrix (Foody 1992). This has the effect of taking into account chance agreement in the classification. The resulting Kappa measure compensates for chance agreement in the classification and provides a measure of how much better the classification performed in comparison to the probability of random assigning of pixels to their correct categories. Estimated variance of the Kappa Coefficient of Agreement can also be calculated. This is most useful in comparing two different approaches to the same classification scheme by allowing a standard normal deviate, or Z score, to be calculated. The Z score is used to determine if the differences in accuracy levels for two classifications with the same resultant classification scheme are significant. The Kappa test statistic tests the null hypothesis that two independent classifiers do not agree on the rating or classification of the same physical object.
Uses of Kappa
Compare two error matrices
Weight cells in error matrix according to severity of misclassification.
Provide error bounds on accuracy.
you can view video on Accuracy Assessment |