16 Digital Image Processing

Dr. Sandeep Gupta

1. Aim of the Module

2. Introduction

3. Digital Image

4. Image Acquisition

5. Image Pre-processing

6. Image Enhancement

7. Image Transformation

8. Image Classification

9. Applications of Digital Image Processing

10. Conclusions

11. References

1. Aim of the module

This chapter provides an approach to understand about a digital image, a short history about the development in the field of processing of digital images. The digital values play an important role not only in remote sensing, but also in pattern recognition, medical imaging, GIS integration etc. The readers after going through this chapter will be able understand the digital image, its structure and formats, various pre-processing and processing steps and techniques which are helpful in extracting the information directly or indirectly. The techniques are useful in image acquisition, image corrections (radiometric, geometric), image enhancement, image transformation, image classification and accuracy assessment. Finally, readers will also come to know about the application of techniques of digital image processing.

2. Introduction

“A single portrait is enough to carve the whole story”

As learned in earlier chapter a digital image is like a gray colored two-dimensional matrix made up of finite set of squared cells arranged in rows and columns. These cells in digital world, called as ‘picture elements or pixels’, have some assigned numerical value called as digital number (DN) which, in remote sensing, relates to brightness of the cell. A row of pixels is called as a scan line.

The development in digital image processing has been evolved since the inception of digital computers and the two goes in parallel. Digital image processing methods were introduced in early 1920s. In 1921, Bartlane cable picture transmission system was used to transmit digitized newspaper images over submarine cable lines between London and New York. The images were coded and sent by telegraph at the transmitter end and decoded into images at the receiver using telegraph printers. The images were initially coded with 5 gray levels, but this number was increased to 15 in 1929 thereby enhancing the quality of the reproduced images (BCPTS, 2016).

The availability of digital computers was powerful enough to perform meaningful image processing tasks appeared in early 1960s. The use of such computers and algorithms for improving the quality of images of the moon taken by Ranger 7 probe started at Jet Propulsion Laboratory (JPL), NASA USA in 1964. In JPL, the image processing tasks involved was to correct various types of image distortion inherent in the on-board television camera (Gonzalez and Woods, 2002). Around 1970, the photographs captured using photographic imaging techniques were transferred to computers for the purpose of automated computer analysis using raster digitization methods (Konecny, 2003). The 1970s saw a surge in space missions and computing technology. The overwhelming increase in scientific interest due to growing need felt by scientists working in medicine and planetary sciences for remote observations of objects and events has led to the multi-dimensional and multifold development of digital image processing techniques. The invention of computerized tomography (CT) for medical diagnosis and the release of satellite images of Earth and extra-planetary features taken by Landsat missions for earth observation in early 1970s were landmark incidents that featured the later development of digital image processing techniques. The multispectrum imaging characteristics of Landsat sensors binds the developers to evolve the image processing techniques for visual image interpretation and digital analysis of information gained in the form of multispectral satellite images in a meaningful way.

It will not be inappropriate to say that this century is the digital century where we are surrounded with digital products and technologies and the multimedia information processing is at the core. The advancement in digital image processing, thus, also coupled with these products and technologies.

The digital image processing is a very vast field. Here, we will try to explain the basic concepts of different processing techniques applied to digital images for different applications without much dwelling into mathematical description of imaging and processing.

3. Digital Image

Any image processing task include few essential components. For example, a sensing step will require

i) a physical device that will be sensitive towards absorbing reflected/emitted energy from the source object and

ii) a digitizer that will basically convert the energy absorbed by the physical sensors, for example in an electrical output form (electrical volt), into a digital form (bits or bytes). These components has been given in the following figure 2.

Figure 2. Essential components of a typical image processing system

The area of remote sensing what we experience today is evolved over a long period of time from aerial photography where images were use to record as photographs or photograph-like images. A photographic image is an analog picture or image in the form of a physical record comprising pieces of paper or film with chemical coatings on it that represent a record of pattern of the image. In analog images the brightnesses within an image is analogous or proportional to the brightnesses within a scene. The problem lies with the analog pictures is that the user faces difficulties in storage, transmission, searching and analysis. Contrary to analog images, the digital images are arrays of pixels, each with a discrete digital number and are also discernable when analysed visually. The advantages of having digital images are they can be easily handled, stored, transmitted, retrieved, exchanged from one format to other and statistically manipulated which is not possible in case of analog images. Although our ability increases to display, examine, and analyse the modern remotely sensed digital image data but these data are also subject to corruption, damage to disk drives, magnetic fields, and deterioration of the physical media. Sometimes, the user also faces problem in accessing old data because of obsolescence of the hardware necessary to read the digital data.

In remote sensing, a digital image acquired through different sensors is represented in the form of a matrix consisting of pixels arranged in rows and columns. Each pixel has a unique location and are indexed according to their radiometric resolution, for example, pixels of a 8-bit image is indexed between 0-255. A subset of a multispectral Landsat 8 image is shown below which is also utilized for different purposes in later sections.

Figure 3. Pixels (8×10) of a part of the Landsat 8 satellite image (Image.tif)

The associated and/or additional information such as metadata of the image is stored either in a separate file or placed in the data file itself as a header (table 1). The image file contains only pixel values (table 2).

Table 1. Metadata Information of the Satellite Image as presented in Image.tif

Table 2. Pixel (spatial resolution 30m) data of the part of satellite image (Image.tif) as appeared in 3 bands (Band 1: Near Infra-Red, Band 2: Red and Band 3: Green)

(X: X-coordinate; Y: Y-coordinate; and DN: Digital Number or Pixel Value)

The digital image processing is a big domain. The tools and technology for data acquisition is different for different applications. Here, in the forthcoming sections we will be describing only major processing stages of digital images that are commonly used in the field of remote sensing.

4. Image Acquisitio

The reflected energy acquisition from the Earth’s surface is measured by imaging sensors that has a capability to digitize the signal collected by the sensor in its Video and Digital camera. The sensors are mounted on an aircraft or spacecraft platforms. In earlier days, the conventional camera and analog-to-digital converters were there to acquire an image. A digital image can be also be produced from papers using either a CCD camera or a scanner. It is important to mention here that in remote sensing an imaging system is a complex system where reflection or scattering of energy from the Earth’s surface, followed by transmission through the atmosphere to sensors, and the data transmission from sensor to ground station on the Earth’s surface where after an initial pre-processing such as volts to DN conversion, removing of noises, resampling and others, the data is ready in an image format to be utilized for different purposes such as land use/land cover study, disaster monitoring, environmental pollution study etc. It is well known that the data are recorded in optical as well as microwave spectral regions through both active and passive sensing and hence are delivered in to distributed spectral bands (visible, near infra-red, short wave infra-red, microwave) at certain spatial and radiometric resolution. Generally, the digital images are of four kind: a) Binary, b) Grayscale [0 to 255; black to white], c) True color or RGB [ 0 to 255 ] (such as 24-bit color images), and d) Indexed. Further, data acquired and delivered in raster and vector format with additional information in a metadata form.

The digital images obtained from multispectral sensors are arranged in different spectral bands and are represented in a matrix form (figure 4).

Figure 4. Structure of a Multispectral Image and feature elements

The digital data are for storing purpose are organized in the following three formats. An image, for example, consisting of three bands of same resolution can be visualized as three superimposed images with corresponding pixels in one band registering exactly to those in the other bands. The data formats are:

Band Interleaved by Pixel (BIP)

This is one of the initial formats for digital data. In this, the data are organized in sequence values for line 1, pixel 1, band 1; then for line 1, pixel 1, band 2; then finally for line 1, pixel 1, band 3. Next are the three bands for line 1, pixel 2, and so on. In this way, the pixel values (of pixel 1) for all three bands are written followed by the values for the next pixels (pixel 2, pixel 3 and so on) are represented. This arrangement is advantageous for many analyses in which the pixel value (DN) vector is queried or required to calculate another quantity. The disadvantage with this image format is that it becomes bulky while displaying.

Band Interleaved by Line (BIL)

The BIL treat each line of the data as a separate unit. In sequence, the data are arranged as line 1 for band 1, line 1 for band 2, line 1 for band 3, line 2 for band 1, line 2 for band 2, line 2 for band 3 and so on. In this way, each line is represented in all three bands before the next line is encountered. A common variation in the BIL format is to group lines in sets of 3 or 7, for example, rather than to consider each single line as the unit.

Band Sequential (BSQ)

In BSQ, all the data for band 1 are written in sequence, followed by all data for band 2, and so on. In this, each band is treated as a separate unit. This is a most commonly applied image format since it presents the data which closely matches the data structure used for display and analysis.

5. Image Pre-processing

Image Pre-processing is a process to enhance the image in order to make it suitable for further processing. It includes mainly radiometric and geometric corrections.

5.1 Geometric Corrections

The geometric distortions in the remotely sensed images is inherent in nature and is depend on the manner in which they are acquired. It is to be noted that the Earth’s geometry is three-dimensional and spherical in shape and therefore, transformations of remotely sensed image becomes necessary to map the curved Earth’s surface to a two-dimensional plane. The geometric distortions in the images occurs due to several reasons which can be broadly classified into two categories: systematic or predictable errors and nonsystematic or random errors. The source of systematic or internal errors are: geometric distortion in the image due to terrain effects (elevation differences); cross-scan geometric distortion due to skew in ground swath at the time of scanning (by the time ground swath takes place the ground track changes due to movement of space-/air-craft); along scan geometric distortion due to changing mirror scan rate at the time of scan; panoramic distortion – the ground area imaged is proportional to the tangent of the scan angle rather than to the angle itself, and since data are sampled at regular intervals, this produces along scan distortion, particularly where instantaneous field of view (IFOV) is larger causing imaged ground area at the extremities of the scan larger laterally than the region sensed at nadir giving a compression of the image data towards its edges; along-track scale distortion is caused when platform speed changes resulting in the change in the ground track covered by successive mirror scan; the simultaneous satellite scanning towards west-east with Earth rotation in North-South results in a shift of ground swath causing along-track distortion. The platform instability while scanning results in nonsystematic or external errors. This is caused by: platform attitude (roll, yaw and pitch) changes during forward motion and this leads to image rotation and image displacement along-track and across track while scanning (figure 5a-5c); change in the remote sensing platform altitude causes the change in scale at constant angular IFOV and field of view (figure 5d). The above mentioned geometric distortions can be rectified using the methods given below.

Figure 5. Effect of platform attitude errors

All the remote sensing application requires planimetrically correct versions of remotely captured images so that they will match to other imagery and to maps and will provide the basis for accurate measurements of distance and area. There are two techniques that can be used to correct the various types of geometric distortions present in digital image data: one is orbital geometry modelling and the other one, rather used in many image processing, is the transformation based on ground control points (GCPs) (Sunar and Özkan, 2000). The GCPs are identified spectrally distinct areas as small as a few pixels denoted as image coordinates. The examples are road intersections, distinctive water bodies, edges of land-cover parcels, stream junctions, and other similar features. To interrelate the geometrically correct (map) coordinates and the distorted image coordinates, a coordinate transformation is performed by applying a least-squares regression analysis to the set of GCPs and determining the coefficients of the transformation matrix for linear or nonlinear transformations. It is to be noted here that as the number of GCPs is increased, registration error decreases. After calculating the transformation coefficients, various resampling methods such as nearest neighbour, bilinear or cubic convolution can be used to determine the pixel values to fill into the corrected output image file from the original distorted image file.

5.1.1 Georegistration and Georeferencing

The geometric registration or georegistration involves identifying several image coordinates – row and column, or GCPs in the distorted image and link them to their true positions in a target map or ground coordinates such as latitude-longitude using transformation function (parameters) that basically relates the coordinates of two systems. The true ground coordinates are typically measured from a target map, either in paper or digital format. The target map could be a topographic or any other map that has been transformed to the wanted map projection system before. This is called as image-to-map registration. Geometric registration may also be performed by registering one (or more) images to another image, instead of to geographic coordinates. This is called image-to-image registration and is often done prior to performing various image transformation procedures, which involve comparing images from different sensors or dates. One more important concept with respect to geometry of satellite image is rectification. Rectification is the process by which the geometry of an image area is made planimetric (Haralick, 1973). The accuracy of the registration at each GCP after registration is measured in terms of the location error which is the root mean square error (RMSE) – the standard deviation of the difference between actual positions of GCPs and their new calculated positions (i.e., after registration). These differences between measured and transformed GCPs coordinates are known as residual error or simply residuals. Usually RMSE is reported in units of image pixels for both north– south and east–west directions. If analysts wish to assess the overall accuracy of the registration, some of the GCPs should be withheld from the registration procedure and then used to evaluate its success. There is another term called geocoding which is georeferencing with subsequent resampling and includes the two step process: i) each new raster pixel is projected using transformation function onto the original image and ii) a digital number for the new pixel is determined and stored.

5.1.2 Transformation

Image transformation is an integral of georeferencing and involves two steps: i) selection of suitable transformation method and ii) determination of the transformation parameters. The polynomial transformation is a general type of transformation that involves 1st, 2nd and nth order of transformation. The 1st order polynomial transformation relates map coordinates (x,y) with image coordinates (m,n) in the following manner:

x = a+bm+cn	(1)
y = d+em+fn	(2)

The six transformation parameters (a to f) in the above two equations (1st order polynomial) is determined by a required number of GCPs (three) using a least squares adjustments for overall best fit of image and map. For the 2nd and 3rd order polynomial a minimum number of GCP is six and ten, respectively. An error of transformation is calculated and overall transformation accuracy is determined as variances or RMSE in both x– and y-direction and in terms of an overall RMSE. It is important to mention here that RMSE conveys an overall accuracy but does not indicate which part of the image is accurately transformed and which are not. It is also important to note that RMSE is valid only for the area bounded by GCPs and therefore, its selection should be well distributed and includes the locations near the edges of the image.

5.1.3 Resampling

Image resampling is a process of interpolation to bring an image into registration with another image or a planimetrically correct map. Resampling scales, rotates, translates, and performs related manipulations as necessary to bring the geometry of an uncorrected image to match a particular reference image of desired properties.

The computationally efficient and preferred resampling approach is a nearest-neighbor in which each “corrected” pixel is assigned the value from the nearest “uncorrected” pixel. The advantages of this approach is its ability to preserve the original values of the unaltered image.

A relatively more complex resampling approach is bilinear interpolation where a value is calculated for each output pixel based on a weighted average of the four nearest input pixels. While output value calculation, a nearer pixel value is given a greater influence than a more distant pixels. Because each output value is based on several input values, the output image will not have the unnaturally blocky appearance compare to some nearest-neighbor images. The image therefore has a more “natural” look. However, there are important changes. First, because bilinear interpolation creates new pixel values, the brightness values in the input image are lost. The analyst may find that the range of brightness values in the output image differs from those in the input image. Such changes to digital brightness values may be significant in later processing steps. Second, because the resampling is conducted by averaging over areas (i.e., blocks of pixels), it decreases spatial resolution by a kind of “smearing” caused by averaging small features with adjacent background pixels.

Another relatively complex resampling method is cubic convolution. Cubic convolution uses a weighted average of values within a neighborhood that extends about two pixels in each direction, usually encompassing 16 adjacent pixels. Typically, the images produced by cubic convolution resampling are much more attractive than those of other procedures, but the data are altered more than are those of nearest-neighbor or bilinear interpolation, the computations are more intensive, and the minimum number of GCPs is larger.

5.2.Radiometric corrections

It include correcting for sensor irregularities and unwanted sensor or atmospheric noise causing visible errors in the raw data and converting the data so they accurately represent the reflected or emitted radiation measured by the sensor. The radiometric problems in the data are mainly of three kinds: Periodic Line Dropouts, Line stripping and Random noise or spike corrections.

5.2.1 Periodic Line Dropouts

In this, one of the detectors of the sensor either gives wrong data or stop functioning. For example, Landsat-7 Enhanced Thematic Mapper (ETM) has 16 detectors per band (channel) except thermal channel. In this, every sixth scan line has a string of zeros which plots as a black line on the image. The first step in the radiometric correction or restoration process is to measure the average DN value per scan line for the entire scene. The average DN value for each scan line is then compared with this scene average. Any scan line deviating from the average by more than a designated threshold value is identified as defective. In the next step, the defective lines are replaced. In this, for each pixel in a defective line, an average DN is calculated using DNs for the corresponding pixel on the preceding and succeeding scan lines. The average DN is substituted for the defective pixel. The resulting image is a major improvement, although every sixth scan line consists of artificial data. This restoration process is equally effective for random line dropouts that do not follow a systematic pattern.

5.2.2 Line stripping

It is possible that with time the response of some detectors of a band may shift to higher or lower levels. As a result of which every scan line recorded by that detector is brighter or darker than the other lines. This defect is known as periodic line striping. This can be understood by an example in which for example, if every second line (detector number 2) has this defect, i.e. ‘second-line striping’ where every second line of that detector has a brightness offset like the second line can have digital value twice that of the other detectors (lines), causing every second scan line to be twice as bright as the scene average.

For this problem, one of the radiometric correction or restoration method is to plot equal number of histograms for the DNs recorded by each detector (for e.g. 16 histograms for 16 detectors). A comparison of these histograms with a histogram for the entire scene can be made. Then, for each detector the mean and standard deviation are adjusted to match values for the entire scene. Alternatively, the DNs of detector number 2 can be altered by using any arithmetic operator by a single factor to produce the corrected values from which the restored image is plotted.

In another restoration method, a histogram of DNs for each of the 16 detectors if first plotted. Then deviations in mean or median values for the histograms are used to recognize and determine corrections for detector differences.

5.2.3 Random noise or spike corrections

The line dropouts and line striping are the form of nonrandom noise in the image that may appear. Random noise occurs in situations where individual pixels with DNs are much higher or lower than the surrounding pixels. In the image these pixels produce bright and dark spots that spoil the image quality. Therefore, the random noise requires more distinguished restoration method. These spots can be removed by digital filters such as moving average filter.

6. Image Enhancement

The purpose of image enhancement is to prepare the image more interpretable for a particular application and/or feature extraction. The image enhancement techniques can be classified in many ways. In contrast enhancement or global enhancement, the raw data is transformed by using the statistics computed over the entire image. The examples of contrast enhancement techniques are linear contrast stretch, histogram equalized stretch and piece-wise contrast stretch. On the other hand, spatial or local enhancement considers the local conditions only and these can vary considerably over an image. The examples of spatial enhancement techniques are image smoothing and sharpening filters.

6.1 Contrast Enhancement

Contrast enhancement involves changing the original values so that more of the available range is used, thereby increasing the contrast between targets and their backgrounds. The contrast enhancements, thus, utilizes image histogram to be stretched over the entire grey range. A histogram is a graphical representation of the amount of DN values distributed over an entire image, for example the amount (frequency) of DN value 10 (x-axis) is 10000 pixels (y-axis) in an image. For a 8-bit image, the DN ranges in between 0 to 255 (or black to white). Commonly, the histogram is statistically represented as mean, standard deviation, minimum and maximum (range). A narrow histogram, thus, having small standard deviation, shows a low contrast image where all the DNs are nearby with a fewer grey values.

6.1.1 Linear Contrast Stretch

In a 8-bit image, a DN value in the low end of the original histogram is assigned to extreme black (0), and a value at the high end is assigned to extreme white (255). The remaining pixel values are distributed (interpolated) linearly between these extremes. One drawback of the linear stretch, is that it assigns as many display levels to the rarely occurring DN values as it does to the frequently occurring values. However, linear contrast stretch, putting (min, max) at (0,255) in most cases still produces a rather dull image. Even though all gray shades of the display are utilized, the bulk of the pixels are displayed in mid gray. This is caused by the more or less normal distribution, within the minimum and maximum values in the tail of the distribution. For this reason it is common to cut off the tails of the distribution at the lower and upper range (usually be defining the size of the tails by their percentage from the total).

6.1.2 Histogram Equalization Stretch

It is a non-linear transformation of image pixels where the original histogram is being readjusted to create a uniform pixel density along the horizontal grey value (DN) axis. This involves two steps: i) it computes the histograms of the original image and the cumulative frequency density percentage and ii) computation of transformation function based on which the contrast manipulation takes place in the output scene. Thus, in this method, both the shape and the extent of the histogram is taken into consideration. The underlying principle is based upon the assumption that each histogram class in the displayed image must contain an approximately equal number of pixel values, so that the histogram of these displayed values are uniform throughout the classes, and certain adjacent grey values can be group. Due to this reason, the number of grey levels in the enhanced image is less than the number of grey levels in the original image.

6.1.3 Piece-wise Linear Stretch

This method is similar to the linear contrast stretch, but the linear interpolation of the output values is applied between user defined DN values. This method is useful to enhance only a certain land cover type, for example water. The data values for this feature are in the range of 5 to 18, and in order to be able to discriminate as much as possible, it is wise to use all available gray levels for this feature only. In this way detailed differences within the feature of interest appear, where as the remaining features are assigned to a single gray tone.

6.2 Spatial Filtering

In spatial filtering operation the image is divided into its constituent spatial frequencies – number of changes in brightness value per unit distance for any particular part of an image, and selectively altering certain spatial frequencies to emphasize some image features. Thus, spatial filters are designed to highlight or suppress specific features in an image based on their spatial frequency. Spatial frequency is related to the concept of image texture. A ‘rough’ textured areas of an image, where the tonal changes are dramatic over a small area, have high spatial frequencies, while “smooth” areas with few changes in tone over several pixels, have low spatial frequencies.

The filtering procedure involves ‘moving window’ concept of a few pixels in (kernel) size like 3×3 or 5×5 (figure 3) over each pixel in the image, applying a mathematical calculation (or convolution) using the pixel values under that window by assigning a weight to each pixel in the window, and replacing the central pixel with the new value. The window is moved along in both the row and column dimensions one pixel at a time and the calculation is repeated until the entire image has been filtered and a “new” image has been generated. By varying the calculation performed and the weightings of the individual pixels in the filter window, filters can be designed to enhance or suppress different types of features. It is to be noted here that in a moving window concept for the pixels along the border of the image to be in act as a centre pixel, the border pixels are duplicated temporarily during convolution process. Essentially, all the filtering operation calculate the ‘gain’ (sum of the kernel values)and multiplies the weighted or assigned kernel value with it:

and in the section 4.2.1, the value of 0.11 with 0.055 as below 4.2.1 Low-Pass or Low-Frequency Filter For example, by applying averaging filter for the kernel pixels given in table 3, we get the gain as 0.055. (i.e. the value 0.11 is replaced with 0.055).

6.2.1 Low-Pass or Low-Frequency Filter

The low-pass filters block the high spatial frequency details, thereby allowing to appear only those pixels where there are small or fewer tonal variation over several pixels, i.e. pixels having low spatial frequencies. In this way, the low-frequency filter evaluates a particular input pixel brightness value and the pixels surrounding the input pixel, and outputs a new brightness value which is the mean of this convolution. Thus, the low-pass filters using an averaging option ‘smoothen’ the image. For example, by applying averaging filter for the kernel pixels given in table 3, we get the gain as 0.11. The other view of the smoothing operation is that it blur the image, mainly at the edges of the objects. The size of the kernel also affect the degree of smoothing. The larger the size of the moving window is, the more blurred will be the output image. Therefore, contrast stretching is necessary after a filtering operation to utilize the full dynamic range of gray values.

Table 3. Kernel filter pixel values

To minimize the high degree of blurring in the image, it is important that a larger pixel value be assigned to the centre pixel. For example, if we replace the centre pixel value (2) with 4 by multiplying it with two, then the gain for the resulting kernel would be 0.05.

6.2.2 High-Pass or High-Frequency Filter

A simple high pass filter works by subtracting a low pass filtered image (pixels by pixel) from the unprocessed original image. The high-pass filters block the low spatial frequency details, thereby allowing to appear only those pixels where there are large or more tonal variation over a few pixels distance only, i.e. pixels having high spatial frequencies, thereby emphasizing edges. In this way, the high-frequency filter evaluates a particular input pixel brightness value and its adjacent pixels, and outputs a new brightness value which is the mean of this convolution. Often, the difference between the centre pixel value and its neighbouring pixel values results in edge detection. Thus, high centre pixel value than its corresponding neighbours is more suitable for the result. Alternatively, the edging result can also be enhanced by converting the neighbourng pixel values to negative (table 4). The gain for the resulting kernel would be 0.25. The output high-pass filtered image can be used as an aid (another band) during the classification of images. Some of the known high-pass filters are Laplacian edge enhancement filter, Sobel edge detection filters etc.

Table 4. Kernel filter pixel values for edge enhancement

Laplacian Edge Enhancement filter

The Laplacian filter calculate the difference between the DN value of the central pixel and the average of the DN values of four adjacent pixels located horizontally and vertically. The sum of all elements within the mask is zero (table 5b).

Table 5. Modified Laplacian filtering operation

a. Input Image

b. Laplacian filter

c. Output Image

The above in equation form is written as:

y = (x-a2) + (x-a7) + (x-a4) + (x-a5) (4)

Sobel Edge Enhancement filter

The Sobel filter computes the gradient (slope) horizontally and vertically. In this, two high-frequency
filters are being calculated in two different directions. In the vertical filter, the kernel values are
basically rotated 900 to the horizontal kernel values (table 6).

Table 6. Sobel filtering operation

The output images are:

x = the resulting image after applying kx to the input image pixel values in the window

y = the resulting image after applying ky to the input image pixel values in the window

The resultant x and y images (pixel values) are then squared and then square root of their sum is being calculated to produce a final image (z).

7. Image Transformation

Essentially, image transformation involves the generation of a ‘new image’ from two or more sources. The source could be a single image involving two or more spectral bands. The resulting image could be produced after using multitemporal-multispectral image data of the same area. The operation may involve simple arithmetic operations to a more complex statistical calculations.

7.1 Spectral Image Ratioing or Image division

Image division is one of the most commonly applied image transformation technique which reveal the fine variation in spectral responses (DN value) of the planetary features observed in different image bands. A fair example of ratioing is observed in determining the presence and health of the vegetation by utilizing the DNs of near-infra red (NIR) (0.7-1.3 µm) and red (R) (0.6-0.7 µm) bands as the fact is in the former one reflectance from vegetation is highest and lowest in the 0.5-0.7 µm range of visible spectrum in the case of later one. Mathematically, the ratio function in general form can be given as:

where, ORV = output ratio value for the pixel at row i and column j; DNi,j,a and DNi,j,b = pixel (reflectance) value at the same location in band a and band b, respectively.

Normalized Difference Vegetation Index

Different features on the earth’s surface has different spectral reflectance behavior and this concept is well utilized in the case of Normalized Difference Vegetation Index (NDVI). NDVI is an example of more complex band ratioing concept where sums of and differences between two spectral bands is used in the form of an index. As stated above, in this NIR and R bands are being used as two distinct spectral region for monitoring vegetation health. Apart from NDVI, there are other two simple indices. These indices are given below in equation form:

It is to be noted here that the computations in the above three equations are being done in each corresponding pixels for against their values (DNs). Here, in case of NDVI, the difference is basically ‘normalized’ by dividing by the sum of the two DN values (of two bands). An example case of the NDVI for figure 3 (image.tif) is computed from DN values as given in table 2 for each pixel for the entire image and the output NDVI values as computed is given in table 7. It is evident from the table 7 is that after computation the raw NDVI value ranges from -1 to +1. The NDVI range is symmetrical around zero (NIR = R).

7.2 Principal Component Analysis (PCA) or Transformation (PCT)

The principal components analysis is a non-parametric, orthogonal linear transformation technique that help in compressing the dimensionality. For example, it reduces the number of bands (in case of remote sensing multispectral image bands) in the data in to fewer bands, also called ‘components’. Normally, the multispectral image data is usually strongly correlated from one band to the other. For example, the visible bands (band 2 and band 3) in the image.tif shows a positive correlation (figure 6a) and a negative correlation between NIR and visible bands (figure 6b). This can be largely due to spectral reflectance characteristics (here, pixel value or DN) and the greenness present in the environment.

Figure 6. Correlation between one band to the other of Image.tif: a) Positive and b) Negative

The trendlines denote the axis along which the values are plotted. The correlation (scatter) plot will depicts redundancy in information when the pixels values of each band are plotted with the other bands of Landsat 8. Applying PCA reduces this redundancy and thus, also compact the data to be investigated. PCA through transformation creates new images from the uncorrelated values of different images using a linear transformation of correlated variables that correspond to a rotation (of axis of the spectral space) and translation of the original coordinate system (or pixel values). After rotation, the length and the direction of the widest transect of the scattered ellipse is calculated. The transect corresponding to the resulting longest or major axis of the spectral space or ellipse that contains the new pixel value, is called the first principal component (PC1). Geometrically, PC1 points in the direction with the largest variance. PC2 being orthogonal or perpendicular to PC1, points to the second largest variance. For a n-dimensional space (spectral bands), the same pattern is repeated. The direction (of variation) of the PC1 is called as the first eigenvector, and the variance (or proportionally the length of the axis of variation) is called as the first eigenvalue. Algebraically, the basis of eigenvector and eigenvalue is the data’s variance-covariance matrix and correlation matrix. The method also called as eigen vector decomposition (EVD) or spectral decomposition, thus, decomposes the variance-covariance matrix and correlation matrix of the raw data into matrices of eigenvector and eigenvalue. The eigenvectors act as weighting coefficients. The transformation basically maximize the amount of information or variance present in the original data, say eleven bands of Landsat 8, into the fewer number of new components, say three, of which over 90 percent of the information present in the original eleven bands. The advantage is that PCA operates on all bands together and thus, it reduces the difficulty of selecting appropriate bands associated with the band ratioing concept. The major difference between this and other transformation technique is that the new components are ordered in terms of the decreasing amount of variance (or eigenvalues) explained. Therefore, the later components describe the minor variations or sometimes noise only. In this way, interpretation and analysis of data present in the new components is simpler and more efficient. Another benefit of PCA is that it permits the identification of a set of coefficients that concentrates maximum information in a resulting single component. The singular value decomposition (SVD) method is a preferred matrix decomposition algorithm used in PCA for its numerical accuracy. One of the main difference is that EVD works on the variance-covariance matrix and correlation matrix while SVD operates on the raw data matrix.

A result after applying the SVD based PCA on the 3-band image (Image.tif) is given below.

As indicated above, the result of applying SVD based PCA yield the PC1 containing over 92% of the information (variance) present in the original three bands followed by PC2 and PC3, respectively, in a decreasing manner. Thus, the interpreter retains over 92% of the original information in a much more concise form and avoids the replicated and redundant information. This not only reduced the bulkiness of the data to be analysed but also reduces the time and cost to be exhausted for the analysis. Another important point of note regarding PCA is that since each resulting component is a linear combination of the original bands, the interpreter should know the technique to interpret the meaning of new components.

PCA is useful in a wide range of applications including data exploration and visualisation of underlying patterns within correlated data sets, decorrelation, detection of outliers, data compression, feature reduction, enhancement of visual interpretability, improvement of statistical discrimination of clusters, ecological ordination etc.

8. Image Classification

In remote sensing, the information hidden in multispectral image pixels or band pixels can be understood in a variety of ways and image classification is one of the those ways. In classification, the bands are sometimes also called as features, the band pixel value as feature vector and the graph or plot showing the feature vector as feature space. Image classification is basically about classifying the ‘pattern’ of rectangular matrix of n-by-n pixels in to classes such as land use/land cover (LU/LC) classes. The pattern in a multispectral classification will use pixel value or DN. The pixel based classification groups the similar pixels into classes. This can be performed by simply comparing pixels values to one another or training the samples of pixels of known identity and then classifying the pattern based on the training samples. The resulting classes are clusters or regions in a map form, each of which is identified by a unique color or symbol. In this way, spectral or pixel-based classification, which is more common, use the pixel information stored in different bands of an image. Whereas the spatial classification use the spatial relationship of the pixels with its neighboring pixels which may involve proximity, size, shape, directionality, texture etc. It is important to mention here that there is no single ‘right’ approach for image classification, but it depends on the objective of the classification, nature of the data to be classified and available resources. A classification can be divided into four phases: training phase where number of classes are defined, analysis of training statistics, assignment where every pixel is included in any of the defined classes, and map output and assessment that may include map, table etc. It is also essential to perform geometric and radiometric calibration before classification. The image classification technique can be broadly categorized in to two: supervised and unsupervised.

8.1 Supervised Classification

In this, the user ´supervises´ the classification method that uses algorithms employing the pixel value for constructing the particular numerical relationship for each class like water, agriculture land, built-up area, forest etc. In this classification, the first phase is ‘training phase’ in which user ‘trains’ or guide the classification algorithm by assigning a selected number of representative pixels (or training pixels) from a homogeneous area to a particular class they belong to. For this, the user should be familiar with image interpretation technique or should have at least a prior knowledge through which (s)he identifies the pixel(s) of a particular LU/LC class. The training samples, also called as seeds or area of interest (AOI) or regions of interest (ROI) help in estimating the statistical parameters of the particular classifier used. These parameters are the properties of probability model (of classifier). These parameters are sometimes called as ‘signature’ for the particular class for which they have been created. The second phase is called as ‘classification phase’ in which each and every pixel of the entire image is divided into the assigned classes based on the class-specific signature. After classification, the pixel is classified in to the class it most probably matches using the chosen classifier algorithm. The third and final one is ‘output phase’ in which the result output is transferred in a thematic map or tabulated or digital data (ASCII, DAT, SHAPE etc.) form. Depending upon the requirement or end-use purpose, the result is either directly used or passed to the other systems as an input.

There are a number of statistical methods used as a supervised classifiers, each one having merits and demerits. The commonly applied methods are: maximum likelihood classifier (MLC) or estimation (MLE), minimum distance method, parallelepiped method, Bayesian’s method, decision tree classification, fuzzy classification, and artificial neural network (ANN) method. The principles and working algorithms of all these classifiers are now out of the scope of this work and can be referred to any standard text books on remote sensing. However, seeing the wide use of MLC in classification, a description about the algorithm is given below.

Maximum Likelihood Classifier (MLC) or estimation (MLE)

MLC is a supervised statistical approach and is based on two principles:

i. MLC considers both the variances and covariances of the class signatures when assigning each pixel to one of the LU/LC classes represented in the signature file.

ii. It is assumed that the pixel in each feature class sample (AOI) in multidimensional space are normally or equally distributed or is Gaussian (i.e. probability of occurrence is equal) which convey that a class can be described by mean vector and variance-covariance matrix.

Given the above two characteristics for each cell value, the probability using the Bayes’ theorem is estimated for each class to determine the membership of the pixels to the class. Then, each pixel is classified to the class to which it has the highest probability of being a member.

The Bayes’ classification is performed according to

x ϵ ωi, if p(ωi|x) > p(ωj|x) for all j ≠ i ————(10)

where, ωi represent the spectral classes (e.g. water, agriculture, forest etc.), i=1,..n. x is the pixel vector or DN or brightness value of the pixel in a multispectral space.

The probability p(ωj|x) denotes the likelihood of a pixel vector x belongs to class ωi. According to the Bayes’ theorem, therefore, the pixel at x belongs to spectral class ωi if p(ωj|x) is largest for that spectral class. The probability is computed using the normal probability density function (PDF) to classify an unidentified pixel vector x by computing the probability of the x belonging to each class and then assigning the pixel to the class for which it has maximum probability. The normal PDF is calculated as:

σ = standard deviation of DN values of each feature class in the training dataset

x = A single pixel of whole image dataset of one band

The result obtained after applying supervised approach on the satellite image (figure 8a) using MLC method is shown in figure 8c.

8.2 Unsupervised Classification

The unsupervised classifiers, unlike the supervised classifiers, do not uses training of images. Rather, it identifies the agglomeration or clustering of image pixels based upon similarity measures. The commonly applied measure is distance measure which often include Euclidean distance. The classifier segment the whole image randomly in to n number of classes which is assigned by the user. Here, the major role of human lies in the later half of the process once the initial classification step is over. The result obtained after initial classification are examined for corrective measures. The interpreter verifies the output classes by overlaying the map over image or through ground truthing report. In case of any required changes in the map, modification is done either by reclassification by merging or splitting of classes or by direct alteration of the boundary of the class clusters in to the appropriate classes. Once the corrective measures are done, a final unsupervised classified map is produced which is free or minimal from errors. Like in supervised classification, here also different classifiers are used for partitioning the pixels, for example, k-means, c-means, hierarchical clustering etc. Seeing the wide use of k-means algorithm in unsupervised classification, a description about the same is given below.

k-means

The k-means is a numerical, unsupervised and non-deterministic method. The k-means treats each observation in the input data as an object having a location in the space. It is also advantageous to implement k-means since it uses the actual observations of the objects (rather than the larger set of dissimilarity measures), and not just their proximities unlike the hierarchical clustering based approaches. The objective of the k-means method is to minimise the total intra-cluster variance or the squared error function. In this algorithm, the sum of absolute differences between each point and its closest centre in Euclidian 3-D space is minimised. Each centroid is the mean of the points in that cluster. This objective can be expressed in the following equation:

where, there are k clusters Ck with iterations i beginning from 1 to k, D is the total intra-cluster variance or the squared error function, xi is the data point (vector data) and Cj is the mean vector or cluster centre. The minimum computational complexity of the k-means algorithm is O(ndCT), where n is the number of d-dimensional pattern, d is the number of feature vectors, C is the number of assigned clusters and T is the number of iterations.

A result obtained after applying unsupervised approach using Iterative Self-Organizing Data Analysis (ISODATA) clustering technique is shown in figure 8d.

ISODATA clustering is a variant of k-means. In this, splitting and merging of the cluster occurs when the cluster variance is above a pre-determined threshold. The ISODATA, although computationally more intensive than the k-means algorithm, does not solve for a pre-determined number of clusters. It is quite more adaptive as it try hard to optimize a cluster solution.

8.3 Hybrid Classification

A third approach is the hybrid classification strategy which uses the unsupervised spectral class statistics to ‘train’ the pixels in supervised analyses. Here also, user’s knowledge of image interpretation or ground truth or other reference data is used to find out the homogeneous spectral classes or clusters. In this way, this approach include methods that combine the statistics generated from supervised and unsupervised trainings incorporating both thematic and spectral meaning.

A result obtained after applying hybrid classification strategy is shown in figure 8b.

8.4 Classification Accuracy

Before delivered the final output after the classification process, it is important for the interpreter to know the correctness of the deliverables. The correctness therefore mean that what level of agreement is exist between a known or assumed to be correct image or information and the resultant classified image. It is expected that increase in accuracy will lower the bias, i.e. the estimated data or information is close to an accepted reference value. However, it is important for a user to know what level of precision is required in the deliverables. An accuracy of 99% in separating built-up area and forest is a wasteful when the purpose is to know the distributions of forest stands of evergreen and deciduous types.

The classification accuracy is done by checking randomly sampled pixels against manually interpreted images. It is further of two kind: Producer and User Accuracy.

Producer accuracy, which is related to omission errors, shows that proportion of the reference sample of a particular category is correctly classified in the map.

User accuracy, related to commission error, is the proportion of samples classified as a particular category in the map which are correctly classified.

Similar to accuracy, there are two kind of errors: Omission and Commission. The omission error denote the unlinked reference features, where as commission error, also called as false detection, is the candidate features that could not be linked to the reference features.

Similarly, root mean square error (RMSE) can also be determined. represented in the form of a matrix, which is called as error matrix. The misclassification is caused by the classifier algorithm due to a statistical confusion for assignment of pixel vector of features in to a particular class. This confusion occurs due to close matching of pixel values in each of the bands (features) for each class. This is the reason why error matrix is also called as confusion matrix. The error or confusion matrix is consist of n×n array, where n is the number of classes. An example of error matrix is shown below.

Table 8. Example of an Error or Confusion Matrix

After preparation of the error matrix, a parameter called as KHAT or Cohen’s Kappa statistics is used to determine whether one error matrix is significantly different from another. The coefficient measures the difference between the actual difference in the error matrix or observed accuracy (or true agreement) and the chance agreement. The observed accuracy or true agreement is the agreement between remotely sensed classification and the reference data. The chance agreement is an agreement between reference data and a random classifier.

Thus, when the observed accuracy or true agreement is 1 and chance agreement is 0, the KHAT approaches 1. This indicates that the classification result is much better than a random classification. If f KHAT is zero, then there is no difference between the classification result and a random classification. If KHAT is less than zero, the classification result is worse than a random classification.

9. Applications of Digital Image Processing

Digital image processing techniques has a wide range of applications. One of the most important application of image processing techniques is examining the applicability of images before they are being utilized for the purpose, digitally correcting errors and removal of errors in the raw image, removal of error or noise before the corrected image is being utilized for the assigned purpose. The techniques of digital image processing helps in detection and extraction of information on: a) physical characteristics/processes such as mineral resources, soil, landform, quantity and quality of water bodies, flood, erosion etc., b) Biological conditions such as forests, crops, wild animals etc., c)

Cultural factors such as land use, recreational activities, cultural status (life style, health and safety, population density), man-made facilities and activities (structure, utility networks, waste disposal) etc.,

d) Ecological relationships such as eutrophication, salinization etc., e) modification in regimes such as alteration (change of habitats, change in drainage), land transformation, land alteration, resource renewal and extraction, accidents (oil spills and leaks) etc. One the results are ready, the digital image processing techniques also study the validity (and accuracy) of the results. Apart from the above mentioned applications, the techniques are also widely used in medical image processing and extra-planetary missions.

10. Conclusions

Digital image processing is a process of examining the images for the purpose of extraction of information hidden in the image form. The digital image processing can be distributed at three levels: Low level, Medium level and High level. The low level image processing includes image acquisition, image pre-processing and image classification. The medium level image processing includes compression and morphological processing of images. The high level image processing includes image reconstruction, object recognition and image representation and description. The focus of this chapter is to prepare the readers with fundamental concepts behind digital images and different processing steps and providing solutions to different problems which a user may encounter from the beginning itself or in between the processes.

Any image acquired digitally or photographically has a purpose. The images in remote sensing are acquired using sensors designed with finest technologies. In the beginning, the analyst attempts to detect and remove the noises present in the image. Then (s)he detect, identify, classify and measure the physical and cultural objects and/processes in the image. Furthermore, (s)he evaluate their patterns, and spatial relationship encountered in the image, all in a logical manner. In between the before mentioned tasks, he faces problems in the images. Sometimes, the data size is too heavy and therefore, needs compression. Sometime, there is a lot of redundancy in the data and therefore, needs the removal of repeating information. The solution to these and other problems lies in digital image processing and here an attempt have been made to unfold some of them. The solution include a wide range of image processing techniques that have been developed to aid the interpretation of remotely sensed data and to extract as much information as possible from the images. The choice of specific techniques or algorithms to use depends on the objective of individual tasks. For example, in this chapter, three statistical techniques of satellite image classification (supervised, unsupervised and mixed classification) have been examined. It has also been shown to the readers that how accuracy assessment of the resultant output should be done instead of blindly reporting the results.

11. References

Haralick, R. M. (1973). Glossary and index to remotely sensed pattern recognition concepts. Pattern Recognition, 5, 391-403.
Congalton, R. G. (1991). Considerations and techniques for assessing the accuracy of remotely sensed data. Remote Sensing of Environment, 37, pp. 35-46.
Sunar F., Özkan C. (2000). Rectification of remotely sensed images with artificial neural network, International Archives of Photogrammetry and Remote Sensing. Amsterdam., Holland, Vol. XXXIII, Part B3.
Gonzalez, R. C. and Woods, R. E. (2002). Digital Image Processing, Addison-Wesley.
Konecny, G. (2003). Geoinformation: remote sensing, photogrammetry and geographic information system, Taylor & Francis, UK.
Lillesand, T. M., Kiefer, R. W. and Chipman, J. W. (2008). Remote sensing and image interpretation, 6th ed., John Wiley & Sons, USA.
Tempfli, K., Kerle, N., Huurneman, G. C., and Janssen, L. L. F. (Eds.) (2009). Principles of remote sensing, ITC Educational Textbook Series 2, ITC, 4th ed., The Netherlands.
Campbell, J. B. and Wynne, R. H. (2011). Introduction to remote sensing, 5th ed., The Guilford Press, USA.
Estes, J. E. Aids to and Techniques of Image Interpretation, http://userpages.umbc.edu/~tbenja1/umbc7/santabar/vol1/lec2/2-4.html (accessed on 29 September, 2016).
SWAC (Satellites, Weather and Climate) project, www.uvm.edu/~swac/docs/mod4/land_features.ppt (accessed on 29 September, 2016).
BCPTS (Bartlane cable picture transmission system),
https://en.wikipedia.org/wiki/Bartlane_cable_picture_transmission_system (accessed on 30 October, 2016).
Alexandris, N., Gupta, S., and Koutsias, N. (2017). Remote sensing of burned areas via PCA, Part 1; centering, scaling and EVD vs SVD. Open Geospatial Data, Software and Standards, 2:17.
Alexandris, N., Koutsias, N., and Gupta, S. (2017). Remote sensing of burned areas via PCA, Part 2; Spectral enhancement of burned areas via PCA. Part 2: SVD-based PCA using MODIS and Landsat data. Open Geospatial Data, Software and Standards, 2:21, pp. 1-26.