23 Factor Analysis- A Tool for Synthesizing Geographical Phenomena

Prof Surendra Singh

   1.0 Introduction:

 

In geographical researches, there is a common problem of synthesizing  more number of attributes into  a few variables in such a way so that the common characteristics of geographical attributes are to find a solution of a specific research problem.  Factor analysis is, therefore, a technique to analyse the complex multivariate structure of data set to convert attributes of possible correlated data characteristics into a set of values of linearly synthesized variables called ‘latent variables’, when attributes of a set  are considered as independent.

 

Synthesisation of geographical attributes is modeled as linear combination of potential factors plus error therm. By definition, suppose we have  a set of n random attributes  X1,X2,X3,…,Xn with means µ1, µ2, µ3, …, µn with m observations (called number of items of data series/ size of a set) and  some unknown constants lij and k unobserved random variable Fj  where i= 1,2,3,…,n and j=1,2,3,…,k subject to k<n, we have

 

 

where ei= error term with zero mean and finite variance.

 

In matrix form, above equation is written as

 

[LF] is matrix of factor loading which shows common characteristics (commonalities) of observed attributes. For example, there are 10 different observed attributes of social structure of 1000 household samples (m observations) of a particular cast residing in a region. The attributes are such as sex, age, educational qualification, occupation, marital  status of the head of HH and so on. We wish to analyse the common social structure (common variances) of that particular cast and the effects of each factor (latent variables/roots, that are extracted through LF matrix) for the analysis of social structure. So the process of extracting latent roots operative through factor analysis is essentially dependent on its following bases.

 

  1. Factor analysis is a multivariate technique (data containing without dependency but with relationship) and also a tool for synthesizing the variables’ essences on the basis of their communalities in to a factor.
  2. Factor Analysis is to find latent variables based on communality, h2
  3. Factor Analysis is based on relationships of variables
  4. It  provides the factors which constitute multi-variable characteristics. The structure of these factors is based on ‘Common Variance and Systematic Interdependencies among Variables of a set.
  5. Factor Analysis reduces the large number of attributes into a few factors and interprets the common characteristics of these attributes called observed variables.

 

 2.0 Methods of Inferring the Latent Roots in the Factor Analysis

 

2.1  Clusterisation

It is a simple method of inferring the results of common properties of the variables from the given symmetric correlation matrix. Very high correlation values of variables are chosen for the purpose and on the basis of these values, the structure of a common factor is analysed.

 

2. 2  Centroid Method (Thurstone 1931): It is simple and easy to extract latent roots even through manually adopting the process given below. There are five steps as given  below. In order to understand the procedure, an example is given in the last part of this module

 

 

Steps – I :

  1. Generation of raw data of geographical attributes to use for analysis of common features of given structure in the data set and conversion it to standardized form to keep it at free scale
  2. Preparation of Correlation Matrix of attributes ignoring signs
  3. Sum of each column including diagonal cell value, i.e., 1.00 (Col Sum,  ∑T )
  4. Square root of the sum of column sum (∑ ∑T^0.5 )
  5. Value of  common  centroid –I ( latent roots), i.e., ( ∑T/  ∑∑T^0.5 )

 

Step –II: Preparation of the cross products of Factor –I (Q1  matrix)

 

Step-III: To prepare the residual coefficients Matrix (R1 matrix subtracting Q1 from Correlation matrix)

 

Step-IV: Extract common centroid –II from R1 Matrix

 

Step-V: Go to same procedure until the entire variance is reduced to minimum.

 

 

2.3 Principal Component Method (Hotelling 1933, Haggett 1965, and  Cole and Smith 1965, Barrey  and Pal 1968, Berry et al 19 66 in the various fields of  geographical researches).

 

This method is based on linear combination of variables because it finds out the degree of co-linearity among the variables and synthesizes the structure of common factor given in the manner as   p1 =  a1X1 + a2X2 + a3X3 +…+  anXn , where p1 is principal component- I, Xn are variables and an are coefficients as factor loading. Due to its lengthy process, it is given in a separate module entitled ‘ Principal component Analysis…’

Fig.- 1: Procedure of Factor Analysis

 

3.0 Example:

 

To interpret the common characteristics of Total Disabled population in seeing (TDPSEE) in India through extraction of  Latent roots of  Common centroids using Centroid Method (Thurston 1931) of  given State-wise Census  2001data of 14 attributes  of TDPSEE.

 

1 Raw data Generation: 

 

A set of 14 attributes (n=14) related to age-wise percentage data of disabled population in seeing  (operational attributes, X1,X2,X3,…,X14) of State-wise entities of India (m=35; observations, R1,R2,R3,…, R35) have been collected from census of India 2001 as coded in the following tables.

 

Table-1: Name of 14 Variables of Disabled Population of India, 2001

 

 

 

Table-2: Name of the States Covered for  the Present Analysis

Sl No State (m) Variable Code
1 Jammu & Kashmir R1
2  Himachal Pradesh R2
3  Punjab R3
4  Chandigarh R4
5  Uttaranchal R5
6  Haryana R6
7  Delhi R7
8  Rajasthan R8
9  Uttar Pradesh R9
10  Bihar R10
11  Sikkim R11
12  Arunachal Pradesh R12
13  Nagaland R13
14  Manipur R14
15  Mizoram R15
16  Tripura R16
17  Meghalaya R17
18  Assam R18
19  West Bengal R19
20  Jharkhand R20
21  Orissa R21
22  Chhattisgarh R22
23  Madhya Pradesh R23
24  Gujarat R24
25  Daman & Diu R25
26  Dadra & Nagar Haveli R26
27  Maharashtra R27
28  Andhra Pradesh R28
29  Karnataka R29
30  Goa R30
31  Lakshadweep R31
32  Kerala R32
33  Tamil Nadu R33
34  Pondicherry R34
35  Andaman & Nicobar Islands R35

 

Table-3: Raw Data Matrix (mxn size)  (Figures in %)

 

 

3.2  Conversion of Raw data in to Standard Scores as Z score  Matrix:

 

Magnitude of each attribute is transformed into Standard score to make scale free distribution with zero mean and unit standard deviation. It is formulated as Z= [(X-µ)/SD]. Transformed values of each and every attribute is given in the following table

 

 

Table-4: Z score Matrix (mxn size)

 

 

 

Table-5: Symmetric Correlation Matrix (nxn size)

 

 

Table-6: Factor Loading Cross- Product- I (CR-1) (nxn size)

 

 

Table-7: Residual Matrix Ignoring Symbols ( (R1= R-CR-1) (nxn size)

 

Table-8: Factor Loading Matrix (Two Common Factors)

 

 

 

4.0 Interpretation:

 

Characteristic features of common factors of processed data- set are explained below

 

  1.  Common factor –I is enough stronger that accounts for more than two- third strength of total variation (69.85%) of all 14 attributes considered for the present  analysis, while common factor –II includes about one- third share of total variation (30.85 %).
  2. Aged and too aged disabled population (above 60 years of age) contributes positively and disabled population of growing children and teenagers (5-20 years) contributes negatively to the common factor-I. As a result, the name of this common factor has been assigned as ‘Disability of aged people which is more prevalent in almost all the states in India.
  3. Common factor-II is negatively related to the strength of disabled population in the population with the adult disabled in seeing population. It accounts for 30.85 % of the total variation. Therefore, it may be named as ‘disability in adult population’ that is an important feature of distribution pattern age-wise disabled population in India.
  4. The last column of the Table-8 shows that attributes X11,X12 and X13 are most important to include higher degree of common variances in the present factors.
  5. Scatterness of these two common factors shows that there is a emergence of two groups of factors: (i) children and teenagers disability has positive effect in factor –II and negative effects in factor –I, while remaining attributed form opposite group in the distribution (Fig.-2).

Fig. -2: Scatterness between Common Factor –I and II in Disabled population in India

 

5.0 Summary:

 

 Factor analysis highlights common features of a given set of n attributes for m observations. In the above example, common factor-I accounts for the highest variations of the set of data. It is obvious that two common features of different age grout disabled population in India emerge as (i) aged group and too aged disabled –population form stronger bonding in the population distribution and (ii) the children and teenagers disability is dependent on their socio economic status.

 

References

  • Thurstone, C. L.(1931): Multiple Factor Analysis, Psychological Review, vol 38.
  • Hotelling 1933 in Journal of Education  Psychological Research
  • Berry, B. J. L. and Marble D. (eds, 1968): Spatial Analysis- A Reader in Statistical Geography, Prentice Hall, New jersey.
  • Haggett, P. (1965): Locational Analysis in Human Geography, London Edward Arnold
  • Berry, B.J.L. et al. (1966): Essays on Commodity flows and the spatial structure of Indian Economy, Chicago University, Department of Geography, Paper no. 111: 190-203.
  • www.en.wikipedia.org/wiki/factor_Analysis
  • www.ats.ucla.edu/stat/output/factor1.htm