23 Factor Analysis- A Tool for Synthesizing Geographical Phenomena
Prof Surendra Singh
1.0 Introduction:
In geographical researches, there is a common problem of synthesizing more number of attributes into a few variables in such a way so that the common characteristics of geographical attributes are to find a solution of a specific research problem. Factor analysis is, therefore, a technique to analyse the complex multivariate structure of data set to convert attributes of possible correlated data characteristics into a set of values of linearly synthesized variables called ‘latent variables’, when attributes of a set are considered as independent.
Synthesisation of geographical attributes is modeled as linear combination of potential factors plus error therm. By definition, suppose we have a set of n random attributes X1,X2,X3,…,Xn with means µ1, µ2, µ3, …, µn with m observations (called number of items of data series/ size of a set) and some unknown constants lij and k unobserved random variable Fj where i= 1,2,3,…,n and j=1,2,3,…,k subject to k<n, we have
where ei= error term with zero mean and finite variance.
In matrix form, above equation is written as
[LF] is matrix of factor loading which shows common characteristics (commonalities) of observed attributes. For example, there are 10 different observed attributes of social structure of 1000 household samples (m observations) of a particular cast residing in a region. The attributes are such as sex, age, educational qualification, occupation, marital status of the head of HH and so on. We wish to analyse the common social structure (common variances) of that particular cast and the effects of each factor (latent variables/roots, that are extracted through LF matrix) for the analysis of social structure. So the process of extracting latent roots operative through factor analysis is essentially dependent on its following bases.
- Factor analysis is a multivariate technique (data containing without dependency but with relationship) and also a tool for synthesizing the variables’ essences on the basis of their communalities in to a factor.
- Factor Analysis is to find latent variables based on communality, h2
- Factor Analysis is based on relationships of variables
- It provides the factors which constitute multi-variable characteristics. The structure of these factors is based on ‘Common Variance’ and Systematic Interdependencies among Variables of a set.
- Factor Analysis reduces the large number of attributes into a few factors and interprets the common characteristics of these attributes called observed variables.
2.0 Methods of Inferring the Latent Roots in the Factor Analysis
2.1 Clusterisation
It is a simple method of inferring the results of common properties of the variables from the given symmetric correlation matrix. Very high correlation values of variables are chosen for the purpose and on the basis of these values, the structure of a common factor is analysed.
2. 2 Centroid Method (Thurstone 1931): It is simple and easy to extract latent roots even through manually adopting the process given below. There are five steps as given below. In order to understand the procedure, an example is given in the last part of this module
Steps – I :
- Generation of raw data of geographical attributes to use for analysis of common features of given structure in the data set and conversion it to standardized form to keep it at free scale
- Preparation of Correlation Matrix of attributes ignoring signs
- Sum of each column including diagonal cell value, i.e., 1.00 (Col Sum, ∑T )
- Square root of the sum of column sum (∑ ∑T^0.5 )
- Value of common centroid –I ( latent roots), i.e., ( ∑T/ ∑∑T^0.5 )
Step –II: Preparation of the cross products of Factor –I (Q1 matrix)
Step-III: To prepare the residual coefficients Matrix (R1 matrix subtracting Q1 from Correlation matrix)
Step-IV: Extract common centroid –II from R1 Matrix
Step-V: Go to same procedure until the entire variance is reduced to minimum.
2.3 Principal Component Method (Hotelling 1933, Haggett 1965, and Cole and Smith 1965, Barrey and Pal 1968, Berry et al 19 66 in the various fields of geographical researches).
This method is based on linear combination of variables because it finds out the degree of co-linearity among the variables and synthesizes the structure of common factor given in the manner as p1 = a1X1 + a2X2 + a3X3 +…+ anXn , where p1 is principal component- I, Xn are variables and an are coefficients as factor loading. Due to its lengthy process, it is given in a separate module entitled ‘ Principal component Analysis…’
Fig.- 1: Procedure of Factor Analysis
3.0 Example:
To interpret the common characteristics of Total Disabled population in seeing (TDPSEE) in India through extraction of Latent roots of Common centroids using Centroid Method (Thurston 1931) of given State-wise Census 2001data of 14 attributes of TDPSEE.
1 Raw data Generation:
A set of 14 attributes (n=14) related to age-wise percentage data of disabled population in seeing (operational attributes, X1,X2,X3,…,X14) of State-wise entities of India (m=35; observations, R1,R2,R3,…, R35) have been collected from census of India 2001 as coded in the following tables.
Table-1: Name of 14 Variables of Disabled Population of India, 2001
Table-2: Name of the States Covered for the Present Analysis
Sl No | State (m) | Variable Code |
1 | Jammu & Kashmir | R1 |
2 | Himachal Pradesh | R2 |
3 | Punjab | R3 |
4 | Chandigarh | R4 |
5 | Uttaranchal | R5 |
6 | Haryana | R6 |
7 | Delhi | R7 |
8 | Rajasthan | R8 |
9 | Uttar Pradesh | R9 |
10 | Bihar | R10 |
11 | Sikkim | R11 |
12 | Arunachal Pradesh | R12 |
13 | Nagaland | R13 |
14 | Manipur | R14 |
15 | Mizoram | R15 |
16 | Tripura | R16 |
17 | Meghalaya | R17 |
18 | Assam | R18 |
19 | West Bengal | R19 |
20 | Jharkhand | R20 |
21 | Orissa | R21 |
22 | Chhattisgarh | R22 |
23 | Madhya Pradesh | R23 |
24 | Gujarat | R24 |
25 | Daman & Diu | R25 |
26 | Dadra & Nagar Haveli | R26 |
27 | Maharashtra | R27 |
28 | Andhra Pradesh | R28 |
29 | Karnataka | R29 |
30 | Goa | R30 |
31 | Lakshadweep | R31 |
32 | Kerala | R32 |
33 | Tamil Nadu | R33 |
34 | Pondicherry | R34 |
35 | Andaman & Nicobar Islands | R35 |
Table-3: Raw Data Matrix (mxn size) (Figures in %)
3.2 Conversion of Raw data in to Standard Scores as Z score Matrix:
Magnitude of each attribute is transformed into Standard score to make scale free distribution with zero mean and unit standard deviation. It is formulated as Z= [(X-µ)/SD]. Transformed values of each and every attribute is given in the following table
Table-4: Z score Matrix (mxn size)
Table-5: Symmetric Correlation Matrix (nxn size)
Table-6: Factor Loading Cross- Product- I (CR-1) (nxn size)
Table-7: Residual Matrix Ignoring Symbols ( (R1= R-CR-1) (nxn size)
Table-8: Factor Loading Matrix (Two Common Factors)
4.0 Interpretation:
Characteristic features of common factors of processed data- set are explained below
- Common factor –I is enough stronger that accounts for more than two- third strength of total variation (69.85%) of all 14 attributes considered for the present analysis, while common factor –II includes about one- third share of total variation (30.85 %).
- Aged and too aged disabled population (above 60 years of age) contributes positively and disabled population of growing children and teenagers (5-20 years) contributes negatively to the common factor-I. As a result, the name of this common factor has been assigned as ‘Disability of aged people’ which is more prevalent in almost all the states in India.
- Common factor-II is negatively related to the strength of disabled population in the population with the adult disabled in seeing population. It accounts for 30.85 % of the total variation. Therefore, it may be named as ‘disability in adult population’ that is an important feature of distribution pattern age-wise disabled population in India.
- The last column of the Table-8 shows that attributes X11,X12 and X13 are most important to include higher degree of common variances in the present factors.
- Scatterness of these two common factors shows that there is a emergence of two groups of factors: (i) children and teenagers disability has positive effect in factor –II and negative effects in factor –I, while remaining attributed form opposite group in the distribution (Fig.-2).
Fig. -2: Scatterness between Common Factor –I and II in Disabled population in India
5.0 Summary:
Factor analysis highlights common features of a given set of n attributes for m observations. In the above example, common factor-I accounts for the highest variations of the set of data. It is obvious that two common features of different age grout disabled population in India emerge as (i) aged group and too aged disabled –population form stronger bonding in the population distribution and (ii) the children and teenagers disability is dependent on their socio economic status.
References
- Thurstone, C. L.(1931): Multiple Factor Analysis, Psychological Review, vol 38.
- Hotelling 1933 in Journal of Education Psychological Research
- Berry, B. J. L. and Marble D. (eds, 1968): Spatial Analysis- A Reader in Statistical Geography, Prentice Hall, New jersey.
- Haggett, P. (1965): Locational Analysis in Human Geography, London Edward Arnold
- Berry, B.J.L. et al. (1966): Essays on Commodity flows and the spatial structure of Indian Economy, Chicago University, Department of Geography, Paper no. 111: 190-203.
- www.en.wikipedia.org/wiki/factor_Analysis
- www.ats.ucla.edu/stat/output/factor1.htm