36 Association of Attributes
S. Gandhimathi
1. Introduction
Statistics deals with quantitative data only. Characteristics possessed by an individual item may be classified into 1. Numerical and 2. Descriptive. The characteristics which are capable of being measured quantitatively are termed as statistics of variables (numerical classification); for instance, height, weight, wages, length, income, expenditure, etc. The characteristics which are not capable of quantitative measurement are termed as statistics of attributes (descriptive classification). The observer may find presence or absence or some attributes. There are certain phenomenon like blindness, dumbness, deafness, literacy, sickness etc., which cannot be measured directly. In such cases the presence or absence of the attributes may be studied. For example, when we count the production into defective and non-defective items falls under statistics of attributes. Thus statistics of variables is based on numerical character and the statistics of attributes is based on descriptive character.
Classification
An observation of the population into two classes on the basis of an attribute, say literacy can be studied, when the whole population is applied to one attribute; the whole population is divided into two classes – one in which the attribute is present and the other in which the attribute is absent. For instance, when we study the literacy of a village, then the population of the village is divided into two classes – one class of people who are literate and the other class who are not literate.
If two attributes are studied their combination can be represented by the combination of the letters representing the two attributes. Thus, if blindness is represented A and deafness by B, then AB would represent blindness and deafness; AB would represent blindness and absence of deafness. would represent absence of blindness and presence of deafness; and B would represent absence of blindness and absence of deafness.
Here it may be noted that it is absolutely essential that a clear cut definition must be laid down of the various attributes under study. It means an item finds place in only one class by a demarcation between two classes. For example, the students of B.Com., classes are divided into two categories tall and short. First we have to lay down a standard height; on the basis of this demarcation line, we categories the students as tall and short; those below the standard height are short and those above the standard height are tall.
Correlation and Association
Correlation can be measured between any two sets of phenomena, which are capable of being directly measured. But association of attribute is measured by the degree of relationship of two phenomena, whose sizes are not directly measurable, but studied by the presence or absence of a particular attribute. In statistics two attribute are regarded as associated only if they appear together in a greater number of cases than are expected if they are independent.
Uses of Terms and Notation
Descriptive characteristics can be classified into presence or absence of the attributes. If only one attribute is studied, then the population is divided into two classes, the one in which that attribute is present and the other in which that attributes is not present. In this way two distinct and mutually exclusive classes are formed; and such classification is termed as classification by dichotomy or division by dichotomy. Generally, we may come across more than one classification. If a class is divided into more than two subclasses, such division or classification is known as manifold classification.
Positive and Negative Classes
The attributes may be positive or negative. If the attribute is present, it is termed as positive class, and its contrary or opposite is known an negative class. The positive class, in which the attribute is present, is denoted by capital letters A, B, C etc., The negative class in which the attributes is absent is denoted by small Greek letters (Alpha), (Beta), (Gamma). In place of Greek letters small letters, a, b, c etc. can also be used.For example, A represents the attribute of literacy, then represents the absence of literacy that is = Not A.
If B represents criminality, then represents absence of criminality that is,= Not B.
If C represents blindness, then represents absence of blindness, that is, = Not C.
N will denote the universe. N denotes the number of members without any specification of attributes and is not placed in brackets.
The class frequencies are expressed by putting the symbols within the brackets. And N is not placed in brackets.
Number of Classes: The total number of classes comprising of the various attributes can be determined by 3”, n representing the number of attributes. If one attribute is studied, then there will be 3’ – 3 classes. Thus, if literacy is studied, the presence of literacy is represented by A, its absence by and total by N, then, there will be 3 classes; i.e., A, and N.
If two attributes are studied, the number of classes will be 32 = 9 classes, ie., (A), ( ), (B), ( ), (AB), (A ), ( B), (N)
If three attributes are studied, the number of classes will be 33 = 27 classes.
Ultimate Class Frequencies: Those classes which specify the attributes of the highest order are known as the ultimate classes and their frequencies are known as the ultimate class frequencies. The process comes to a stop only when we reach the frequencies of the highest order.
The number of ultimate classes represented by 2n, where n sands for the number of attributes under study. Thus, for one attribute, there will be 21=2 classes, for two attributes, the ultimate classes will be 22 = 4; if three attributes, the ultimate classes will be 23 =8 and so on. For instance, when two attributes are studied, say, (A) and (B). (A) = (AB) (A ) ( )=( B)( )
The frequencies of the classes can be expressed by the following chart:
The classes which represent the presence of an attribute or attributes are called positive classes. The classes which represent the absence of an attribute or attributes are called negative classes. The class in which one attribute is present and the other absent are called pairs of contraries. Thus, N, A, B AB etc. are positive classes
, , etc. are negative classes.
, A , etc. are pairs of contrary classes.
Order of Classes: The order of classes depends upon the number of attributes under study. A class having one attribute is known as the class of the first order; a class having two attributes as the class of the second order and so on. N denotes the number of members without any specification of attributes as zero order and is not placed in brackets:
The following tables gives the class frequencies of all order and the total number of all class frequencies upto 3 attributes.
The classes of highest order are called the ultimate classes and their frequencies are called the Ultimate class frequencies.
Relationship: One can set up various types of relationships between the frequencies of different orders. For example, the frequencies of the first order can be expressed in terms of the frequencies of the second order which in turn can be expressed in terms of the third order. Again, for example, N can divided into two classes (A) and ( ).
N =(A)+( )
(A) =N–( )
( ) =N–(A)
Similarly, if we take one more attribute into account, say (A) and (B) then:
(A) =(AB)+(A ) ( ) =(B)+( )
(B) =(AB)+( B)
() =(A)+( )
N = (A) + ( ) or (B) + ( )
Determination of Frequencies: There are certain general rules for the determination of frequencies of various classes. The total number of observations is equal to the positive and negative frequencies of the same classes of the first order, for instance.
N=(A)+( )
Similarly,N =(A)+( )
N =(B)+( )
N=(C)+( )
(B) =(AB)+( B)
The frequencies can also be known with the help of a nine square table. Thus,
If known values are substituted for the symbols in the square, then the remaining values can be found out by addition or subtraction.
Thus: (A) – (AB) = (A )
( )-( B)=( )
N -(A)=( ) N–(B)=( )and so on.
The following illustrations will clarify.
Illustration: From the following ultimate class frequencies, find the frequencies of the positive and negative classes and the total number of observations:
(AB) = 9
(A )=14
( B)=4
( )=37
Solution: It is required to find (A), (B), (), and N.
(A)=(AB)+(A )=9+14=23
(B) =(AB)+( B)=9+4=13( )
=( B)+( )=4+37=41
( )=(A )+( )=14+37=51N
=(A)+( ) =23+41=64
The total number of observations is equal to the sum of one positive and one negative frequency of the same class.
Alternatively: The missing values of classes of the above illustration can also be found out with the help of the Nine Square Table; perhaps this is a convenient method when two attributes.
Consistency of Data: Statistics of attributes is obtained by counting; and as such no class frequency can be negative. The class frequency can be positive or zero, but cannot be negative. Data observed may be described as consistent, if they do not conflict with one another. In case any class frequency is negative, then the given data are inconsistent. It is simple test to be applied and verified whether the frequencies of classes are negative or not. If no conflict is there, no frequencies are negative, it is concluded that the given data are consistent.
Illustration: Test for consistency, given N = 100, (A) = 75). (B) 60, (AB) = 15
Solution:
(A ) = (A) – (AB) = 75-15 = 60
( B)=(B)–(AB)=60-15=35
( )=N–(B)–(A )=100–60–60=-20
( )is negative; therefore; the given data are inconsistent.
Illustration: Is there any inconsistency in the data given below?
N = 60, (A) = 51, (B) = 32, (AB) = 25
Solution:
Thus,
(A )=(A)–(AB)=51–25=26
( B)=(B)–(AB)=32–25=7
( )=N–(B)–(A )=60–32–26=2
(AB)=25,(A )=26,( B)=7,( )=2
Since all the ultimate frequencies are positive, it can be concluded that the given data are consistent.
Type of Association: Association has a technical meaning in Statistics. Two attributes are said to be associated, if they appear together in a larger number of cases than is expected of them, when they are disassociated or independent. There are three types of association.
Positive Association: When two attributes are present or absent together in the data and actual frequency is more than the expected frequency it is called positive association. For example,
Smoking and cancer, literacy and employment, i.e., (AB) >( ) ( ) (Actual) > (Expected)
Negative Association (Disassociation): When the existence of one attribute cause absence of another attribute and actual frequency is less than the expected frequency, it is called negative association. For example, Cleanliness and ill health i.e., ., (AB) <( ) ( ) (Actual) < (Expected)
Independent Association: When there exists no association between two attributes or when they have no frequency to be present together or the presence of one attributes does not affect the other attribute the two attributes are said to be independent. Actual frequency is equal to the expected frequency, ie., ., (AB) =( ) ( ) (Actual) = (Expected)
Method of determining association: Association can be studied by anyone of the following methods.
Comparison of observed and expected frequencies: In this method the actual number of observation is compared with the expected frequencies. The probability is the expectation of (AB) =( ) () and ( B) =( ) () . The expected frequency can be found by combination also. This will be clear from the following:
Illustration: Can vaccination be regarded as a preventive measure for Small Pox from the data given below?
(i) Of 2,000 persons in a locality exposed to Small Pox. 450 in all were attacked.
(ii) If 2000 persons, 365 had been vaccinated; of these only 50 were attacked.
Solution:
Let (A): Vaccinated, ( ) Not vaccinated.
Therefore, there is no association between sex and success in the examination.
Illustration:
A teacher examined 280 students in Economics and Auditing and found that 160 failed in Economics, 140 failed in Auditing and 80 failed in both the subjects. Is there any association between failure in Economics and Auditing?
Solution:
Let A denote students failed in Economics and B denote students failed in Auditing. By putting the given information in the Nine-square table, we can find out other frequencies.
Q = ( )( )−( )( ) = (80 60)−(80 60) ( )( )−( )( ) (80 60)+(80 60) = 4800−48004800+4800 = 0
Since Yule’s coefficient of association is zero, there is no association between failure in Economics and Auditing.
2. Yule’s Coefficient of Colligation:
This is an another method for calculation of coefficient of association given by Yule, known as coefficient of colligation. Q is more popular than .
Formula: 1- √( ) ( ) ( ) ( )Co-efficient of Colligation ( ) = 1+√( ) ( )( ) ( )
From this co-efficient, Yule’s coefficient of Association can be obtained:
2 Q=1+ 2
Illustration: Do you find any association between the tempers of brothers and sisters from the following data?
Good natured brothers and good natured sisters | 1040 |
Good natured brothers and sullen sisters | 160 |
Sullen brothers and good natured sisters | 180 |
Sullen brothers and sullen sisters | 120 |
1040 120−160 180 96000 | ||
= | 1040 120+160 180 =153600 | = 0.625 |
3.Pearson’s Coefficient of Contingency: We have so far discussed dichotomous classification. Classification of data can be either dichotomous or manifold. When, the universe is divided into two groups, say, “rich” and “not rich” – “A” and “not A) but as A1, A2, A3 etc. Similarly an another attribute, say B can be subdivided into B1, B2, B3, etc. The frequency falling within the different classes can be arranged in the form of a contingency table.
Attribute A
A1, A2, A3etc. and B1, B2, B3, etc. are the first order of the frequencies. And the frequencies of various cells are the frequencies of the second order. The total of A1, A2, A3etc. or the total of B1, B2, B3, etc. would give grand total i.e., N.
The coefficient of mean square contingency or C according to Karl Pearson is:
Partial Association
So far we have considered the association of A and B in the universe as a whole without finding out the other attributes in the universe. However, it is possible that the association between A and B may not be a direct association, but may be the result of their association with a third attribute, say C. Thus, if A is positively associated with C and if B is also associated with C, A may be found to be positively associated with B. This association between A and B is not direct. It is the effect of their association with another attribute C. To find out whether the association between A and B is real and not merely due to their association with a third attribute C, it would be necessary to study the association of A and B in the sub-population of C and
If A and B are associated in both the sub population of C and it would indicate that A and B are really associated with each other. The associations of A and B in the sub populations are called partial associations to distinguish them from total association in the universe as a whole.
For instance, an association is studied between vaccination and prevention from attack by small pox. It means that vaccination prevents attack of small pox. This association in reality may be due to a third factor, viz., economic condition of the people. Those people, who are economically well off live in better conditions, open-houses, take rich diet, live in hygienic conditions etc. the possibility of attack of small pox is less. On the other hand, those who are poor, live in filthy conditions, dirty surroundings, dirty houses, etc. liable to suffer more from diseases.
If we denote A for vaccination, B for small pox and C for economic conditions, we may find that there is positive association between A and C and also between B and C. Hence in order to arrive at correct conclusions it is necessary that on the basis of economic conditions the population is divided into two parts Rich (C) and Poor ( ) and in each sub-population association is ascertained between vaccination (A) and prevention from small-pox (B). If this third attribute is ignored it will give rise to misleading conclusions or technically illusory association.
Conclusion
Let us summarize, the meaning of association of variables and association of attributes, methods of determining association of attributes and so on. Association has a technical meaning in Statistics. Two attributes are said to be associated, if they appear together in a larger number of cases than is expected of them, when they are disassociated or independent. There are three types of association such as Positive Association, Negative association and independent association. Various methods analyzing association of attributes are comparison of observed and expected frequencies, Yule’s coefficient of association and Yule’s coefficient of colligation and Pearson’s coefficient of contingency
you can view video on Association of Attributes |
Web links
- https://bikramjitgccbachd.files.wordpress.com/2013/…/10-theory-of-association-1.ppt
Reference Books
- Pillai Bhagavathi (1998), Statistics Theory and Practices,Chand Publications New Delhi, pp 1-888