2 Data and its Type

Dr. Harmanpreet Singh Kapoor

epgp books

 

 

    Learning Objectives

  • Introduction.
  • Types of Data
  • Comparison of qualitative and quantitative data
  • Variable
  • Summary
  • Suggested Readings

    1. Learning Objectives

 

In this module, an attempt has been made to give a brief and relevant information about the topic with examples. This module helps to understand the various types of data and their segmentation. Numerical questions are included to give an in-depth knowledge of the topic.

 

2. Introduction

 

Data is a plural form of a word ‘datum’. Data is considered as a collection of items either in qualitative or in quantitative form and it consists of full information regarding the objective. It consists of relevant information about the objective and it is analyzed further to extract that information.

 

Data is collected from the sources in which one is interested. In different sectors, it is possible that one does not have direct approach to the object due to time and money for example for the study of an environment affecting factors. One requires large instruments and man power to collect values but there are other departments like metrological sciences, remote sensing that also deal with the same objects. Thus one can use published data by these department for further analysis. The source that one has used to collect the data is secondary and it is called secondary source. If one has direct approach to the sources and information then data collected is considered as from primary source. So data are collected from two sources:

 

(a) Primary Source

(b) Secondary Source

 

Now information is collected in the form of data from one of the above sources. This information is further used for analysis purpose. It solely depends on the characteristics of an item that it can be observed either in quantitative form or in qualitative form. For example, height, weight and age of a person are quantified in numbers. This is an example of quantitative form and the variables used to quantify values are called quantitative variables.

 

Also some characteristics like religion of a person, designation, severity of a diseases, gender etc are difficult to calculate in terms of number but one can assign numbers to them for recording purpose and has no meaning in term of value. For example for noting down sex of a person on can use ‘1’ for male and ‘2’ for female in a government survey. Similarly for considering the health status of patient suffering from particular disease one can categorize good as ‘1’ , mild as ‘2’ and ‘severe’ as ‘3’. This is an example of qualitative data and it is further evaluated to draw conclusion.

 

The main thing that one has to keep in mind is as ‘1’,’2’ and ‘3’ values assigned to different category are just representative values. It does not mean that one value is double or half than other value. So one cannot apply mathematical/statistical methods on the values to make a decision about the study.

 

In the coming sections, the detail structure of the data and its types are discussed in detail.

 

Self –Check exercise

 

Question: Which source is used to study the factors affecting environmental pollution in the country?

 

Answer: As from the objective, one can see that the area under study is very vast. So, it is difficult for a person to gather data within a limited time period and money. But one can find data related with the study from the report published by many government departments. Hence secondary source is basically used in such cases.

 

Question: Is it true to say that secondary source for data is always reliable for a study in environmental sciences?

 

Answer: There are many things to discuss before making a comment about the statement. Few questions are Does one get secondary data from a renowned agency/government institution? Does one get latest data that one needs for a study? Is it also beneficial to take secondary data? If the answer is true in each of the above cases then the statement is true but it is not always true in general.

 

3. Types of data

 

In this section, an attempt has been made to give an understanding of types of data. One can get detail understanding of types of data after reading this section.

 

The following chart explains the different segments of the data and it relation with one another.

 

Figure 1

 

From the above chart, one can observe that data is mainly segmented into two forms and these forms are further divided into various segments. One of main branch of data includes qualitative data and it is further divided into attributes, nominal and ordinal. Similarly, quantitative data includes continuous data and discrete data.

 

3.1 Qualitative Data

 

Qualitative data deal with information about the characteristics or qualities of an object under study that cannot be measured. For example, color of skin, shape and color of eyes, three segmentation. Qualitative data are further divided into the following types. These are

 

(a) Attributes

(b) Nominal

(c) Ordinal

 

3.1 (a) Attributes

 

Attributes are considered as a type of qualitative data that have only two categories. For example, male/female, yes/no, dead/alive. It is called attribute data because of two categories one can say whether an item has presence of attribute (characteristics related with objective) or not.

 

For example, in a study of census there are many items like male/female (excluding third sex), employed/unemployed, educated/illiterate etc. are considered as attributes.

 

3.1(b) Nominal

 

Nominal data are considered as that form of data that cannot be ordered or have more than two categories.

For example, color of hair (black, brown, blonde etc.), marital status (married, unmarried, divorced, separated), nature of disaster (fire, theft, accident, earthquake, etc.). In these examples, on can observe that there are more than two categories and these categories are unordered. It means that one cannot compare black color with brown color and comment about it. Also one cannot compare categories of marital status and nature of disaster with each other.

 

3.1 (c) Ordinal

 

Ordinal data consist of observation that can be ordered in terms of their characteristics.

For example, tidiness among students have categories as messy, fairly, tidy or very neat, build of body has categories fat, medium sized or thin, agreement has categories strongly agree, agree, neither agree nor disagree, disagree, strongly disagree. Now one can order observation based on these categories in term of high or low level of cleanliness. Similarly, responses on body type and agreement can also be ordered based on categories available.

 

Self-Check Exercise

In the following question, state the type of data with a reason

Question: Which one of the following subject you learn here?

(a)   Mathematics

(b)   Physics

(c)   Statistics

 

Answer This question is an example of qualitative data and further it is categorized as nominal data. The reason behind this is that one cannot order subjects.

 

In the previous question, subjects are considered as a nominal data. But it can be ordered depending on the question.

 

Question: Which of the following subject you like the most?

(a) Mathematics

(b) Physics

(c) Statistics

Answer: It is an example of qualitative data and specifically ordinal data. As one can order the subjects based on his liking.

 

Question: How would you rate your learning technique?

Answer: It is an example of qualitative data and specifically ordinal data as the categories are ordered from poor to excellent.

 

Question: Did you study statistics in your college?

Answer Qualitative data and it is an attribute. In this case, we simply say item has this attribute or not.

 

Question: How would you rate your learning techniques? (1= excellent, 5=poor)

Answer It is an example of qualitative data and specifically ordinal data. As you can grade your learning techniques from 1 to 5.

 

One can keep the above examples in one’s mind for understanding the concept in depth.

In the next sub-section, quantitative data is explained with examples.

 

3.2 Quantitative Data

 

Quantitative data consist of numbers (e.g. 1, 0.8, -3.7, ¾……) or quantities (e.g. 1.2 kg, 155cm……). Most of the books consider numbers being referred to as quantitative data.

 

Quantitative data are further segmented into two types. These types are

(a) Discrete quantitative data

(b) Continuous quantitative data

 

3.2(a) Discrete quantitative data

 

Discrete quantitative data are that form of data that can only take particular numbers like whole numbers*. For example in a study of cancer patients, numbers of patients are calculated in whole numbers that is 0,1,2,3….. One cannot count 2.3, 3.5 persons. There are situations where observation are considered in term of whole numbers only then they are examples of discrete quantitative data.

 

*Whole Numbers are those numbers that start from 0 and go till infinity that is 0, 1, 2, 3………………….

 

3.2(b) Continuous quantitative data

 

Another form of qualitative data is continuous data that can take all values either it is whole number or real numbers. It considers negative values. For example, temperature of a room can be negative, positive or zero. Time duration of happening of an event like earthquake, is considered in minutes and seconds,

 

for example 5 12 min, 2 min etc. Continuous quantitative data are basically derived while measuring height (cm), time (sec) etc.

 

Self-Check exercise

 

Question: State the type of data in the following

(a)   Weight of a student

(b)   Place of birth

(c)   Number of claim occur due to natural disaster

(d)   Nature of loss due to natural disaster

(e)   Age in complete years

(f)  Loss occur due to flood

 

Answer:

(a)     Continuous data (Quantitative data)

(b)     Nominal (Qualitative data)

(c)     Discrete data (Quantitative data)

(d)   Ordinal data (Qualitative data) as loss can be small, large or big amount

(e)   Discrete data (Quantitative data)

(f)    Continuous data (Quantitative data)

 

In this section, various types of examples on data are discussed so that one can easily understand the differences between types of data.

 

Now, one gets an understanding of the qualitative data and quantitative data and its types. One more important topic that is related with this module is variable.

 

4. Comparison of qualitative data and quantitative data

 

In this section, an attempt has been made to give a clear understanding about difference between qualitative data and quantitative data through a real life situations.

 

In a study to quantify loss occurred due to natural disaster in a state. how will one approach the total amount of loss? There are many types of loss like financial loss, human being etc. One must have some instrument/tools to quantify financial loss as one cannot measure loss of human being. So, one must have questionnaire or paper work to collect information about financial loss. One can see the draft of questionnaire below to understand which data is covered under which form. The draft of questionnaire is

 

Name:

Age (in years):

Sex:

Occupation:

Type of financial loss:

Estimated financial loss:

Details………………..

No. of family persons injured:

No. of family member died:

 

From the above questionnaire, one can observe that data hold both the qualitative and quantitative forms. Starting from considering

Age (in years): Discrete quantitative data

Sex: Attributes (Qualitative data)

Occupation: Nominal qualitative data (Farmer, shopkeeper, labourer)

Type of financial loss: Ordinal qualitative data

Estimated financial loss: Continuous quantitative data

No. of family persons injured: Discrete quantitative data

No. of family member died: Discrete quantitative data

 

From the above example, one can see that in real life the data consists of different types. Hence, one must be clear while preparing questionnaire or study about the type of items (questions). So one can keep the following points in mind about qualitative data and quantitative data. These are

 

Hence from the above comparison, one can observe that both these methods are also complementary to one other. The questionnaire has given above to prove this statement. As one can see that information is collected from the respondent to draw conclusions. These include results/statements about all the attributes and variables available in the data. For example, from the above questionnaire, one can conclude on the basis of data that x amount of persons (male /female) of y age are affected with monetary loss of z amount on average. This shows the relationship of loss amount with attributes like sex and variables like age.

 

5.   Variable

 

Variable is another commonly used word when collecting information from the observations. Before looking at the definition let’s first understand it through an example.

 

Basically data are realization of the variable. For example, in a study of measuring average height of student then height is considered as a variable. As height of each student is different and it can take any value within a specified range. Also for other objectives whose value vary under different conditions are measured through variable.

 

For example it there are 3 students and their heights are given in inches as 78, 81 and 56 respectively. So our data have three values and it is collected by measuring heights of students.

 

Definition

 

Variable is defined as a measurement tool that fluctuates its value with change in the conditions of an objective.

 

In quantitative data form, observations are of discrete or continuous data form. Hence there must be two variables. These are

(1)  Discrete variables

(2) Continuous variables

 

5.1) Discrete Variables

 

As variable is considered as an instrument that can change its values. Discrete variables are those variables that can only change values in whole numbers. For example, numbers of boys in a class, number of patients in a hospital, number of accidents occurred in a year and number of states in a country.

 

Definition

 

Variables that can only take on a finite number of values is termed as discrete variables. This value vary from one observation to the other.

 

5.2) Continuous Variables

 

In the previous example, height of different person vary and also it can take values in decimals so height is considered as a continuous variable. Thus continuous variables are those variable that measure the value of an item in real.

 

Definition

 

A variable is a tool that keep on changing value. This value can vary from one observation to the other. A continuous variable is a variable that take any value possible for observation on the real line. It means all possible positive, negative, fraction etc.

 

There are uncountable values that exist between two numbers like 1 and 2 for e.g. 1.00,1.01, 1.001…..

Following examples will give you more insight about continuous variable. These are

  • Time taken by a person to complete a task
  • Height of a person
  • Wind speed
  • Dust particles in the air
  • Cost of an equipment/object
  • Average speed of bike
  • Mileage of car

  After understanding the definition of the variable, it is easy for one to understand independent and dependent variables as these variables are used in the most of the studies.

 

Independent Variable

A variable is said to be an independent variable if there is little change in its value. The values of other variable change but when the value of other variables change then there is no change in the value of this variable. Hence, that variable whose value is not affected by any other variable is called an independent variable.

 

For example,

Algal density is a variable and its value determine the quantity of Chlorophyll-a that is used as an indicator of lake water quality. But change in Chlorophyll-a does not have an impact on the value of algal density. In this case, algal density is considered as an independent variable.

 

Dependent Variable

A variable is said to be dependent variable if its value changes due to change in other variable. The variable that influences the value of this dependent variable is called independent variable (from above definition).

 

From previous example, one can see that there are quantities like algal density, poor water quality and chlorophyll-II that is used as an indicator of lake water quality. Now, chlorophyll-II value are basically dependent on the values of algal density and also on the quality of lake water. So basically, chlorophyll-

 

II  is dependent variable and others are independent variables.

 

Self -Check Exercise

 

Question: Which one is independent/dependent variable in the study if a scientist conducts an experiment to test the theory that a vitamin could extend a person’s life-expectancy?

Answer: Here the independent variable is the amount of vitamin that is given to the subjects within the experiment. Dependent variable is the variable affected by the independent variable and in this case it is life span.

 

Question: If a scientist studies the impact of a drug on cancer. What will be an independent variable?

Answer: A scientist studies the impact of a drug on cancer hence it is a dependent variable. Here independent variables are the administration of the drug like the dosage and the timing of an impact.

 

Question: If the scientist studies the impact of withholding affection on rats. Which one is the independent variable?

Answer: Here the amount of affection is the independent variable and dependent variable is the reaction of the rats.

 

Question: In a scientific study that how many days people can eat soup until they get sick? Write independent and dependent variable?

Answer: Here number of days of consuming soup is an independent variable and the dependent variable is the onset of illness.

  1. Summary

In this module, we try to give an in-depth knowledge of different types of data. Different types of data are discussed with examples that help the reader to understand the topic in an easy manner. In the fourth section, different data types are compared in term of their feature and properties that they possess. Variables with their types are discussed due to the importance of this topic in the statistics. Question and answer part will help you to check understanding of this topic.

  1. Suggested Readings

Agresti, A. and B. Finlay, Statistical Methods for the Social Science, 3rd Edition, Prentice Hall, 1997.

 

Daniel, W. W. and C. L. Cross, C. L., Biostatistics: A Foundation for Analysis in the Health Sciences, 10th Edition, John Wiley & Sons, 2013.

 

Hogg, R. V., J. Mckean and A. Craig, Introduction to Mathematical Statistics, Macmillan Pub. Co. Inc., 1978.

 

Meyer, P. L., Introductory Probability and Statistical Applications, Oxford & IBH Pub, 1975.

 

Triola, M. F., Elementary Statistics, 13th  Edition, Pearson, 2017.

 

Weiss, N. A., Introductory Statistics, 10th Edition, Pearson, 2017.

you can view video on Data and its Type

 

One can refer to the following links for further understanding of the statistics terms.

 

http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ClinStat/glossary.pdf

 

http://www.stats.gla.ac.uk/steps/glossary/alphabet.html

 

http://www.reading.ac.uk/ssc/resources/Docs/Statistical_Glossary.pdf

 

https://stats.oecd.org/glossary/

 

http://www.statsoft.com/Textbook/Statistics-Glossary

 

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm

 

https://stats.oecd.org/glossary/alpha.asp?Let=A