23 Computer Application in Quantitative Data Analysis

Subhasis Bandyopadhyay

epgp books
  1. Introduction

 

Machine intelligence has revolutionized the arena of social science research in the last quarter of a century. Prolific and wide ranging use of computer software has given rise to smart and sophisticated accumulation, collation and presentation of dense data structure in social and behavioural research. Analysis and interpretation have achieved new heights as social scientists and researchers now often operationalise their theoretical concepts through tangible and testable empirical evidence. Deprecation of numerical presentation, as a result, has given way to understanding and adaptation of statistical tools and methods. The question ‘how is it’ is getting organically connected with the answer to ‘why is it’.

 

Like a subplot in a complete novel, computer application by and large helps to gather, assemble and interpret data generated during specific kind of social science research. It is equally true that in majority of social science research, the inbuilt number processing software bundled with word processing in a given operating system (OS) like Linux, Windows, or Mac, for example, is sufficient to handle preliminary and useful statistical necessities. However, such a reality can never take away the sophistry, ease of use, and substantially extensive range of treatment of raw or semi processed data made possible by specific numerical packages like SYSTAT, STATA, SAS or the popular IBM software Statistical Package for Social Sciences (SPSS).

 

If someone is not conversant with computer application, or cannot toy with statistical manoeuvring, there is nothing to worry. Hundreds of online modules are there in the net, besides the built-in Help menu in most software, to clear your doubts step-by-step and instil a sense of confidence.

 

Notwithstanding the usefulness of computer application for quantitative data analysis, it has to be understood that at the end of the day it is just another tool of research in social sciences, the performance of which is grossly dependent on the training, motivation, research acumen and understanding of the people involved in a specific project and not the other way round.

 

  1. Learning Outcome

 

This module would allow you to learn different processes computation technique in quantitative research. It would also inform you about the fundamental aspects of quantitative data collection and analysis using computer software.

 

  1. Nature and Scope of Quantitative Data Analysis

 

Quantitative social science research process normally includes a few general (G) and some specific

  • (S) processes:
  • Formulating research questions (G);
  • Constructing a research tool or instrument for data collection (G)
  • Sampling procedure which may be probabilistic or non-probabilistic (S);
  • Measurement techniques like survey, scaling, qualitative, unobtrusive procedures (S);
  • Research design may be categorized as exploratory, descriptive, explanatory, experimental and quasi-experimental and the last three are particularly open to quantification (G);
  • Data analysis (S);
  • Idea of validity and reliability of measures (S);
  • Research ethics (G); and
  • Writing the research paper (G).

 

The specific processes (S) mentioned here are more accessible to standardized computational analysis. Recently, however, applications have been developed to use computer for qualitative assay and this would be discussed in a separate module.

 

  1. Beginning of Quantification

 

During the formulation of a research design, often the social scientist would try to find a few factors that can account for most of the variations in a given phenomenon. Usually, these factors are independent variables. An idiographic or theoretical approach to the study would begin with every possible explanation to understand the phenomenon as a whole. In contrast to this, a nomothetic approach would strive to understand the interrelationship between a major (independent) variable and a subsidiary (dependent) variable one after another. It shows that nomothetic explanation is inherently probabilistic. Application of nomothetic representation is the point that opens up the possibility of quantitative analysis. The nomothetic causality in social research is qualified by three major criteria: one, the variables must be correlated; two, the cause must always precede the effect; and, finally the variables in question should be nonspurious (Babbie 2013:20).

 

Expanding further we can say: a causal relationship cannot exist until and unless a statistical correlation is found between two variables. This particular point emphasizes the need to orient social science research on actual observations and not on assumptions. A causal relationship can only exist if the cause precedes the effect in a time scale. The second point put stress on the fact that a salient attribute of an observable fact should come out first before any sequential attribute. For example, if there is an income disparity on the basis of gender, the gender division should always come first before any discrimination on the basis of income. The third point highlights the exclusiveness of the causal relationship between two variables which cannot be explained in terms of some third variable. For instance, if number of doctor in a region shows a positive correlation with number of crime in the same region, it may simply because of population density of the region; hence the relationship is a spurious one. To put it simply, correlation does not equal causation. When social scientists say there is a causal relationship between, for example, education and secular principles, they mean (1) there is a statistical correlation between the two variables, (2) a person’s educational attainment has occurred before adherence to secular principles, and (3) there is no third variable that can explain away the observed correlation as spurious. Hence, a causal relationship should reflect a necessary and a sufficient cause to be tenable. Another example is given in the following figure:

 

The nomothetic model of causal analysis lends itself to determine the a) units of analysis, i.e. what or whom to study; b) the research design, i.e. whether the approach to study may take shape of exploration, explanation or analytic, and description; c) conceptualization, i.e. specification and exact meaning of the concepts to be studied; d) operationalisation, i.e. deciding on the measure of technique(s) to be pursued; and e) hypothesis testing, i.e. specific statement of prediction involving variables in a given research setup which in turn opens up the possibility of computer application in quantitative analysis. Quantitative research process, in general, follows a deductive and empirically driven approach while pursuing the research objective.

 

Self-Check Exercise 1

 

Q 1. What is an Operating System (OS)?

 

An operating system is vital software acting as an intermediary between the user and the computer hardware. OS manages computer hardware, software and provides common services for computer programmes.

 

Q 2. What are the essential features of nomothetic causality?

 

In nomothetic causality, the variables are correlated, the cause precedes the effect, and the variables remain non-spurious.

 

Q 3. Mention two major features of quantitative analysis.

 

The two major features of quantitative analysis are: i) they are deductive in their logical orientation, and ii) they are empirically directed most of the time.

 

  1. Scaling and Coding

 

Data collection process is generally followed by categorization of the information gathered and data editing. This is followed by coding, which entails developing a code book, pre-testing, formal coding, and verification of the coded data. In the beginning, however, the process starts with a decision on the scaling, viz. whether the information generated in the study can be categorized into nominal, ordinal or ratio scale. Accordingly, tabulation and segregation of data into qualitative and quantitative categories take place.

 

The level of measurement refers to the relationship among the values that are assigned to the attributes for a variable. First, knowing the level of measurement helps you decide how to interpret the data from that variable. When you know that a measure is nominal (like the one just described), then you know that the numerical values are just short codes for the longer names. Second, knowing the level of measurement helps you decide what statistical analysis is appropriate on the values that were assigned. If a measure is nominal, then you know that you would never average the data values or do a t-test on the data.

 

There are typically four levels of measurements that are defined:

  • Nominal
  • Ordinal
  • Interval
  • Ratio

 

Table 1:  A Summary of any Social Science Research Process

 

Scale Central tendency Statistics Transformations
Nominal Mode Chi-square One to One (equality)
Ordinal Median Percentile, Non-parametric Monotonic Increase (order)
Interval Arithmetic Mean and Correlation, Regression, Positive linear (affine)
Standard deviation Analysis of variance
Ratio Geometric Mean, Coefficient of variation Positive similarities
Harmonic Mean (logarithmic)

 

Adapted from Bhattacherjee, A. 2012: 45.

  1. Computer Application in Research Process – A thematic understanding

 

Application of computer today is not limited within the ambit of data analysis and interpretation only. The ecology of social science research is comprehensively guided by computation, be it problem formulation, literature review, selection of sampling technique, or data mining. The Table 1 presents a summary of any social science research process. Evidently, starting with the research question in the exploration stage till the final stage of research report, computer applications have their role to play. Nonetheless, the descriptive and inferential tools of computers are used in the most prolific ways during the phase called research execution, viz. pilot testing, data collection and data analysis. Figure 2 makes these phases clear.

 

 

When social researchers are handling large quantity of data and require to store or to arrange them efficiently, a computer based approach is invaluable. The ability of a computer to make repetitive calculations rapidly and accurately has revolutionized quantitative research and it would now seem rather out of date to try organization and analysis of data by hand alone. There are large numbers of computer programmes that can assist with quantitative analysis. Among them the most commonly used are: Excel by Microsoft, SPSS by IBM and Matlab by MathWorks. The fundamentals of these programmes are easily available and explained in the internet, books and periodicals. Basic versions of a few of these programmes are available over the net. As expected, they offer only limited resources for quantitative computation in comparison to the various paid versions.

 

Here is a brief description of programmes commonly used by social scientists while doing numerical and quantitative studies.

 

  1. SPSS: Stands for Statistical Package for Social Science, it is the most popular quantitative analysis software program used in social science research. It is broad in its scope, wide-ranging and flexible enough to be used with more or less any type of file. In its overview of the programme Wikipedia states: The original SPSS manual (Nie, Bent & Hull 1970) has been described as one of “sociology’s most influential books” for allowing ordinary researchers to do their own statistical analysis. Statistics included in the base software are:
  • Descriptive Statistics: Cross Tabulation, Frequencies, Explore, Descriptive Ratio Statistics
  • Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial, distances) and Nonparametric tests
  • Prediction for numerical outcomes: Linear regression
  • Prediction for identifying groups: Factor analysis, Cluster analysis (two-step, means, hierarchical), Discriminant.

 

In addition to statistical analysis, data management (case selection, file reshaping, creating derived data) and data documentation (a metadata dictionary is stored in the datafile) are features of the base software.1

 

It is normally used to generate reports, charts, tabulation and plots of distributions and trends. It is also used to generate descriptive statistics and more complex statistical analyses. SPSS offers a user interface that makes it exceedingly uncomplicated and intuitive for all types of users. The organization of the interface in the format of menus and dialogue boxes help the users to perform analyses without writing command syntax, which is required in some other programmes. The platform is simple and easy to make entry and edit data directly into the program.

 

There are, however, a few drawbacks which might not make it the best program for some researchers. For example, there is a limit on the number of cases you can analyse. It is also difficult to account for weights, strata, and group effects with SPSS.

 

  1. STATA: This one is an interactive data analysis programme that runs on a variety of platforms. It can be used for both simple and complex statistical analyses. STATA generally uses a point-and-click interface as well as command syntax, which makes it flexible to use. STATA has also made it easy to generate graphs and plots of data and results.

 

Analytics in STATA is centred on four windows: the command window, the review window, the result window, and the variable window. Analysis commands are entered into the command window and the review window records those commands. The variables window lists the variables that are available in the current data set along with the variable labels, and the results window is where the results appear.

 

  1. SAS is the short for Statistical Analysis System. It is used by a wide array of researchers because in addition to statistical analysis. It also allows programmers to perform report writing, graphics, business planning, forecasting, quality improvement, project management, and things like these.

 

SAS is a pretty useful program for the intermediate and advanced user because it is extremely robust in its orientation and can be used with very large data sets. Also, it can perform complex and advanced analyses. SAS is good for examinations that require the investigator to take into account weights, strata, or groups. Unlike SPSS and STATA, SAS is run largely by programming syntax rather than point-and-click menus, so some knowledge of the programming language is required.

 

  1. An Interactive Approach to Computer Application in Quantitative Analysis in Social Sciences

 

For ease of access and use, a step by step application of computation for quantitative analysis in social science is given here following popular programmes like SPSS and Excel. Because, these two are generally compatible and switching between them is possible within limitation. Now we begin with the data entry, analysis and graphing. The following texts in this section have been taken eminently from two sources: one, the SPSS module, 2014 installed in the Department of HSS at Indian Institute of Engineering Science and Technology at Shibpur, and two, Cunningham and Aldrich (2012)

 

  • Start SPSS. A window will open. It then asks: what would you like to do? Click Type in data and click OK. A data screen appears.
  • Click Variable View
  • At the top of the screen enter the variable you want to pursue in the cell. A balloon points the cell where you have to make an entry.
  • At the bottom, click Data View. Now enter the values of the variable one by one. In case of mistake, click the cell and re-enter the value.
  • After all entries are made, click File and then click Save As. A window “Save Data As” will open. Give a name of your work/project. Click Save. Your entry is now saved under the given file name.
  • On the SPSS menu at the top, click Analyze, select Descriptive Statistics, and then select Descriptives. A window titled Descriptives will appear.
  • Drag the variable you have saved earlier and click the right arrow to place it in the box.
  • Now click Options, a new window will appear. Click Mean and Sum. Click Continue. Click OK.

 

A screen titled “Output SPSS Statistics Viewer” will appear with a table Descriptive Statistics showing the results of the analysis with the sum and average of the variables you have entered.

  • On the main menu, click Graphs, select Legacy Dialogs, and then click Bar. A small window will appear. Click Simple, and then click Values of Individual Cases. Click Define and a window opens. Click the variable chosen and drag it to the “Bars Represent” box. Click

 

A graph will appear under the table. Save the Output-Viewer screen by clicking File and then Save As. A window will appear. In the “File name” box, enter the name you have given in all lower case without any space. The “Look in” box indicates the location where the file is saved. Click Save. After saving your work click the small red “x” in the top right corner to make the window go away. Click File and then Exit. To reopen it click File, select Open, and then click Data. Click the file name. Click Open.

  • To open an Output file, click File, select Open, and click Output. Then follow the step for opening a data file. You have completed your first use of SPSS! Simple, isn’t it?

 

In the Main Menu page, there are series of templates, viz. File, Edit, View, Data, Transform, Analyze, Graphs, Utilities, Add-ons, Window and Help. Click on one at a time to see the options under each of the templates. If some options are blurred, this simply means you can use them only after generating adequate data following certain procedure.

 

  1. Importing files from Excel

 

Basically, Excel is an easily available spreadsheet that is frequently used to import data files into SPSS.

      • Click the Excel file name.
      • Click Open. A window depicting “Opening Excel Data Source” will surface.
      • Click OK and the file will open in the SPSS Data View screen.SPSS and Levels of MeasurementClick File on the Main Menu. Select Open, and then click Data. A window titled “Open Data” will surface. In the “Files of type” box use the arrow to scroll and then select the file type Excel. Keep in mind that the file you wish to open should be in the “Documents” folder.

    Selecting correct levels of measurement for each of the variables in social science research is extremely important for effective use of SPSS. There are three choices within SPSS for selecting these levels of measurement: nominal, ordinal, and scale. These three levels provide the researcher with different amounts of analytical information. The level of measurement is partly determined by the basic nature of the variable. However, the researcher does enjoy a certain degree of freedom when making choice from one of the three levels. In this light one may say that level of measurement describes how to convert observed variables into measurable information. A few simple illustrations are here to clarify the notions: measurement by a measuring tape shows scale data; ranking positions like nine-, ten-, and eleven- gives ordinal data; and when the girl students in a class-room are counted, we get nominal data.

     

    Now, let’s try get a measurement from our previously “saved work.”

     

    • Start SPSS and click Cancel in the Statistics opening window.
    • Click File, select Open, and click Data
    • Click the name of the saved file, and click Open
    • Click Variables View. Inspect the Measure column that shows “scale.”
    • Click the cell below the Measure column. It shows the three choices.

     

    SPSS by default selects “scale” for all numerical data that the researcher enters in the Data View screen. It means SPSS does not always make the correct decision when it designates data for measurement at the scale level. It is then up to the researcher to decide whether the attributes of the variables to be placed at the nominal, ordinal, or scale levels. This is the point where the researcher should have a clear understanding of the levels of measurement.

     

    Self-Check Exercise 2

     

    Q 1. Name the four types of scaling with examples.

     

    The four types of scaling are: i) Nominal (Gender); ii) Ordinal (Rank in a class); iii) Interval (Celsius scale of temperature); and iv) Ordinal (Time and age).

     

    Q 2. What is metadata?

     

    Metadata is a core data that describes some other data and its content.

     

    Q 3. Define factor analysis.

     

    Factor analysis is a statistical method of data reduction. It accomplishes the task by seeking underlying latent variables that are reflected in the manifest or observed variables.

     

    1. Validating Data

     

    Handling a large data set requires machine validation. There is a Validate data module in SPSS. However, the base SPSS installation does not have it and you need to purchase it separately. On the main menu click Add-ons, select SPSS Data Validation, and click Validate Data. The browser will take you to the website that explains the procedure. This procedure enables you to apply rules to perform data checks based on each variable’s measure level. One can also check for invalid cases and receive summaries regarding rule violations and number of cases affected. Here examples are provided for validation of nominal and ordinal data:

     

    • Click Analyze, select Descriptive Statistics, and click Frequencies

     

    • Click while holding down the Ctrl key, say, Evening Class, Student’s Predicted Grade, Self-rated Anxiety level, Gender, and Instructor Rating (variables measured at the nominal and ordinal levels)
    • Click the right arrow
    • Click OK

     

    Immediately SPSS will produce six tables for the five variables. The additional table will show a summary of the five variables that you selected for this data check operation. With increased complexity in the data structure, you may need to put more emphasis on Analyze, Nonparametric Tests, Compare observed data, Customize Tests, and Pairwise Comparisons all of which are part of the general command structure of the software.

     

     

    1. A Note on Integral Approach to Measurement Validation

     

    Any comprehensive approach to ensure validity must include both theoretical and empirical perspectives. As described by Bhattacherjee (2012: 63): the integrated approach begins in the theoretical sphere. The initial step is to conceptualize the constructs of interest. This includes defining each construct and identifying their constituent domains and/or dimensions. Next, the task is to select (or create) items or indicators for each construct based on the conceptualization of these construct. A literature review may also be helpful in indicator selection. Each item is reframed in a uniform manner using simple and easy-to-understand text. Following this step, a panel of expert judges (academics experienced in research methods and/or a representative set of target respondents) can be employed to examine each indicator and conduct a Q-sort analysis2. Ambiguous items that were consistently missed by many judges may be re-examined, reworded, or dropped.

     

    Next, Bhattacherjee continues, the validation procedure moves to the empirical context. A research instrument is created comprising all of the refined construct items, and is administered to a pilot test group of representative respondents from the target population. Data collected is tabulated and subjected to correlational analysis or exploratory factor analysis using a software programme such as SAS or SPSS for assessment of convergent and discriminant validity. Items that do not meet the expected norms of factor loading should be dropped at this stage.

     

    He then goes on to say that the remaining scales are evaluated for reliability using a measure of internal consistency such as Cronbach alpha3. Scale dimensionality may also be verified at this stage, depending on whether the targeted constructs were conceptualized as being unidimensional or multi-dimensional. Next, evaluate the predictive ability of each construct within a theoretically specified nomological network of construct using regression analysis or structural equation modelling. If the construct measures satisfy most or all of the requirements of reliability and validity described in this chapter, we can be assured that our operationalised measures are reasonably adequate and accurate.

     

    The integrated approach to measurement validation discussed here is quite demanding of researcher time and effort. Nonetheless, this elaborate multi-stage process is needed to ensure that measurement scales used in our research meets the expected norms of scientific research. Because inferences drawn using flawed or compromised scales are meaningless, scale validation and measurement remains one of the most important and involved phase of empirical research.

 

  1. Using Help Menu

 

There is a rich and wide ranging features in the SPSS Help menu. However, it is not the only available option to understand and use definite type of quantitative treatment. For instance, when you request SPSS to compute a statistics or generate a graph, various windows will open. These windows contain options, one of which is Help option that you can click to obtain assistance concerning the statistics or graph you wish to generate. It means the programme is inherently flexible and user-friendly.

 

  1. Beyond SPSS

 

There are many convenient software programmes beyond SPSS which may be put to use in social science research. They have their originalities of approach, need-based sensitivities, and innovative interfaces. Most importantly, all most all of them are freely available in the net. The single major drawback of these programmes is you need to know the fundamentals of computer programming to run them effectively. One such software is called R. It is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and Mac OS. Moreover, such foundational platforms often share their resources in an interactive environment like code sharing. A glimpse in their website4 states:

 

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

 

Programme like this helps the user to follow the algorithmic choices and redefinition of the code.

 

  1. Scope of future development

 

Earl Babbie (2013:381) has shown enthusiasm while describing the possibility of computer simulation (CS) for social indicator research as part of evaluation research. To him, social indicators stand for measurement that reflect the quality or nature of social life, such as crime rates, infant mortality rate, number of physicians per million population, and so forth. Social indicators are often monitored to determine the nature of social change in a society at a given period of time. Further, to him, as researchers begin compiling mathematical equations describing the relationships that link social variables to one another, those equations can be stored and linked to one another in the computer. With a sufficient number of adequately accurate equations on tap, researchers one day will be able to test the implications of specific social change by computer rather than in real life. An early illustration of computer simulation linking social and physical variables can be found in the research of Donella and Dennis Meadows at Dartmouth (Meadows et al 1972)

 

Dawkins (1986:74) once commented “For those, like us, who are not mathematicians, the computer can be a powerful friend to the imagination.” Definitely, when used with probity, these machines enable us to explore the deductive consequences of rules in ways that used to be impossible for many of us. Social dimensions of complex systems such as education and market economy have been simulated with computers and there is scope of important theoretical outcomes from such studies. Yet, there are simple methods of computer simulation that can be used extensively to improve decision process in social science research. CS should be used to determine if data collection strategies that are proposed will answer the questions that are being asked.

 

Self-Check Exercise 3

 

Q 1. Give the first command in validating data in SPSS.

 

Click Analyze, select Descriptive Statistics, and click Frequencies.

 

Q 2. When an item may be dropped from a research plan?

 

Item that does not conform to the expected norm of factor loading after correlational or factor analysis may be dropped from the research coverage.

 

Q 3. What is evaluation research?

 

Evaluation is a set of research methods and associated methodologies with a distinctive purpose. They provide a means to judge actions and activities in terms of values, criteria and standards.

 

  1. Conclusion

 

Philosopher of Science Karl Popper (1902-1994) once commented that theories can never be proven; only disproven. Likewise, in quantitative analysis we, the social researchers, use computer application(s) only to bring out meaningful description and interpretation from the heaps of information collected from a social milieu. Finally, nevertheless, these quantifications only lead to some sort of clarification of the original research question and testing of the theory. It is better to remember that tool based sophistry at any scale is not going to replace understanding and socially necessary knowledge of an interested mind embedded in the material condition of a historical moment.

 

you can view video on Computer Application in Qualitative Data Analysis
  1. References
  • Babbie, E. The Practice of Social Research. California: Wadsworth Cengage Learning, 2013.
  • Bhattacherjee, A. Social Science Research: Principles, Methods, and Practices. University of
  • South Florida: Scholar Commons, 2012     http://scholarcommons.usf.edu/oa_textbooks/3
  • Cunningham, B. James and Aldrich O. James. Using SPSS: An Interactive Hands-On Approach. New Delhi: Sage, 2012.
  • Meadows, H. Donella, Dennis L. Meadows, Jorgen Randers and William W. Behrens. The Limits to Growth: A Report for the Club of Rome’s Project on the Predicament of Mankind. NY: Potomac Associate Books, 1972.
  • Nie, H. Norman, Dale H. Bent and Hadlai C. Hull. SPSS: Statistical Package for Social Sciences.NY: McGraw-Hill, 1970.
  • Dawkins, R. The Blind Watchmaker. NY: W.W. Norton & Company Inc. 1986.