35 SPSS: essential tool for demographic data analysis
Nupur Mahajan and Gautam Kshatriya
Contents:
- SPSS: An overview
- History of ownership of SPSS
- Structure of SPSS
- Uses of SPSS in demography
- Applications of SPSS
- Limitations of SPSS
Learning Objectives:
- To give an overview of SPSS
- To get a brief about the history and development of the software package
- To know about the structure and functions in SPSS
- To familiarize the users with the applications and uses of SPSS in demography
- To know about the limitations of the package.
Overview
Logo icon of SPSS version 23.
SPSS is an acronym for Statistical Package for the Social Sciences. It is a software package which is used for statistical analysis of data. The software which was originally designed to perform analysis related queries of social sciences and allied disciplines is now being used extensively in the field of health sciences, marketing, data mining, education, market research and many more (KDnuggets, 2013). The original manual of SPSS which came out in 1970 by Norman H. Nie, Dale H. Bent, and C. Hadlai Hul was regarded as the most influential books in sociology as it provided a platform for ordinary researchers to carry out statistical analysis on their own. A few added features in the base software apart from statistical analysis are data documentation and data management.
SPSS is capable to read and write data from ASCII text files, spreadsheets and databases and can read and write to external relational database tables via SQL and ODBC. The statistical output is saved in
.spv file format which is a proprietary file format in which a stand-alone reader can be downloaded along with the in-package viewer. The .spv files can be exported to text or word, PDF, Excel or other formats also. Using the OMS command, the output can been retrieved as text, PDF, XLS, HTML, XML, SPSS dataset or in various graphic image formats such as JPEG, PNG, BMP and EMF. SPSS Statistics Server is a version of SPSS Statistics with client/server architecture.
The first version of the software was developed by Norman H. Nie, Dale H. Bent, and C. Hadlia Hull in the year 1968 as the statistical package for social sciences (SPSS). The early versions of SPSS statistics were designed for batch processing on mainframes. These versions including the IBM and ICL ones used punched cards for input. A processing run read a command file of SPSS commands and either a raw input file of fixed format data with a single record type, or a ‘getfile’ of data was saved by a previous run. To save precious computer time an ‘edit’ run could be done to check command syntax without analysing the data. In 1983, version 10 of SPSS was launched in which the data files could contain multiple record types.
SPSS Statistics versions 16.0 and later run under Windows, Mac, and Linux. The graphical user interface is written in Java. The Mac OS version is provided as a Universal binary, making it fully compatible with both PowerPC and Intel-based Mac hardware.
SPSS Statistics version 13.0 for Mac OS X was not compatible with Intel-based Macintosh computers, due to the Rosetta emulation software causing errors in calculations. SPSS Statistics 15.0 for Windows needed a downloadable hotfix to be installed in order to be compatible with Windows Vista.
Prior to SPSS 16.0, different versions of SPSS were available for Windows, Mac OS X and Unix. The Windows version was updated more frequently and had more features than the versions for other operating systems.
History of ownership of SPSS
SPSS Inc. announced on July 28, 2009 that it was being acquired by IBM for US$1.2 billion (IBM press release, 2009.). Because of a dispute about ownership of the name “SPSS”, between 2009 and 2010, the product was referred to as PASW (Predictive Analytics SoftWare) (Sachdev, 2009). As of January 2010, it became “SPSS: An IBM Company”. Complete transfer of business to IBM was done by October 1, 2010. By that date, SPSS: An IBM Company ceased to exist. IBM SPSS is now fully integrated into the IBM Corporation, and is one of the brands under IBM Software Group’s Business Analytics Portfolio, together with IBM Algorithmics, IBM Cognos and IBM OpenPages.
SPSS logo which was in use before its renaming in 2010.
Release history of versions of SPSS
• SPSS 15.0.1 – November 2006
• SPSS 16.0.2 – April 2008
• SPSS Statistics 17.0.1 – December 2008
• PASW Statistics 17.0.3 – September 2009
• PASW Statistics 18.0 – August 2009
• PASW Statistics 18.0.1 – December 2009
• PASW Statistics 18.0.2 – April 2010
• PASW Statistics 18.0.3 – September 2010
• IBM SPSS Statistics 19.0 – August 2010
• IBM SPSS Statistics 20.0 – August 2011.
Structure of SPSS
SPSS for windows has the same general look and feel like most of other programmes in Windows. Virtually anything statistic that one wishes to perform can be accomplished in combination with pointing and clicking on the menus and various interactive dialog boxes.
Once the SPSS icon is clicked upon, a new window will appear on the screen. The appearance is that of a standard programme for windows with a spreadsheet like interface. There are a number of menu options relating to statistics, on the menu bar. There are also shortcut icons on the toolbar. These serve as quick access to often used options.
The set-up of the SPSS is organized into two main sections, for defining and entering data and for output. While defining and entering data, users can move between the ‘variable’ and ‘data’ views by clicking on the tabs at the bottom of the screen. The third ‘output’ section opens in a separate window and displays the results of the statistical analyses. The ‘output’ data are saved as a separate file to the data set. The data can either be entered manually or it can be read from an existing data file. There are four types of file menu in SPSS.
- Data view which is the default window with a blank data sheet is ready for analyses.
- Syntax is where one can write scripts like those present in the Howell text, instead of using options from the menu.
- Output shows the result which is the outcome of the procedure which was performed to get the final result of the data entered. The output is directed in a separate window. One can also have multiple [Output] windows open to organize the various analyses that might be conducted. Later, these results can be saved and/or printed.
- Script window provided the opportunity to write full blown programmes, in a BASIC like language. These programmes have access to functions that make up SPSS. With such access it is possible to write user-defined procedures- those not part of SPSS- by taking advantage of the SPSS functions.
Also present in the [File] menu are two separate avenues for reading data from existing files. The first is the [Open] option. Like other application packages SPSS also has its own format for saving data. Here, the accepted extension for any file which is to be saved is by using “*.sav”. So, one can save the file name by “data.sav” and so on. This format is not readable with the text editor, it is binary format. The benefits are that all formatting changes are maintained and the file can be read faster, hence the [Open] option is used. It is specifically meant for files saved in the SPSS format. The second option, [Read ASCII Data], as the name suggests is to read files that are saved in ASCII format. As can be seen, there are two choices- [Freefield] and [Fixed Columns]. Clicking on one of these options will produce a dialog box. One must specify a number of parameters before a file can be read successfully.
In the ‘variable view’ (Figure 1), users enter the data and analyze cells by naming and defining variables which are to be included in the data set. The names for variables which are entered by the users are limited to eight or fewer characters, also, the title or names must begin with some alphabet. The descriptions of the variables are added using the label dialog box, which can be used by the users as reference during analysis (Figure 2). A swift way to classify the variable format (including the variable type, the number of characters used and labels) if the number of variables have similar format is by copying the attributes of those variables and then pasting them into other variable fields of similar nature.
Once the variables which are to be recorded have been named properly and are defined appropriately, the user can access ‘data view’ window for entering the values for each variable which has been defined in the variable view. The SPSS data view look wise is similar to a spreadsheet one sees in Microsoft excel. The variables are organized in the form of columns with each row defined as a single case in the data set which consist values for the variables relating to that particular case. A common practice adopted by the users for entering data is by using codes to enter data into the SPSS package. The labels can be used to illustrate values wherever it is necessary. For example, codes may be used to record the types of motor vehicle one has, or respondent’s educational and socio-economic levels. The defined labels will appear once the user clicks the drop-down list arrow which appears on the right side of each cell with defined labels, using this; the user can select the values which are relevant for his or her analysis (Figure 3).
Figure 1: Variable view in SPSS v.16
Figure 2. Defining variable labels using the Value labels dialog box
This is useful when the possible responses are large in number, therefore, coding for each variable becomes helpful as the user might not be able to remember all the variables at all times. The user can prefer to have codes or labels to be displayed in the data view by choosing ‘Value labels’ option which is listed under ‘View’ in the menu.
The SPSS has an extensive range of logical and analytical functions, varying from basic descriptive statistics to a much advanced general linear modeling analysis. Specific functions are incorporated in this package which allows transformation of variables which might be required in the preparation and performing of different tests, for example, for creating logarithmic values, or calculation of scales from a specific number of variables, data needs to be transformed.
The utilization of these functions allow researchers and scholars to calculate and estimate new variables in not time. Based on the values of other variables, the test variations in group schemes are used to organize responses to open ended questions, and subside categories where necessary.
Once the data is entered in the SPSS package, it becomes important to check the database for any type of typographic error which may influence the result of the statistical analysis. One way to achieve this is by examining the frequencies of nominal data, and descriptive statistics of numeric (ordinal, scale or interval) data. The analytical functions can be accessed using the ‘Analyze’ menu (Figure 3).
Figure 3. Analysis options available in SPSS
In the Descriptive statistics option of analyze menu, the Frequencies options are selected, a dialog box illustrated in Figure 4 appears. This dialog box helps users to select variables for which frequencies are to be calculated. It also enables the users to control the types of frequencies to be estimated. The formatting of displays of analyses can be done in this option as well.
Figure 4. Frequencies dialogue box
If computation of descriptive statistics is required, users must select ‘Descriptive statistics’ and ‘Descriptives’ options under the Analysis menu to see the ‘Descriptives’ dialog box.
Figure 5. ‘Descriptives’ function dialog box
Once the ‘Descriptives’ dialog box appears, the variables which are to be included in the analyses are then chosen from the list on the left side of the dialogue box, and then transferred to the list on the right side of the box which is labeled as Variables, using the arrow which appears in the centre of the two boxes. The kind of descriptive statistics which will be calculated using this function are selected by clicking on the ‘Options…’ button. Then the ‘Options’ dialog box appears for the descriptives function.
Figure 6: Options dialog box for the ‘Descriptives’ function dialog box
Other analytical functions which are included in the SPSS include chi-square tests, correlations, regression analyses, principal components factor analyses, ANOVA, cluster analyses, general linear modeling and many more.
Figure 7. Choosing an appropriate statistical procedure (Source: Corston and Colman, 2000) There are functions such as crosstabs, general tables, multiple response tables, and tables of frequencies which are becoming helpful to identify any shortcomings and weaknesses in the dataset at the commencement of the analysis, so that it does not bound the statistical soundness of the analysis. Crosstabs are a very proficient way of presenting data in a summarized way in research and project reports.
The charting functions offered in SPSS also provide a number of techniques for initial examination and presentation of data. Scatter Plots can be used to identify promptly the presence and nature of any correlations between variables while the histograms are used to give a graphical representation of the distribution of the data for important variables.
Use of SPSS in demography
The descriptive use of demographic data was once perceived to be a radical innovation. Bourdieu was the first anthropologist to use demographic statistics in a systematic manner for his fieldwork in parts of Algeria, where he recorded the frequencies of different types of marriages. He found that amongst all the diverse types, the lowest frequency was of parallel cousin marriages. This analysis was of great importance, as the areas where this study was conducted were already well studied through qualitative methods and this quantitative representation which the support of diagrams and charts depicted the norms of marriage much more clearly in these areas. His analysis led to major revisions in the theories of marriages strategies.
Optimal scaling is an approach which is the modern version of correspondence analysis. It accommodates demographic data from various participants and varying measurement assumptions. The SPSS package optimal scaling option allows the user to make choice between analyses which treats all variables as having nominal measurement properties at best (Multinominal) or analyses which take into consideration the potential differences in measurement properties. The optimal scaling analysis provides a basis for collapsing across smaller frequency response categories. Researchers collapse categories within variables lead up to entering demographic variables as independent variables (IVs) in univariate or multivariate parametric analyses. This is done to minimize the likelihood of these analysis from becoming unstable and unreliable because the presence of one or more cells with less than 6-10 respondents or those cells with less that 10% of the total number of participants. So, collapsing across response categories can be a desirable mediator step after the preliminary categorizing level analysis, supported by frequency and contingency table analysis of participant numbers. The collapsing of response categories affects follow-up examinations of the social space which is unfolded and revealed via demographic data. It affects the analysis as it reduces the complexity of data. The reduction in data complexity could be a loss or gain depending on the value of the variables with more complex description.
SPSS allows a research to describe relationships between two variables or many variables through graphs, diagrams, charts and other visual representation tools. To commence, the optimal scaling is used to generate a spatial illustration. This is achieved by entering variables into optimal scaling as multinomial variables. Multinomial optimal scaling does homogeneity analysis, which quantifies categorical data by giving them numerical values to the objects and categories. The goal of homogeneity analysis is to illustrate the associations between two or more nominal variables in a low-dimensional space containing variable categories as well as the objects in those categories. Objects which lie within the same category are plotted close to each other, while those objects in different categories are far apart in the plot diagram. Homogeneity analysis is Multiple Correspondence Analysis. It can be seen as the principal component analysis of nominal data. This analysis is ideal over the standard principal components analysis in the cases when the linear relationship between variables do not hold true or the variables are measured at a nominal level.
Applications of SPSS
SPSS has been around since 1960s, and out of the major packages which are available SPSS seems to be the easiest and most user-friendly software for statistical analysis. One can use it with either a Windows point-and-click approach or through syntax by writing SPSS commands to perform the analysis. Each of this has its own advantage and the user can switch between the approaches conveniently. Also, many of the widely used social science data sets come with an easy method to translate them into SPSS which has led to significant reduction in the preliminary work which is required to explore new data sets.
SPSS is very robust statistical software which is capable of performing complex statistical tests. It is user-friendly and the “stat coach” option in online help menu helps in interpretation of results easily.
Limitations of SPSS
Although SPSS is being used widely by the researchers and scientist all across the world for its easy and user-friendly interface and functions. There are certain limitations to the software which might hamper its use judiciously, and therefore, researchers might use alternative software such as STATA, SAS, R-statistics, etc. along with SPSS to cater to their needs more efficiently. There is a need for fast processor while working with charts on SPSS. The inexperienced users may apply inadequate or inappropriate methods for data analysis, as it consists of too many click and use options. It lacks some of the major regression analysis techniques. It is difficult to edit the output within the output window. For editing, the output, it needs to be copied in Microsoft word or other applications. The users also face the problem of importing and exporting data in some specific formats. It also lags and becomes slow at time, if the latest hardware is not in use. It is difficult to present different figures and charts by using the same scale for comparison (Weakness of spss. Available from: https://www.researchgate.net/post/Weakness_of_spss, accessed April 28, 2017).
Summary
The module clearly indicates that SPSS is an essential tool for data analysis and is being used by social scientists as well as researchers involved in natural sciences extensively. The 1st version which was developed in 1968 has been revised and reviewed over the years and very recently version 24 has been launched by IBM. The purpose of SPSS was to cater to the analysis related queries of the researchers, and this objective is still intact. SPSS provides its user with various techniques for examination and presentation of data. Scatter plots, histograms and cross-tabulations are some of the charting functions which help in understanding the correlation between variables; provides graphical representation of the data and also presents data in a summarized form. SPSS is the easiest, most user-friendly software. It is robust and is capable of performing complex statistical tests. It also has a stats-coach option which facilitates easy interpretation of results. But, even though SPSS has many advantages, there are some shortcomings too. Users experience lagging issues with the software if many functions are worked upon at one time, therefore, there is a need for faster processor which will make it more efficient. It does lack some major regression analysis techniques. Also, editing in the output window is little cumbersome and time taking.
you can view video on SPSS: essential tool for demographic data analysis |
References
- Corston, R. and Colman, A. 2000. A Crash Course in SPSS for Windows, Blackwell, Oxford.
- IBM Press release. 2009. Available from: http:/ / www. spss. com/ ibm-announce/ accessed April 28,2017.
- KDnuggets.2013, May. Annual Software Poll: Analytics/Data mining software used? KDnuggets online news. Available from: http://www.kdnuggets.com/polls/2013/analytics-big-data-mining-data-science software.html, accessed April 28, 2017.
- Sachdev, A. 2009, September 27. IBM’s $1.2 billion bid for SPSS Inc. helps resolve trademark dispute. Chicago Tribune.
- Weakness of spss. Available from: https://www.researchgate.net/post/Weakness_of_spss, accessed April 28, 2017.
Suggested Readings
- Bourdieu, P. 1979. Algeria 1960: The disenchantment of the world, the sense of honour, the Kabyle House or the world reversed. Cambridge: Cambridge University Press.
- Bourdieu, P. 1984. Distinction: A social critique of the judgement of taste. London: Routledge & Kegan Paul.
- SPSS for Windows. 2003. Chicago: SPSS Inc.