25 Sample Size and its Determination
R. Saratha
Introduction:
A good research study depends on a number of ingredients and sampling technique is one among them. In addition to the technique adopted by the researcher in selecting the sample for the research study, the sample size is often debated as a vital factor that makes the study meaningful. There is a notion that a large sample may make the study more robust but there are counter arguments that large sample alone does not make the study effective and so many other key parameters of research should also be looked into. This chapter deals with sample size considerations for various types of research designs.
2. Learning Objectives:
At the end of this session you will be able to:
- Conceptualize sample size and its variations depending on various factors like variable, research design, measurement tools etc.
- Understand the principles that govern sample size.
- Know about various methods in estimating the sample size.
3. FACTORS SIGNIFICANT IN SAMPLE SELECTION
Before getting into the detailed descriptions on these, it is essential to review the sample selection concept in the light of the vital factors which make the research better. These factors are as important as the sample size consideration too in a research.
A good research study addresses the following key factors:
- Clear Objectives
- Research questions that can be effectively addressed
- Design of the study clearly linked to the objectives and research questions
- Sampling procedures
- Tools used in the study
- Clear definition and classifications of Variables
- Data gathering procedures
- Hypotheses relevant for the study
- Data analysis procedures as per the design and samples
The focus in the objectives has an impact on the sample selection too. In case the objective of the study is exploratory in nature then selection of a large sample is inevitable whereas a researcher working on a confirmatory research may work with a small sample too.
The research questions of the study are linked to the hypothesis and the divergent nature of the questions may also determine the sample requirement.
The sample selection for the study depends on some of the above factors and therefore, sample size should be viewed in conjunction with all vital parameters of research.
The terms “large sample” and “small sample” are generally used in discussions on sample size and one may ask whether the large or small can be quantified. In fact quantification of these terms becomes difficult as they are research specific
For example, an experimental research may treat even 10 samples as large for the particular study as working with every individual requires a lot of time and better controls as well. On the other hand a researcher may consider even 50 samples as small if the research is aimed at finding out the opinions of students about objective type questions in examinations. 500 samples may also be considered small if a State-wide survey of teachers is conducted about a revised curriculum. Therefore, the terms “large” and “small” samples used in this chapter and also in general should be viewed in the context of the specific topic under discussion.
4. SAMPLE SIZE AND VARIABLES
Assume that the researcher wants to use 4 variables namely, locality (rural/urban/suburban), type of management (Government/private), type of school (primary/secondary/higher secondary), and gender (boys/girls) using stratified random sampling and for this purpose, the researcher has to gather data from 36 sub-groups as per the following table:
Let us also assume that the researcher wants at least 20 subjects for each sub-group as he/she is interested in using a sizable sample for applying parametric statistics. Analysis of this data reveals the complications involved in its selection itself, which are enumerated as follows:
As the researcher is interested in applying parametric statistics, he/she should ensure that the population of each of the sub-group is normally distributed, the variances among the sub-groups are equal and the each sample is independent of each other. It implies that the population of each sub-group should be fairly large enough so that the 20 students are randomly selected. As is evident, the type of statistical analysis proposed to be used also decides the size of the sample.
Besides the statistical procedure, the number of variables and the levels within each variable too decide the sample size. In this research the researcher uses 4 variables and reduction of one variable will definitely decrease the number of sub-groups which may also warrant less number of samples. Review of related literature is of utmost importance to find out the inter-relation between variables used in the study in order to use the right variables less in number instead of simply using more variables in the study. Sometimes analysis of the impacts of variables on the study might reveal that some variables have identical effects in which case eliminating one among them will not alter the results of the study to a large extent.
The levels within each variable too have impact on the sample size. For example, the variable – locality has three levels namely rural, urban and sub-urban. Reduction of one level by combining the 3 levels into two would have an impact on the sample size. The above design after leaving out the variable “suburban” may be redesigned as follows:
The type of schools too has 3 levels and bringing this down to 2 levels may reduce the number of sample. It should be noted that the number of variables or the levels within the variables should not be altered simply for the sole purpose of reducing the sample size but that can be done if the results are not going to be affected to a large extent. The design after leaving one of the levels ‘higher secondary” will be as follows provided the sample size in each sub-group does not change:
Having too many sub-groups in the study may be good from the point of view of analysis but gathering data from many such groups involves time and resources too. Therefore, the researcher should decide to what extent the resources and time allow data collection and this has an impact on the sample size too.
In summary, selection of the right kind of variables in the study will definitely have a bearing on the sample size and this demands a thorough knowledge of the research area. A researcher who has an insight about the correlation between potential variables will be able to narrow down to a few but significant variables for the study instead of including too many variables.
5. SAMPLE SIZE AND THE DESIGN OF THE STUDY
The research design too has an impact on the sample size. Assume that the researcher is interested in doing an in-depth study about behavioral changes in children and uses single subject research or a time series analysis. For such studies a few subjects are studied over a period of time. Here the methodology should be robust as it deals with a small sample and the data gathering should be meticulous as the researcher may also try to generalize patterns of behavioral changes based on few case studies.
Some researchers may try to do a sample survey on specific issues and for this a reasonable number of participants will be necessary. Survey research is not done with a small sample. Opinions on specific concepts, social issues, etc., can be studied when the sample is fairly large.
Experimental studies do not involve a large sample but better control of the experimental conditions is imperative to make the results effective. Experimental designs generally satisfy the following conditions:
- Two groups – experimental and control are involved in the experimental research
- The samples in both the experimental and control groups are randomly selected.
- Both groups are administered pre-tests and post-tests
- The experimental group alone gets the treatment
These strict stipulations make the sample size fairly small as better controls are necessary. There are good experimental studies involving as low as 5 students which are reported in the literature and therefore, the researcher may work with less number of sampling units if the characteristics of experimental design are satisfied.
6. SAMPLE SIZE AND THE MEASUREMENT TOOL OF THE STUDY
The sample size of the research may be linked to the type of measurement tools used by the researcher too. For example, the interview method in a research may not involve a huge sample as it takes time and human resources. Some interviews may be conducted online but in this method too, there is significant time and number of samples involved.
In some other research, the investigator might use observation techniques to note the behavioral changes and this type of research too may not involve a large sample. On the other hand, questionnaire, rating scales, inventories, etc., when used as measurement tools may involve a large number of sample too.
Some statistical analysis procedures which have implications on the size of the sample are briefly mentioned as follows:
- Analysis of Variance: Procedure used to compare the effects when more than two variables are used in the study. Use of ANOVA reduces the error compared to doing pair-wise comparisons. ANOVA uses F test.
- Analysis of Covariance: Adjusting the post-test scores on the basis of new variables and then comparing the difference between the experimental group and control group.
- Canonical Correlation: When the researcher is interested in computing correlation between a set of independent variables and a set of dependent variables, the canonical correlation technique is used.
- Multiple Regressions: The process of using more predictors in order to explain the variance in the criterion is called multiple regressions.
7. PRINCIPLES OF CALCULATING SAMPLE SIZE:
Consequential research requires an understanding of the statistics that drive sample size decisions. Before you can calculate a sample size, you need to determine a few things about the target population and the sample you need:
- Population Size — first and the foremost a researcher needs to know how many subjects or units will fit into the demographic, which forms the sample size.
- Margin of Error (Confidence Interval) — there cannot be a sample of extreme accuracy or perfection. Hence the researcher needs to decide how much errorcould be allowed for the sample mean to fall from the population mean. This is called the confidence interval. The common confidence interval is +/- 5.
- Confidence Level — this is the level of confidence that denotes that the sample mean actually falls within the confidence intervals. The usual confidence levels are 90%, 95% and 99 %
- Standard of Deviation — refers to the variance we expect from the respondents. As it is before the survey, it’s always safe to use 0.5 deviation, which is the most forgiving number and ensures that your sample will be large enough.
8. METHODS OF ESTIMATING SAMPLE SIZE
In order to calculate the sample size, it is required to have some idea of the results expected in a study. In general, the greater the variability in the outcome variable, the larger the sample size required to assess whether an observed effect is a true effect. On the other hand, the more effective (or harmful!) a tested treatment is, the smaller the sample size needed to detect this positive or negative effect. Estimating the sample size for a trial requires four basic components:
8.1 The type I error (alpha). Most research studies are usually performed in a sample from a population rather than in the whole study population. In research, we are testing hypothesis to determine whether (results in) particular samples differ from each other. On the one hand, the null hypothesis (H0) hypothesizes that the groups of subjects (samples) that are being compared are not different, that is they come from the same source population. The alternative hypothesis (H1), on the other hand, hypothesizes that these groups are different and that therefore they seem to be drawn from different source populations.
Sample size calculations are needed to define at what number of subjects it becomes quite unlikely that adding more subjects will change the conclusion. In the process of hypothesis–testing, two fundamental errors can occur. These errors are called type I and type II errors. The type I error (alpha) measures the probability that, given the H0 that the samples come from the same source population, the differences found are likely to happen.
8.2 Power. Instead of a false-positive conclusion, investigators can also draw a false-negative conclusion. In such cases, they conclude that there is no difference between two groups or treatments when in reality there is, or in other words, they falsely accept the H0 that the compared samples come from the same source population. This is called a type II error (beta). Conventionally, the beta is set at a level of 0.20, meaning that the researcher desires a <20% chance of a false-negative conclusion. For the calculation of the sample size, one needs to know the power of a study.
8.3. The smallest effect of interest. The smallest effect of interest is the minimal difference between the studied groups that the investigator wishes to detect and is often referred to as the minimal relevant difference.
For example, if body weight is the outcome of a trial, an investigator could choose a difference of 5 kg as the minimal relevant difference. In a trial with a binary outcome, for example the effect of a nutrient on the development of a child/infant (yes/no), an investigator should estimate a relevant difference between the event rates in both treatment groups and could choose, for instance, a difference of 10% between the treatment group and the control group as minimal relevant difference.
8.4. The variability. Finally, the sample size calculation is based on using the population variance of a given outcome variable that is estimated by means of the standard deviation (SD) in case of a continuous outcome. Because the variance is usually an unknown quantity, investigators often use an estimate obtained from a pilot study or use information from a previously performed study Let us take an example of the experimental study and see how the sample is calculated. As two groups are involved in the research, the test of significance is usually done by applying “t” test. For applying “t” test, the level of significance is important. The researcher may use 0.01 level or 0.05 level in the research. Let us assume that the researcher in this case takes 0.01 as the significance level. Also assume that the test administered for the experimental group works on the basis that the standard deviation is 12. Verification of the “t” table reveals that 2.7 is the value required if the test is administered to about 45 cases with a significance level of 0.01. Assume that the estimated mean difference between the experimental and control group is 6. Then the sample size (N) can be estimated by using the formula
N = [2(SD)2 x (Expected t-value)2] / [(Estimated mean difference)2] Using the above formula we get the estimated sample as follows:
N = [(2x(12×12)) x (2.7×2.7)] / (6×6)
= (288 x 7.29) / 36
= 53.84 or 54
This estimation works when the researcher takes full control of the independent, dependent and extraneous variables explained in the earlier sections of this chapter.
In summary, sample size is a relative concept which should be viewed in the context of the specific study and also for the purposes for which such studies are conducted. As indicated in this chapter, sample cannot be considered in unison in research and it’s interlink with all ingredients of research should be taken into account in determining the quantum and as well as the quality of the sample.
9. SUMMARY
In this lesson, we have enumerated the importance of sample size in research, its determination, principles of calculating sample size and methods of estimating sample size. As you understand by now, the sample size for a research study depends on a number of parameters such as the nature of the research, objective, analysis technique, etc. All research studies may not warrant large sample whereas adequate sample is inevitable when the researcher tries to make comparison between groups with the help of specific statistical procedures. In an experimental study, even small sample will work if control factors in the research are handled effectively. In applying parametric statistics, the researcher should ensure that the population of each of the sub-group is normally distributed, the variances among the sub-groups are equal and each sample is independent of each other and these conditions determine the size of the sample. The confidence level used in a research also is a determining factor of sample size. The stricter confidence limits warrant better control of variables and a larger sample too as more accuracy is need for the effective generalisation of the results into the population. We have also discussed methods using formula in deciding the sample size but the researcher should not rely only on this technique and also take into account factors such as design, tools used in the study, variables used, type of statistical procedures used, etc., in deciding the sample size.
you can view video on Sample Size and its Determination |
Suggested References
- C.R. Kothari (2004), Research Methodology, methods & techniques second edition, revised. New Delhi, India: New Age Publishing Company, P55-67
- Ranjit Kumar (2011), Research Methodology a step-by-step guide for beginners, third edition, New Delhi, India, Sage Publications , P 175- 189
- John W. Creswell & Vicki. L .Plano Clark (2006), Designing and conducting Mixed Methods Research, second edition, California, Sage Publications, P 195 & 196
- Yin, R.K. (2016). Qualitative Research from Start to Finish, Second Edition. New York: The Guilford Press.
- Paul D. Leedy (2016) Practical Research: Planning and Design, 11th Edition, University of Northern Colorado (Emerita)
- Santhosh Gupta(2001) Research Methodology and Statistical technique, , New Delhi, India , Deep& Deep publications ISBN 81-7100-501-2
- G.R. Basotia &K.K. Sharma (2002) Research Methodology, Jaipur, India , Mangal Deep Publications, ISBN: 81-7594-090-5 P.Saravanavel (2007) Research Methodology, Allahabad, India, Kitab Mahal Publications, ISBN: 81-2225-0010-2
- R.Panneerselvam(2004), New Delhi, India, Phi Learning Private Limited, ISBN: 978-81-203-2452-7
- Welter R. Borg & Meredith D. Gall, Educational Research- An Introduction, fourth edition, New York & London, Longman Publications, ISBN: 0-582-28246-2