26 Test and Retest in Anthropological Research

Henry Konjengbam and Sanjenbam Yaiphaba Meitei

epgp books

 

Contents:

 

1.0 Introduction

 

2.0 Test-retest reliability

 

3.0 Assumptions and Considerations

 

4.0 Advantages of test-retest methods

 

5.0 Disadvantages of test-retest methods

 

6.0 Test-retest coefficient correlation

 

7.0 Test-retest in Anthropological Research

 

Summary

 

 

Learning objectives:

 

To discuss in detail:

  •  What do you mean by reliability?
  •  What is test-retest method for assessment of reliability?
  •  What are the advantages and disadvantages of test-retest method for assessment of reliability?
  •  What is test-retest coefficient correlation?
  •  How can we calculate test-retest coefficient correlation? and
  •  Applications of test-retest reliability in Anthropological Research.

 

1.0 Introduction

 

Reliability is one of the most important elements of experiment or measurement quality in a research. It deals with the consistency or reproducibility of a researcher performance on the test or data collection. Reliability is the extent to which measurements are repeatable – when different researcher performs the same measurements or experiment, on different time intervals, under different conditions. In sum, reliability is consistency of measurement (Bollen, 1989), or stability of measurement over a variety of conditions in which basically the same results should be obtained (Nunnally, 1978). Reliability indicates the extent to which measurement methods and procedures yield consistent results in a given units of study, in different circumstances. For example, if a researcher records intelligence level in a selected sample several times, and each time the measurement produces a similar intelligence test score, that intelligence test has high reliability. Reliability is a property of the scores on a test for a particular unit of research population. When talking about measurement in the context of research, there is an important distinction between being valid and being reliable. Validity refers to whether the measurement is correct whereas reliability refers to whether the measurement is consistent. Reliability can be investigated directly from the test data; no data external to the measure are required. The basic issues of reliability lend to mathematical analysis, and the amount of deviation can also be stated in mathematical terms. Reliability of a test can be assessed in a variety of ways. Test-retest method for assessment of reliability is one of the easiest ways to estimate reliability in which the same test is given twice, after an interval of time, to the same individuals. The most commonly used technique to estimate reliability is measure of association, the correlation coefficient, which is often termed as reliability coefficient (Rosnow and Rosenthal, 1991). The reliability coefficient is the correlation between two or more variables which measure the same dataset, which will be going discussed in details in the following pages. Beside the test-retest method, the other methods of reliability assessment are as follows (https://www.socialresearchmethods.net/kb/ reltypes.php).

 

  • Internal consistency method: It is used to assess the consistency of results across items within a test.
  • Parallel forms method: It is used to assess the consistency of the results of two test constructed in the same way from the same content domain.
  • Inter-rater or Inter-observer method: It is used to assess the degree to which different raters/observers give consistent estimates of the same phenomenon.

2.0 Test-retest method for assessment of reliability:

 

Test-retest method is the variation in measurements taken up by a single researcher or an instrument on the same parameters or experiment, under the same conditions, and in a short period of time. Test-retest reliability is the most common way to estimate the stability of a measure over time. It is conceptually and intuitively the simplest approach and one that most closely corresponds to the view of reliability as the consistency or repeatability of a measure. That is, if a researcher has same people, and takes the same measurement or experiment on more than one occasion (i.e. a first test occasion and retest occasion), the correlation between each test measurement will be the test-retest reliability. Test-retest reliability is limited because scores may improve due to practice or learning effects.

The reliability of a set of scores is the degree to which the scores result from systemic rather than chance or random factors. The test-retest method measures the proportions of the variance among scores that are a result of true differences. True differences, here, refer to the actual differences, not measured differences.

  • Test-retest coefficient correlation:

The test-retest of a measure is estimated through reliability co-efficient. Reliability coefficient is often considered as a measure of the accuracy of a test or measuring instrument obtained by measuring the same individuals twice and computing the correlation of the two sets of measures (https://www.merriam-webster.com/dictionary/reliability%20coefficient). A test-retest coefficient assumes that the characteristic being measured by the experiment is stable overtime, so any change in scores from one time to another is caused by random error. The error may be caused by the condition of the participants themselves or by testing conditions. The test-retest coefficient also assumes that there is no practice effect or memory effect. The correlation coefficient obtained by this procedure is called test-retest reliability coefficient correlation. It is also sometimes referred as coefficient of stability.

The test-retest reliability coefficient correlation is usually calculated by Pearson coefficient correlation when the measurement or experiment is taken at two time intervals where as the measurement or experiment which is performed more than two occasions are usually calculated by intra class correlation. It is calculated by the formula given below:

 

Test-retest reliability coefficient correlation can be classified into various levels of reliability based on the value of ‘r’ calculated. Some of the important levels of reliability are given below:

 

  • Perfect reliability: Test retest reliability is said to be perfect reliability when the correlation value is exact 1.
  • Excellent reliability: Test-retest reliability is said to be excellent reliability when the correlation value is greater than equal to 0.9.
  • Good reliability: Test-retest reliability is said to be good reliability when the correlation value is between 0.9 and 0.8.
  • Acceptable reliability: Test-retest reliability is said to be acceptable reliability when the correlation value is between 0.8 and 0.7.
  • Questionable reliability: Test-retest reliability is said to be questionable reliability when the correlation value is between 0.7 and 0.6.
  • Unacceptable reliability: Test-retest reliability is said to be unacceptable reliability when the correlation value is less than equal to 0.5.

Assumptions and considerations for test-retest method:

The applicability of test-retest reliability depends on two primary assumptions:

  • The first assumption is that the participants’ true score is stable across the two testing occasions. That is, the researcher must be confident that respondents’ true scores do not change from the first measurement to the second measurement.
  • The second assumption is that the error variance of the first test is equal to the error variance of the second test.

Condition for Test-Retest Method:

 

According to Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement

Results (1994), the following conditions need to be fulfilled in the establishment of test-retest method:

The same experimental tools. The same observers.

The same measuring instrument, used under the same conditions. The same location.

Repetition over a short period of time. Same objectives.

 

 

3.0 Advantages of test-retest method:

 

  • Test-retest method is appropriate for determining the reliability of test designed to measure attributes that are relatively stable over time and that are not affected by repeated measurements.
  • Test-retest method is appropriate for a test of aptitude, which is stable characteristic, but not for a test of mood.
  •  Good test-retest method signifies the internal validity of a test and ensures that the measurements obtained in one sitting are both representative and stable over time.
  • Test retest reliability ensures consistency of results as studies with high test-retest reliability will produce similar results every time it is administered.
  • Test-retest reliability is important because if a study is low in this, it shows that results may be due to factors other than the manipulation of the independent variable.
  • The test-retest itself is required, unlike the other method of estimating reliability that requires more than one form.
  • The sample of the items or stimulus situations is held constant, which would seem to minimize the possibility of measuring traits other than what is designed by the instrument.

 

4.0 Disadvantages of test-retest method:

  • Memory: When the interval between the first and second test is too short, respondents might remember what was on the first test and their answers on the second test could be affected by memory. As an example, subjects may learn something just from taking a test and thus will react differently on the second taking of the test (adapted from
  • Maturation: When the interval between the first and second test is too long, maturation Maturation refers to changes in the subject factors or respondents that occur overtime and cause a change from the initial measurements to the later measurements.

Let us assume that the researcher is assessing participants’ second language pronunciation abilities, administering the instrument two weeks later should produce similar results if is reliable. However, if there is a month or two between testing sessions, any training on pronunciation may create differences between the two sets of scores that would depress the reliability coefficient (adapted from Tavakoli, 2012).

  • Reactivity: Reactivity can affect test-retest method. Reactivity refers to the fact that sometimes the very process of measuring a phenomenon can induce change in the phenomenon itself. Thus, in measuring a person’s attitude at time 1, the person can be sensitized to the subject under investigations and demonstrate a change at time 2, which is due solely to the earlier measurement. As an example, if a person is interviewed about the likelihood of voting in an approaching election at time 1, the person might decide to vote (at time 2) and cast a ballot (at time 3) merely because he or she has been sensitized to the election. In this case, the test-retest correlation will be lower than it would be otherwise because of reactivity (adapted from Tavakoli, 2012).

7.0 Test-retest in Anthropological Research

 

The test-retest method of assessment reliability can be applied successfully in various anthropological researches i.e. both in qualitative and quantitative research. But, there is some limitation when applying it in qualitative research which is completely descriptive in nature, because it is difficult and challenging to quantify a descriptive qualitative research. Only those qualitative researches which can be quantified are eligible for test-retest reliability. For better understanding the concept of test and retest in anthropological queries, it is explained with examples.

 

Example 1: Test-retest reliability in Quantitative research.

Let us take the height for 10 students using standard instrument i.e. anthropometer. After taking the first test, a retest was conducted after 5 days, to check the reliability of the measurement and instrument by using test retest reliability coefficient correlation. Table no. 2 represent the test and retest score of the height of 10 students taken at two different times. Here, “X” represent the height of the students taken on 1st interval of time (test score) and “Y” represents the height of the students taken on 2nd interval of time (retest scores). “∑X” represent the sum of all the test scores and “∑Y” represent the sum of all the retest scores. “∑XY” represent the sum of all the product of test and retest scores. “X2” represent the product of each test scores and “Y2” represent the product of each retest scores. “∑X2” represent the sum of all the product of each test scores and “∑Y2” represent the sum of the product of each retest scores.

 

In order to calculate test-retest reliability correlation, Pearson correlation is best suitable, therefore, by putting the value of x2, y2, x, y and xy in the below Pearson correlation formula:

Where, N= Total numbers of pairs of test and retest scores.

X = test scores, Y = retest scores.

 

The resultant value of correlation coefficient i.e. r = 0.98, which is close to +1, suggesting that there exists an excellent positive correlation between test and retest scores.

 

Example 2: Test-retest method in Qualitative research

 

Qualitative research items are of considerable importance for the socio-cultural anthropology. Questionnaires with qualitative dichotomies when answers are either “right” or “wrong” and questionnaires with multiple response more than two options can be assigned numerical values and added up over a set of items to yield a total test score, then the reliability of the total scores should be studied by the technique of quantitative variables, but this does not change the fact that the original items were qualitative..

 

Let us take a questionnaire form for test and retest method. The response of the questionnaire is assigned different rating scores for each response which is necessary for calculating the test-retest reliability coefficient correlation. Let us assume that information for perceived stress scale is collected from respondent in two different occasions with a gap of 10-15 days using Cohen’s questionnaire (given in Box 1)

Table no 3 represents the test and retest scores of the questionnaire which is calculated by using different rating scale for each response. Here, “X” represent the test score of each questionnaire taken on 1st interval of time and “Y” represents the retest score of the questionnaire taken on 2nd interval of time. “∑X” represent the sum of all the test scores and “∑Y” represent the sum of all the retest scores. “∑XY” represent the sum of all the product of test and retest scores. “X2” represent the product of each test scores and “Y2” represent the product of each retest scores. “∑X2” represent the sum of all the product of each test scores twice and “∑Y2” represent the sum of the product of each retest scores.

The resultant value of the coefficient correlation i.e. r = 0.76, which is close to +1, suggesting that there exist an acceptable positive correlation between test and the retest scores.

From the above two examples, it was very clear that, the test-retest method for assessment of reliability can be applied in various anthropological works, whether it may in physical anthropology or social anthropology or prehistoric archaeology. The researchers or students should keep in mind the limitation of test or retest method while analyzing the reliability of the measurement or test, especially of descriptive nature.

Summary:

Reliability is an index that estimates dependability (consistency) of scores.

Test-retest method is the variation in measurements taken up by a single person or instrument on the same item, under the same conditions, and in a short period of time.

Good test-retest method signifies the internal validity of a test and ensures that the measurements obtained in one sitting are both representative and stable over time.

There are various advantages and some disadvantages of test-retest method for assessment of reliability. Disadvantages includes: memory capacity, maturation and reactivity of the participant. These three points should be kept in mind while doing test-retest reliability.

The applicability of test-retest reliability depends on two primary assumptions:

  •  The first assumption is that the participants’ true score is stable across the two testing occasions. That is, the researcher must be confident that respondents’ true scores do not change from the first measurement to the second measurement.
  •  The second assumption is that the error variance of the first test is equal to the error variance of the second test.

Test-retest reliability is estimated by using reliability co-efficient. The most commonly used reliability formula for assessment of test-retest reliability is Pearson’s coefficient correlation formula.

The test-retest method for assessment of reliability can be applied in various anthropological investigations.

you can view video on Test and Retest in Anthropological Research