28 Biostatistics

Dr. Jaspreet Kaur

Contents of the unit

Introduction
Definition-Biostatics
Definition-Epidemiology
Applications of Epidemiology
Basic Measurements in Epidemiology
Measures of Disease frequency
Measures of Mortality
Dynamics of Disease Transmission
Modes of Transmission
Errors in Epidemiological Research
Limitations of Statistics
Lets Sum-up
Self assessment
Suggested reading

Learning objectives

By the end of this unit, you will be able to-

Define biostatistics and its scope
Basic concepts in Epidemiology
Applications of epidemiology
Errors in epidemiological research
Explain the limitations of statistics

Introduction

Human population cannot be studied without the compilation of data, which is always in the form of numbers. For these numbers to bring meaning for a better understanding of the population dynamics some tool is required. This tool called as statistics, and it is the numerical representation of the facts and figures, enabling meaningful interpretation, and a validity with a scientific basis. The module would explain the meaning and scope of biostatistics, definition of epidemiology, scope and applications of epidemiology, disease transmission and its dynamics, and errors in epidemiological research. The last section would summarize the entire module and end up with a self check exercise.

1.1 Statistics and Biostatistics

The word statistics came from the Greek word ‘status’ meaning a political state, and thus statistics in earlier days was mainly used for administering various affairs related to the state. For example, budgeting, taxing, number of labor working in a factory, or considering the food requirements etc. Statistics are numerical statement of facts in any department of enquiry placed in relation to each other. – A.L. Bowley. However, the above definition is incomplete as it does not take into account all the other aspects of statistics like analysis and interpretation. The most scientific definition of statistics was given by Croxton and Cowden, who defined statistics as the science of collection, presentation, analysis and interpretation of numerical data from the logical analysis. The definition clearly shows the four stages:

Collection of data
Presentation of data
Analysis of data
Interpretation of data

Biostatistics, in simple terms means the branch dealing with the data relating to living organisms. It is the application of statistical methods to solving medical, biological, and public health problems. The other term that is synonymously used with biostatistics is biometry. Statistics is an applied discipline and though its root lie in mathematics, the branches touch all the subjects of nutrition, medicine, biology, and public health. The fundamental of statistics as well as biostatistics lies on the concept of variation, that no two things in this universe are alike. For example, variation in weight, height across populations and individuals, blood pressure of patients, pulse rate. It is also useful in clinical trials, where a scientists can see the effectiveness of certain drug over a particular parameter. Thus, biostatistics helps us:

Understand the relationship between certain disease and factors
Understand and enumerate the occurrence of various diseases
Understand the disease etiology

Statistics is descriptive when the numerical data collected is presented tabular and pictorial forms and expresses the nature of variable under study in terms of frequency, distribution, prediction equations etc. On the other hand, when inferences are drawn for the entire population or the lot based on the sample, it is called as inferential statistics. The data that is collected can be of two forms: quantitative and qualitative. Quantitative data includes the measurements like height, weight, blood pressure, pulse rate, and qualitative data includes data like disease severity, socio-economic class, presence or absence of a symptom or disease. The two types of data are subjected to various statistical methods that are specific to each kind. Methods to treat quantitative data are average, correlation, regression, test of significance, ‘t’ and ‘F’ statistics etc. whereas methods used to deal with qualitative data are chi-square, order statistics etc.

Some of the tests which are frequently used in statistical enquiry are shown in the below tables:

The data collected in a study is thus put to statistical analysis based on the aims and objectives, and presented in suitable format, for further interpretation. Interpretation is an important factor since the figures obtained are influenced by a number of factors that are correlated. After the preliminary evaluation of the results is done, one can further plan for detailed statistical analysis like ANOVA, test of significance, between different groups/populations.

1.2 Examples in biostatistics

Smoking causes cancer, but not everyone who smokes has cancer, and on the other hand, some people who never smoke have cancer. There is an element of uncertainty in this cause and effect element, which has to be appropriately addressed by studies investigating this effect.
In making designs for various health facilities, planners often need to consider the changes in the standards of care, demographic shifts, incorporation of new technologies.
In studying the effect/pharmacokinetics of a new drug or a medicine, one needs to examine closely the changes in the levels of drug in the serum/blood over a defined period of time, after giving a fixed dose. Now any kind of individual variability, theoretical prediction or measurement error can be harmful to the entire study and may not yield accurate results. 8

1.3 Epidemiology

“I keep six honest serving men; they taught me all I know. Their names are what, why, when, how, where and who.”

The word epidemiology is derived from the Greek words: epi “upon”, demos “people”, and logos “study”. It is defined as “the study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to control of health problems” (John M. Last, 1988). Hence, it is the study of the pattern of a particular disease in the population and the factors influencing this pattern. Epidemiologists are not only concerned with death or illness, but also with improving the overall health status, and means to improve health.

Thus, the four objectives of epidemiology are:

Descriptive study of the health-related state and its determinants in a given population. It is to describe the distribution and extent or magnitude of health and disease in a population (descriptive study).
Analyse the major determinants and trace the natural history of a disease, to understand how the disease spread and its course in individuals and groups (analytic study).
The three E’s- efficiency, efficacy, and effectiveness of the methods to prevent, cure and alleviate the disease (intervention study).
Evaluation of the process and outcome of the services related to the above purposes ( health services research).

1.4 Applications of Epidemiology

It is useful in describing the health of a particular population group, in terms of five major attributes which includes, measurement of the disease burden, trends in incidence and prevalence, studying any changes in the character of the disease, identifying special risk groups, and lastly to define the functional ability or the impairment.
Describing the natural history of the disease, i.e. the course and outcome of the disease in individuals and groups. This includes studying the complete aetiology of the disease its predisposing conditions and syndromes, to prognosis of the clinical progress of the disease.
Understand the determinants of the disease, which could be ranging from genetic and environmental to infectious. It also includes further distinguishing between association and determinants.
Controlling and preventing the spread of the disease is another application of epidemiology; wherein it helps in identifying and removing the primary agent or the so called determinant of the disease, protecting people from the agent, enhancing the host resistance (the so called protective mechanisms) and modifying the human behaviour in order to minimize the risk and promote healthy actions among them.
Planning and evaluating the various health services for achieving better health status of the population or group. The planning part includes estimating the needs and demand; locating major hazards/hindrances that could be avoided, and determine the supply of various resources.

2. Basic Measurements in Epidemiology

The fundamental practice often adopted in epidemiology is the measurement of health and disease or mortality and morbidity in human populations. There are various measurements that are frequently used by epidemiologists and these includes the measurement of mortality, morbidity, disability, natality, measurement of medical needs, health care facilities, utilization of health services, and other health-related events. The three basic tools of measurements are rates, ratios, and proportions.

Rate: it measures the occurrence of some particular event in a population during a specified period of time. It is indicative of change in an event in a population in a given period of time. It comprises of four elements: the numerator, the denominator, time specification and the multiplier. Time, here is usually a calendar year, and the rate is expressed per 1000 or any other round figure (10,000 or 100,000) which is selected to avoid fractions.

Rates can be of three types: Crude rates (which are actual observed rates, without any modification, like births and deaths); Specific rates (which are also actual observed rates, but due to specific causes or specific time period, like age-sex groups, annual, weekly); and standardized rates ( which are derived out of direct or indirect standardization or after adjustment, like age and sex standardized rates).

Ratio: expresses a relation in size between two random quantities. The numerator or denominator may not belong to same measuring unit, i.e. numerator is not a component of denominator. In simple words, a ratio is the result of the division of one quantity by another. It has no units.
Proportion: proportion expresses the relationship in magnitude of a part of the whole, i.e. the numerator here is always a part of the denominator. Its unit is percent.

2.1 Measures of disease frequency (Morbidity)

There are several measures of disease frequency that are based on the fundamental concept of incidence and prevalence. It must be kept in mind that while calculating these measures, correct estimation of the number of people under consideration should be carefully selected. For example, while measuring the maternal morbidity rate, only women who had undergone pregnancy and delivery will be considered, and not all the women; when studying rates/ratios related to sexual health, only that proportion of the entire population which is sexually active should be considered, and not children or people above 60 years of age. Thus, the part of the population which is susceptible to a disease is called as population at risk, and is defined by various demographic, geographic, economic or environmental factors.

Incidence Rate:

Incidence of a disease is defined as the number of new cases occurring in a specified period of time in a population at risk for developing the disease during that specified time. The incidence rate is calculated as:

Prevalence Rate:

Prevalence is defined as the number of affected/diseased persons in a population at a specified time by the number of total persons in the population at that specified time.

Relationship between Incidence and Prevalence

If the prevalence of a disease is low and does not vary significantly over time, it can be :

Prevalence= Incidence x Duration of Disease

However it must be noted that the relationship between incidence and prevalence is always dynamic. While some advances in medicine may actually increase the prevalence of a disease, for example, medicines for heart problems may help to control the effects of the disease, resulting in lesser number of fatalities or deaths, but indirectly more number of patients in a population are living with that disease and hence the prevalence went up. Thus, studying prevalence rates is helpful in designing policies related to the number of clinics based on the disease burden, the amount and type of rehab services, and the number of healthcare providers.

Difference between Incidence and Prevalence

2.2 Measures of Mortality

Measures of mortality are an indicator of the fatal consequences of the disease and reflect upon the efficiency of the health infrastructure in the community. The following are some of the frequently used mortality rates in epidemiology:

Here, it is important to note that in the denominator mid-year population is taken as the population is dynamic and keeps on changing with time.

Specific Rates

Specific mortality rates are obtained after putting a filter on the population (example: age, gender, ethnicity etc.).

We can calculate age specific mortality rates by keeping in the numerator the specific age group for which we want the rate. Example- annual mortality rate in children (under 10 years) or infant mortality rate, under 5 mortality rate etc. Disease specific or cause specific rates can similarly be calculated.

Case Fatality

Case fatality is used to calculate the death rate by particular disease. It calculates the percent of people diagnosed with having a certain disease that die in a specified period of time from that particular disease. Expressed in percentage, it is calculated as:

Clearly, the case fatality rate is the measure of the severity of the disease in that particular population. It is an indicative of the improvement and advancement in the treatment or therapy available for that particular disease. If the treatment is widely and efficiently available, then the number of deaths from that particular disease will be lowered, thus bringing down the case fatality rate.

Dynamics of Disease Transmission

Communicable human disease require certain pre-requisite to transmit, and these includes, the agent, source for agent, exit point from the agent, mode of transmission, entry point to the new agent, and a susceptible host. The source of infection is defined as “the person, animal, object, or substance from which an infectious agent passes or is disseminated to the host”. A reservoir is defined as “any person, animal, arthropod, plant, soil or substance (or combination of these) in which an infectious agent lives and multiplies, on which it depends for survival, and where it reproduces itself in such manner that it can be transmitted to a susceptible host”. It is to be noted here, that reservoir and source may not always be the same or synonymous. The interaction of the epidemiologic triad forms the basis of the disease transmission.

The reservoir may be of three types:

Human reservoir: man can be either a case or a carrier. A case is defined when a person in the population is identified as having a particular disease, health disorder, or condition which is under investigation. Whereas a carrier is defined as an infected person or animal that harbours a specific infectious agent in the absence of discernible clinical disease and serves as a potential source of infection for others.
Animal reservoir: sometimes the source of infection may be animals or birds, and they can either be carriers or cases. The diseases caused by vertebrates are called zoonoses (example rabies), while birds can also spread some fatal illness among humans.
Reservoir in non-living things: these include the soil, and inanimate things that are reservoirs of infection.

Diseases transmitted by direct contact include STD and AIDS, leprosy, skin and eye infections. Direct transmission may further be by droplet infection wherein the droplets of saliva,, and nasal secretions during coughing and sneezing, spitting or while speaking infect the other person. Examples include pulmonary TB, common cold, whooping cough, diphtheria etc. The third mode of direct transmission may be from contact with soil. Further transmission may be transplacental (example AIDS), inoculation into skin or mucosa (rabies via dog bite).

Indirect transmission includes a variety of mechanisms including the traditional five F’s: flies, fingers, fomites, food, and fluid.

Errors in Epidemiology

Epidemiologists rely a great deal on the measurements, and one of the objective of epidemiology is to provide accurate measures of disease occurrence and its outcome. There may be many possibilities of errors in measurements, which must be recognized in order to try for its best possible elimination. These errors may be random or systematic.

Random errors occur when the value of the sample deviates from that of the true population value due to chance alone. There are three major sources of random errors:

Individual biological variation
Sampling error
Measurement error

Random errors can never be completely eliminated because we study only a sample of the population rather than the population itself. This error can be reduced only by increasing the sample size. Other ways of reducing this error is by minimizing on the individual measurement, which can be achieved by following strict protocols. It should be made mandatory for every investigator to undergo a stipulated training period before actually taking measurements. Further, the laboratories should have proper documentation instruments and they should be able to measure the accuracy and precision of their instruments. By increasing the sample size of the study group/population we thereby increase the statistical power, to further detect the differences. Sample size should be carefully worked out based on fixed statistical formulas, which again vary as per the requirements of the study. In practice, sample size is often affected by the logistic and financial constraints.

Systematic errors occur when results differ in a systematic manner from the true values. Unlike the random errors, systematic errors are not affected by the sample size, and a study with small systematic error is said to have high accuracy. There are more than 30 types of systematic errors that have been identified in epidemiology, but the principal ones are: selection bias, and measurement (or classification) bias. The former occurs when there is a systematic error in the characteristics of the sample selected and the people who are not a part of the study. A selection bias occurs when there is unavailability of the people for the study of a particular disease or factor. Such examples are frequently encountered in studying the effect of occupation on the health of individuals. In such cases, employees who are healthy enough to perform their duties should be selected in the sample, but this usually does not happen leading to selection bias. Individuals who are already ill or cannot perform their duties effectively introduce a selection bias to the study. On the other hand, measurement bias occurs due to personal negligence. It occurs when an individual does not measure accurately what he is supposed to measure. Here, the varying perceptions and understanding may also affect the way in which measurements are taken. This error may occur at the level of the investigator or at the respondent level. For example, in retrospective case control studies is the recall bias, wherein an individual may find it difficult to recall some past exposure, thus affecting the results of the entire study. Such biases can result in over-reporting as well as under-reporting of cases. Further, if the investigator is already aware of the exposure status of the individuals in the study, it may lead to observer bias. Such bias is removed by conducting a blind or double blind study where the investigator do not know how the participants are classified. A double blind study means wherein neither the investigator nor the participant knows how the latter are classified.

Limitations of Statistics/Biostatistics

Besides its wide use in a variety of subjects and fields, as well as public health, statistics and biostatistics has certain limitations, which are:

Statistics is applicable only to an aggregation of data, rather than individual data. It can help interpret results on a large, group/population level rather than giving us the results for an individual. For example, it can give the result for income of a group or a population, but not the per capita income of the individual.
It describes numerical information, of the numerical data. It cannot give us an estimate of the level of honesty, hatred, love, intelligence or beauty, as no statistical test can be directly applied to measure these variables. However, statistics can be applied to these variables, only after converting them into measurable quantitative traits.
Statistical laws are ever changing, and are statistical conclusions are not universally true.
Statistics, if used by amateurs can prove to be dangerous, as it may lead to wrong conclusions. Thus, proper and scientific knowledge of statistics is of utmost importance before one can use it to interpret large quantity data. As King says aptly, ‘statistics is like clay of which one can make a God or Devil, as one pleases’.
Statistics does not provide complete solution to the problems, which may be beyond the realm of its field of enquiry. For example, a problem with statistical solution may be affected by the non-quantifiable characteristics, such as culture, economy, religious ideals of the country/individual.

Lets Sum-up

Biostatistics is the application of statistics to solution of medical, biological, and public health problems, synonymously used with biometry.
Epidemiology is the study of disease pattern in a population and what affects or determines this pattern.
There are three aspects of morbidity commonly used by epidemiologists, i.e. rates, ratio and proportion. Disease frequency is measured by prevalence and incidence, which are used to describe the magnitude of a disease in a given population at a particular point of time.
Epidemiology is readily applied in describing the health of a population; describing its natural history, finding the determinants of the disease, control and prevention of disease, and planning and evaluating the existing health services.
Communicable human disease require certain pre-requisite to transmit, and these includes, the agent, source for agent, exit point from the agent, mode of transmission, entry point to the new agent, and a susceptible host.
Communicable disease can be transmitted either directly (human to human) or indirectly (via vector or vehicle).
There can be two types of errors in epidemiology, random and systematic. Random errors are caused by individual biological variation, sampling error, and measurement error.
Systematic errors are caused by selection and measurement bias.
Besides its wide use in a variety of subjects and fields, as well as public health, statistics and biostatistics has certain limitations. It is applicable only to an aggregation of data, describes numerical information, statistical laws are ever changing, it does not provide complete solution to the problems.

you can view video on Biostatistics