6 Bradford Distributions: An Overview

B K Sen

epgp books

 

 

 

 

 

I. Objectives

 

After going through this case study you will come to know about the following:

 

•      To study and understand Derivation of equations for Bradford distribution by various bibliometricians.

•      To discuss viewpoints of some bibliometricians on the law.

•      To study the Ambiguity between verbal and graphical representations of Bradford distribution.

•      To discuss Bradford-Zipf distribution.

•      To study the Characteristics of bibliometric distribution, etc.

 

II. Learning Outcome

 

After completion of this module, you will be certainly knowledgeable with regard to Bradford distribution and related work. At the end of this module, you gained knowledge on various aspects of Bradford’s law — Bradford-Zipf distribution, ambiguity between verbal and graphical interpretation of Bradford’s law, Leimkulher distribution; computational aspects of baradford’s law.

 

III. Module Structure

 

1 Introduction

2.  Cole’s Formulation

3.  Leimkuhler’s Formulation

4.  Brookes Formulation

5.  Naranan’s Viewpoint

6.  Bookstein’s Viewpoint

7.  Bradford Multiplier

8.  Ambiguity between Verbal and Graphical Statements

9.  Bradford-Zipf Distribution

10.  Characteristics of Bibliometric Distribution

11.  References

 

1. Introduction

 

The law of scattering was propounded by Samuel Clement Bradford (1878- 1948), a British librarian, mathematician and document artists at the Science Museum in London after a laborious study of scientific literature in mid-1930s. After examining the distribution of scientific literature in periodicals and their coverage in abstracting and indexing periodicals he realized that the distribution of literature follow a particular pattern [1, 2]. He opined that ‘the nucleus of periodicals devoted to the given subject must contain, individually, more articles on that subject than periodicals dealing with related subjects’ [3]. ‘In consequence, it is possible to arrange periodicals in zones of decreasing productivity, in regard to papers on a given subject, and the numbers of periodicals in each zone will increase as their productivity decreases’ [3]. He described a scattering pattern of journals in the area of applied geophysics and lubrication. He plotted the partial sums of references against the natural logarithm of the partial sum of numbers of journals, and he noticed that the resulting graph is a straight line. On the basis of this observation, he suggested the following linear relation to describe a scattering phenomenon [2]

F(x) = a + b log x.

 

F(x) is the cumulative number of references contained in the first x most productive journal; a and b are constants. The following figure is a hypothetical, but typical, log-linear curve (as described by Bradford) showing aggregates of articles on a given subject corresponding to the number of journals.

 

This type of a curve is usually called a Bradford curve; In X-axis: Partial sum of Journals (in log scale). In Y-axis: Partial sum of articles contained in X top most journals (in linear scale) P1 in the figure is the point at which the straight line part of the curve begins. Draw Y1P1, Y2P2, and Y3P3 such that they are parallel to the X-axis and OY1 = Y1 Y2 = Y2 Y3. Draw P1X 1, P2X 2, and P3 X3 such that they are parallel to Y-axis. Since P1P3 is a straight line and since Y 1Y2 = Y 2Y 3, X1X2 and X2X3 are equal, say r units. Let the distance between O and X is s units. Thus, if α, β and γ are the positive real numbers corresponding respectively to the logarithmic abscissa OX1, OX2 and OX3, we have, log α = s, log β = s+r, and log γ = s+2r.

That is, α = 10s, β = 10r+s = 10s.10r , and γ = 10s+2r = 10s.102r

 

Substituting n = 10r, we see that the natural numbers α, β, and γ are related to each other as 1:n:n2. On the basis of this relationship and also since OX1 represents a number of periodicals in a subject area Bradford stated his law as

 

“Bradford stated that “If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to subject, they and several groups of zones containing the same number of articles as the nucleus, when the zones will be 1:n:n2 ….”

 

This is called Law of Scattering or Bradford’s law.

 

Since then a great deal of work has been done with this law [17,9,10, 13]. An attempt is made in this module to highlight some of those works.

 

2. Cole’s Formulation

 

Cole [8] also experimented with the law and named the slope of the curve as the reference scattering co -efficient and concluded that the coefficient might be the characteristic of the subject field. For petroleum literature, Cole obtained the relationship

 

F(x) = 1+ b log10 x (x > c)

 

where F(x) stands for the cumulative number of papers contained in the x number of most productive journals, and c represents the number of journals figuring in the nuclear zone. For petroleum literature Cole found the value of b as 0.43.

 

3. Leimkuhler’s Formulation

 

Ferdinand F. Leimkuhler [13]Professor of Industrial Engineering of the Purdue University of the United States analyzed all the published data on Bradford distribution and derived the following equation applying statistical techniques.

 

ln(1+βx)

F(x) =—————–, (0≤ x ≤ 1)

ln (1 + β)

 

In the equation, F(x) stands for the cumulative fraction of the references, x for the corresponding fraction of the most productive journals, and β is a constant related to the document collection. Brookes [5] opined that the equation is but a compromise since Leimkuhler accepted the empirical data as both complete and exact. Brookes further observed that “Unfortunately, though Leimkuhler’s formulation can be used theoretically without difficulty, it has some disadvantages for the practical document a list. The numerical evaluation of the key parameter β requires tedious statistical computation and the solving of an implicit equation by approximation methods….In fact it was the exasperation evoked by an attempted practical application of Leimkuhler’s formulae that led the author of this paper to seek a simpler formulation of the Bradford distribution” [5: p249]

 

4. Brookes Formulation

 

Brookes [5, 6] formulation of the Bradford distribution follows. Suppose R(n) is the cumulative total of relevant papers found in the first n journals when all the journals are ranked in order of decreasing productivity, the Bradford’s law requires that

 

The only function that fully satisfies this condition is

                                                             R(n) = k log n, where k is a constant      Eq.4

Brookes has provided another model

The values we get are a = 188, and b = .382

With the values of a and b we can now determine the value of the cumulative total of references for the periodical of any rank.

 

For testing, let us take 45th rank.

We have                             R(n) = anb

Putting the values , we get R(n) = 188 x 45.382

=  188 x 4.276

=  803.888

=804, which is quite close to the observed value of 802.

 

Bradford considered the Bibliograph to be a straight line which has resulted in two different formulations, one is verbal and the other is graphical. The algebraic expressions for the two

 

formulations given by Brookes are:

R(n) = j log (n/t + 1) for the verbal formulation, and

R(n) = k log n/s for the graphical formulation

 

5. Naranan’s Viewpoint

 

In 1970 Naranan [14] opined that (i) Bradford’s law of bibliography of scientific literature is explainable in terms of an underlying power law distribution of the number of articles in scientific journals; (ii) the law emerges as a natural consequence of exponential growth of scientific literature and journals at comparable rates; and (iii) a model like this predicts a strong correlation between the age of a journal and the number of articles it carries. The author was hopeful that the proposed mechanism might find wider application in many other fields of science.

 

Brookes [7] pointed out that Naranan’s analysis was not valid for Bradford. However, his paper provided a plausible model of Lotka’s law with suitable verbal amendments. The comments of Brookes as to the paper are being reproduced verbatim. “The inverse square law of scientific authorship has hitherto been regarded as an inexplicable and useless scientific oddity. Naranan’s model of it is therefore welcome. And, together with other measures of scientific productivity, Lotka’s law has recently been applied by Dobrov and Korennoi in determining the optimum size of research institutes in USSR”.

 

Hubert [ 11] was of the view that Naranan interpretation of the original form of Bradford’s law does not follow a stochastic argument based on his assumptions.

 

6. Bookstein’s Viewpoint

 

In his paper published in 1976, Bookstein [4] analyzed the distributions of Lotka, Zipf, Bradford and Leimkuhler and adopted a point of view that allows us to understand that these distributions are in fact the different versions of a single theoretic distribution. He generalized these distributions with the following words. “All of these distributions are almost equivalent . . . In each case we have a set of entities (for example, chemists, words) producing events (publications, occurrences) over some dimension of extension (time, length of text) and in each case the distribution describes the number of occurrences of events over a fixed interval of that dimension. Under these conditions it is possible to describe the same distribution in at least four distinct ways; these modes of description are represented above by the distributions of Lotka, Zipf, Bradford, and Leimkuhler”.

 

Physicists all over the world have tried to unify four natural forces, i.e. electromagnetic force, gravitational force, weak nuclear force, and strong nuclear force for the last hundred years or so. Bookstein has done the same thing for bibliometric distributions. He has shown that basically all the four bibliometric distributions are different versions of the same distribution.

 

7. Bradford Multiplier

 

The number of periodicals in the three zones of Bradford distribution generally follows the ratio 1:n:n2, where n is the Bradford multiplier. Ravichandra Rao [16] analyzed the Bradford multiplier with a small sample of 12 datasets using t test. An attempt has also been made to identify a suitable model to explain the law of scattering. Among the various methods tried log normal fits much better than many models including the log linear model.

 

8. Ambiguity between Verbal and Graphical Statements

 

Bradford’s law can be looked into two different ways — graphically and verbally. This was first observed by Vickery [18].Bradford’s verbal formulation of the law is recorded as “If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the number of periodicals in the nucleus and succeeding zones will be as 1: n: n2”.[2: p. 154].In 1948, Brian C. Vickery [18] contributed an important paper on Bradford’s law. He analyzed about 1600 journal references and compared his results with Bradford’s and found an inconsistency. He remarked – “We can … regard the theoretical distribution of papers on a given subject in scientific periodicals as derived by Bradford, as fully corroborated by the distributions observed in the sample investigations. The rectilinear relation . . . incorrectly assumed by Bradford to be identical with his theoretically derived relation, fits only the upper portion of the observed curve (Figs. 2 and 3). The theoretical relation itself , however, enables us to predict the whole curve”.Vickery showed that if n mjournals contribute a cumulative m papers, and nm is greater than the nucleus, then the verbal formulation is equivalent to the expression nm : n 2m – nm : n3m – n2m: . . . :: 1: am: am2: . . . The graphical formulation is equivalent to the expression nm : n2m : n3m:  . . . :: 1: bm: bm2: . . . This apart, when the graph is plotted with the data of verbal formulation it takes a different shape compared to the shape of the graph with complete data set. In the verbal expression, the data in Table 1 will take the following shape and generate a curve as given in Fig. 2.

Table 2 – Distribution of Articles according to Zones

 

 

Comparing the two bibliographs we find the following:

 

i. In the verbal formulation, the entire data is not available. What we get is practically a summary of the entire data set.

ii. The Bibliograph in Fig 2 is incomplete, inasmuch as it does not indicate the starting point of the curve.

iii. The last portion of the graph in both Figs. 2 and 3 is a straight line.

iv. In the graphical presentation of some Bradford distributions, a droop is observed at the end of the graph, which is not seen with the data of verbal formulation.

 

With these, the distinction between verbal formulation and the graphical presentation becomes quite clear and the shortcomings of the verbal formulation apparent. Brookes have provided equations both for verbal formulation as well as the graphical formulation. The equations are given under Brookes’ formulation.

 

9. Bradford-Zipf Distribution

 

Kendall [12], a statistician by profession, also studied Bradford distribution using 1,763 references on operational research pertaining to 370 journals. For the sake of comparison ‘1465 references to statistical methodology (covering the period 1925-39)’ were used. The graph plotted following Bradford’s method produced a curve which was remarkable for its linearity. He also noticed that the Law is similar to, but not identical with the Zipf’s law. Let us consider the data given in Table 4. The data set provides the typical Bradford distribution.

Rank Periodical/s No. of article/s No. of total Cumulative
1 1 20 20
2 1 14 34
3 1 12 46
4 1 11 57
5 1 10 67
6 1 9 76
9 3 8 100
10 1 7 107
12 2 6 119
14 2 5 129
15 1 4 133
25 10 3 163
40 15 2 193
84 44 1 237

Table 4– A data set following Bradford distribution.

Inverting the columns 1 and 3 of Table 4 and multiplying the numbers of each row we get the following result (Table 5).The number in the second column may be considered as frequency.
 Rank i.e. No. of article/s, Frequency Rank x Frequency
84 1 84
40 2 80
25 3 75
15 4 60
14 5 70
12 6 72
10 7 70
9 8 72
6 9 54
5 10 50
4 11 44
3 12 36
2 14 28
1 20 20

Table 5 – Partly inverted form of Table 4

The figure in the third column clearly indicates that they by and large follow Zipf’s law. The two distributions are in fact very close, hence they are often referred to as Bradford-Zipf distribution. The linearity of the Bradford Bibliograph indicates a true Zipf situation.
10.  Characteristics of Bibliometric Distribution
  • Bibliometric distributions can generally be expressed through algebraic expressions.
  • On graphical presentation, they form different types of curves.
  • All these distributions have given rise to well-established laws which have found applications in journal selection, ranking of authors, ranking of words for keyword generation, and so on.
  • The classical laws of Bibliometrics generally follow power law distribution.
  • All these laws are basically different versions of a single bibliometric distribution.
you can view video on Bradford Distributions: An Overview

11.  References

  •  Bradford, Samuel Clement. Wikipedia<en.wikipedia.org>. Web. 1.5.2013.
  • Bradford, S. C. (1934) Sources of information on specific subjects. Engineering, 26: 85-86.
  • Bradford, S.C. (1953) The documentary chaos. In Bradford S C. Documentation. Crossby Lockwood. London. Ch. IX, p. 144-59.
  • Bookstein, Abraham. “Bibliographic distribution”. Library Quarterly 46(1976): 416-23.
  • Brookes, B. C. “The derivation and application of the Bradford-Zipf distribution”. Journal of Documentation 24 no.4 (1968): 247-265.
  • Brookes, B. C. “Bradford’s law and the bibliography of science”. Nature 224, Dec 6 (1969): 956
  • Brookes, B. C. Correspondence. “Scientific bibliography”. Nature 227 Sept. 26 (1970):1377
  • Cole P F.“A new look at reference scattering”. Journal of Documentation 18 no.2 (1962): 58
  • Egghe, Leo and Rousseau, Ronald . Introduction to Informetrics : Quantitative Methods in Library, Documentation and Information Science. Amsterdam : Elsevier Science Publishers, 1990.
  • Hertzel, Dorothy H. “Bibliometrics, history of the development of ideas in statistical bibliography, or bibliometrics”. Encyclopedia of Library and Information Science 42(1987): 144-219.
  • Hubert, John J. “On the naranan interpretation of Bradford’s law”. Journal of the American Society of Information Science 27 no. 5 (1976): 339-3.
  • Kendall M G. “The bibliography of operations research”. Operations Research Quarterly 2(1960): 31-6.
  • Leimkuhler F F. “The Bradford distribution”. Journal of Documentation 23 no. 3 (1967): 197-207
  • Naranan, S. “Bradford’s law of bibliography of science: an interpretation”. Nature 227, Aug. 8(1970): Power law. Wikipedia <https://en.wikipedia.org/wiki/Power_law>’
  • Ravichandra Rao, I. K. An analysis of Bradford multipliers and a model to explain the law of scattering. Scientometrics 41 no.1 (1998 ), 93-100.
  • Ravichandra Rao, I. K . Quantitative Methods for Library and Information Science. New Delhi: Books in my Basket, 2003. Xii, 271.
  • Vickery, B. C. “Bradford’s law of scattering”. Journal of Documentation 4(1948): 198-202.