The metaphor of the knowledge life cycle probably has several independent points of origin within various disciplines. At least two

lines of development can be identified, one based on the Kuhnian concept of a paradigm and the other on statistical bibliography or bibliometrics. There is substantial literature on each of these concepts; the next three sections of this article focus on only a few key contributions as background for introducing an alternative approach to conceptualizing the growth of knowledge.

The Paradigm Concept

Citing earlier work on the diffusion of innovations, Crane (1972) stresses the role of scientific communities and the processes of social interaction and influence that underlie the exponential growth of science (pp. 22-40, 172). She identifies four stages of a growth curve. In stage one, new discoveries provide models or paradigms for future work, new scientists are attracted to the area, but little or no social organization has emerged. Stage two is characterized by Kuhnian normal science" and by substantial interaction among collaborators that leads to exponential growth of membership and publications; an "invisible college" develops. In stage three, major problems addressed within the paradigm are solved, but, in the course of time, anomalies appear. Concomitantly, the researchers become increasingly specialized and increasingly engaged in controversy. In stage four, the major problem-solving activity that sustains the group has become exhausted, a crisis develops, and membership declines. As a result of the crisis, new paradigms and a new cycle of growth may be initiated. The dynamics of the life cycle derive from interaction between cognitive and social factors. Other authors have proposed variations of the stages identified by Crane (Mulkay et al., 1975; Mey, 1982, chap. 9).

An attempt to formalize the idea of a Kuhnian life cycle is presented by Sterman (1985). He develops a dynamic computer model that simulates the kind of behavior that Kuhn has described, tests its internal consistency, and explores the sensitivity of his model to various parameters. Sterman identifies the four stages of a life cycle as: "emergence, normal science, crisis, and revolution" (p. 96). The lifetime growth curve of the paradigm is measured in terms of the fraction of the practitioners that are committed to it, a fraction that depends directly on a variable called "confidence" in the paradigm, which in turn depends on, among other things, the success of the paradigm in solving the "puzzles" of normal science. Sterman introduces the important assumption that the easiest problems are solved first, so that what remains becomes increasingly difficult in the course of time. After an initial period of rapid growth, the community eventually exhausts the supply of easily solved puzzles and engages problems that give rise to possibly profound difficulties or anomalies that lead to a crisis stage. Some anomalies may not be resolved, confidence in the paradigm diminishes, defections occur, and the population declines. Sterman notes that "the entire process may take a few years or a few centuries" (p. 96).

The power of a scientific model can best be understood in terms not of what empirical phenomena or data it seems to account for but of what it forbids. Sterman does not make clear what kind of plausible empirical data concerning population growth in a scientific community would be ruled out by his model. Almost any growth curve, one would assume, either keeps rising or eventually levels off or it levels off then declines. A suitable adjustment of numeric parameters in the model would seem to be capable of accounting for any of these possibilities and on any time scale from months to centuries. No attempt is made to test the model against empirical data. Sterman's stated objective is to test only the internal consistency of Kuhnian science.

Wittenberg (1992) has recently offered a critique and analysis of Sterman's model and has extended it by taking into account more systematically the role of competing paradigms. The Wittenberg critique, and the ensuing published discussion of it, illuminates the limitations and significance of the model. Of particular interest is the criticism of one author that the model does not at all represent the process of paradigm change during revolutionary science, a period characterized by "sudden, idiosyncratic, discrete, and exogenous processes" altogether different from normal science (Barlas, 1992, p. 45). Additional commentary on paradigms and normal science by Watkins, Toulmin, Popper, Masterman, and others is pertinent to any attempt to model such phenomena (Lakatos & Musgrave, 1970).

Linking the Paradigm Concept to Bibliometrics

In a frequently cited paper, Small (1980) has proposed an operational definition of a paradigm based on cocitation context analysis. The context in which any given citation occurs is presumed to serve as a concept marker. Small had earlier shown, for highly cited chemistry documents, that there is a strong consensus concerning such context among authors who cite the same article. The cociting of two documents (in a restricted enough context) can be interpreted as a linkage of the two corresponding concepts. Thus, through analysis of cocitation clusters and concept combinations, one can build up representations of consensual knowledge structures. Although Small identifies such structures with paradigms, his proposal, by attempting to bridge the gap between citation structures and knowledge structures, merits a life cycle of its own independently of Kuhnian science. Among other things, Small's approach would seem to offer prospects for illuminating the relationship between normal science and revolutionary science. Although he does not explicitly discuss knowledge life cycles, his idea clearly implies a central role for cocitation patterns in any such investigation. To focus on cocitations suggests the possibility of knowledge rejuvenation through combining previously independent specialties that, taken alone, might be in a state of decline.

A second paper that links paradigms to citation patterns reports that the average age of references cited in archaeological papers is less than the average age of references in humanistic papers on the same topic--namely, the Dead Sea Scrolls (Heisey, 1988). The author interprets such data as supporting the idea that archaeologists work within a single paradigm, solving and assimilating puzzles progressively in a way that requires citing only recent material, whereas, for humanists, the absence of such paradigmatic focusing leads to citing a higher percentage of older sources. Perhaps so, but the same data would also seem to support alternative interpretations--e.g., that scientists (in contrast to humanists) tend to disregard the literature, are not acquainted with most of it, and cite only what they have recently heard about at meetings and from colleagues.

The Bibliometric Concept of a Knowledge Life Cycle

In 1935, Wilson and Fred, in the departments of agricultural bacteriology and agricultural chemistry at the University of Wisconsin, conducted what they termed a study of the biological properties (growth and development) of the literature on nitrogen fixation by plants. They stressed the importance of choosing a large but fairly definite problem and plotted a growth curve based on a previously compiled bibliography of over 2,000 references that spanned a seventy year period. They discussed the nature of the research performed in three major periods that they termed, respectively, "the embryonic period," "the period of development," and "approaching maturity." The third period extended to the year of their study so no subsequent phases were identified. They compared their empirical data with a theoretical logistic curve derived in 1931 by Hiroshi Tamiya, who similarly investigated a literature growth curve for Aspergillus. Although a few empirical studies in statistical bibliography preceded this work, Tamiya may have been the first to propose a theory that could account for exponential growth. He suggested that scientific publication not only attracted a readership but attracted also other investigators who thus were stimulated to make similar further contributions. He hypothesized, therefore, that the growth rate of publications was proportional to the total number of prior publications. Introducing a factor that limited growth rate then resulted in a logistic curve. A modification of Tamiya's idea will be explored in the second part of this article.

Important issues of literature use and obsolescence studies were identified in a classic paper by Line and Sandison (1974). They distinguished at the outset the obsolescence of information from the obsolescence of documents containing the information. Information that represents knowledge can become obsolete by becoming incorporated, superseded, or invalidated by later information, or simply by failing to command continued interest. Moreover, obsolescence is neither inevitable nor irreversible. These authors concluded that generalizations concerning the decline of either usage or citations with age are not of much practical use for the purpose of weeding out obsolete materials in individual libraries owing especially to the very high individual variances for different journals and different types of material. Average values thus are not good predictors for individual cases.

Brookes (1975), commenting on the above work, stressed important distinctions between the scientific study of citation patterns and the study of usage in individual libraries; the latter type of study, he argued, is not subject to generalization.

According to the Science Citation Index (1986, p. 28), an "average" paper is cited most heavily in the first few years after publication and receives 80 percent of its eventual total citations in the first thirteen years. Numerous studies have shown that citations to any given article tend to diminish exponentially with time. Cano and Lind (1991) briefly described a dozen or so diachronic (longitudinal) studies in which various citation lifetimes were reported that ranged from one to three decades. But there are many exceptions to this prevailing tendency. These same authors found that citations to one "citation classic," among ten that were examined, showed sustained growth throughout the twenty-five year period studied. Garfield has collected a number of instances of "delayed recognition"; the time scales of delay prior to being cited varied from about ten to forty years (1981, p. 488; 1989, pp. 154, 264). Line (1974, 1984) has shown that citations to frequently cited physics articles decline much more slowly with time than do citations to less frequently cited articles. Thus, to take the average age of a set of articles as a predictor of obsolescence might be most seriously in error for the more important articles.

The Growth and Fragmentation of Knowledge

Defining Knowledge": Subjective Versus Objective

"Knowledge" is often inextricably associated with the shared intellectual activities, understandings, beliefs, interests, and commitments of a community. It is a subjective and sociological concept of what people know." The so-called knowledge life cycle in that light can be described in terms of the growth and decline of shared interest in certain techniques, puzzles, and problems.

"What people know" is a common understanding of what is meant by knowledge," but a second sense of the word, a meaning associated with the products of human intellect as in the phrase recorded knowledge," may be of greater significance for bibliometrics, scientometrics, information science, and librarianship. Librarians who try to build research collections for future scholars, for example, must decide what recorded knowledge is worth preserving irrespective of the waxing and waning of interests, fashions, fads, and finances--and thus irrespective of who knows what or when they know it. In the rest of this article, knowledge," unless otherwise indicated, will refer to the products of human intellectual activity rather than to the contents of the human mind. Such products are not identical to the physical media but rather are abstractions. The abstract world of human created "objective" knowledge underlies an evolutionary epistemology as described by Campbell (1974), Popper (1972), and others. There are elements of such knowledge that do not necessarily have any explicit physical representation--e.g., the implicit logical relationship between two pieces of recorded information. Moreover, the total number of such relationships in all of science might greatly exceed the total number of pieces.

A Model of Per Capita Literature Growth

Isolated populations of living organisms grow because organisms beget organisms. Under ordinary and favorable circumstances, the number of offspring produced each year tends to be some fixed percentage of the parent population, a relationship that leads to an exponential growth curve for the population. Such growth begins slowly and continues to the point of seeming explosive; eventually new and limiting factors supervene and the population may stabilize or decline. The scientific community also can undergo extended periods of fixed percentage exponential growth by "breeding" new doctorates.

Information, on the other hand, does not beget information. At least for recorded textual scholarly or scientific knowledge (i.e., putting aside photography and similar raw data that are direct products of scientific instrumentation), information is created only by people. The average rate per person at which scholarly or scientific articles can be produced probably does not undergo substantial changes in the course of time. In such a context, to speak of exponential growth or its equivalent invites misunderstanding unless such growth is linked to the growth of the population that produces the information. An information explosion can derive only from a population explosion. Moreover, a population of researchers cannot grow explosively without a corresponding growth of the resources needed in order to conduct its research. If both information and the resources needed to produce it are growing at about the same rate, the ratio of one to the other remains almost constant--in which case it is not obvious just what an information explosion" could mean. If libraries and other mechanisms for coping with information are not getting their proper share of research-supporting resources (as indeed may be the case), that is more a problem of allocation than inundation. It seems worthwhile to try to understand exactly what kind of information, if any, can possibly grow increasingly, unremittingly, and necessarily faster than its producing community.

One obvious approach to the question is to contrast the production of new information per year with the total accumulation. New rather than total information would seem to bear a constant ratio to the producer population and its resources. If, for example, a fixed population of 100 scientists produces 300 articles per year, the ratio of the number of articles to the number of people after one year is three to one. But after 10 years, the ratio of total literature to people is thirty to one and after 100 years, three hundred to one. This illustration is based on simple linear (nonexplosive) growth of literature within a nongrowing population of producers. How much larger are these same ratios for an exponentially growing community and its literature? The answer may at first seem surprising.

Suppose the community in question is growing at the rate of 3 percent per year. In the first year, 300 articles are produced by 100 people and in the second year 309 articles come from 103 people, and so on. In the tenth year, 130 people produce 390 articles. The cumulative number of articles produced in the first ten years is 3,432. At that point, the ratio of literature to people is about twenty-six to one, and hence is smaller than the fixed population or linear case of thirty to one. This difference becomes more pronounced at the 100 year point when the two ratios in question are 95 and 300 for the exponential and linear cases respectively. After 100 years, the total literature to people ratio for the 3 percent growth case hardly changes at all and never exceeds 100. The corresponding ratio for the fixed population case, on the other hand, continues to grow at a constant rate. In short, a fixed population producing literature at a constant rate leads to unremitting (but linear or nonexplosive) growth of the literature-to-people ratio, while the case usually described as an information explosion entails little or no change with time in that ratio.

To examine this idea in greater depth, consider, again earlier, a growing scientific community in which each person produces a fixed number of articles per year. But now, instead of taking exponential growth of the community as given, assume (following Tamiya's suggestion) that its growth can be attributed to the power of its literature to attract new authors. The rate of production of new articles per year must be proportional to the size of the community, and we now take the rate of growth of the community as itself proportional to the number of articles that have been published during the most recent k years. The two ideas together lead to the following pair of differential equations: (a) dL/dt = nS b) dS/dt = a [L(t)-L(t-k)] (1a,b) where S = S(t) = size of community (number of people) as a function

of time n = the number of articles per year produced by each person (n

= 3, for example) and L = L(t) = total (cumulative) amount of literature produced

up to time t (in years). a = a constant that represents the ability of the literature to attract

new scientists. Let [S.sub.o] = the initial population of the community = 100, for example) g = the growth rate, assumed fixed and sustained at g = 0.03 or

3 percent per year.

The solution to the pair of equations (1) is given by the following exponential growth equations for the scientific community and its literature: (2) S = [S.sub.o][e.sup.gt] = 100[e.sup.0.03t] (3) [Mathematical Expression Omitted] It is notable that these two equations do not depend on the value of k. Next let: K(t) = L(t) - L(t-k), and, noting that, for t[greater than or equal to]k: (4) [Mathematical Expression Omitted] It follows from (1b) and (4) that: dS/dt = aK(t) = gS(t), provided a = [g.sup.2]/n(1-[e.sub.-gk]

In the above model, the scientific community grows because people are attracted to its published products (equation 1b). The last result above shows that, instead of equation 1b, an equivalent and perhaps more common assumption can be used-that the rate of growth of the community is proportional to the total number of its members, a relationship that may have explanations other than the one based on publications. Such resilience of exponential growth to the underlying explanatory account of it would appear to be a special case of a more general resilience property of the exponential and other informetric distributions investigated in depth by Bookstein (1990, p. 377).

Of particular interest now is the ratio of the growth rate of the literature to that of the community. After the first eighty or so years [e.sup.gt] >> 1), that ratio is essentially constant, as shown in the following equation 5.

Equation 4 shows that the current literature (k most recent years) grows at the same exponential rate, g, as the total literature, and with the shape of the growth curve independent of k (Bookstein, 1990, p. 377; Price, 1963). Pro-rating the current literature on a per person basis gets rid of the time dependence and so yields a constant value: (6) K/S = n/g [1-[e.sup.-gk]] = 59.3 where k has been set equal to 30 years.

The proportion of literature older than k years is given by: (7) V(t) = 1 K/L =1- [1-[e.sup.-gk]]/[1.[e.sup.gt]

Thus V(t) is zero at t = k = 30 years, increases to 38 percent at 100 years, and never exceeds 41 percent; thus the per capita number of old articles in the long run does not increase.

A Connection Explosion

The argument that the ratio of literature to people is an important and relevant parameter merits further discussion. It seems to suggest that a division of labor somehow can be used to cope with the various problems of assimilating the literature. Just how such a coping mechanism could work may not be at all clear, but it is commonly taken for granted within the scientific community. Scientists long ago abandoned the idea that each of them had to read everything. By some obscure spontaneous process, they organized their work into specialties, thus permitting each individual to focus on a small part of the total literature. Specialties that grow too large tend to divide into subspecialties that have their own literatures which, by a process of repeated splitting, maintain a more or less fixed and manageable size. As the total literature grows, the number of specialties, but not, in general, the size of each, increases. The literature to people ratio is kept roughly constant thus enabling everyone to keep up (or at least to think they keep up) with their own share of the total as represented by their specialty. But the unintended consequence of specialization is fragmentation. By dividing up the pie, the potential relationships among its pieces tend of necessity to be neglected (for a perceptive essay on some of the cognitive/sociological dimensions of specialization and fragmentation, see Whitley, 1979).

To examine the fragmentation problem in the context of the above model, let F(t) = K/A = the total number of specialized literatures, or "fragments," each of fixed size A articles, within the current literature, K (Choose A = 100, for example). To assume that the fragments are similar in size is not accurate (Sullivan et al., 1977, p. 176), but such an assumption serves as a useful approximation in the following analysis. The total number of possible pairs of fragments, F[F-1]/2 or approximately [F.sup.2]/2 , provides a basis for estimating the number of potential connections or relationships between different fragments. Some unknown fraction, denoted by q (for example, q = 1 percent), of all possible pairs, represents those that are important, interesting, or meaningful according to criteria to be discussed in the third part of this article. The number of such "interesting" pairs is then given by: (8) C(t) = q[.2]/2[A.sup.2] = 1760 q/A.sup.2][S.sup.2]

The squared term leads to a very rapid rate of growth. Even prorating the number of pairs on a per capita basis still yields exponential growth: (9) P = C/S = 1760q/[A.sup.2][S.sub.o][e.sup.gt] = 0.176[e.sup.gt] where A is chosen as 100 and q as 0.01 for the purpose of illustration. The relationships expressed can be visualized with the help of Figures 1 and 2. The solid curve of Figure 1 is actually formed of two curves that represent equations 2 and 3, respectively, for the research community and the cumulative literature it produces. Except for the constant factor of 100, the two curves are not perceptibly different (on the scale chosen) and represent an exponential increase at the rate of 3 percent per year The literature growth curve taken alone represents also the usual view of an "information explosion." The dashed line shows the steady linear accumulation of information from a fixed community of producers. Figure 2 shows how literature appears to grow when it is prorated on a per person basis in the producing community (which, for science, one can assume is also the consuming community). For the fixed community (dashed line), the per capita literature growth remains linear as before, for it is simply divided by a constant factor of 100. For the case of a community growing at the rate of 3 percent, the per capita growth is dramatically different from the total growth. Instead of exponential growth of the total, there is essentially no growth per capita at all after the first 80 or 100 years (see equation 5). From this perspective, the information explosion is a myth, but not so the potential for a connection explosion, as shown by the rapidly rising curve that corresponds to P (the number of pairs of fragments per person) in equation 9.

[TABULAR DATA OMITTED]

No attempt has been made to match the model to actual growth rates or community size. Nonetheless the idealized growth curves shown in Figure 2 lead to certain conclusions that are unlikely to be much different if real data were used. The actual growth rate of the community itself is known to vary greatly over the course of history and to vary greatly from one discipline to another and from one specialty to another. But a growth rate of over 4 percent for science as a whole has been sustained for several hundred years (Price, 1963). Any sustained growth rate less than 3 percent would lead to a curve intermediate between the 3 percent curve shown and the dashed line that represents zero growth of the community. For example, a 2 percent curve (not shown) has almost the same shape as the 3 percent curve, but flattens out more slowly-remaining constant after about 150 years instead of 80 to 100 years. Any growth rate higher than 3 percent would lead to a ratio curve even flatter than the one shown. Within wide and plausible limits, substantially the same general picture emerges. The literature to person ratio grows at a rate slower than linear and eventually not at all. There seems to be no reasonable set of assumptions that could lead to explosive or exponential growth of that ratio. On the other hand, under all reasonable assumptions, the growth rate ratio for pairs (thus for relationships or connections) would be exponential and hence explosive.

[TABULAR DATA OMITTED]

The absolute size of the P-curve was chosen to fit the scale of the graph by taking only 1 percent of the number of possible pairs as representing those that turn out to be of interest, but this percentage is, in fact, altogether unknown. It could be larger or smaller, but the shape of the curve is clear; under any plausible set of assumptions this abstract and invisible but explosive growth must eventually overshadow the growth of the literature itself and overwhelm the associated community of information producers and users.

What is generally taken as the single most important marker of potential obsolescence is the age of the literature. In Figure 2, the distance between the horizontal dotted line for the current thirty years of literature (see equation 6) and the near horizontal total literature line above it (see equation 5) represents the literature per capita older than thirty years. Thus, whatever may be the nature of the obsolescence problem in older literature, it is not characterized by disproportionate growth with respect either to the total literature or the size of the scientific community.

The life cycle of a scientific specialty is characterized by growth and decline but not necessarily by obsolescence. Prior to decline, it may fragment into new subspecialties or develop new relatedness with other older specialties--relatedness that may be unintended and unnoticed. The fragmentation of knowledge inevitably will spawn the most important information problems of the future, problems that also are opportunities to create new knowledge by discovering new relationships. What is now seen as an information explosion will become an opportunity explosion. The next part examines the nature and explicit examples of such opportunities.

COMBINING COMPLEMENTARY BUT DISJOINT (CBD) LITERATURES

A series of papers published during the past six years shows how it is possible to find, within scientific bibliographic databases, unnoticed relationships that represent new solutions to scientific problems. In particular, three "case studies" show how previously unknown solutions can emerge through combining sets of biomedical articles that are logically related but which do not cite or mention each other (Swanson 1986b, 1988, 1990a). Subsequent to each of the first two such studies, independent clinical and laboratory evidence has been reported that supports the proposed novel solutions; a clinical trial to test the results of the third study is currently underway. The above three studies are summarized and elaborated in Swanson (1990b, 1991); independent reviews and evaluations provide additional perspective (Lesk, 1991, pp. 6, 7; Davies, 1989). The following description of the literature structure that was analyzed in the 1988 case study explains the main point of the project.

That study began by choosing the problem of finding published information on the cause or cure of migraine. Such a quest is presumably futile because it is generally accepted by medical researchers that the cause and cure of migraine are unknown. However, to see how implicit but unknown published information relevant to causes or cures might exist, consider the following six pairs of titles of medical articles, selected from 128 similar articles cited and reviewed in the published outcome of the study. All "a" titles are about migraine; "b" titles are about magnesium. 1a The relation of migraine and epilepsy. 1b The magnesium deficient rat as a model of epilepsy. 2a Role of calcium entry blockers in the prophylaxis of migraine. 2b Magnesium: nature's physiologic calcium blocker. 3a Leao's spreading depression: evidence supporting a role in the

in migraine patients. 4b Protective effects of dietary calcium and magnesium on platelet

function and atherosclerosis in rabbits fed saturated fat. 5a Serotonin-releasing factors in migrainous patients. 5b The effect of magnesium on the response of smooth muscle to 5-hydroxytryptamine [serotonin]. 6a Substance P and enkephalins: a creditable tandem in the

Because of the shared "linkage" terms shown in italics, each of the six pairs of titles raises the question of whether magnesium deficiency might be implicated in migraine. None of these pairs is unique; there are more than a dozen articles (like 1a) that relate epilepsy to migraine and a similar number (like 1b) on the role of magnesium in epilepsy--similarly for the other five pairs. Moreover, there are at least five more linkages besides the above mentioned six that appear to connect magnesium with migraine (Swanson, 1988, 1990b). Altogether, sixty-five articles on migraine and sixty-three on magnesium were identified as forming what may be called two "complementary literatures," a pair in which one literature appears to contain a potential solution to a problem posed in the other. Such unintended complementarity becomes of great interest if it is not apparent within either literature taken alone, a point to be elaborated in the next section.

Disjointness

Remarkably, none of the sixty-five articles on migraine mentions or cites any articles on magnesium and none of the sixty-three articles on magnesium mentions or cites any articles on migraine. Moreover, a MEDLINE search in August 1987 revealed that, among 4,600 migraine records and 38,000 magnesium records, there were only six that contained both "migraine" and "magnesium," either as a descriptor or as an identifier or as a text word in the title or abstract. The six corresponding articles, published over a twenty year time span, were principally on magnesium. They offered little or no substantive discussion of the migraine literature and none had been cited by any migraine researcher, as judged by searching the Science Citation Index. In short, neither online searching nor printed indexes nor reading the text and following citation trails in medical articles turned up evidence that there was, at the time, any substantial scientific interest in the possibility of a physiological relationship between magnesium and migraine.

If migraine and magnesium are biologically unrelated, one might infer that the corresponding words should appear to be distributed independently of one another in scientific articles as well as in titles and abstracts. That is, "migraine" occurs in about one record per thousand in the entire MEDLINE database so it would be expected to turn up more or less by chance in about thirty-eight of the "magnesium" records. Thus it comes as a surprise to many information scientists that there are significantly fewer than thirty-eight such occurrences; so striking a negative correlation instead of a random association perhaps calls for explanation. The work of science is clustered into specialties; migraine researchers write about migraine, citing each other, and magnesium researchers write about magnesium, citing each other; difficulties in word use between the two groups may cause many negative correlations.

When the 1988 study was conducted, the two literatures on migraine and on magnesium were essentially disjoint in that they had virtually no articles or authors in common, they did not cite each other, and they were not cited together (cocited) by any third type of article (a few exceptions are discussed in that study; no subsequent publications have yet identified other pre-1988 exceptions, and even the exceptions have not been cited). Moreover, the two sets of articles identified were as nearly complete as can reasonably be expected from a diligent literature search. So far as printed evidence was concerned, it would seem that no one at all was aware of the implicit problem-solution relationship within these complementary but disjoint literatures; people who worked on the migraine problem appear not to have known about the connection with magnesium and vice versa. Such insularity and fragmentation may be a typical, though unintended, consequence of scientific specialization.

Publication of Results as Intervention

Any published paper, in some sense, "intervenes" in the process of building the edifice of knowledge; in this article, "literature intervention" is used in a special sense that requires a definition and explanation. Contributions that fit within the normal framework of scientific specialties are not considered as "interventions." A "literature intervention" is here taken to be a published literature analysis or bibliometric analysis that attempts to introduce new connections and citations that cut across specialties in order to connect disjoint parts of the existing network. The purpose of such an intervention in general is to influence the course of normal research within the individual specialties that are thus connected. A literature intervention is based on the assumption that even a nonspecialist is able to see many clues to relationships that may have escaped the notice of specialists simply because the implicitly related literatures in question have never been brought together, clues such as those in the six pairs of titles on migraine and magnesium quoted earlier. Thus, such interventions are not necessarily based on as great a depth of knowledge as that prevailing within the specialty.

In the last few years a fundamental role for magnesium in establishing

the threshold for migraine attacks and involvement in the pathophysiologic

mechanisms related to its onset has become evident. (Gallai

et al., 1992, p. 132)

Although only eight of the above thirteen publications cite the 1988 literature intervention study, the pattern of citations and the information within all thirteen papers is consistent with the possibility that the 1988 intervention provided the main stimulus for this new work (see especially Schoenen et al., 1991). If it did not, the suddenly awakened interest in a migraine-magnesium connection after 1988 remains unexplained. Whether or not such a connection ultimately becomes established and accepted by medical scientists, it seems clear, in any event, that complementary but disjoint literatures are worth seeking; in principle they hold the potential for stimulating the process of scientific discovery.

Similar Conclusions Reached in Earlier Study

Similar conclusions were reached in the first CBD literature analysis and intervention (Swanson, 1986b). That study identified one set of articles showing that dietary fish oil leads to certain blood and vascular changes, and a second set containing evidence that similar changes might benefit patients with Raynaud's disease. Yet neither literature mentioned the other and no evidence has yet turned up to indicate that the implied potential benefits of dietary fish oil for Raynaud patients had been suggested prior to that study. At that time, there had been published about 2,000 papers on Raynaud's syndrome and about 1,000 papers related to dietary fish oil. A MEDLINE search showed zero intersection of those two sets. No instances were found of articles on fish oil citing articles on Raynaud's disease or vice versa.

Two years later, in 1988, the predicted beneficial effect was corroborated by a controlled double-blind, clinical trial reported by B. B. Chang and a team of medical researchers. That report does not cite the 1986 CBD literature intervention study but the similarity of the two papers is notable. All of the reasons given by Chang et al. (1988) for inferring that fish oils may ameliorate Raynaud's disease are identical to reasons given in the intervention paper, and the complete list of ten references cited by Chang et al. is a subset of the references cited in the intervention paper for the same reasons. The following points are common to the two papers: Fish oil reduces platelet activity, vascular reactivity, and blood viscosity, and these same changes can be expected to ameliorate Raynaud's disease. Fish oils produce prostaglandin PGI3, and the latter is a potent vasodilator that can also suppress platelet aggregation. Fish oils reduce blood viscosity through increasing cell membrane fluidity and through reducing blood triglycerides and chylomicrons. Nifedipine, a calcium blocker commonly used in treating Raynaud's disease, inhibits platelet aggregation. Chang et al.'s (1988) references 1 through 10, respectively, are identical to references 50, 52, 49, 3, 17, 19, 16, 10, 11, and 65 in Swanson (1986b).

The 1988 study by Chang et al. is important for it represents the first clinical trial of a dietary approach to treating or preventing Raynaud's disease. The purpose of the 1986 literature intervention was not to substitute literature research for clinical research but to show that a novel approach based on a synthesis of CBD literatures can stimulate exactly the kind of controlled clinical study reported by Chang. It is difficult to imagine better evidence of such influence than the text of the article by Chang et al. save perhaps an acknowledgment by the authors themselves.

The common (though mistaken) belief that whatever is published is therefore, by definition, "known," would imply that the work reported in any CBD literature study cannot be "original" research and does not merit recognition as an influence beyond the ideas of others that it merely assembles. An assembly of other people's ideas is just what such work claims to be, but it claims also that such an assembly in principle can yield new knowledge. The proof of the latter claim follows from a simple example that can be taken as paradigmatic: suppose that "X causes Y" is known exclusively to one group of authors and readers and "Y causes Z" is known exclusively to a second group. Then the (unintended) implication that X causes Z might be known to no one at all, but it is discoverable by any third party who assembles the two complementary but disjoint premises. This "disjoint syllogism" shows how "public knowledge" can remain undiscovered by anyone, even by its own authors (Swanson, 1986a).

Clinical Trial Initiated for Third CBD Literature Study

A controlled clinical test with the title "effect of additional arginine administration on serum levels of insulin-like growth factor I" has been initiated by faculty members of the University of Illinois College of Medicine at Urbana (personal communication from Arjun Venkataramani, primary investigator for the project, August 27, 1992). That investigation is apparently the first controlled test, and the first dose response measurement, of the potential stimulatory effect of arginine infusion on endogenous IGF-1 (Somatomedin C) levels. To establish such a relationship through a clinical test may have important implications for treatment or prevention of catabolic states. Many indirect linkages that would lead one to expect arginine to influence IGF-1 levels were reviewed in the third literature intervention study (Swanson, 1990a), along with corresponding citation based evidence that the implicit arginine IGF-1 link had been overlooked or neglected by medical researchers. The University of Illinois trial further strengthens the evidence that identifying and reporting CBD structures within the literature of science can stimulate clinical and laboratory research.

The Quest for a Systematic Process

The migraine study may be taken as a prototype or model to aid in developing a more systematic repeatable process for conducting other CBD literature studies. Ideally, one would like to fully automate the initial stage of at least bringing potential solutions to the attention of migraine researchers. The prospects for achieving this ideal will be examined briefly, making use of the known outcome in this case (the literature on magnesium) to provide suitable test parameters (the original analysis relied heavily on human judgment in a trial-and-error online search process).

The goal of the migraine study in effect is to retrieve magnesium articles without knowing in advance that magnesium is the object of the search. Fully automatic methods that depend only on word co-occurrence frequencies and probabilistic matching of records at first sight might seem promising because of the shared linkage terms in the migraine and magnesium literatures, as exemplified by the six pairs of titles illustrated earlier. Indeed, the mainstream of information retrieval research is based on similar probabilistic methods (Salton & McGill, 1983). However, a closer look at the statistics of the problem, applied to a multimillion record database, does not encourage boundless optimism (Swanson, 1991, p. 284). The nature of the problem for a fully automated process is illuminated by the Venn diagram of Figure 3, which shows that a very small part of the migraine literature intersects a very large (200,000 article) intermediate literature (corresponding to the eleven connections mentioned previously), and a very small part of the magnesium literature forms the other part of the (potential) linkage (Figure 3 is a post-hoc reconstruction of the known outcome of the analysis; the information it presents could not have been known at the outset). The goal of any automated probabilistic process must be to present to the human searcher a set of records sufficiently "rich" in both migraine and the unknown target (magnesium) so that a complementary relationship, if any exists, could be recognized.

In experiments using only title word co-occurrence data, initial attempts to form a sufficiently rich set were not successful (Swanson, 1991, pp. 285-86). Even though the intermediate 200,000 record literature (Figure 3) was based on strong title-word correlation with migraine, there was no significant correlation of the same intermediate literature with magnesium.

The difficulty with a fully automatic probabilistic approach is easily seen if one attempts to design a test for any specified procedure that is thought to be promising. Most operational databases (such as MEDLINE) are far too large to serve as the basis for any such test. But any artificially constructed "test" database would of course have to contain some number of magnesium articles, and the question of how many such articles are introduced into the database is a crucial parameter of test design. The problem is to determine whether the (fully automatic) method under test is able to achieve the requisite level of discrimination, a level that is known in this case because the criterion for a successful outcome is known. As indicated earlier, the size of the magnesium literature identified as complementary to the literature on migraine was sixty-three articles (no doubt some articles that should have been included were never found, but the sixty-three article result was achieved through reasonably diligent searching of the major available online databases, including MEDLINE). The performance goal for any proposed test should thus approximately duplicate an ability to select on the order of 100 articles out of the 5 million or so in the MEDLINE database at the time of the reported study--a discrimination level of 1 record in 50,000. Such a level of discrimination in a relatively homogeneous database (i.e., as contrasted with a multidisciplinary or encyclopedic database) is probably well beyond what any text-based probabilistic process applied to either full text or to titles and abstracts can reasonably be expected to achieve (if such a test were to be conducted, the above argument suggests that the minimal size test database would have to be at least a few hundred thousand records). To apply human judgment using the tools of online searching provided by the major database producers and system vendors, leads to a more promising approach (Swanson, 1991, pp. 286-87).

To restate the problem now in terms of an online search rather than a hypothetical, purely probabilistic process, one begins with a MEDLINE search for migraine literature but cannot specify immediately what to look for next. Several techniques for conducting a search for an unknown target have been previously discussed (Swanson, 1991), the most successful of which took advantage of both title-word co-occurrence data and of strategic guesses concerning categories of possible target literatures. Broad categories of targets offer a more practical basis for conjecture than do thousands of highly specific targets. Four such general categories--physiological deficiencies, dietary factors, poisons, and toxicity--each divided into two or three levels of differing breadths, have been developed and partially tested with some success; these strategies are presented as DIALOG MEDLINE searches in Swanson (1991, Table 3, pp. 286-87). The effect of using one such strategy is illustrated schematically in Figure 4. A set of over 100,000 articles linked to migraine literature by title-word co-occurrence is narrowed down by means of the following DIALOG search based on the MESH subheading "deficiency" and diet-related title words as indicated: select deficiency/de and (diet or dietary or diets or intake or induced or supplement?)/ti

The set formed by this search consisted of 4,400 records. When combined with the 100,000 record intermediate literature, the resulting intersection contained only 200 records (none of which contained the word "migraine"). Within these 200 records, "magnesium" was the most frequent substantive title word, occurring in thirteen titles, about fifty times the frequency with which such titles occurred in the intermediate literatures. In any visual scan of the 200 titles, magnesium would not be likely to escape the notice of anyone looking for clues to possible physiological influences on migraine. The linkages to these thirteen titles are all sufficiently suggestive to stimulate further exploration.

The last step just mentioned might provide a suitable basis for testing automatic methods. Once a downloaded set is thought to be sufficiently rich in records that represent the unknown target (which in this case turned out to be the thirteen magnesium records out of a set of 200 records), a probabilistic match of source (migraine) and target sets to identify the most likely specific target candidates may hold promise.

The objective of the search for an unknown target is to find complementary passages of text that stimulate conjectures concerning target substances that hold promise for solving the problem initially selected. Such conjectures can then be explored in the further searching of MEDLINE. Citation and cocitation patterns finally must be analyzed to determine whether the discovered complementary literatures are in fact disjoint (Swanson, 1989a, 1989b). Online searching alone can be no more than a prelude to an investigation in depth; ultimately, many full-text articles within the source and target literatures must be analyzed, an unavoidably labor intensive but essential prerequisite to any literature intervention.

CONCLUSION

On a per capita basis, the cumulative amount of published information produced by a long established exponentially growing community of researchers is essentially constant, an easily provable statement that may seem at first surprising in the light of prevailing ideas on information inundation. Moreover, the fraction of the total quantity of information that is older than some given number of years (and hence more susceptible to depreciation or obsolescence) is also essentially constant on a per capita basis. However, the fragmentation that inevitably accompanies the growth of science has created an altogether different set of problems--as well as opportunities. Interrelationships among the fragments, unnoticed because of the insularity of specialties, have been shown to harbor previously unknown solutions to authentic scientific problems, and so to hold a potential for rejuvenating knowledge that might otherwise be considered obsolete. The invisible growth of relatedness probably follows a combinatorial law and so may far exceed even the explosive growth rates that have characterized both the scientific community and the mountains of print it produces.

ACKNOWLEDGMENT

I am grateful to Abraham Bookstein and David Lewis for helpful discussions.