- Home
- Samuel Arbesman
The Half-Life of Facts Page 2
The Half-Life of Facts Read online
Page 2
CHAPTER 2
The Pace of Discovery
WHEN Derek J. de Solla Price arrived1 at Raffles College (now the National University of Singapore) to lecture on applied mathematics in 1947, he did not intend to spearhead an entirely new way of looking at science. But his plans to continue research in physics and mathematics were altered by the construction on the college library. Since Raffles was a small university, the library was actually giving books out to students and faculty to store in their dormitories and apartments while the construction was under way.
Price ended up with a complete set of Philosophical Transactions of the Royal Society of London, a British scientific journal that dates back to 1665. Once home, he stacked the journals chronologically against the walls in his apartment: Each pile was published later than the one before it, and they were all lined up one after another. One day, while idly looking at this large collection of books that the library had foisted upon him, he realized that the heights of the piles of these bound volumes weren’t all the same. But their heights weren’t random either. Instead, he realized, the heights of the volumes fit a specific mathematical shape: an exponential curve. Price’s simple observation was the origin of a sophisticated quantitative theory for how scientific knowledge advances.
• • •
MOST of our everyday lives revolve around linear growth, or changes that can be fit onto a line. When something increases by the same amount each year, when the rate is constant, we get linear growth. When we drive somewhere, and go at the same speed the entire way, a chart showing the distance we’ve traveled over time follows a straight line. And if we have a machine that builds widgets at a constant rate of three per hour, the number of widgets after a given number of hours grows linearly with the number of hours we’re considering.
Due to how easy it is to imagine (and our brains seem particularly well suited to this type of thinking), we often think in terms of linear growth. If the temperature was sixty-five degrees yesterday, and sixty degrees the day before that, it is not surprising if we expect it to be about seventy degrees today.
But there are many examples of change that occur differently. If you were watching when the sun set over the course of a few days early in the summer, it wouldn’t be unreasonable to expect the sunset’s timing to follow a nice linear curve: Each day the sun sets the same number of minutes later than it did the day before. But it turns out that sunsets at a specific location adhere to a sine curve—a wavelike shape that looks like a rope being shaken up and down, a shape that we aren’t particularly intuitive about. During the solstices—the shortest and longest days of the year—we are at the top or bottom of the wave, when the sunset only varies by a small amount each day; during the equinoxes (spring and fall), we are in the steep parts of the wave, and each day the sunset time is many minutes different from the day before. This is far from a curve that we can think about easily.
We are just as ill suited when it comes to noticing the many changes that adhere to exponential growth. When we encounter exponential curves all around us, we often don’t think about it this way at all, because it is harder to picture. Exponential growth is when something increases by the same fraction or percentage, rather than the same amount, each second or minute or hour. If bacteria double every hour, that’s exponential growth, because they’re growing at a constant rate of 200 percent an hour. Compound interest is the same sort of thing: If our money grows by a certain percentage each year, we can describe this growth by an exponential curve.
Figure 1. A linear (black) versus exponential (gray) curve versus sine (dotted) curve.
As you might have realized, exponential growth is very rapid. Even if we are initially adding only a small amount to some quantity each hour or day, that quantity can become very big very quickly. Imagine we are given a penny and begin doubling it each day. After a week we would be receiving less than a $1.50 a day. But give it one more week. Now we’re getting more than $80 a day. Within a month our allowance is more than $100 million a day!
Exponential growth gets its name from the use of an exponent: an exponent signifies how many times to multiply another number, the base, by itself. Many times a special constant is used for the base; in the case of exponential growth it is often e. Also known as Napier’s constant, it is about 2.72. It’s one of those numbers, like π, that crops up in the weirdest situations, from bacteria doubling to infinitely long sums of numbers. The exponent part of the equation includes what is known as its rate of growth. The larger this value, the faster the quantity grows, and the faster it doubles.
• • •
THE exponential growth curve was well-known to Price, so when he began to measure the heights of his stacks of journals, he knew immediately what was going on. But maybe he just happened to have gotten the only stack of journals that obeyed this curious pattern. So he began collecting lots of data, a research style that he followed throughout his life.
He measured the number of journal articles in the physics literature in general, as well as for more specialized fields, such as the subfield that deals with linear algebra. And they all seemed to have elements of the exponential curve. Price began to recognize that this could be a new way to think about how science grows and develops. Price published his findings2, under the title “Quantitative Measures of the Development of Science,” in a small French journal in 1951, after presenting this work at a conference the previous year in Amsterdam.
No one was interested.
But Price wasn’t deterred. He returned to Cambridge and continued to pursue his research in this new field, the quantitative study of science, or scientometrics, as it soon became known. This science of science was still quite young, but Price set himself to collecting vast quantities of data to help him understand how science changes.
By the 1960s, he was the foremost authority in this field. He gathered data from all aspects of science and marshaled evidence that enabled him to look at scientific growth as something far from haphazard; this knowledge was subject to regular laws.
Expanding on his initial research on scientific journals, he gathered data for a wide variety of areas that displayed this growth, from chemistry to astronomy. Price calculated the doubling times—how long it takes for something to double, a proportional increase that implies exponential growth—for these components of science and technology, which then can be used as a rough metric for seeing how different types of facts change over time. Here is a selection of these doubling times from his 1963 book Little Science, Big Science:3
Domain
Doubling Time (in years)
Number of entries in a dictionary of national biography
100
Number of universities
50
Number of important discoveries; number of chemical elements known; accuracy of instruments
20
Number of scientific journals; number of chemical compounds known; memberships of scientific institutes
15
Number of asteroids known; number of engineers in the United States
10
The growth of facts was finally beginning to be subjected to the rigors of mathematics.
• • •
PARALLEL to Price’s work in the hard sciences, a similar line of research was proceeding in the social sciences. In 1947, a psychologist named Harvey Lehman published a curious little paper4 in the journal Social Forces. Combing through a wide variety of dictionaries, encyclopedias, and chronologies, Lehman set out to count the number of major contributions made in a wide variety of areas of study over the years. He looked a
t everything from genetics and math to the arts, whether new scientific findings, new theorems, or even new operas produced. What he found in all of these were exponential increases in output over time. But this wasn’t only over the previous few decades. Lehman looked at each of these areas over hundreds of years. He examined philosophy over the six hundred years from 1275 to 1875, botany over the three hundred years from 1600 to 1900, and geology over the four hundred years from 1500 to 1900.
Each area was found to have a characteristic rate of increase. Here are doubling times (the number of years it takes for the yearly contributions in these fields to double) from Lehman’s findings, along with a few more recent areas examined:5
Field
Doubling Time (in years)
Medicine and hygiene
87
Philosophy
77
Mathematics
63
Geology
46
Entomology
39
Chemistry
35
Genetics
32
Grand opera
20
Independently, a number of thinkers were coming to the realization that the growth of knowledge was subject to patterns, and was far from random. Similarly, different types of growth fit different types of knowledge creation. For example, opera is a far faster-changing domain than the sciences. Even though science and opera composition are inherently creative, science is limited by what we can determine about nature. Science can develop only as quickly as we can figure out things about the world. Grand opera, however, is not limited by what is true, only by what is beautiful, and should therefore be able to grow more rapidly, since it doesn’t have to be rigorously subjected to experimentation.
In addition, we can see a hint of how more fundamental discoveries grow by comparing them to ones that are more dependent on other areas, which build on work done in other fields. For example, genetics and chemistry, two areas of the basic sciences, proceed at similar rates. On the other hand, medicine and hygiene are much slower, and are also areas that rely on these more basic fields for new discoveries. Perhaps this is a hint that more derivative fields move more slowly compared to the more basic areas of knowledge on which they depend.
Price’s and Lehman’s efforts showed that looking at how knowledge grows in a systematic way was finally possible, and they unleashed a wave of discoveries.
• • •
PRICE’S approach, looking at how science progresses by examining scientific articles and their properties, has proven to be the most successful and fastest-growing area of scientometrics. While scientific progress isn’t necessarily correlated with a single publication—some papers might have multiple discoveries, and others might simply be confirming something we already know—it is often a good unit of study.
Focusing on the scientific paper gives us many pieces of data to measure and study. We can look at the title and text and, using sophisticated algorithms from computational linguistics or text mining, determine the subject area. We can look at the authors themselves and create a web illustrating the interactions between scientists who write papers together. We can examine the affiliations of each of the authors and try to see which collaborations between individuals at different institutions are more effective. And we can comb through the papers’ citations, in order to get a sense of the research a paper is building upon.
Examining science at the level of the publication can give us all manner of exciting results. A group of researchers at Harvard Medical School6 looked at tens of thousands of articles published by its scientists and mapped out the buildings on campus where they worked. Through this, they were able to look at the effect that distance has on collaboration. They found exactly what they had assumed but no one had actually measured: The closer two people are, the higher the impact of the research that results from that collaboration. They found that just being in the same building as your collaborators makes your work better.
We can also understand the impact of papers and the results within them by measuring how many other publications cite them. The more important a work is, the more likely it is to be referenced in many other papers, implying that it has had a certain foundational impact on the work that comes after it. While this is certainly an imperfect measure—you can cite a paper even if you disagree with it—much of the field of scientometrics is devoted to understanding the relationship between citations, scientific impact, and the importance of different scientists.
Using this sort of approach, scientometrics can even determine what types of teams yield research that has the highest impact. For example, a group of researchers at Northwestern University7 found that high-impact results are more likely to come from collaborative teams rather than from a single scientist. In other words, the days of the lone hero scientist, along the lines of an Einstein, are vanishing, and you can measure it.
Citations can also be used as building blocks for other metrics. By examining the average number of times articles in a given journal are cited, we can get what is known as the impact factor. This is widely used and carefully considered: Scientists want their papers to be published in journals with high impact factors, as it is good both for their research and influences decisions such as funding and tenure. The journals with the highest impact factors have even penetrated the public consciousness—no doubt due to the highly cited individual papers within them—and include the general science publications such as Nature and Science, as well as high-profile medical journals such as the New England Journal of Medicine.
Scientometrics has even given bragging tools to scientists, such as the h-index, which measures the impact of a paper on other researchers. It was created by Jorge Hirsch8 (and named after himself; notice the h) and essentially counts the number of articles a scientist has published that have been cited at least that many times. If you have an h-index value of 45, it means that you have forty-five articles that have each been cited at least forty-five times (though you have likely published many more articles that have been cited fewer times). It also has the side benefit of meaning that you are statistically more likely to be a fellow of the National Academy of Sciences, a prestigious U.S. scientific organization.
It shouldn’t be surprising that the field of scientometrics has simply exploded in the past half century. While Price and his colleagues labored by hand, tabulating citations manually and depending on teams of graduate students to do much of the thankless grunt work, we now have massive databases and computers that can take a difficult analysis project and do it much more easily. For example, the h-index is now calculated automatically by many scientific databases (including Google Scholar), something inconceivable in previous decades. Due to this capability, we now have scientometric results about nearly every aspect of how science is done. As we spend billions of dollars annually on research, and count on science to do such things as cure cancer and master space travel, we have the tools to begin to see what sorts of research actually work.
Scientometrics can demonstrate the relationship between money and research output. The National Science Foundation has examined how much money9 a university spends relative to how many articles its scientists publish. Other studies have looked at how age is related to science. For example, over the past decades, the age at which scientists receive grants from the National Institutes of Health has increased, causing a certain amount of concern among younger scientists.
There’s even research that examines how being a mensch is related to scientific produc
tivity. For example, in the 1960s, Harriet Zuckerman, a sociologist of science—someone who studies the interactions and people underlying the entire scientific venture—decided to study the scientific output of Nobel laureates10 to see if any patterns could be seen in how they work that might distinguish them from their less successful peers. One striking finding was the beneficence of Nobel laureates, or as Zuckerman termed it, noblesse oblige. In general, when a scientific paper is published, the author who did the most is listed first. There are exceptions to this, and this can vary from field to field, but Zuckerman took it as a useful rule of thumb. What she found was that Nobel laureates are first authors of numerous publications early in their careers, but quickly begin to give their junior colleagues first authorship. And this happens far before they receive the Nobel Prize.
As one generous Nobel laureate in chemistry put it: “It helps a young man to be senior author, first author, and doesn’t detract from the credit that I get if my name is farther down the list.” On the other hand, those peers of Nobel laureates who were not as successful tried to maintain first authorship for themselves far more often, garnering more glory for themselves. By their forties, Nobel laureates are first authors on only 26 percent of their papers, as compared to their less accomplished contemporaries, who are first authors 56 percent of the time. Nicer people are indeed more creative, more successful, and even more likely to win Nobel prizes.
These regular patterns of scientists seem evident enough, at least when we look at whole populations of researchers. But what of regularities related to knowledge itself and how it’s created? To understand this, we need to begin thinking about asteroids.