I wrote a short script that searched for each number between 11 and 1000 on the search engine Alta Vista. (It was not possible to search for numbers 1-10 for some reason.) For each number I recorded the number of hits Alta Vista reported and the result included a few surprises. I got the following diagram. It was perhaps not so surprising that lower numbers are more popular than higher numbers and that even fives, tens, twenty fives and hundreds were vastly overrepresented. What I had not expected, however, was that at each even hundred, the rate of occurence jumped up and then fell within the hundred, only to jump up again at the next. This effect can be explained by Benford's law as pointed out to me by Golan Levin. What also suprised me at first was the distinct difference between the groups of numbers 11-31, 32-60 and 61-94, something which can better be seen on a zoom up of numbers 11-100. I now believe this is explained by date and time strings where days are in the range 1-31 and seconds/minutes are within 0-59. (Credit to Mats Wicksell for this suggestion.) Then there is the issue of individual numbers that are overrepresented. As was seen above, numbers divisible by 5 are in general overrepresented. Therefore I filtered out all those, to make it easier to identify other overrepresented numbers. What I got was the following diagram, where I have annotated some of the numbers that stand out. Hence, the spikes in the diagram correspond to the overrepresented numbers. Some of those I can understand. The even powers of two, 64, 128, 256, 512 turn up in most computer related situations. The number 404 is the code for the ubiquitous error message "Page not found" and 877, 888 are area codes for toll free numbers in the US (which also explains why 800 is the most common even hundred after 100). Some have commercial roots, like the CPUs 386 and 486 and Levi's 501 jeans. Windows 95/98 probably contributes to the large number of hits in the range 95-99 (even more than for 100), but mostly I think those numbers turn up in dates; the 95-99 peak should hence reflect the age and growth of the Internet. (A better view of this is in the second figure above.) Some numbers look funny, like 333 and 999, and are maybe common because of that. (But why not 444 and other similar?) Others are totally puzzling to me. Why are for instance 152, 163, 301, 541, 624, 672, 703 and 972 overrepresented? If anyone has an idea, I would be happy to know. I had a hunch that some pop culture numbers like 187 (California police code for homicide), 242 (Front 242, and the UN resolution) and 666 (number of the beast) would be common, but this turned out to be wrong.
Zoom ups of this last filtered diagram are here:
|Numbers 11-100||Numbers 101-200||Numbers 201-300||Numbers 301-400||Numbers 401-500|
|Numbers 501-600||Numbers 601-700||Numbers 701-800||Numbers 801-900||Numbers 901-1000|
Finally, the raw data is available here.
The experiments above were carried out in September 2000. I was recently made aware of a very similar but more ambitious website, The Secret Lives of Numbers, by Golan Levin et. al. launched in 2002, based on data collected as early as 1997.
No more links. You've reached the end of Webworld.
Don't fall over the edge.