How does one measure the importance of a genome to science? Of course I could give you my subjective list, but I was looking for an objective measurement, one that anyone would have to admit is reasonable. The one I chose – the obvious one, really – is the number of scientific citations that the original genome paper has collected. This measure has a bias towards older papers, because newer papers haven’t yet had time to accumulate as many citations, but all of the papers on the Top 10 Genomes list are at least 6 years old. I will revise this list in the future to accommodate updates in the citation counts.
The other question is how to count citations. After looking at several sources, I chose ISI’s Web of Science citation index. Google Scholar is another option, and I used it as well, but I found that Google is less accurate – it uses a heuristic method to collect citations, and it frequently double-counts references, especially for papers with large numbers of authors. I listed both counts in the Top 10 list, but the ranking follows ISI where there’s a disagreement.
So here they are! The Top 10 Genome Papers include 5 bacteria, 3 model organisms, and the two human genome papers right at the top. Not surprisingly, all 10 appear in Nature or Science (5 in each journal). All of the first authors are different, and three were authored by consortia without a traditional first author. And for those who want to argue about which of the two human papers deserves #1, ISI gives a clear edge to the publicly-funded effort, while Google Scholar, curiously, ranks the Celera Genomics effort (which I was part of) well ahead of the public project. My subjective list would have included the malaria genome paper (MJ Gardner et al, Nature 2002) – TB and malaria are the two greatest infectious disease killers of humans – but it came in at #12 using citation criteria. But it’s much newer than #9 and #10, so I'm betting it will move up in the future – stay tuned.
[Note that I’ve also created a separate web page for this list.]
Criteria for inclusion: a paper must be the first description of the complete or near-complete genome of a species, and it must describe the DNA sequence as well as relevant sequencing methods and biological discoveries revealed by the initial sequencing of the genome. Rankings are based on citation counts, with the ISI Web of Science taking priority over Google Scholar, which is less accurate as it uses heuristic rules to gather citations. Counts from both databases are provided. Citation counts are current as of December 2008.
1. Initial sequencing and analysis of the human genome
International Human Genome Sequencing Consortium
Nature 409:6822 (15 Feb 2001), 860-921.
Times Cited: 6,416
Google Scholar: 5,542
2. The sequence of the human genome
JC Venter, MD Adams, EW Myers, et al. (274 authors)
Science (16 Feb 2001), 1304-1351.
Times Cited: 4,588
Google Scholar: 6,502 [Note that Google places this paper at #1]
3. The Complete Genome Sequence of Escherichia coli K-12
FR Blattner, G Plunkett, CA Bloch, et al.
Science 277:5331 (5 Sept 1997), 1453-1462.
Times Cited: 3,327
Google Scholar: 3,625
4. Whole-genome random sequencing and assembly of Haemophilus influenzae RD
RD Fleischmann, MD Adams, O White, et al.
Science 269:5223 (28 July 1995), 496-512.
Times Cited: 3,075
Google Scholar: 2,651
5. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence
ST Cole, R Brosch, J Parkhill, et al.
Nature 393:6685 (11 June 1998), 537-544.
Times Cited: 2,858
Google Scholar: 3,163 [Note that Google places this paper at #4]
6. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana
The Arabidopsis Genome Initiative (143 authors)
Nature 408:6814 (14 Dec 2000), 796-815.
Times Cited: 2,689
Google Scholar: 1,728 (Google has real trouble tracking this "group author" name)
7. The genome sequence of Drosophila melanogaster
MD Adams, SE Celniker, RA Holt, et al.
Science 287:5461 (24 Mar 2000), 2185-2195.
Times Cited: 2,632
Google Scholar: 3,002
8. Initial sequencing and comparative analysis of the mouse genome
Mouse Genome Sequencing Consortium
Nature 420:6915 (5 Dec 2002), 520-562.
Times Cited: 2,188
Google Scholar: 1,763
9. The complete genome sequence of the gastric pathogen Helicobacter pylori
JF Tomb, O White, AR Kerlavage, et al.
Nature 388:6642 (7 Aug 1997), 539-547.
Times Cited: 1,960
Google Scholar: 1,325
10. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
CJ Bult, O White, GJ Olsen, et al.
Science 273:5278 (23 Aug 1996), 1058-1073.
Times Cited: 1,811
Google Scholar: 1,425 [Note that Google places this paper at #9]