H 1
Human Genome II
Email Kathy


The Human Genome Project II - The Private Effort
October 25, 2007


"Craig Venter, simply put, is a self-proclaimed maverick, and never thinks inside the box." Kristen Philipkoski, Wired Magazine, Feb 2001

Craig Venter's FOUR pioneering scientific discoveries. "All were achieved because of his skill in spotting the gains that could be achieved from the new DNA sequencing technologies made possible by Applied Biosystems / PE Biosystems. All three were initially met with strong resistance from the scientific community, but are now considered to be among the fastest, most accurate ways to analyze the genome of any organism" [quote from NYTimes, now archived].

Another quote: :)


(1) 1984 - 1991: New method for sequencing genes: ESTs
(2) 1992 - 1998: Whole Genome Shotgun Sequencing
(3) 1998 - 2002: The Human Genome
(4) 2002-Present: Metagenomics

 


(1) 1984 - 1991: New method for sequencing genes: ESTs (go to figures in this article)

  • J. Craig Venter began working as a molecular biologist at the NIH in 1984 (National Institute for Neurological Disorders and Stroke).

  • In June 1991, "Venter, who by then ran a large sequencing lab, went public with an iconoclastic plan: Why not focus on finding the genes--the "real goods" that both scientists and companies were clamoring for--and leave tedious sequencing until later?" He had just found a way to quickly identify portions of expressed genes, and called them ESTs or Expressed Sequence Tags.

  • What Are ESTs and How Are They Made?

    ESTS: are made by isolating mRNA from a tissue of interest (each mRNA represents one gene's worth of activity) and making a cDNA copy of each RNA using reverse transcriptase. The cDNAs are partially sequenced (200 to 500 nt) to create a unique 'tag' for each cDNA (one sequencing-gel's worth of info). Each EST represents one gene turned on in a cell at a given time, and totally avoids all the labor-intensive steps of cloning and carefully sequencing a full-length gene. These "tags" can be used to fish a gene out of a portion of chromosomal DNA by matching base pairs.

  • 5' EST = Coding sequence for a protein - tend to be conserved across species / gene family.
  • 3' EST = Non-coding /untranslated regions (UTRs) end of the cDNA - tend to be less conserved
  • What can they be used to DO? Identify new genes and map genes to human chromosomes!
  • At about this time, Venter's lab became the first test site for Applied Biosystems' first generation of automated DNA sequencers....

However, Venter's technique was NOT well received in the scientific community, (particularly by James Watson) as it was "cream-skimming" rather than fully sequencing genes, and because a biological function had not been assigned for the vast majority of these genes.

 

By 1991, Venter had identified thousands of ESTs / cDNAs - thousand of genes - and at the time, the sequence of only a few dozen genes was known. He initiated s pilot project using new automated DNA sequencers: (1991 timeline)

 

(1) Complementary DNA sequencing: expressed sequence tags and human genome project. Adams, M.D., J.M. Kelley, J.D. Gocayne, M. Dubnick, M.H. Polymeropoulos, H. Xiao, C. R. Merril, A. Wu, B. Olde, R.F. Moreno, A.R. Kerlavage, W.R. McCombie, and J.C. Venter. 1991. Science 252: 1651–1656. Note that in PubMed, only the first 10 authors are listed! : ) [PDF]

  • Automated partial DNA sequencing on more than
  • 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). Of those 600,
  • 337 represent new genes, including
  • 48 with significant similarity to genes from other organisms, such as a yeast RNA polymerase II subunit; Drosophila kinesin, Notch, and Enhancer of split; and a murine tyrosine kinase receptor.
  • 46 ESTs were mapped to chromosomes after PCR amplification
  • "This fast approach to cDNA characterization will facilitate the tagging of
    • most human genes
    • in a few years
    • at a fraction of the cost of complete genomic sequencing,
    • provide new genetic markers, and
    • serve as a resource in diverse biological research fields."

When Craig Venter and the NIH, under then-director Bernadine Healy, decided to file patent applications on these ESTs, the scientific outcry was LOUD - particularly from James Watson, then director of the HGP. A Senate hearing was called to investigate the idea of patenting human ESTs

 

Watson denounced the idea to the US Senate as "sheer lunacy" and noting that "virtually any monkey" could do what Venter's group was doing (Science, 11 October 1991, p. 184). His NIH grant using EST technology was rejected. (Go to "Controversial from the Start" for more great quotes)

 

From "The Turning Point in Genome Research": "Gradually though, it became apparent to most biologists that ESTs were indeed quite useful":

  • for the discovery of novel mRNAs = new human genes expressed in tissues,
  • for discovering novel members of gene families involved in human disease
  • for the identification of exons in vast 'deserts' of genomic / intronic DNA
  • as a plentiful source of gene-based mapping reagents with which to populate physical maps
  • a valuable, low priced, and easily accessible biological reagent"!
  • The turning point in genome research. Boguski MS. Trends Biochem Sci. 1995 Aug;20(8):295-6. [PDF]

ESTs became the standard gene finding tool; and dbEST (Nature Genetics 4:332-3;1993) was set up within Genbank to handle the vast repository of the data being generated by sequencing of any species as well as human genome sequencing.

 

How many ESTs currently exist in dbEST?
Summary by Organism - October 6, 2006 -->
Oct 19, 2007
Number of public entries: 38,953,178
--> 46,485,786


J. Craig Venter
The Genome Warrior

Michael Hunkapillar

Perkin Elmer

Applied Biosystems

Celera Genomics

Two more amazing papers:

 

(2) Sequence identification of 2,375 human brain genes. Nature. 1992 Feb 13;355(6361):632-4. Last line of the abstract: "These data represent an approximate doubling of the number of human genes identified by DNA sequencing and may represent as many as 5% of the genes in the human genome." Or - October 2004, make that 10% of the human genome![PDF]


(3) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Adams et al and Venter Nature. 1995 Sep 28;377:174.

 

To end this era: In April 1992 (largely due to the Great EST Dispute)"Watson also went to war on this issue with his boss, NIH Director Bernadine Healy. The fight cost him his job. In April 1992 he returned to Cold Spring Harbor Laboratory, muttering that no one could work with that woman (Science, 17 April 1992, p. 301). Craig Venter and his wife Claire Fraser also left NIH in when he was offered $70 million from a venture capital company to try out his gene identification strategy at a new nonprofit, The Institute for Genomic Research (TIGR)."


(2) 1992 - 1998: The Formation of TIGR and HGSI; New method for sequencing genomes: Whole Genome Shotgun Sequencing

 

Just up Rockville Pike, Venter set up two companies -

  • TIGR The Institute for Genomic Research, Rockville, Md (not-for-profit) with Claire Fraser ,
  • HGSI - Human Genome Sciences - a sister company (FOR-profit) with William Hazeltine - Inc., which would commercialize the work of TIGR.
  • Formation of the two companies was financed by Smith-Kline Beecham ($125 million) and venture capitalist Wallace Steinberg.

Under the terms of the deal, HGSI promised to fund TIGR with $70 million over ten years in exchange for marketing rights TIGR's discoveries. The two companies eventually split after the death of Wallace Steinberg due to irreconcilable differences in business strategies as well as the personalities of Venter and Hazeltine).

A major milestone in this period: In 1995 Venter, Fraser et al. at TIGR used a new technique developed by Venter, Whole Genome Shotgun Sequencing, to successfully sequence the first complete genome ever, that of Haemophilus influenzae. Venter, JC et. al.

  • "The New York Times said the genome era began with the announcement in May 25, 1995, that TIGR scientists had decoded the first genome of Haemophilus influenzae. It was the first time the entire genome of a free-living organism (in contrast to a virus that must rely on its host to survive) had been deciphered."

  • Whole-genome random sequencing and assembly of Haemophilus influenzae Science (1995 Jul 28) 269:496 [PDF] (great article about this!) = first genome sequenced!

  • PE Biosystems developed an automated sequencing machine just for this technique. Ironically, Venter had applied for an NIH grant to sequence Haemophilus but was rejected a year later by a panel of academic genome scientists who declared the shotgun method "unworkable". One month after the NIH rejection, Venter et. al. published the Science paper about the first complete sequence of a bacterium by whole genome shotgun sequencing...

  • The minimal gene complement of Mycoplasma genitalium Science. 1995 Oct 20;270(5235):397-403: They followed up later in 1995 by completing the genome of Mycoplasma genitalium, which has the smallest genome of any known free-living organism.
  • A new strategy for genome sequencing, 30 May 1996 Nature 381:364-366 [PDF] "Existing approaches to sequencing the human genome are based on the assumption that each region to be sequenced must first be mapped. But there is a simpler strategy in which any number of laboratories can cooperate."

 

Since 1995, TIGR, with Claire Frasier as CEO, has since has sequenced well over 100 whole genomes of various bacteria and pathogens. Today, a bacterial genome can be sequenced by whole genome shotgun sequencing in about....a DAY.

 

Venter was recently quoted as saying: "While the NIH is not very good at funding new ideas, once an idea is established they are extremely good," noting the profusion of the institutes' money now devoted to decoding other bacterial genomes.

 

To end this era, a press release: PERKIN-ELMER, DR. CRAIG VENTER, AND TIGR ANNOUNCE FORMATION OF NEW GENOMICS COMPANY

-- Plan to Sequence Human Genome Within Three Years -- May 9, 1998

 


(3) 1998 - 2002: The Formation of Celera Genomics; New method for large-scale sequencing projects:


In May 1998, Venter, in collaboration with Michael Hunkapiller at PE Biosystems (aka Perkin Elmer / Applied Biosystems / Applera), formed Celera Genomics Goal: sequence the entire human genome by December 31, 2001 - 2 years before the completion by the HGP, and for a mere $300 million (but its data release policy will not follow the Bermuda principles).

 

The company is massive genomics sequencing facility with a computing power ranking only slightly less than the Pentagon and a few other large supercomputer facilities. Venter calls the plan a "mutually rewarding partnership between public and private institutions." (Celera is from the Latin, meaning speed.)

  • The Plan: "Shotgun Sequencing of the Human Genome," Science. 1998 Jun 5;280(5369):1540-2.J. Craig Venter, Mark D. Adams, Granger G. Sutton, Anthony R. Kerlavage, Hamilton O. Smith and Michael Hunkapiller

  • The Response: October of 1998: New Goals for the U.S. Human Genome Project: 1998-2003 Science 1998 282: 682-689 Francis S. Collins, Ari Patrinos et al. .cience 1998 282: 682-689 Francis S. Collins, Ari Patrinos et al. Generating a "working draft" of the human genome DNA sequence by 2001. This is an ambitious goal, given that at this point (1998) less than 10% of the human genome has been sequenced...PS. "It's NOT a race"

  • May 1999: Before getting started on the human genome, Celera tried out a test run on a higher organism - Drosophila melanogaster, the fruitfly. IF it could be done in fruitflies, then maybe it couild be done with humans! Critics predicted that (a) the full sequence would not be able to be deciphered and (b) that Celera would not release the sequence to the public, neither of which proved to be the case. Just 12 weeks later, on September 1999, Celera announced completion of the Drosophila genome sequence (With Gerry Rubin et. al. of Berkeley / Howard Hughes Medical Institute), and immediately begins sequencing the Human genome. Science reports the full scoop.[Image]

  • April 6, 2000 "Celera Wins Genome Race" Celera announces completion of the rough draft of the human genome sequence, and that they were now moving onto the mouse genome! Celera agrees to wait a bit to make the official announcement with the HGP. More on the announcement: Celera Completes Sequencing of Human Genome (Fool.com) Celera cracks genetic code (Money.com)

 

  • 14 March 2000: Code Red for Biotech Stocks: President Clinton and Prime Minister Tony Blair released a joint statement that genome information "should be made freely available to scientists everywhere". While Clinton and Blair went on to reinforce “the intellectual property protection for all gene-based inventions”, the market reacted to only the first part of the statement, putting stock market investors in a panic. Biotech stocks across the board went into a 'screaming nosedive', dragging down the NASDAQ, which on that day suffered its second-biggest point loss ever! By the end of the day, investors in the biotechnology sector lost over $40 billion. Ouch (Nature Biotechnology: The #1 biggest Biotech Mistake of the last 10 years) ! Image

  • A major milestone in this period:
    June 26, 2000 Love Fest at the White House: "PRESIDENT CLINTON ANNOUNCES THE COMPLETION OF THE FIRST SURVEY OF THE ENTIRE HUMAN GENOME: Hails Public and Private Efforts Leading to This Historic Achievement: President Clinton, Tony Blair, the HGP, and Celera announce the completion of a "working draft" sequence of the human genome. After more than a decade of dreaming, planning and heroic number crunching, both groups simultaneously announced that they had deciphered essentially all the 3.1 billion biochemical "letters" of human DNA, the coded instructions for building and operating a fully functional human. The achievement provides scientists with a road map to the location and sequence of an estimated 90% of genes on every chromosome, with all HGP data freely available on the Internet. Although the draft contains gaps and errors, it provides a high-quality reference genome sequence -- with the final draft expected by 2003 or sooner." Quote
  • December 2000, Time magazine's Scientist of the year... who else but!
  • February 15 and 16, 2001: The simultaneous publication of the historic Genome Issues of Nature (Collins et. al, Human Genome Project) and Science (Venter et. al, Celera Genomics). Science was originally going to publish the work of both groups, but after it made an agreement to publish Celera's sequence with Restricted Commercial Access ($$), rather than free GenBank access required of every other scientific researcher and laboratory around the world, the HGP / Genome International Sequencing Consortium took their historic publication to Nature. So there.
July 3, 2000: Feb 15, 2001 Feb 16, 2001

 

February 15 and 16, 2001: The simultaneous publication of the historic Genome Issues of Nature (Collins et. al, Human Genome Project) and Science (Venter et. al, Celera Genomics).

 

Celera Milestones from this period:
(What, only two things done in this whole time?)
2000: The genome sequence of Drosophila melanogaster Science. 2000 Mar 24;287(5461):2185-95
2001: The sequence of the human genome.
Science. 2001 Feb 16;291(5507):1304-51


Genome Milestones from TIGR
:

1998: Treponema pallidum, Plasmodium falciparum Chr2
1999:
Deinococcus radiodurans, Thermotoga maritima
2000: Borrelia burgdorferi, Neisseria meningitidis, Vibrio cholerae
2001: Caulobacter crescentus, Streptococcus pneumoniae
2002: Mycobacterium tuberculosis, Plasmodium falciparum Chr2,10,11,14...


(4) 2002-Present: Sailing the Genome Seas: The Sorcerer II Expedition


New career: circumnavigating the globe on a quest for genomes (Environmental Genome Shotgun Sequencing of the Sargasso Sea, Science, 2 April 2004)

 

23 January 2002: So it WAS his genome...! J. Craig Venter steps down / is ousted as president of Celera Genomics, as Celera decides re-organize as a pharmaceutical business and drug discovery company. Venter announces the creation of the not-for-profit J. Craig Venter Science Foundation, initiated by $100 M of his own money. In his spare time Venter says he plans write a book "examining my own genetic code", revealing that he was one of the six supposedly anonymous donors used to generate Celera's genome sequence. [Quote]

 

15 August 2002: Next-generation genomes. The Institute for Genomic Research (TIGR) announces the formation of two non-profit organizations (plus a sequencing organization, the Joint Technology Center (JTC):

(Where are they now?)

Claire Fraser-Liggitt
Incredible publication list...and this is just since 2004!
Gene Meyers
Author of BLAST
Former VP of Celera - algorithm
Now at the way cool HHMI Janelia Farms Institute
Hamilton Smith Nobel Laureate 1978 for restriction enzymes (PNAS classic)
Director of IBEA - Venter Institute

Milestones from TIGR:
2003:
Bacillus anthracis, Enterococcus faecalis,...
2004: Desulfovibrio vulgaris, Listeria monocytogenes, Treponema denticol
2005: Campylobacter jejuni, Dehalococcoides ethenogenes

 

Milestones from J. Craig Venter Science Foundation:
2002: The genome sequence of the malaria mosquito Anopheles gambiae.Science. 2002 Oct 4;298(5591):129-49

2002: Genome sequence of the human malaria parasite Plasmodium falciparum Nature. 2002 Oct 3;419(6906):498-51


2003: The dog genome: survey sequencing and comparative analysis Science. 2003 Sep 26;301(5641):1898-903. (OK, so it was his poodle, too)

2003: Generating a synthetic genome by whole genome assembly: phiX174 bacteriophage from synthetic oligonucleotides.PNAS 2003 Dec 23;100(26):15440-5

2004: Genome sequence of the Brown Norway rat yields insights into mammalian evolution.Nature. 2004 Apr 1;428(6982):493-521.

2004: Environmental genome shotgun sequencing of the Sargasso Sea
. Science 2004 Apr 2;304(5667):66-74. 2004 Mar: 1 million new genes from the ocean...the Sorcerer II Expedition is funded by the J. Craig Venter Science Foundation, the Discovery Channel Quest Program, and the U.S. Department of Energy.

  • WGS on samples collected from seawater from the Sargasso Sea near Bermuda.
  • Sequenced "populations en masse" - a total of 1.045 billion base pairs .
  • Sequenced 1800 genomic species based on sequence relatedness,
  • Including 148 previously unknown bacterial phylotypes.
  • Identified over 1.2 million previously unknown genes represented in these samples,
  • Including more than 782 new rhodopsin-like photoreceptors!
  • WOO baby! Sign me up for the next cruise!
  • [And on a side note: Here's one of the latest from Claire and TIGR: Gill, S.R., Pop, M., Deboy, R.T., Eckburg, P.B., Turnbaugh, P.J., Samuel, B.S., Gordon, J.I., Relman, D.A., Fraser-Liggett, C.M., Nelson, K.E.  (2006) Metagenomic analysis of the human distal gut microbiome. Science 312 (5778):1355-1359. COOL article!]

Very recent...

2006 (January 10): Essential genes of a minimal bacterium
John I. Glass, Nacyra Assad-Garcia, Nina Alperovich, Shibu Yooseph, Matthew R. Lewis, Mahir Maruf, Clyde A. Hutchison, III, Hamilton O. Smith *, and J. Craig Venter PNAS (free full text) = Identification of a Minimal Gene Set for Microbial Life....

 

2006 (October): The new J. Craig Venter Institute was formed in October 2006 "through the merger of several affiliated and legacy organizations--one large multidisciplinary genomic-focused organization.."

  • The Institute for Genomic Research (TIGR) and
  • The Center for the Advancement of Genomics (TCAG),
  • The J. Craig Venter Science Foundation,
  • The Joint Technology Center (JTC)
  • The Institute for Biological Energy Alternatives (IBEA).

2007 (March): The PLoS Ocean Metagenomics Issue...waay cool. PLEASE let me go on the next expedition...


2007 (April)
: "Renowned microbiologist Fraser-Liggett to head new U.Md. Genomics Institute"

2007 (4 September): The Diploid Genome Sequence of an Individual Human Check out the manuscript in PLoS Biology! "We have generated an independently assembled diploid human genomic DNA sequence from both chromosomes of a single individual (J. Craig Venter)..."

2007 (Sept 8): Sorcerer II for sale!

2007: Another one of the Time 100 Scientists and Thinkers

 

2007: Put this on your Amazon Wish List : A Life Decoded: My Genome: My Life (2007) by J. Craig Venter

2007 (18 October): Just found this! - a brand new article and news focus: The TWO bad Boys of DNA: Interview: DNA's messengers

 


II. Sequencing the Human Genome - The Technology:

Sequencing comparison - public vs. private: Bac-to-Bac vs. Whole Genome Shotgun Sequencing (handout)

 

Shotgun Sequencing:
1) Genomic DNA
of a organism (human, Drosophila) is shredded into small fragments (2,000 and 10,000 bp on average.)

  • DNA used to make the Drosophila melanogaster libraries was provided by the Berkeley Drosophila Genome Project (BDGP).
  • DNA for Celera's human genome was provided by 6 donors of different ethnic background and gender.
    (We know now that ONE of them was none other than...)
  • DNA is blunt-end cloned into M13 plasmid libraries to allow access to sequencing primers. Both the 2,000 and the 10,000 bp plasmid libraries are sequenced, 500 bp from EACH end, generating millions of sequences.Sequencing both ends of each insert is critical for the assembling the entire chromosome.

2) Speed Matters: High-speed sequencing machines from PE / Applied Biosystems sequence these fragments. Using ~300 ABI PRISM® 3700 Automatic DNA sequencers (@ $300,000 apiece..= $90 M!), Celera can sequence over 130 million bases (130 Mbp) every 24 hours.

 

3) Assembly: Sequenced fragments are re-assembled in proper order using computers employing algorithms developed at Celera by Gene Myers and others. In partnership with Compaq Computer Corporation, Celera has created a data center for sequence assembly and database access using over 200 AlphaServer continuous assembly algorithm workstations to perform genome analysis to identify gene structure and function to assemble the human genome sequence.

 

4) Finishing: closing the gaps invariably left due to the inefficient cloning into bacterial vectors.

5) Analysis and annotation of the genome

Now, Genome Boys, lets not ARGUE: How much of Celera's assembly came from the Human Genome Project?
"Free and unfettered access" revisited...


Objectives: HGP Part 2;
1. Explain the relationship between TIGR, HGSI, and Celera.
2. Describe ESTs: what are they, how are they made, what they represent, and why are they useful in genome analysis?
3. Describe Whole Genome Shotgun Sequencing and 2 advantages and 2 disadvantages over Bac-to-Bac sequencing
4. Describe the effect the formation of Celera Genomics had on the Human Genome Project.
5. How to the Bermuda Principles relate to Celera's completion of the Human Genome sequence
6. Describe some of the Major Milestones at each 'phase' of Venter's career
7. Describe the Sorcerer II Expedition, and its major findings.

 

Some possible reasings for next week (stay tuned) :

Nutrigenomics, General Genetics and Health

Nutrigenomics, Genetic Testing

DNA Sequence Worth More Than A Thousand Dollars
Some DNA Tests Banned in South Korea
The Value of Nutrigenomic DNA Testing
Nutrigenomics Under Scrutiny in the UK
Commercially Available DNA Tests

Reliability of at home tests

Nutritional Genetics = Nutritional Genomics = Nutrigenomics
Commercially Available DNA Tests
The Value of Nutrigenomic DNA Testing
Reliability of Home DNA Tests
Free Sciona Seminar: Genes, Nutrition, and You

 

Schedule