Science in Society Archive

E. coli 0157:H7 and Genetic Engineering

The food-borne pathogen E. coliO157:H7 has been sequenced. Dr. Mae-Wan Ho asks whether genetic engineering might have contributed towards its emergence.

E. coli 0157:H7 is a food-borne pathogenic strain of bacteria that emerged in the United States in the 1980s, and is now responsible for some 75 000 cases of infection annually in that country. It has also been responsible for major outbreaks in Scotland, Japan and elsewhere since.

The first outbreak was associated with infected hamburgers in 1982. The strain responsible, EDL933, isolated from ground beef in Michigan, has been studied as a reference strain. The complete sequence of its genome has recently been determined (1,2), and its closest relative turns out to be the laboratory strain K-12 MG1655. E. coli O157 has acquired shiga toxin genes (from the bacteria Shigella) and plasmids containing virulence factors by horizontal gene transfer.

The two strains, O157 and K12 share a common backbone with almost identical gene order. The 4.1 million base pairs in the genomes can be lined up side by side along their lengths except at one point where the O157 genome is reversed. Inversions around the starting point of replication are common in bacterial genome evolution.

Scattered roughly evenly within each genome are hundreds of sections of DNA that are unique to one or the other: 1.34 megabases coding for 1,387 genes in the O strain, the O islands; and 0.53 megabases coding for 528 genes in the K strain, the K islands. Much of the DNA in O and K islands has been acquired by horizontal gene transfer.

There are 106 O and K islands present at the same locations in the backbone. Only a subset of islands is associated with elements likely to be autonomously mobile. Most islands are horizontal transfers of relatively recent origin from a donor species with a different intrinsic base composition.

Of the 1 387 acquired genes in O157, 40% (561) can be assigned a function, another 338 genes of unknown function lie within clusters that are probably remnants of phage (bacterial virus) genomes. About 33% (59/177) of the O islands contain only genes of unknown function. Many classified proteins are related to proteins from other E. coli strains or related enterobacteria known to be associated with virulence, and include alternative metabolic capacities, prophages (integrated genomes of bacterial viruses) and other new functions.

There are 3574 protein-coding regions in the backbone, and the average nucleotide identity between O157 and K12 is high: 98.5%. Of these regions, 89% are of equal length and 25% encode identical proteins. Some chromosomal regions are more different (hypervariable) than the average, but they encode a comparable set of proteins at the same relative chromosomal positions. In the most extreme case (YadC), the proteins from the two strains exhibit only 34% identity. Four such loci encode known or putative biosynthesis operons of fimbrial proteins used in attachment to host cells. Another code for a restriction/ modification system that breaks down foreign DNA.

From the extent of genetic differences between the strains, the authors estimate that E.coli O157:H7 and K12 shared a common ancestor about 4.5 million years ago. This estimate is highly questionable, however, as are all similar estimates.

Such estimates are based on the so-called molecular clock hypothesis, which assumes a steady, neutral (nonadaptive) random accumulation of genetic difference per unit time. One assumption is one percent per million years. But this is notoriously unreliable, as we now know that mutational changes vary directly in proportion to the number of DNA replication cycles. So, organisms with short life-cycles accumulate changes faster than those with long life-cycles. There are also many fluid genome processes that can rapidly change genomes. These include hypermutation, or mutations rates that are up to a million times faster than usual, recombination, and horizontal gene transfer. Horizontal gene transfer is well documented in all bacteria including E. coli, as is clear from the genome sequence data. Recombination too, appears to be an important mechanism in the evolution of the enterobacteria to which E. coli belongs. And hypermutation has been identified in several regions in the E. coli chromosome.

Another factor that would give an overestimate of divergence time is artificial genetic engineering. Artificial genetic engineering involves rampant recombination and transfer of genes across divergent species barriers. Now that sequence data are becoming widely available, one ought to be asking the serious question as to whether genetic engineering might have contributed towards the emergence of E. coliO157 some twenty years ago (3).


  1. Perna NT et al. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 2001:409: 529-33.
  2. Eisen JA Gastrogenomics. Nature 2001: 409, 462-3.
  3. See Ho MW, Traavik T, Olsvik R, Tappeser B, Howard V, von Weizsacker C and McGavin G. Gene Technology and Gene Ecology of Infectious Diseases. Microbial Ecology in Health and Disease 1998: 10: 33-59; also 'Genetic engineering superviruses' by Mae-Wan Ho, I-SIS Report March 2001

Article first published 21/03/01

Got something to say about this page? Comment

Comment on this article

Comments may be published. All comments are moderated. Name and email details are required.

Email address:
Your comments:
Anti spam question:
How many legs on a tripod?