Escherichia coli
Transcription Factor Binding Sites

This site presents transcription factor binding site predictions in the E. coli genome made by cross-species comparison (i.e. phylogenetic footprinting) using a Gibbs sampling algorithm for motif finding. Predictions were made upstream of 2086 E. coli genes; that is, all genes for which: 1) there was at least 50 bp upstream intergenic sequence, and 2) a probable ortholog was identified among the species used for comparison. The gene names and annotations used are those from the E. coli genome GenBank entry (U00096). Where available, correspondence of our predictions with experimentally verified binding sites or known repeats is presented. Regulons have been predicted by clustering these phylogenetic footprinted motifs.

This page presents predictions described in: L.A. McCue, W. Thompson, C.S. Carmack, and C.E. Lawrence. (2002) Factors influencing the identification of transcription factor binding sites by cross-species comparison. Genome Research, 12: 1523-1532 ( Abstract ) .

Click here to access predictions described in: L.A. McCue, W. Thompson, C.S. Carmack, M.P. Ryan, J.S. Liu, V. Derbyshire, and C.E. Lawrence. (2001) Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nuc. Acids Res. 29:774-782.

Transcription factor binding site predictions for each of 2086 E. coli genes are organized into separate pages for each gene. The search function below allows you to search by gene name (only the names used in the GenBank genome entry). Alternatively, a listing of the genes is presented in alphabetical order in sections by the first letter of the gene name. A full list of all 2086 genes is available, however be aware that this requires some time to load!

The prediction page for each gene contains the following information.
1) The genome coordinates, orientation, gene type (CDS or RNA), gene name, EcoGene link, and annotation for the target gene, and also the genes immediately flanking the target gene.
2) The genome coordinates, source of information, transcription factor name, and annotation for known TF-binding sites that regulate the target gene.
3) A listing of the species for which a probable ortholog of the target gene was identified.
4) The predictions made for the target gene, separated into those predictions that were above our cutoff for statistical significance (p < 0.05), and those that were below the cutoff. For each prediction the following information is given: the Average MAP (maximum a posteriori probability) value, the genome coordinates (an "R" indicates that the sequence given is the reverse complement of that in the GenBank entry at those coordinates), the site sequence (plus 5 bp flanking), the transcription factor site type and overlap in bp (if site overlaps a known transcription factor binding site), the repeat name (if site overlaps a known repeat), a sequence logo if the motif model, a sequence logo of the aligned sites, a sequence alignment of the sites. For genes that are divergently transcribed, predictions were made for each gene separately and overlap with known sites is reported for all regulatory sites in that promoter region (sites that regulate the target gene and sites that regulate the divergently transcribed gene).
5) The critical MAP values for the target gene calculated for p = 0.01, p = 0.05, p = 0.1, and p = 0.2.

The Study set: a detailed study of 166 genes with experimentally verified transcription factor binding sites was performed and described in the Genome Research paper. The "Study set genes" link displays a table listing these genes, their known sites, and links to the predictions for each gene. The "Study set sorted by known TF-binding sites" link displays a table that is sorted according to the 48 transcription factors that regulate the 166 study set genes.

Gene Index

[ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z | Full Listing ]

Study set genes
Study set sorted by known TF-binding sites

Orthologous gene lists links

Haemophilus influenzae
Pseudomonas aeruginosa
Shewanella putrefaciens
Salmonella typhi
Vibrio cholerae
Yersinia pestis

The E.coli genome GenBank entry used for this analysis is available at our ftp site.
If would you like an electronic copy of our entire data set,
please send an email request to

Bayesian Bioinformatics Program at the Biometrics Laboratory of Wadsworth Center, 2002.