Research • Transcriptional Regulation This page contains information and links that support our efforts in Discovery and Prediction of Archaeal Transcriptional Regulation. Introduction We have developed a discovery science-based approach to characterizing the molecular mechanisms of transcriptional regulation associated with response of a cell to a particular stress. This approach, named STRES (Survey of Transcriptional Response to Environmental Stress), combines a number of existing methodologies that start with DNA microarray expression profiling (transcriptomics) and allow the discovery of new regulatory transcription factors, their binding sites (operators), their effectors, and the set of genes (across the genome) that they regulate. We use the subtitle "from microarrays to molecular mechanisms". This page provides a summary of our approach; an unpublished manuscript with more detailed description can be found here.
STRES has been applied to prokaryotic microorganisms (both bacteria and archaea); most work focuses on the model archaeon Pyrococcus furiosus (Pf). The figure above summarizes the premise of our discovery approach, that the most common transcriptional regulation mechanisms in prokaryotes involve cis repression or activation. That is, a particular protein acts as a regulatory transcription factor (rTF) by binding to a DNA cognate sequence (an operator) near the promoter [the TFB-recognition sequence, BRE, and TATA element, near –26bp relative to the transcription start site (see numbering in figure)] and either activates by helping recruit the transcription complex [TATA element binding protein, TBP, transcription factor B, TFB, and RNA polymerase (the brown "football" in the figure)] or accelerating promoter clearance, or represses by blocking the formation of the transcription complex or blocking RNA polymerase from the gene template. Note that in the summary below, we use the generic term UOR (Upstream of ORF Region) to describe DNA upstream of an ORF (Open Reading Frame) regardless of whether it has a known promoter or operator.
The flowchart above provides a brief guide to the integrated application of existing methodologies in our STRES approach. Red font indicates computational activities, while yellow boxes are entry points into the flowchart. For any given prokaryotic organism, first a stress or metabolite of interest is chosen (1) defining growth conditions A (standard) and B (stress), used to generate cell extracts from which RNA pools are analyzed by microarray expression profiling (2). Selected stresses can be conditions like temperature, redox level, pH, etc., or can be the presence or absence of a selected nutrient, or the presence of a toxic compound, or many other conditions of interest. Highly up- or down-regulated genes are selected (3) and their upstream DNA (UORs containing promoter/operator regions) is amplified and used to capture (by a DNA affinity pull-down technique) sequence-specific DNA-binding proteins as prospective rTFs (4). 1d comparative gels are used to identify prospective rTFs that bind differentially from cell extract A compared to B and these are in-gel digested and identified by mass spectrometry (5). Other prospective rTFs can be predicted using de novo bioinformatics tools (9); both sets are prioritized (6) for further study. In parallel, both protein biophysics (7, 8, 12) and genetics (10) can be used to validate rTFs. Cloning, expression, and purification of target rTFs (7) leads to biophysical identification of the recognition sequence (operator) using EMSA (Electrophoretic Mobility Shift Assay), footprinting, and SELEX (Systematic Evolution of Ligands by EXponential enrichment) (8). For predicted rTFs (9), no promoter DNA is available, but a genome-wide localization of recognition sequences is possible using DIP-chip, protein-binding microarray, or genomic SELEX techniques (12), which can also be applied to experimentally discovered rTFs. Other predicted recognition sequences can be verified by footprinting (8). If the protein is validated as an rTF, these are the prospective operators. Assuming genetics is available in this organism, any prospective rTF can be functionally characterized by deletion mutagenesis (deleting the gene for the rTF in the organism's genome), followed by comparing the genotype of the mutant with wildtype, using the same DNA microarray transcriptomics approach as above (10); in other words, how does the stress-related expression profile change when this rTF is no longer produced in the mutant organism? Whether or not this is possible, in vitro transcription can be used to verify the regulatory function of a prospective rTF for a given gene/promoter (11). Any discovered operator (8, 12) can be used in a bioinformatics approach to find other related sequences in the genome (14). Predicted related sequences that occur in UORs can then be verified by footprinting (8) and in vitro transcription (11). All the information about rTFs, operators, and gene regulation, is assembled into a dynamic transcriptional regulatory network (13) for the response of this organism to this particular stress or nutrient condition. Eventually, studying many different conditions A/B should allow the identification and characterization of all regulatory transcription factors genome-wide. DNA Pull-down Protocol for Discovery Project
Database of Discovered/Predicted Transcriptional Regulators and Operators (UORdb) UORdb is a dynamically generated database capable of searching putative transcription factor binding site (TFBS) motifs on a genome-wide scale. This database aggregates gene predictions from the Refseq, GenBank, and TIGR/CMR databases for a selected genome and then extracts the regions immediately upstream of predicted ORFs to provide a search space for motif searching. These upstream-of-ORF regions, dubbed UORs, should contain all the possible promoter elements in a genome, which are more likely to contain putative TFBS motifs. Users can input a specific motif sequence, transcribed using the IUPAC nucleotide code, to a regular expression-based method capable of searching all UORs in a genome from Search by Sequence with UORdb. Alternately, searches can be conducted using the motif search methods CUBIC, MEME, and BioProspector on UORs selected by the user on the Search for Conserved Motifs with UORdb. The database also incorporates operon predictions to highlight those UORs corresponding to the first ORF in a given operon; these UORs are more likely to contain regulatory TFBS subject to regulation of transcription of all genes in its operon. Results are reported in the form of tables allowing sorting by column for different features, including motif location, gene annotation, and operon predictions for the ORF immediately downstream of the motif site. Alternately, results can be downloaded in a comma-separated (csv) format for use in spreadsheet programs. This is a growing collection of important literature references and reviews that provide background on transcription initiation and regulation. scott@chem.uga.edu |