Each gene in our DNA has a start and an endpoint. It is critical to correctly define the gene’s extremities in order to produce a functional protein. Much investigation has gone into determining what determines when, where, and at which place on the DNA a gene “starts.” But where a gene terminates is a different matter; transcription termination site selection has been considered to be influenced by downstream elements and external factors.
In their most recent work, published in the journal Cell, researchers from the Max Planck Institute of Immunobiology and Epigenetics discovered that the site of transcription starts affects the site of transcription end for the majority of our genes. This process, which predetermines mRNA end sites at the very beginning of transcription and is highly conserved across species, is essential for cell identity and functionality.
An organism’s cells all have the same DNA sequence. The assortment of genes that will be activated in a certain location at a specific time dictates the identity and function of individual cells and tissues. The messenger RNA (mRNA) molecules produced by these active genes, which are transcribed from the DNA template, will encode the proteins required for cellular activity.
Complex molecular machinery begins transcribing DNA sequences into mRNA at certain locations known as promoters. Surprisingly, most genes have several locations where transcription can begin or finish. This means that the mRNAs for each gene can differ based on the start or termination point. The expression of a single gene in several variations greatly increases the genome’s diversity and functioning. At the same time, it adds another degree of complexity to genome research.
RNA snapshots from start to finish
Researchers at the Max Planck Institute for Immunobiology and Epigenetics in Freiburg wanted to discover how many alternative start and end sites each gene utilizes, in what combination, and whether the combinations differed depending on the circumstance. “The technical challenge in answering this question is that we must read every mRNA molecule from every gene from start to finish.” “This is a massive task that has never been attempted before,” explains Valérie Hilgers, research group leader at the MPI-IE.
The researchers read each individual mRNA using modified next-generation sequencing technology. Each mRNA is divided into smaller fragments for traditional short-read sequencing, which results in the read after sequencing and amplifying the shorter fragments. Following that, a continuous sequence is created by piecing the reads together using bioinformatic algorithms.
The Hilgers collaborated with the MPI’s Deep Sequencing Facility to refine specialized long-read-sequencing technologies to obtain full-length mRNA information from the whole genome in many Drosophila tissues, including the brain. “Long-read sequencing allows for the retrieval of much longer sequencing reads than standard sequencing, which is widely used.” However, we had to refine this technique and extend the typical read length by several orders of magnitude to collect full-length mRNA information in our various model systems,” explains Carlos Alfonso-Gonzalez, the publication’s first author.
In addition to Drosophila, the Hilgers Lab used a human nervous system model in their research: cerebral organoids, which are “mini-brains” produced in a dish from induced pluripotent stem cells. Transcription endpoints were selected prior to the start of transcription.
The data collected at the full-molecule scale for each mRNA provides unique insight into the transcription of specific genes. “We discovered that rather than start sites (TSSs) and end sites (TESs) being randomly combined one to another, we discovered that sites of transcription start are frequently specifically linked to distinct sites of transcription end,” Hilgers explains.
This relationship is genuinely causal: for example, in the ovaries, artificial activation of a TSS that is normally only employed in the brain overrides the normal TES and induces the use of the brain TES. This demonstrates the importance of TSS in generating the RNA landscape unique to each tissue and hence impacting tissue identity.
Promoter dominance is the driving force behind RNA diversity, gene function, and tissue identity.
However, one occurrence stuck out. “Some TSSs exhibit unexpected dominance behavior.” They override traditional signals to terminate transcription, outcompete other TSSs, and result in the selection of unique TESs. As a result, we dubbed them dominant promoters,” Alfonso-Gonzalez explains.
Furthermore, the researchers discovered that interactions between these dominating promoters and their linked gene ends were influenced by different epigenetic fingerprints. Importantly, the findings in Drosophila brain cells could be repeated in human brain organoids, demonstrating that promoter dominance is a conserved, maybe universal, mechanism for regulating the synthesis of functional proteins and the functionality of cells.
What is the physiological significance of this innovative mechanism? The Freiburg researchers discovered that TSSs and TESs co-evolve: during millions of years of evolution between species, individual nucleotide changes in the gene start at dominant promoters were accompanied by changes at the corresponding gene end.
“We interpret this observation as a push through evolution to sustain the interaction between both extremities of the gene, implying that these couplings are important for animal fitness,” says Valérie Hilgers.