Of course, the best way to understand what the hullabaloo is all about is to read the papers yourself. As a consequence, we decided to dedicate a lab meeting to dipping our toes into the flood of papers coming out from the project. After strolling around the pretty threads interface of the ENCODE explorer, my personal choice fell on Djebali et al.'s Landscape of transcription in human cells. Admittedly, my own lack of appreciation of molecular cell biology made me a tad sceptical of its entertainment value. However, after reading the abstract, which promised a "re-definition of the concept of a gene", I found my enthusiasm growing.
At the heart of the author's approach is the sequencing of RNA from different kinds of sub-cellular locations (nucleus, cytosol etc) in 15 different cell lines. This approach resulted in a genome wide catalogue of the identify and character of RNAs. They report several observations, of those I think four were particularly interesting.
First, it has long been known that a given gene may produce several different forms of the same protein and that there are more transcripts than genes. Isoforms, as these different stein forms are called, may be due to SNP differences or variation in start locations or splicing. Here, the authors show that the number of isoforms that a gene has is not linearly correlated with the number that are expressed. Instead, the correlation plateaus around 10-12 expressed isoforms in a given cell.
Second, they revisit the question of RNA editing, that is the extent to which a transcript can change after transcription. This apparently made a bit of a splash last year when Li et al. published a paper in Science that argued that this was very common in humans. Djebali et al. end up siding with the number of researchers that attributed to Li et al.'s higher number to a failure to apply a decent false discovery rate.
Finally, they show that 74.7% of the human genome is transcribed as at least a primary transcript (62.1% as processed transcript). A high number indeed (probably higher than what I would have guessed) but even more interesting is that no type of cell expressed more than 56.7% of all possible transcript. In other words, expression is highly cell specific. Moreover, they also found that the intergenic regions often overlap and that this overlap often includes loci that traditionally would have been considered to be distinct genes.
The last piece on what constitutes a gene is particularly interesting for those of use engaged in population genomics. Our annotations, and hence our inferences, depend on our definition. However, our theoretical framework was established decades before the double helix. Moreover, many conceptually influential evolutionary biologists, such as George Williams and Richard Dawkins, adopted a rather liberal definition of a "gene" that more molecular inclined workers found unsatisfactory. To what extent changing the definition of a gene changes our thinking remains to be seen.
Song of the day:
I think you meant that "o type of cell expressed more than 56.7% of [all possible] transcript[s]".
ReplyDeleteAlso, you accidentally a sentence there at the end.
Fixed!
Delete