duminică, 10 februarie 2013

Decoding gene promoters via DNA patterns. The Second World War and DNA sequence analysis, what they have in common ?

Fig 1. DNA patterns for each promoter class.

Most often, DNA analysis was made in a linear form. For nearly 25 years, and especially in the last 10 years, the most used method for analyzing DNA sequences was performed using sequence alignment algorithms.
This method is used specifically for phylogenetic analysis (the relationship between species at the DNA sequence level). In the last 5-6 years it became more obvious that these algorithms are limited and may not be used for advancing the field of genetics or computational biology.

The interest of the scientific community for gene promoters is rooted in the ability of the former to participate in the control of gene expression.

A few months ago, at BMC Genomics journal an interesting article has appeared on this topic. The paper presents a fundamental study, describing a total of 10 promoter classes in eukaryotes (Fig 1) and their proportion in human tissues.


What is even more interesting is that the method used is unprecedented in computational biology/genetics field and revives a cryptographic method (Index of coincidence) from the 1920', used for decoding messages between enemies in the Second World War. This shows that maybe we should go back to the previous algorithms used in cryptography, because they can reserve many beautiful surprises in other types of analysis.

Using cryptography, Gagniuc P. and Ionescu-Tirgoviste C. (2012) managed to make an informational homology between promoters of genes that led them to the detection of 10 separate classes. This means that although some sequences may differ in the nucleotide arrangement, they can accommodate the same genetic information (this can often be seen in HIV (Human immunodeficiency virus) sequences).

They further describe how the shape of a promoter pattern (judging by the angle of the pattern and the density of points in the pattern) seems to show a maximum frequency (a maximum scale) of mRNA synthesis conferred by the DNA sequence (Fig 2).


Fig 2. Distribution across promoters of orthologous genes.

They have found that different tissues employ distinct classes of promoters (eg one class is present in all tissues but absent in spleen).


Maxwell P. Lee et al. (2005) published a paper describing a new class of promoters called "ATG deserts" class. The new study by Gagniuc P. and Ionescu-Tirgoviste C. confirms the existence of this promoter class as one of the 10 classes found, renamed "CG based" class (Fig 1A).


1. Gagniuc P, Ionescu-Tirgoviste C. Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters. BMC Genomics. 2012, 13:512.

2. Maxwell P. Lee et al. ATG deserts define a novel core promoter subclass. Genome Res. 2005, 15(9): 1189–1197.