Selected ETH Polymer Physics publications

by
abstracts hide pdf's show images
matching keyword & author

1 selected entry
Article   A.N. Gorban, A.Y. Zinovyev, T.G. Popova
Seven clusters in genomic triplet distributions
In Silico Biology 3 (2003) 0039
In several recent papers new gene-detection algorithms were proposed for detecting protein-coding regions without requiring a learning dataset of already known genes. The fact that unsupervised gene-detection is possible is closely connected to the existence of a cluster structure in oligomer frequency distributions. In this paper we study the cluster structure of several genomes in the space of their triplet frequencies, using a pure data exploration strategy. Several complete genomic sequences were analyzed, using the visualization of tables of triplet frequencies in a sliding window. The distribution of 64-dimensional vectors of triplet frequencies displays a well-detectable cluster structure. The structure was found to consist of seven clusters, corresponding to protein-coding information in three possible phases in one of the two complementary strands and in the non-coding regions with high accuracy (higher than 90% on nucleotide level). Visualizing and understanding the structure allows to analyze effectively the performance of different gene-prediction tools. Since the method does not require extraction of ORFs, it can be applied even for unassembled genomes.


for LaTeX users
@article{ANGorban2003-3,
 author = {A. N. Gorban and A. Y. Zinovyev and T. G. Popova},
 title = {Seven clusters in genomic triplet distributions},
 journal = {In Silico Biology},
 volume = {3},
 pages = {0039},
 year = {2003}
}

\bibitem{ANGorban2003-3} A.N. Gorban, A.Y. Zinovyev, T.G. Popova,
Seven clusters in genomic triplet distributions,
In Silico Biology {\bf 3} (2003) 0039.

ANGorban2003-3
A.N. Gorban, A.Y. Zinovyev, T.G. Popova
Seven clusters in genomic triplet distributions
In Silico Biology,3,2003,0039


© 03 May 2024 mk@mat.ethz.ch      1 out of 810 entries requested [H-factor to-date: > 0]