Selected ETH Polymer Physics publications

Selected ETH Polymer Physics publications

by
abstracts hide pdf's show images
matching keyword & author

1 selected entry
Article A.N. Gorban, A.Y. Zinovyev, T.G. Popova
Seven clusters in genomic triplet distributions
In Silico Biology 3 (2003) 0039

In several recent papers new gene-detection algorithms were proposed for detecting protein-coding regions without requiring a learning dataset of already known genes. The fact that unsupervised gene-detection is possible is closely connected to the existence of a cluster structure in oligomer frequency distributions. In this paper we study the cluster structure of several genomes in the space of their triplet frequencies, using a pure data exploration strategy. Several complete genomic sequences were analyzed, using the visualization of tables of triplet frequencies in a sliding window. The distribution of 64-dimensional vectors of triplet frequencies displays a well-detectable cluster structure. The structure was found to consist of seven clusters, corresponding to protein-coding information in three possible phases in one of the two complementary strands and in the non-coding regions with high accuracy (higher than 90% on nucleotide level). Visualizing and understanding the structure allows to analyze effectively the performance of different gene-prediction tools. Since the method does not require extraction of ORFs, it can be applied even for unassembled genomes.

for LaTeX users @article{ANGorban2003-3,
author = {A. N. Gorban and A. Y. Zinovyev and T. G. Popova},
title = {Seven clusters in genomic triplet distributions},
journal = {In Silico Biology},
volume = {3},
pages = {0039},
year = {2003}, }
\bibitem{ANGorban2003-3} A.N. Gorban, A.Y. Zinovyev, T.G. Popova,
Seven clusters in genomic triplet distributions,
In Silico Biology {\bf 3} (2003) 0039.
ANGorban2003-3
A.N. Gorban, A.Y. Zinovyev, T.G. Popova
Seven clusters in genomic triplet distributions
In Silico Biology,3,2003,0039

© 06 Jul 2025 mk@mat.ethz.ch 1 out of 833 entries requested [H-factor to-date: > 0]