Public Library of Science
Browse
Figure_10.tif (347.92 kB)

Frequent Triplets – Theory and simulation.

Download (0 kB)
figure
posted on 2013-11-21, 03:09 authored by Erez Persi, David Horn

Expected values of Frequent Triplets (FTs) in random proteins as function of sequence length. Length range is up to 35,000 amino-acids, approximately the length of the longest proteins found among the proteomes of the 94 species studied (TITIN in human, and beta-helical in Chlorobium). A) Blue curve is the theoretical expected value given by the Bernoulli probability, for n = 5. Dark circles are the corresponding results of a numerical search of triplets showing perfect match to the theoretical estimation. Red circles are the numerical results for restrictive FTs defined by n = 5 and M = 2000. Inset: same data is shown up to L = 8000 for clarity. Additional black curves represent the theoretical estimation for n = 4–6. B) P-value for FT misidentification as function of length on log-scale. C) Length distribution of human proteins showing log-normal characteristics. Length of CO proteins is right-shifted (see also Text S1 -section 3, figure S6d). Further analysis based on a human “unigram” reference model is provided in Text S1 - sections 1 and 2, where the few very long proteins are analyzed in detail.

History

Usage metrics

    PLOS Computational Biology

    Keywords

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC