Frequent Triplets – Theory and simulation. Erez Persi David Horn 10.1371/journal.pcbi.1003346.g010 https://plos.figshare.com/articles/figure/_Frequent_Triplets_8211_Theory_and_simulation_/859129 <p>Expected values of Frequent Triplets (FTs) in random proteins as function of sequence length. Length range is up to 35,000 amino-acids, approximately the length of the longest proteins found among the proteomes of the 94 species studied (TITIN in human, and beta-helical in <i>Chlorobium</i>). A) Blue curve is the theoretical expected value given by the Bernoulli probability, for <i>n = 5</i>. Dark circles are the corresponding results of a numerical search of triplets showing perfect match to the theoretical estimation. Red circles are the numerical results for restrictive FTs defined by <i>n = 5</i> and <i>M = 2000</i>. Inset: same data is shown up to <i>L = 8000</i> for clarity. Additional black curves represent the theoretical estimation for <i>n = 4–6</i>. B) <i>P</i>-value for FT misidentification as function of length on log-scale. C) Length distribution of human proteins showing log-normal characteristics. Length of CO proteins is right-shifted (see also <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi.1003346.s026" target="_blank">Text S1</a> -section 3, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi.1003346.s006" target="_blank">figure S6d</a>). Further analysis based on a human “unigram” reference model is provided in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi.1003346.s026" target="_blank">Text S1</a> - sections 1 and 2, where the few very long proteins are analyzed in detail.</p> 2013-11-21 03:09:57 triplets