Selection on Target k-mer Sets

2008-01-11T00:01:03Z (GMT) by Daniela Raijman Ron Shamir Amos Tanay
<div><p>(A) The selection on single nucleotide substitutions. Our model predicts substitutions between target k-mer set k-mers (inner circle) and boundary k-mers (ring) to be under negative selection, and substitutions between boundary k-mers and background k-mers (outside the ring) or between two background k-mers not to be under selection. The right hand box shows observed and expected counts for the substitutions in one example (CACGTG in the <i>cerevisiae-mikatae</i> alignment), as well as the ratio between the observed and expected counts.</p><p>(B) Ratio of observed and expected conservation. The plot shows the cumulative distribution of the ratios between the observed number of conserved motif appearances and the number expected given a neutral model. Shown are the distributions for three different sets of motifs: target k-mer set (red), boundary (green), and background (blue). A full concordance with the neutral model would have resulted in a perfect lognormal distribution. The background distribution is the closest to lognormal, but still shows bias toward increased conservation due to the clustering of mutations in yeast promoters (unpublished results). Target k-mers are conserved above what is expected. This is evident from looking at the observed–expected ratio distribution, and also when comparing it to the observed–expected ratio distribution for background k-mers (KS (Kolmogorov-Smirnov test) = 0.42 for difference from background k-mers, <i>p</i> < 3.8e-43), and the same is true for boundary k-mers, but to a lesser degree (KS = 0.09 for difference from background k-mers, <i>p</i> < 3.4e-36).</p><p>(C) Ratio of observed and expected substitutions. The plot shows the cumulative distribution of the ratios between the observed number of substitutions between motifs and the number expected by a neutral model. Shown are plots for substitutions between target k-mers and boundary k-mers (red), between boundary and background k-mers (green), and between background k-mers (blue). Substitutions between target k-mers and boundary k-mers appear less than expected. Again, this is evident when looking both at the ratio distribution (more than 60% of the data points have ratio < 1) and at its difference from the distribution for substitutions between background k-mers (KS = 0.14, <i>p</i> < 2.3e-13). Substitutions between boundary k-mers and background k-mers are also occurring less often than expected, but to a somewhat lesser extent (KS = 0.1, <i>p</i> < 3.8e-169). As with the conservation data, the background distribution here is not lognormal, due to the non uniform distribution of mutations.</p></div>

Keyword(s)

License

CC BY 4.0