Selection on Target k-mer Sets
(A) The selection on single nucleotide substitutions. Our model predicts substitutions between target k-mer set k-mers (inner circle) and boundary k-mers (ring) to be under negative selection, and substitutions between boundary k-mers and background k-mers (outside the ring) or between two background k-mers not to be under selection. The right hand box shows observed and expected counts for the substitutions in one example (CACGTG in the cerevisiae-mikatae alignment), as well as the ratio between the observed and expected counts.
(B) Ratio of observed and expected conservation. The plot shows the cumulative distribution of the ratios between the observed number of conserved motif appearances and the number expected given a neutral model. Shown are the distributions for three different sets of motifs: target k-mer set (red), boundary (green), and background (blue). A full concordance with the neutral model would have resulted in a perfect lognormal distribution. The background distribution is the closest to lognormal, but still shows bias toward increased conservation due to the clustering of mutations in yeast promoters (unpublished results). Target k-mers are conserved above what is expected. This is evident from looking at the observed–expected ratio distribution, and also when comparing it to the observed–expected ratio distribution for background k-mers (KS (Kolmogorov-Smirnov test) = 0.42 for difference from background k-mers, p < 3.8e-43), and the same is true for boundary k-mers, but to a lesser degree (KS = 0.09 for difference from background k-mers, p < 3.4e-36).
(C) Ratio of observed and expected substitutions. The plot shows the cumulative distribution of the ratios between the observed number of substitutions between motifs and the number expected by a neutral model. Shown are plots for substitutions between target k-mers and boundary k-mers (red), between boundary and background k-mers (green), and between background k-mers (blue). Substitutions between target k-mers and boundary k-mers appear less than expected. Again, this is evident when looking both at the ratio distribution (more than 60% of the data points have ratio < 1) and at its difference from the distribution for substitutions between background k-mers (KS = 0.14, p < 2.3e-13). Substitutions between boundary k-mers and background k-mers are also occurring less often than expected, but to a somewhat lesser extent (KS = 0.1, p < 3.8e-169). As with the conservation data, the background distribution here is not lognormal, due to the non uniform distribution of mutations.