Between-host divergence across prevalent species of gut bacteria.
(A) Schematic illustration. For a given pair of hosts (h1, h2), core-genome nucleotide divergence (d) is computed for each species (s1, s2, etc.) that is quasi-phaseable (QP) in both hosts. (B) Distribution of d across all pairs of unrelated hosts for a panel of prevalent species. Species are sorted according to their phylogenetic distances , with the number of QP hosts indicated in parentheses; species were only included if they had at least 33 QP hosts (>500 QP pairs). Symbols denote the median (dash), 1 percentile (small circle), and 0.1 percentile (large circle) of each distribution and are connected by a red line for visualization; for distributions with <103 data points, the 0.1 percentile is estimated by the second-lowest value. The shaded region denotes our ad hoc definition of "closely related" divergence, d≤2×10−4. (C) The distribution of the number of species with closely related strains in distinct hosts present in the same or different continents. The null distribution is obtained by randomly permuting hosts within each species. Although the observed values are significantly different than the null (P<10−4), the large contribution from different continents shows that closely related strains are not solely a product of geographic separation. (D) The distribution of the number of species with closely related strains for each pair of hosts. The null distribution is obtained by randomly permuting hosts independently within each species (n = 103 permutations, P≈0.9). This shows that there is no tendency for the same pairs of hosts to have more closely related strains than expected under the null distribution above.