Recombination between strains across hosts.

(A) Phylogenetic inconsistency between individual single nucleotide variants (SNVs) and core-genome-wide divergence for each of the species in Fig 2. The fraction of inconsistent SNVs is plotted for all 4-fold degenerate synonymous SNVs in the core genome with estimated age ≤d (S1E Text, part i). Singleton SNVs are excluded, because inconsistency can only be assessed for SNVs with ≥2 minor alleles. (B, inset) Linkage disequilibrium (LD) () as a function of distance () between pairs of 4-fold degenerate synonymous sites in the same core gene (S1F Text). Individual data points are shown for distances <100 bp, while the solid line shows the average in sliding windows of 0.2 log units. The gray line indicates the values obtained without controlling for population structure, while the blue line is restricted to the largest top-level clade (S2 Table, S1E Text, part ii). The solid black line denotes the neutral prediction from S1F Text; the only free parameters in this model are vertical and horizontal scaling factors, which have been shifted to enhance visibility. For comparison, the core-genome-wide estimate for SNVs in different genes is depicted by the dashed line and circle. (B) Summary of LD in the largest top-level clade for all species with ≥10 quasi-phaseable hosts. Species are sorted phylogenetically as in Fig 2B. For each species, the three dashes denote the value of for intragenic distances of , 99, and 2,001 bp, respectively, while the core-genome-wide values are depicted by circles. Points belonging to the same species are connected by vertical lines for visualization.