Public Library of Science
Browse
pcbi.1007527.g006.tif (138.47 kB)

Reads from two samples mapped to a computational pan-genome sequence with regions of zero coverage.

Download (138.47 kB)
figure
posted on 2019-12-09, 18:31 authored by Christine Jandrasits, Stefan Kröger, Walter Haas, Bernhard Y. Renard

Regions with no coverage such as A and C are considered to contain as many difference between the samples as found in regions with sufficient coverage. To account for these regions with insufficient coverage, the total expected difference between two samples is calculated using the SNP difference per base—derived from regions covered in both samples—and the set of common reference genome sites. This set is composed of all sites of the genome except regions such as B and D. These regions have low coverage in both samples and overlap with gaps in the whole genome alignment (blue, yellow and green) of the strains used to build the computational pan-genome (purple). This indicates that both samples are related to similar strains that both do not contain this specific genomic region, which should therefore not be considered when calculating the expected number of differences for the whole computational pan-genome.

History