Assembly and binning for metagenomic samples with high intra-population diversity

Binning is to cluster similar contigs assembled from reads based on the coverage and sequences composition (e.g., kmer composition) of contigs. The resulting bin, which we also called metagenomic assembled genome (MAG), as a representative of the true collection of species population genome.

  • Assembly and Binning for high inter-population diversity samples. For many environmental samples, especially environments that are highly heterogeneous, e.g., soil, genome coverage is a big problem because the number of sequences that we got from sequencing for each species population is small when there are so many species (and they are not closely related species) and the sequencing efforts is fixed. This can be easily derived from the Lander-Waterman equation (1). Therefore, assembling for those low coverage population genomes will be less efficient. We want to use soil or activated sludge as examples below, which are well known for high inter-population diversity.
Written on December 20, 2023