Viral Metagenomics 2021: Glossary

Key Points

  • Metagenomics is the culture-independent study of the collection of genomes from different microorganisms present in a complex sample.

  • We call dark matter to the sequences that don’t match to any other known sequence in the databases.

The dataset
  • FASTA format does not contain sequencing quality information.

  • Next Generation Sequencing data is made of short sequences.

  • With sequence assembly we get longer, more meaningful genomic fragments from short sequencing reads.

  • In a cross-assembly, reads coming from the same species in different samples are merged into the same scaffold.

  • Mapping reads back to the cross-assembly we can know which scaffolds/species are present in the samples.

Abundance profiles
  • Generally, the higher the number of reads aligned to a scaffold, the higher its abundance in a sample.

Profiles correlation
  • Adding more samples with similar species diversity but different abundances increases the binning resolution.

  • If you got this far, you are a pro.
