Viral Metagenomics 2021: Glossary

Key Points

Introduction
  • Metagenomics is the culture-independent study of the collection of genomes from different microorganisms present in a complex sample.

  • We call dark matter to the sequences that don’t match to any other known sequence in the databases.

The dataset
  • FASTA format does not contain sequencing quality information.

  • Next Generation Sequencing data is made of short sequences.

Cross-assembly
  • With sequence assembly we get longer, more meaningful genomic fragments from short sequencing reads.

  • In a cross-assembly, reads coming from the same species in different samples are merged into the same scaffold.

Mapping
  • Mapping reads back to the cross-assembly we can know which scaffolds/species are present in the samples.

Abundance profiles
  • Generally, the higher the number of reads aligned to a scaffold, the higher its abundance in a sample.

Profiles correlation
  • Adding more samples with similar species diversity but different abundances increases the binning resolution.

Re-assembly
  • If you got this far, you are a pro.

Glossary

FIXME