Welcome
|
|
Listen to Assembly lecture
|
|
The dataset
|
Metagenomics is the culture-independent study of the collection of genomes from different microorganisms present in a complex sample.
We call dark matter to the sequences that don’t match to any other known sequence in the databases.
FASTA format does not contain sequencing quality information.
Next Generation Sequencing data is made of short sequences.
|
Metavirome assembly
|
With sequence assembly we get longer, more meaningful genomic fragments from short sequencing reads.
In a cross-assembly, reads coming from the same species in different samples are merged into the same contig.
|
Lunch break
|
|
Visualizing the assembly graph
|
|
Assessing assemblies quality
|
|
Binning contigs
|
|
Setup and run DeepVirFinder
|
Different tools have different environments. Keeping them in separate environments makes runs reproducible and prevents a variety of problems.
We are running DeepVirFinder during the lecture, because the run takes ~50 minutes.
|
Listen to Virus Detection lecture
|
|
Setup and run PPR-Meta
|
|
Setup and Run VirFinder
|
|
Comparing Virus Identification Tools
|
Despite out data being almost exclusively viral, the tools identify max. 2/3 of the sequences as viral.
Making the decision boundrary less strict will include more sequences, which might seem like an advantage in this case. However, if we were working with a mixed metagenomic dataset, this would mean that we would falsely annotate microbial sequences as viral.
|
Setup and run VirSorter
|
VirSorter is a homology-based tool.
Because Virsorter has to compare each sequence to a database, it is slower that many other tools.
|
Lunch Break
|
Nutrition is important
Guten Appetit!
|
Compare Results for four tools
|
The different tools often agree, but often disagree on whether a contig is viral. This is to an extent affected by the length of the contig.
Even for current state-of-the-art tools, getting a high sensitivity is hard.
Some tools make more similar predictions than others.
|
Prophage Prediction
|
Features such as GC-Content changes, or sudden enrichment in viral genes indicate the presence of a prophage in a contig/genome.
Results of a tool are sometimes distributed across multiple folders. Make sure to check all output files so that you can get the max out of your experiment.
|
Listen to Benchmarking lecture
|
|
Introduction and setting up
|
|
Gene prediction
|
|
Prodigal modes
|
|
Functional annotation
|
|
Lunch break
|
|
Clustering proteins
|
|
Integrating annotations
|
|
Inspecting the MSAs
|
|
Install R package
|
|
Clustering and taxonomic classification of uncultivated viral genomes
|
|
Required data
|
|
Homology based search
|
Sometimes you might find a good hit, for example for the crAssphage bins, or for many of the well-described viruses infecting humans. In other cases, we need more sophisticated search strategies to assign a given viral sequence to a previously described taxon.
|
Clustering viral sequences based on shared proteins
|
|
Installing and running Vcontact2
|
|
Lunch break
|
|
Check Vcontact2
|
|
Phylogeny based on marker genes
|
|
Gene sharing networks with Vcontact2
|
|
Assessing viral contigs completeness and contamination
|
|
Track alpha and beta diversity dynamics of viral/microbial communities
|
|
Rstudio set up from Conda environment, package installment and data download
|
|
Exploring data
|
Plot absolute abundance, relative abundance and centered-log ratio abundance plots to see the difference of different abundance measures.
Picking thresholds for filtering can be tricky. Play with the thresholds to filter data based on your questions and your data.
|
Break
|
|
Alpha diversity
|
Different alpha diversity indices emphasize on different aspects of alpha diversity. Make choices based on your questions and interpret the results based on the methods you choosed.
Hill numbers are linear and intuitive while original alpha diversicy index values are not.
|
Beta diversity
|
Understand the similairties and dissimilarites of different beta diversity/distance matrices.
Aitchison distance is the distance between samples or features within simplex space. We use Aitchison distance in compositional data analysis.
|
Differential abundance
|
|
Bia Introduction
|
|