Assessing viral contigs completeness and contamination

Overview

Teaching: 0 min
Exercises: 30 min
Questions
  • Which signals can we use to assess how complete and/or contaminated is our viral contig?

Objectives
# Install checkv in the environment
$ conda install -c bioconda checkv

# Download database
$ mkdir checkv_database
$ checkv download_database checkv_database

Run checkv end_to_end for the bins using a for loop in Bash. In the folder with your bins in FASTA format (bin_00.fasta, bin_01.fasta…):

# run checkv for each bin, sequentially
$ mkdir checkv_results
$ for bin in b*.fasta; do checkv end_to_end -d <PATH_TO_DATABASE> -t 7 $bin checkv_results/${bin%.fasta} ; done

While it is running, have a look at the quality_summary.tsv files of the bins already finished.

Key Points