Required data
Overview
Teaching: 0 min
Exercises: 10 minQuestions
Objectives
prepare data for today
For today we will need the following datasets from the previous days:
- binned sequences (day 1)
- predicted protein sequences for these bins (day 3)
- protein annotations (day 3)
Please check if you can locate these files. If didn’t get that far on the previous days and as a backup we also provide the files for you to download:
# binned sequences
$ wget https://github.com/MGXlab/Viromics-Workshop-MGX/raw/gh-pages/data/day_4/bins_fasta.zip
# predicted protein sequences
$ wget https://raw.githubusercontent.com/MGXlab/Viromics-Workshop-MGX/gh-pages/data/day_4/proteins_bins.faa
# protein annotations
$ wget https://raw.githubusercontent.com/MGXlab/Viromics-Workshop-MGX/gh-pages/data/day_4/proteins_bins_annotation.txt
# MMSeqs2 clusters
$ wget https://raw.githubusercontent.com/MGXlab/Viromics-Workshop-MGX/gh-pages/data/refseq_clusters.tsv
# Vcontact2 output
$ wget https://raw.githubusercontent.com/MGXlab/Viromics-Workshop-MGX/gh-pages/data/day_4/c1.ntw
$ wget https://raw.githubusercontent.com/MGXlab/Viromics-Workshop-MGX/gh-pages/data/day_4/genome_by_genome_overview.csv
Key Points