Course introduction
Overview
Teaching: 60 min
Exercises: 0 minObjectives
Understand course scope and evaluation criteria
Start and plan an effective lab book
login into Draco
Viromics
Metagenomics of viruses, often referred to as Viromics, is the study of viral communities in a sample or environment through the direct sequencing and analysis of their genetic material. This avoids the need to isolate or culture the viruses, preventing the bias associated with isolation, and allowing the complete viral community to be analyzed. This allows for the identification and characterization of both known and novel viruses within a given ecosystem. In this module, you will learn how to analyse viromics data with computational approaches.
Data
We will be using viromics sequencing data from Dr. Xiu Jia, a Postdoc in the VEO Group. This is data was generated by sequencing samples from the Unterwarnow, an estuary near Rostock. The samples were filtered through a 0.22um filter (commonly used to filter viral particles) and then sequenced on a Oxford Nanopore Technologies Flow Cell (long read sequencing).
Theory
The theoretical parts of the course are covered in the mornings. This includes reading relevant papers, watching video lectures, and discussion of the concepts and tools. Please write down any questions and discussion points about the theory in your Lab Notebook (see below).
Hands-on
The practical parts of the course are covered in the afternoons. You will use different bioinformatics tools, compare and interpret the results, and make conclusions about the sampled viruses. We will be available to help and guide you during this time. Please document how you performed the analyses, and write down any questions and discussion points in your Lab Notebook (see below). Solutions for each step are provided. If necessary, you can use them to move to the next steps.
Homework
For some of the tutorials, we will assign some homework (in lieu of the missed day on Friday 20 September). This is usually a visualization exercise. It will be up to you to complete this in your own time and to include it in your Lab Notebook and presentations.
Presentation
You will give a final presentation on Friday 27 September. This is where you can show off your understanding of the material and share what interesting things you have identified in the data.
Final evaluation
Your final grade is based on an evaluation of three factors.
- Your professional performance (active participation, asking questions, helping others): 30%
- Your Lab Notebook (approach to address hands-on questions/exercises, tidiness/reproducibility, notes/questions/discussion points): 30%
- The presentation of your final project: 40%
Lab Notebook
Documenting your work is crucial in Computational Biology/Bioinformatics. This way you can make sure your work is reproducible, document your commands to later find back what you did, prepare a first draft of text for other documents (e.g. a Methods section in a report or paper), and collect information which you can send to colleagues. Make sure that it is tidy, commented, and clearly written, so that others can easily understand it, including your future self.
You are required to write a Lab Notebook in markdown for this module, which will count as part of your evaluation. We recommend that you start a GitHub repository for the course and write your lab book there.
More details on your lab notebook
Access to Draco
Draco is a high-performance cluster created and maintained by the Universitätsrechenzentrum. It is available for members of Thuringian Universities. To log in, you can use ssh (replace <fsuid> by your FSU login):
ssh <fsuid>@login1.draco.uni-jena.de
Terminal or ssh client
More details on access to Draco
Submitting jobs on Draco
When you login to Draco, you are on the “login node” and this is not the best place to run any programs or heavier scripts. You MUST be on a “compute node” to run scripts and programs
You can either request for resources from a compute node and run programs interactively or submit a job to the job scheduler, which then sends it to a compute node to complete.
See here for what slurm architecture looks like
You can either request resources from a node for an “interactive” shell or you can submit via sbatch
To get resources - see here
To submit a job, you have to make a script my_slurm_script.sh
and then submit it as sbatch my_slurm_script.sh
. Detailed information on creating these scripts, including descriptions can be found in the “extras” here
Key Points
In this course, you will analyze viral sequence data
You need to keep a lab notebook
You need to be able to access the HPC Draco