Course introduction

Overview

Teaching: 60 min
Exercises: 0 min
Objectives
  • Understand course scope and evaluation criteria

  • Start and plan an effective lab book

  • login into Draco

Viromics

Metagenomics of viruses, often referred to as Viromics, is the study of viral communities in a sample or environment through the direct sequencing and analysis of their genetic material. This avoids the need to isolate or culture the viruses, preventing the bias associated with isolation, and allowing the complete viral community to be analyzed. This allows for the identification and characterization of both known and novel viruses within a given ecosystem. In this module, you will learn how to analyse viromics data with computational approaches.

Data

We will be using viromics sequencing data from Dr. Xiu Jia, a Postdoc in the VEO Group. This is data was generated by sequencing samples from the Unterwarnow, an estuary near Rostock. The samples were filtered through a 0.22um filter (commonly used to filter viral particles) and then sequenced on a Oxford Nanopore Technologies Flow Cell (long read sequencing).

Theory

The theoretical parts of the course are covered in the mornings. This includes reading relevant papers, watching video lectures, and discussion of the concepts and tools. Please write down any questions and discussion points about the theory in your Lab Notebook (see below).

Hands-on

The practical parts of the course are covered in the afternoons. You will use different bioinformatics tools, compare and interpret the results, and make conclusions about the sampled viruses. We will be available to help and guide you during this time. Please document how you performed the analyses, and write down any questions and discussion points in your Lab Notebook (see below). Solutions for each step are provided. If necessary, you can use them to move to the next steps.

Homework

For some of the tutorials, we will assign some homework (in lieu of the missed day on Friday 20 September). This is usually a visualization exercise. It will be up to you to complete this in your own time and to include it in your Lab Notebook and presentations.

Presentation

You will give a final presentation on Friday 27 September. This is where you can show off your understanding of the material and share what interesting things you have identified in the data.

Final evaluation

Your final grade is based on an evaluation of three factors.

Lab Notebook

Documenting your work is crucial in Computational Biology/Bioinformatics. This way you can make sure your work is reproducible, document your commands to later find back what you did, prepare a first draft of text for other documents (e.g. a Methods section in a report or paper), and collect information which you can send to colleagues. Make sure that it is tidy, commented, and clearly written, so that others can easily understand it, including your future self.

You are required to write a Lab Notebook in markdown for this module, which will count as part of your evaluation. We recommend that you start a GitHub repository for the course and write your lab book there.

More details on your lab notebook

Access to Draco

Draco is a high-performance cluster created and maintained by the Universitätsrechenzentrum. It is available for members of Thuringian Universities. To log in, you can use ssh (replace <fsuid> by your FSU login):

ssh <fsuid>@login1.draco.uni-jena.de

Terminal or ssh client

More details on access to Draco

Submitting jobs on Draco

When you login to Draco, you are on the “login node” and this is not the best place to run any programs or heavier scripts. You MUST be on a “compute node” to run scripts and programs

You can either request for resources from a compute node and run programs interactively or submit a job to the job scheduler, which then sends it to a compute node to complete.

See here for what slurm architecture looks like

You can either request resources from a node for an “interactive” shell or you can submit via sbatch

To get resources - see here

To submit a job, you have to make a script my_slurm_script.sh and then submit it as sbatch my_slurm_script.sh . Detailed information on creating these scripts, including descriptions can be found in the “extras” here

Example script here

Key Points

  • In this course, you will analyze viral sequence data

  • You need to keep a lab notebook

  • You need to be able to access the HPC Draco