Visualizing viral taxonomy
Overview
Teaching: 0 min
Exercises: 90 minObjectives
Plot the network of closely related viruses created by vConTACT3
This section is suggested as homework.
Today, you taxonomically annotated your phage contigs with vConTACT3.
The tool applies a gene-sharing network approach by computing the number of homologous genes between two viral sequences.
Based on this information, it integrates unseen contigs into a reference network of taxonomically classified phages, and then uses information about related viruses in the network to infer a taxonomic classification.
The resulting network, which contains both the reference phages and our assembled contigs, can be found in the output file graph.cyjs
in your vConTACT3 results folder.
This file contains the network encoded in JSON format, which can be parsed with the Python standard library package json.
Python also provides several tools for handling and analyzing networks. In this section, you will use the NetworkX package to visualize the network of phages. The goal is to get an idea of the taxonomic diversity of the contigs in our dataset, and how it compares to reference phages in the database. You can either focus on a single contig and its closely related phages (close network neighborhood) or plot the whole network to get an overview of the full virosphere (at least the viruses in the reference database).
NetworkX provides several functions for drawing the network. Our network only consists of nodes (phages) and weighted edges (number of shared genes) between them, without any layout information (where to position the nodes). There are many algorithms for finding a visually pleasing layout for a given network. The Kamada-Kawai layout and the spring layout both employ a physical force-based simulation that optimizes the distance or attraction between node pairs. In our network, the number of shared genes between two nodes serves as a measure of attraction. The spring layout is computationally less demanding and should be used for the visualization of the complete network.
Making these plots involves a fair amount of programming and testing your code. This homework is not so much about figuring out the code but to introduce you to networkx as a tool for network analysis and visualization, and create a visualization which can confer the relational character of the data better than a table could. You can either try to figure out the code for this homework for yourself or start with one of the solutions and try to adjust or improve it. The goal is for you to end up with a picture of the network of phages which confers the idea of the algorithm VConTACT3 implements.
# you have to add networkx to the virtual environment first
import networkx as nx
# the json package is part of the standard library
import os, sys, json
# parse the json file and create a networkx graph from it
with open(graph_filename) as json_file:
graph_json = json.load(json_file)
# the following lines are necessary for compatibility with networkx
graph_json["data"] = []
graph_json["directed"] = False
for node in graph_json["elements"]["nodes"]:
# the 'value' attribute will be used for naming the nodes
node["data"]["value"] = node["data"]["id"]
# create a networkx graph from the graph_json dictionary
g = nx.cytoscape_graph(graph_json)
# use draw_networkx_nodes(...) and draw_networkx_edges(...) to draw the graph
# use g.subgraph(...) or g.edge_subgraph(...) to select a subgraph of interest
python script for plotting the local neighborhood of your selected contig
python script for plotting the complete network of phages
sbatch script for submitting the plotting script
Plot the network of closely related phages
You can choose to plot the whole network of all reference phages and contigs, or zoom in and plot a single contig and its neighborhood.
- Use the taxonomic information encoded in the network to color the nodes.
- Use draw_networkx_nodes and draw_networkx_edges to better access the plot attributes.
Key Points
The Python package NetworkX can be used to work with networks
The visualization of graphs usually requires the computation of positions for the nodes
Many of our contigs are close to Caudoviricitae, but we also have several connected groups of unclassified sequences