From Array Suite Wiki

View viral sequence counts in Land RNA-seq Data

LandView Viral HPV16.png


As part of the Land RNA-seq pipeline, OmicSoft has quantified the number of reads mapping to thousands of viruses in each sample. These data can be visualized in several Lands with the Virus View.

LandView Viral Menu.png

Virus data are from NCBI reference sequence. We aligned unmapped reads from RNA-Seq bam files to these virus genomes and simply counted the total reads for each virus sequence. Next, we normalized the read counts by the total mapped reads of a sample. Rapid Identification of Non-human Sequences (RINS) was used to identify the sequences.

Land Data Requirements

The Virus View is available for several Lands in OncoLand, including TCGA, CCLE, and GTEx. Only samples with RNA-seq data will be included.

Features of the Virus Variable View

For each viral sequence, one plot will be generated. In each plot, samples will be grouped by the selected metadata column, such as Tumor Type, and the number of RNA-seq reads matching the virus will be plotted.

Tips.pngViral reads are normalized in each sample by 1,000,000 * (viral read count) / (reads mapping to human genome).

Samples will be colored by secondary grouping, such as Sample Type.

In addition to filtering by sample metadata, the user may filter the displayed microbial species under the Virus filter tab:

LandView Viral FilterTab.png

The user can also view a list of the viral sequences passing the filter settings by clicking View Filtered Table:

LandView Viral ViewFilteredTable Table.png

Use Case

Researchers may be interested in how gene expression is related to viral load. For example, how are PD1 (PDCD1) levels affected by hepatitis in liver cancers? This section describes a basic workflow to integrate expression and viral counts:

Query gene expression (OmicData query)

Users can quickly query the expression of a gene and return the expression value for each sample:

Pd1 query.png

Apply Filters

Uses can filter the view to desired virus. Using the example above, here we have filtered to hepatitis, and we see that liver cancers have a high incidence of hepatitis:


Plot sample viral counts by gene expression

The custom query generated by the OmicData query can be used in the Virus view to regroup the samples on the y-axis (after filtering to Tumor Type LIHC). Here we see that PDCD1 does not seem to be dependent on hepatitis B viral counts:


Create Sample Sets of high vs low viral counts

Alternatively, users may want to take an unbiased approach to identify gene expression changes that correlate with high viral counts. One way to do this is to perform a Sample Grouping to Expression analysis. First, start by creating a custom sample set by selecting samples with high virus counts and create a sample set:

Select forGrouping.png

The new sample set will have a metadata column in the sample set corresponding to this selection:

Selected set.png

Using this workflow, users can perform Sample Grouping to Expression analysis to perform a Kruskal-Wallis test for genes differentially expressed within this sample grouping, therefore correlating with virus counts.

Related Articles