ServerDataMatrix.Virus.Variable
View viral sequence counts in Land RNA-seq Data
Overview
As part of the Land RNA-seq pipeline, OmicSoft has quantified the number of reads mapping to thousands of viruses in each sample. These data can be visualized in several Lands with the Virus View.
Virus data are from NCBI reference sequence. We aligned unmapped reads from RNA-Seq bam files to these virus genomes and simply counted the total reads for each virus sequence. Next, we normalized the read counts by the total mapped reads of a sample. Rapid Identification of Non-human Sequences (RINS) was used to identify the sequences.
Land Data Requirements
The Virus View is available for several Lands in OncoLand, including TCGA, CCLE, and GTEx. Only samples with RNA-seq data will be included.
Features of the Virus Variable View
For each viral sequence, one plot will be generated. In each plot, samples will be grouped by the selected metadata column, such as Tumor Type, and the number of RNA-seq reads matching the virus will be plotted.

Samples will be colored by secondary grouping, such as Sample Type.
In addition to filtering by sample metadata, the user may filter the displayed microbial species under the Virus filter tab:
The user can also view a list of the viral sequences passing the filter settings by clicking View Filtered Table:
Use Case
Researchers may be interested in how gene expression is related to viral load. For example, how are PD1 (PDCD1) levels affected by hepatitis in liver cancers? This section describes a basic workflow to integrate expression and viral counts:
Query gene expression (OmicData query)
Users can quickly query the expression of a gene and return the expression value for each sample:
Apply Filters
Uses can filter the view to desired virus. Using the example above, here we have filtered to hepatitis, and we see that liver cancers have a high incidence of hepatitis:
Plot sample viral counts by gene expression
The custom query generated by the OmicData query can be used in the Virus view to regroup the samples on the y-axis (after filtering to Tumor Type LIHC). Here we see that PDCD1 does not seem to be dependent on hepatitis B viral counts:
Create Sample Sets of high vs low viral counts
Alternatively, users may want to take an unbiased approach to identify gene expression changes that correlate with high viral counts. One way to do this is to perform a Sample Grouping to Expression analysis. First, start by creating a custom sample set by selecting samples with high virus counts and create a sample set:
The new sample set will have a metadata column in the sample set corresponding to this selection:
Using this workflow, users can perform Sample Grouping to Expression analysis to perform a Kruskal-Wallis test for genes differentially expressed within this sample grouping, therefore correlating with virus counts.