Sum of RPKM_Transcript values for a gene. Used to stablilize values by gene in calculation of RPKM.
Calculated RPKM value for each exon or exon junction for a particular sample
This option DOES use the EM algorithm.
RPKM/FPKM unit of transcript expression
Reads Per Kilobase of transcript, per Million mapped reads (RPKM) is a normalized unit of transcript expression. It scales by transcript length to compensate for the fact that most RNA-seq protocols will generate more sequencing reads from longer RNA molecules. Fragments Per Kilobase of transcript, per Million mapped reads (FPKM) is used with paired-end RNA-seq data to reflect that a single sequenced molecule can generate two reads, but came from a single cDNA fragment.
In contrast to Transcripts Per Million (TPM), RPKM/FPKM will not always add up to 1,000,000, so a transcript or gene's RPKM/FPKM expression level can be affected by the average length of transcripts expressed in the sample. To compensate for this, RPKM values can either be scaled to TPM, or can be quantile-scaled.
RNAseq QC Coverage Metrics
The # of reads in a gene * (1000 / exonic gene length) * (1000000 / Effective Alignment Count)
RPKM data in Lands
Our RNA-seq processing pipeline only performs normalization within each individual sample (as you mentioned, by taking into account sample library size through the totalCount and mappedReadCount parameters). We do not normalize across different samples because then you couldn’t compare samples across different consortia (e.g. if we wanted to compare breast cancer samples from TCGA and OncoGEO, or between TCGA and your own internal samples using our sample processing pipelines). This is of course a trade-off, as there are most certainly batch-like effects within each consortia (although for something like TCGA, samples were generated at multiple sequencing centers across years, so there might not be a clear batch effect). However, we most valued the idea that every sample in OncoLand is uniformly processed.