Genome Toolbox: RNA-seq: RPKM, FPKM, Formulas, and Scripts

Wednesday, July 16, 2014

RNA-seq: RPKM, FPKM, Formulas, and Scripts

Reads per kilobase per million mapped reads (RPKM) is a common metric used when investigating RNA expression of a gene transcript in sequencing data from RNA-seq experiments. The RPKM measure of read density reflects the molar concentration of a transcript in a starting sample by normalizing for RNA length and the total read number. By doing so, RPKM values facilitates transparent comparison of transcript levels both within and between samples.The formula for RPKM is as follows

where ER is the number of mapped reads in the gene's exons, EL is the sum of exon length in base pairs, and MR is the total number of mapped reads.

The number of transcript copies (TC) can be derived from RPKM as well. Essentially

where TL is the length of the transcriptome in base pairs. This can then be rearranged and RPKM can be substituted in as follows

The difficult part is getting an estimate on TL. TL can be estimated from spike-in data or can be derived from the starting amount of mRNA if you are willing to assume 100% efficiency of the cDNA synthesis. A great paper on RPKM is by Mortazavi et al. (Nature Methods 2008)

Fragments per kilobase per million mapped fragments (FPKM) is essentially analogous to RPKM. The only difference being that rather than using read counts you are estimating abundance of gene transcripts in terms of fragments observed. In paired-end RNA-seq experiments (ex: Illumina sequencing), fragments are sequenced from both ends providing two reads for each fragment. Therefore, RPKM=one read (single end) and FPKM=fragments are two reads (paired end). A common misconception is that RPKM values are twice that of FPKM. That is untrue, since FPKM is fragments per kilobase per million mapped fragments, not fragments per kilobase per million mapped reads. RPKM is approximately equal to FPKM.

Here are links to some programs and scripts that are useful:

cufflinks

rpkmforgenes.py

Genome Toolbox

Pages

Wednesday, July 16, 2014

RNA-seq: RPKM, FPKM, Formulas, and Scripts

No comments:

Post a Comment