Using a pool of 1,062 in vitro transcribed (IVT) human cDNA plasmids, a group from the University of Pennsylvania sought to characterize coverage biases in RNA-seq experiments. Their paper, titled IVT-seq reveals extreme bias in RNA-sequencing was published last week.
The authors cleverly use a carefully controlled set of IVT cDNA clones whose base composition and expression levels are known. Mixing the IVT set with mouse total RNA they found > 2 fold differences in transcript coverage amongst 50% of their transcripts and 10% having up to 10 fold changes. When IVT cDNA clones are sequenced alone, in the absence of a complex genomic milieu, the authors acknowledge biases that arise from random priming, adapter ligation, and amplification, but identify polyA selection and ribosomal depletion as being the main cause for RNA coverage bias. In their experiment, they consider hexamer entropy, GC-content, similarity of sequence to rRNA and measure coverage variability as an indicator of coverage bias along with depth of coverage as measured by FPKM. They demonstrate a significant correlation between transcript similarity to rRNA and greater differences in coverage between libraries that undergo rRNA depletion and those that do not.
Overall their method demonstrates that library preparation does introduce significant biases in RNA-seq data and that developing carefully controlled synthetic test transcripts, allows users to accurately measure this bias. Development of these controlled sets will allow for further refinement to current library preparation practices.