rRNA Depletion / Poly-A Selection Responsible for Coverage Bias in RNA-seq

Using a pool of 1,062 in vitro transcribed (IVT) human cDNA plasmids, a group from the University of Pennsylvania sought to characterize coverage biases in RNA-seq experiments. Their paper, titled IVT-seq reveals extreme bias in RNA-sequencing was published last week.

The authors cleverly use a carefully controlled set of IVT cDNA clones whose base composition and expression levels are known. Mixing the IVT set with mouse total RNA they found > 2 fold differences in transcript coverage amongst 50% of their transcripts and 10% having up to 10 fold changes. When IVT cDNA clones are sequenced alone, in the absence of a complex genomic milieu, the authors acknowledge biases that arise from random priming, adapter ligation, and amplification, but identify polyA selection and ribosomal depletion as being the main cause for RNA coverage bias. In their experiment, they consider hexamer entropy, GC-content, similarity of sequence to rRNA and measure coverage variability as an indicator of coverage bias along with depth of coverage as measured by FPKM. They demonstrate a significant correlation between transcript similarity to rRNA and greater differences in coverage between libraries that undergo rRNA depletion and those that do not.

Overall their method demonstrates that library preparation does introduce significant biases in RNA-seq data and that developing carefully controlled synthetic test transcripts, allows users to accurately measure this bias. Development of these controlled sets will allow for further refinement to current library preparation practices.

 

4 thoughts on “rRNA Depletion / Poly-A Selection Responsible for Coverage Bias in RNA-seq

  1. So people have been doing RNA-Seq for years, but nobody has developed anything like a proper control for representational biases until now?

    Like

  2. Pingback: Interesting paper on RNA-seq bias | Genome Science Core at Penn State Hershey

  3. This is really interesting (and not surprising) data. For stranded RNA-Seq applications using total RNA we have been using the rRNA depletion method used in NuGEN’s kits. This workflow differs from Ribo-Zero in that the RNA is not touched, but rather rRNA depletion takes place by targeting the adaptor ligated library. We like the method because it is highly reproducible, and can also be used to deplete other transcripts in addition to rRNA such as globin, housekeeping genes, and structural protein transcripts. The method is flexible and we have applied it to other species and transcript types. This was big step forward towards standardization for us when dealing with this exact issue.

    Like

  4. Hi Andrew,

    Regarding your point: “This workflow differs from Ribo-Zero in that the RNA is not touched, but rather rRNA depletion takes place by targeting the adaptor ligated library.”

    You are not saying that the NuGEN kit gets rid of the bias described in the paper, right?

    You would still expect the extreme bias from the ‘depleting’ step when purifing RNA to occur when depleting at the adapter ligated library? Your point is that the NuGEN kit is better in that you can design what you want to pull out (as opposed to ribo-zero) and it is reproducible, right?

    Adding the treatment you describe where selection takes place on the ligated library would be a nice addition to the paper to clarify if it makes a difference. Meaning, could the secondary structures from RNA vs DNA make a difference in the sequencing bias.

    Did you ever test the RiboMinus kit? BTW, I am using RiboZero for plant, now. Any other suggestions are welcome 🙂

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s