Library preparation for next-generation sequencing involves a coordinated series of enzymatic reactions to produce a random, representative collection of adapter modified DNA fragments within a specific range of fragment sizes. The success of next-generation sequencing is contingent on this proper processing of DNA or RNA sample. However as you go from isolating nucleic acid to finally interpreting your sequencing data, assumptions and bias prone steps can quickly reduce the value of your work. This is especially the case during library preparation.
Where are biases introduced ?
The first step in most library preparation is shearing of your DNA or RNA to fragments that are compatible with today’s short read NGS instruments. Whether you fragment your nucleic acid material using a high divalent cation buffer, nuclease, transposase, acoustic or mechanical method, you are invariably cutting at “preferred” positions resulting in fragments with non-random ends. In DNA-Seq/ChIP-Seq library prep procedures the next step is to end-repair or repair 5’overhangs and fill in 3’overhangs using a couple polymerases (or fragments of a polymerase). If you’re starting with variably fragmented ends, this process isn’t going to occur at the same efficiency for all your fragments, and neither will adenylation of these ends (commonly performed after end-repair). The next step is typically ligation of an adapter that is compatible with the instrument you plan to use. Unfortunately there aren’t that many studies that have examined bias in DNA adapter ligation. However, if you’re keeping up to date with papers coming out on RNA adapter ligation bias, there is reason to be concerned. To complete your library you typically have to amplify it, which can lead to preferences for particular fragments, and un-even fragment representation in your data. We’re just talking polymerase bias in PCR here, we haven’t even mentioned the polymerase biases on board the sequencing platform.
All RNA-Seq library prep procedures utilize reverse transcriptases that often have issues reading through regions of RNA that have secondary structure or specific base compositions. The reverse transcriptase primer you choose can also bias your results. Follow that with second strand synthesis and all the steps described above for DNA-Seq and you now have potentially the most compromised library application on hand.
A recent review nicely highlights some of these biases in more detail: Library preparation methods for next-generation sequencing: Tone down the bias
It’s not all doom and gloom, there are several ways you can reduce or measure these biases. For example using randomized bases in small RNA library preparations, eliminating PCR or even using a polymerase that can handle GC / AT rich regions fairly equally. Eliminating or reducing these biases will be the subject of a subsequent blog post.
Consistency in library prep method
For many applications, as long as all your samples are being treated equally, biases introduced during library preparation affect all your samples and may be less of a concern. If you’re in the middle of a long project, we highly recommend using the same library prep method or kit for your whole study. Differences in method, enzymes and reaction times between two different library prep kits can give you significantly different results. To understand differences in library preparation methodology, we have a NGS Library Prep Kit Guide that describes kits providers use on Genohub.
If you’d like help navigating issues in library preparation bias or would like to use our complementary consultation service, contact us !
2 thoughts on “Consider Bias in NGS Library Preparation”
Non-random DNA fragmentation in next-generation sequencing
Thanks for the article, nicely done study !