6 QC methods post library construction for NGS

After nucleic acid extraction and sample QC, the next step in the NGS workflow is library preparation. NGS libraries are prepared to meet the platform requirements with respect to size, purity, concentration and efficient ligation of adaptors. Assessing the quality of a sequencing library before committing it to a full-scale sequencing run ensures maximum sequencing efficiency, leading to accurate sequencing data with more even coverage.

In this blog post, we list the various ways to QC libraries in order of most stringent to least stringent.

1. qPCR

qPCR is a method of quantifying DNA based on PCR. qPCR tracks target concentration as a function of PCR cycle number to derive a quantitative estimate of the initial template concentration in a sample. As with conventional PCR, it uses a polymerase, dNTPs, and two primers designed to match sequences within a template. For the QC protocol, the primers match sequences within the adapters flanking a sequencing library.

Therefore, qPCR is an ideal method for measuring libraries in advance of generating clusters, because it will only measure templates that have both adaptor sequences on either end which will subsequently form clusters on a flow cell. In addition, qPCR is a very sensitive method of measuring DNA and therefore dilute libraries with concentrations below the threshold of detection of conventional spectrophotometric methods can be quantified by qPCR.

KAPA Biosystems SYBR FAST ‘Library Quantification Kit for Illumina Sequencing Platforms is commonly used with qPCR. This kit measures absolute numbers of molecules containing the Illumina adapter sequences, thus providing a highly accurate measurement of amplifiable molecules available for cluster generation.

2. MiSeq

The MiSeq system uses the same library prep methods and proven sequencing by synthesis chemistry as the HiSeq system. Thus, it is ideal for analyzing prepared libraries prior to performing high-throughput sequencing. Performing library quality control (QC) using the MiSeq system before committing it to a fullscale HiSeq run can save time and money while leading to better sequencing results.

Data generated by the MiSeq system is comparable to other Illumina next-generation sequencing platforms, ensuring a smooth transition from one instrument to another. Based on the individual experimental requirements, metrics obtained from performing simple QC can be used to streamline and improve your sequencing projects.

Using a single library prep method and taking only a single day, detailed QC parameters, including cluster density, library complexity, percent duplication, GC bias, and index representation can be generated on the MiSeq system. The MiSeq system has the unique ability to do paired-end (PE) sequencing for accurately assessing insert size. Library cluster density can also be determined and used to predict HiSeq cluster density, maximizing yield and reducing rework.

3. Fluorometric method

Quantifying DNA libraries using a fluorometric method that involves intercalating dyes specifically binding to DNA or RNA is highly useful. This method is very precise as DNA dyes do not bind to RNA and vice versa.

The Invitrogen™ Qubit™ Fluorometer a popular fluorometer that accurately measures DNA, RNA, and protein using the highly sensitive Invitrogen™ Qubit™ quantitation assays. The concentration of the target molecule in the sample is reported by a fluorescent dye that emits a signal only when bound to the target, which minimizes the effects of contaminants—including degraded DNA or RNA—on the result.

4. Automated electrophoresis

Several automated electrophoretic instruments are useful in estimating the size of the NGS libraries. The Agilent 2100 Bioanalyzer system provides sizing, quantitation, and purity assessments for DNA, RNA, and protein samples. The Agilent 2200 TapeStation system is a tape-based platform for reliable electrophoresis platform for accurate size selection of generated libraries. PerkinElmer LabChip GX can be used for DNA and RNA quantitation and sizing using automated capillary electrophoresis separation. The Qiagen QIAxcel Advanced system fully automates sensitive, high-resolution capillary electrophoresis of up to 96 samples per run that can be used for library QC as well. All these instruments are accompanied by convenient analysis and data documentation software that make the library QC step faster and easier.

5. UV-Visible Spectroscopy

A UV-Vis spectrophotometer can be used to analyze spectral absorbance to measure the nucleic acid libraries and can differentiate between DNA, RNA and other absorbing contaminants. However, this method is not super accurate and should be paired with one of the other QC methods to ensure high-quality libraries. There are several US-Vis spectrophotometers currently available, such as currently available such as Thermo Scientific™ NanoDrop™ UV-Vis spectrophotometer, Qiagen QIAExpert System, Shimadzu Biospec-nano etc.

6. Bead normalization

This is the preferred QC method if < 12 libraries are to be QCed or if library yields are less than 15 nM, highly variable and unpredictable or Users are working with uncharacterized genomes and are inexperienced with the Nextera XT DNA Library Prep Kit protocol.

During bead-based normalization, DNA is bound to normalization beads and eluted off the beads at approximately the same concentration for each sample. Bead-based normalization enables scientists to bypass time-consuming library quantitation measurements and manual pipetting steps before loading libraries onto the sequencer. Bead-based normalization can provide significant cost and time savings for researchers processing many samples, or for researchers without access to any of the QC  instruments listed in the above methods.

 

 

 

 

Top 3 Sample QC steps prior to library preparation for NGS

Before beginning library preparation for next-generation sequencing, it is highly recommended to perform sample quality control (QC) to check the nucleic acid quantity, purity and integrity. The starting material for NGS library construction might be any type of nucleic acid that is or can be converted into double-stranded DNA (dsDNA). These materials, often gDNA, RNA, PCR amplicons, and ChIP samples, must have high purity and integrity and sufficient concentration for the sequencing reaction.

1. Nucleic Acid Quantification

Measuring the concentration of nucleic acid samples is a key QC step to determine the fit and amount of nucleic acid available for further processing.

  • Absorbance Method:

A UV-Vis spectrophotometer can be used to analyze spectral absorbance to measure the whole nucleic acid profile and can differentiate between DNA, RNA and other absorbing contaminants. Different molecules such as nucleic acids, proteins, and chemical contaminants absorb light in their own pattern. By measuring the amount of light absorbed at a defined wavelength, the concentration of the molecules of interest can be calculated. Most laboratories are equipped with a US-Vis spectrophotometer to quantify nucleic acids or proteins for their day-to-day experiments. Customers can choose from several spectrophotometers currently available such as Thermo Scientific™ NanoDrop™ UV-Vis spectrophotometer, Qiagen QIAExpert System, Shimadzu Biospec-nano etc.

  • Fluorescence Method:

Fluorescence methods are more sensitive than absorbance, particularly for low-concentration samples, and the use of DNA-binding dyes allows more specific measurement of DNA than spectrophotometric methods. Fluorescence measurements are set at excitation and emission values that vary depending on the dye chosen (Hoechst bis-benzimidazole dyes, PicoGreen® or QuantiFluor™ dsDNA dyes). The concentration of unknown samples is calculated based on comparison to a standard curve generated from samples of known DNA concentration.

The availability of single-tube and microplate fluorometers gives flexibility for reading samples in PCR tubes, cuvettes or multiwell plates and makes fluorescence measurement a convenient modern alternative to the more traditional absorbance methods. Thermo Scientific (Invitrogen) Qubit™ Fluorometer is one of the most commonly used fluorometers that accurately measure low concentration DNA, RNA, and protein.

sho-qubit-instrument

2. Nucleic Acid Purity

Nucleic acid samples can become contaminated by other molecules with which they were co-extracted and eluted during the purification process or by chemicals from upstream applications. Purification methods involving phenol extraction, ethanol precipitation or salting-out may not completely remove all contaminants or chemicals from the final eluates. The resulting impurities can significantly decrease the sensitivity and efficiency of your downstream enzymatic reactions.

  • UV spectrophotometry measurements enable calculation of nucleic acid concentrations based on the sample’s absorbance at 260 nm. The absorbance at 280 nm and 230 nm can be used to assess the level of contaminating proteins or chemicals, respectively. The absorbance ratio of nucleic acids to contaminants provides an estimation of the sample purity, and this number can be used as acceptance criteria for inclusion or exclusion of samples in downstream applications.
  • Contaminants such as RNA, proteins or chemicals can interfere with library preparation and the sequencing reactions. When sequencing DNA, an RNA removal step is highly recommended, and when sequencing RNA, a gDNA removal step is recommended. Sample purity can be assessed following nucleic acid extraction and throughout the library preparation workflow using UV/Vis spectrophotometry. For DNA and RNA samples the relative abundance of proteins in the sample can be assessed by determining the A260/A280ratio, which should be between 1.8–2.0. Contamination by organic compounds can be assessed using the A260/A230 ratio, which should be higher than 2.0 for DNA and higher than 1.5 for RNA. Next-generation spectrophotometry with the Qiagen QIAxpert system enables spectral content profiling, which can discriminate DNA and RNA from sample contaminants without using a dye.

19647-15451

  • qPCR:

Quantitative PCR, or real-time PCR, (qPCR) uses the linearity of DNA amplification to determine absolute or relative quantities of a known sequence in a sample. By using a fluorescent reporter in the reaction, it is possible to measure DNA generation in the qPCR assay. In qPCR, DNA amplification is monitored at each cycle of PCR. When the DNA is in the log-linear phase of amplification, the amount of fluorescence increases above the background. The point at which the fluorescence becomes measurable is called the threshold cycle (CT) or crossing point. By using multiple dilutions of a known amount of standard DNA, a standard curve can be generated of log concentration against CT. The amount of DNA or cDNA in an unknown sample can then be calculated from its CT value.

qPCR-based assays can accurately qualify and quantify amplifiable DNA in challenging samples. For example, DNA derived from Formalin-fixed paraffin-embedded tissue samples, is oftentimes highly fragmented, cross-linked with protein and has a high proportion of single-stranded DNA making it challenging to perform library preparation steps. For FFPE samples, the Agilent NGS FFPE QC kit enables functional DNA quality assessment of input DNA.

3. Nucleic Acid Integrity (Size distribution)

Along with quantity and purity, size distribution is a critical QC parameter that provides valuable insight into sample quality. Analyzing nucleic acid size informs you about your sample’s integrity and indicates whether the samples are fragmented or contaminated by other DNA or RNA products. Various electrophoretic methods can be used to assess the size distribution of your sample.

  • Agarose Gel Electrophoresis

In this method, a horizontal gel electrophoresis tank with an external power supply, analytical-grade agarose, an appropriate running buffer (e.g., 1X TAE) and an intercalating DNA dye along with appropriately sized DNA standards are required. A sample of the isolated DNA is loaded into a well of the agarose gel and then exposed to an electric field. The negatively charged DNA backbone migrates toward the anode. Since small DNA fragments migrate faster, the DNA is separated by size. The percentage of agarose in the gel will determine what size range of DNA will be resolved with the greatest clarity. Any RNA, nucleotides, and protein in the sample migrate at different rates compared to the DNA so the band(s) containing the DNA will be distinct.

gel_electrophoresis_dna_bands_yourgenome

Analyzing PCR amplicons or RFLP fragments confirms the presence of the expected size fragments and alerts you to the presence of any non-specific amplicons. Electrophoresis also helps you assess the ligation efficiency yield for plasmid cloning procedures as well as the efficiency of removal of primer–dimers or other unspecific fragments during sample cleanup.

For complex samples such as genomic DNA (gDNA) or total RNA, the shape and position of the smear from electrophoresis analysis directly correlates with the integrity of the samples. Nucleic acid species of larger size tend to be degraded first and provide degradation products of lower molecular weight. Samples of poor integrity generally have a higher abundance of shorter fragments, while high-quality samples contain intact nucleic acid molecules with higher molecular size.

Eukaryotic RNA samples have unique electrophoretic signatures, which consist of a smear with major fragments corresponding to 28S, 18S and 5S ribosomal RNA (rRNA). These electrophoretic patterns correlate with the integrity of the RNA samples. The RNA integrity can either be assessed manually or with automation that employs a dedicated algorithm such as the RNA Integrity Score (RIS) that gives an objective integrity grade to RNA samples ranging from 1–10. RNA samples of highest quality usually have a score of 8 or above.

  • Capillary Electrophoresis

In this method, charged DNA or RNA molecules are injected into a capillary and are resolved during migration through a gel-like matrix. Nucleic acids are detected as they pass by a detector that captures signals of specific absorbance. Results are presented in the form of an electropherogram, which is a plot of signal intensity against migration time. The fragment sizes are precisely determined using a size marker consisting of fragments of known size. This method provides highly resolving and sensitive nucleic acid analysis that is faster and safer.

 

 

Hybrid Read Sequencing: Applications and Tools

Next-generation sequencing (Illumina) and long read sequencing (PacBio/Oxford Nanopore) platforms each have their own strengths and weaknesses. Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, the combination of these techniques led to a new improved approach known as hybrid sequencing.

The hybrid sequencing methods utilize the high-throughput and high-accuracy short read data to correct errors in the long reads. This approach reduces the required amount of costlier long-read sequence data as well as results in more complete assemblies including the repetitive regions. Moreover, PacBio long reads can provide reliable alignments, scaffolds, and rough detections of genomic variants, while short reads refine the alignments, assemblies, and detections to single-nucleotide resolution. The high coverage of short read sequencing data output can also be utilized in downstream quantitative analysis1.

Applications

De novo sequencing

As alternatives to using PacBio sequencing alone for eukaryotic de novo assemblies, error correction strategies using hybrid sequencing have also been developed.

  • Koren et al. developed the PacBio corrected Reads (PBcR) approach for using short reads to correct the errors in long reads2. PBcR has been applied to reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced parrot (Melopsittacus undulates) The long-read correction approach, has achieved >99.9% base-call accuracy, leading to substantially better assemblies than non-hybrid sequencing strategies.
  • Also, Bashir et al. used hybrid sequencing data to assemble the two-chromosome genome of a Haitian cholera outbreak strain at >99.9% accuracy in two nearly finished contigs, completely resolving complex regions with clinically relevant structures3.
  • More recently, Goodwin et al. developed an open-source error correction algorithm Nanocorr, specifically for hybrid error correction of Oxford Nanopore reads. They used this error correction method with complementary MiSeq data to produce a highly contiguous and accurate de novo assembly of the Saccharomyces cerevisiae The contig N50 length was more than ten times greater than an Illumina-only assembly with >99.88% consensus identity when compared to the reference. Additionally, this assembly offered a complete representation of the features of the genome with correctly assembled gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly4.

Transcript structure and Gene isoform identification

Besides genome assembly, hybrid sequencing can also be applied to the error correction of PacBio long reads of transcripts. Moreover, it could improve gene isoform identification and abundance estimation.

  • Along with genome assembly, Koren et al. used the PBcR method to identify and confirm full-length transcripts and gene isoforms. As the length of the single-molecule PacBio reads from RNA-Seq experiments is within the size distribution of most transcripts, many PacBio reads represent near full-length transcripts. These long reads can therefore greatly reduce the need for transcript assembly, which requires complex algorithms for short reads and confidently detect alternatively spliced isoforms. However, the predominance of indel errors makes analysis of the raw reads challenging. Both sets of PacBio reads (before and after error-correction) were aligned to the reference genome to determine the ones that matched the exon structure over the entire length of the annotated transcripts. Before correction, only 41 (0.1%) of the PacBio reads exactly matched the annotated exon structure that rose to 12, 065 (24.1%) after correction.
  • Au et al. developed a computational tool called LSC for the correction of raw PacBio reads by short reads5. Applying this tool to 100,000 human brain cerebellum PacBio subreads and 64 million 75-bp Illumina short reads, they reduced the error rate of the long reads by more than 3-fold. In order to identify and quantify full-length gene isoforms, they also developed an Isoform Detection and Prediction tool (IDP), which makes use of TGS long reads and SGS short reads6. Applying LSC and IDP to PacBio long reads and Illumina short reads of the human embryonic stem cell transcriptome, they detected several thousand RefSeq-annotated gene isoforms at full-length. IDP-fusion has also been released for the identification of fusion genes, fusion sites, and fusion gene isoforms from cancer transcriptomes7.
  • Ning et al. developed an analysis method HySeMaFi to decipher gene splicing and estimate the gene isoforms abundance8. Firstly, the method establishes the mapping relationship between the error-corrected long reads and the longest assembled contig in every corresponding gene. According to the mapping data, the true splicing pattern of the genes is detected, followed by quantification of the isoforms.

Personal transcriptomes

Personal transcriptomes are expected to have applications in understanding individual biology and disease, but short read sequencing has been shown to be insufficiently accurate for the identification and quantification of an individual’s genetic variants and gene isoforms9.

  • Using a hybrid sequencing strategy combining PacBio long reads and Illumina short reads, Tilgner et al. sequenced the lymphoblastoid transcriptomes of three family members in order to produce and quantify an enhanced personalized genome annotation. Around 711,000 CCS reads were used to identify novel isoforms, and ∼100 million Illumina paired-end reads were used to quantify the personalized annotation, which cannot be accomplished by the relatively small number of long reads alone. This method produced reads representing all splice sites of a transcript for most sufficiently expressed genes shorter than 3 kb. It provides a de novo approach for determining single-nucleotide variations, which could be used to improve RNA haplotype inference10.

Epigenetics research

  • Beckmann et al. demonstrated the ability of PacBio sequencing to recover previously-discovered epigenetic motifs with m6A and m4C modifications in both low-coverage and high-contamination scenarios11. They were also able to recover many motifs from three mixed strains ( E. coliG. metallireducens, and C. salexigens), even when the motif sequences of the genomes of interest overlap substantially, suggesting that PacBio sequencing is applicable to metagenomics. Their studies infer that hybrid sequencing would be more cost-effective than using PacBio sequencing alone to detect and accurately define k-mers for low proportion genomes.

Hybrid assembly tools

Several algorithms have been developed that can help in the single molecule de novo assembly of genomes along with hybrid error correction using the short, high-fidelity sequences.

  • Jabba is a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. It uses a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds12. The tool is available here: https://github.com/biointec/jabba.
  • HALC is a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement and constructs a contig graph. This tool was applied on E. coliA. thaliana and Maylandia zebra data sets and has been showed to achieve up to 41 % higher throughput than other existing algorithms while maintaining comparable accuracy13. HALC can be downloaded here:  https://github.com/lanl001/halc.
  • The HYBRIDSPADES algorithm was developed for assembling short and long reads and benchmarked on several bacterial assembly projects. HYBRIDSPADES generated accurate assemblies (even in projects with relatively low coverage by long reads), thus reducing the overall cost of genome sequencing. This method was used to demonstrate the first complete circular chromosome assembly of a genome from single cells of Candidate Phylum TM6using SMRT reads14. The tool is publicly available on this page: http://bioinf.spbau.ru/en/spades.

Due to the constant development of new long read error correction tools, La et al. have recently published an open-source pipeline that evaluates the accuracy of these different algorithms15. LRCstats analyzed the accuracy of four hybrid correction methods for PacBio long reads over three data sets and can be downloaded here: https://github.com/cchauve/lrcstats.

Sović et al. evaluated the different non-hybrid and hybrid assembly methods for de novo assembly using nanopore reads16. They benchmarked five non-hybrid assembly pipelines and two hybrid assemblers that use nanopore sequencing data to scaffold Illumina assemblies. Their results showed that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and performed relatively well on lower nanopore coverages. The implementation of this DNA Assembly benchmark is available here: https://github.com/kkrizanovic/NanoMark.

References:

  1. Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genomics, Proteomics Bioinforma. 13, 278–289 (2015).
  2. Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotech 30, 693–700 (2012).
  3. Bashir, A. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat Biotechnol 30, (2012).
  4. Goodwin, S. et al. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res 25, (2015).
  5. Au, K. F., Underwood, J. G., Lee, L. & Wong, W. H. Improving PacBio Long Read Accuracy by Short Read Alignment. PLoS One 7, e46679 (2012).
  6. Au, K. F. et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl. Acad. Sci. 110, E4821–E4830 (2013).
  7. Weirather, J. L. et al. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Res. 43, e116 (2015).
  8. Ning, G. et al. Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome. 7, 43793 (2017).
  9. Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq.(ANALYSIS OPEN)(Report). Nat. Methods 10, 1177 (2013).
  10. Tilgner, H., Grubert, F., Sharon, D. & Snyder, M. P. Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl. Acad. Sci. 111, 9869–9874 (2014).
  11. Beckmann, N. D., Karri, S., Fang, G. & Bashir, A. Detecting epigenetic motifs in low coverage and metagenomics settings. BMC Bioinformatics 15, S16 (2014).
  12. Miclotte, G. et al. Jabba: hybrid error correction for long sequencing reads. Algorithms Mol. Biol. 11, 10 (2016).
  13. Bao, E. & Lan, L. HALC: High throughput algorithm for long read error correction. BMC Bioinformatics 18, 204 (2017).
  14. Antipov, D., Korobeynikov, A., McLean, J. S. & Pevzner, P. A. hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics 32, 1009–1015 (2016).
  15. La, S., Haghshenas, E. & Chauve, C. LRCstats, a tool for evaluating long reads correction methods. Bioinformatics (2017). doi:10.1093/bioinformatics/btx489
  16. Sović, I., Križanović, K., Skala, K. & Šikić, M. Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads . Bioinformatics 32, 2582–2589 (2016).

 

6 Methods to Fragment Your DNA / RNA for Next-Gen Sequencing

The preparation of a high quality sequencing library plays an important role in next-generation sequencing (NGS). The first main step in preparing nucleic acid for NGS is fragmentation. In the next series of blog posts we will present important challenges and things to consider as you isolate nucleic acid samples and prepare your own libraries.

Next Generation Sequencing, will give you a plethora of reads, but they will be short. Illumina and Ion read lengths are currently under 600 bases. Roche 454 outputs reads at less than 1kb and PacBio less than 9kb in length. This makes sizing your input DNA or RNA important prior to library construction. There are three main ways to shorten your long nucleic acid material into something compatible for next-gen sequencing: 1) Physical, 2) Enzymatic and 3) Chemical shearing.

Physical Fragmentation

1) Acoustic shearing

2) Sonication

3) Hydrodynamic shear

Acoustic shearing and sonication are the main physical methods used to shear DNA. The Covaris® instrument (Woburn, MA) is an acoustic device for breaking DNA into 100-5kb bp. Covaris also manufactures tubes (gTubes) which will process samples in the 6-20 kb for Mate-Pair libraries. The Bioruptor® (Denville, NJ) is a sonication device utilized for shearing chromatin, DNA and disrupting tissues. Small volumes of DNA can be sheared to 150-1kb in length. Hydroshear from Digilab (Marlborough, MA) utilizes hydrodynamic forces to shear DNA.  Nebulizers (Life Tech, Grand Island, NY) can also be used to atomize liquid using compressed air, shearing DNA into 100-3kb fragments in seconds. While nebulization is low cost and doesn’t require the purchase of an instrument, it is not recommended if you have limited starting material. You can lose up to 30% of your DNA with a nebulizer. The other sonication and acoustic shearing devices described above are better designed for smaller volumes and retain the entire amount of your DNA more efficiently.

Enzymatic Methods

4) DNase I or other restriction endonuclease, non-specific nuclease

5) Transposase

Enzymatic methods to shear DNA into small pieces include DNAse I, a combination of maltose binding protein (MBP)-T7 Endo I and a non-specific nuclease Vibrio vulnificus (Vvn), NEB’s (Ipswich, MA) Fragmentase and Nextera tagmentation technology (Illumina, San Diego, CA). The combination of non-specific nuclease and T7 Endo synergistically work to produce non-specific nicks and counter nicks, generating fragments that disassociate 8 nucleotides or less from the nick site. Tagmentation uses a transposase to simultaneously fragment and insert adapters onto dsDNA. Generally enzymatic fragmentation has shown to be consistent, but worse when compared to physical shear methods when it comes to bias and detecting insertions and deletions (indels) (Knierim et al., 2011). Depending on your specific application, de novo genome sequencing vs. small genome re-sequencing, biases associated with enzymatic fragmentation may not be as important.

RNAse III is an endonuclease that cleaves RNA into small fragments with 5’phosphate and 3’hydroxyl groups. While these end groups are needed for RNA ligation, making the assay convenient, RNAse III cleavage does have sequence preference which makes the cleavage biased. Heat / chemical methods described below, while they leave 3’phosphate and 5’hydroxyl ends, show less sequence bias and are generally preferred methods in library preparation.

Chemical Fragmentation    

6) Heat and divalent metal cation

Chemical shear is typically reserved for the breakup of long RNA fragments. This is typically performed through the heat digestion of RNA with a divalent metal cation (magnesium or zinc). The length of your RNA (115 bp – 350 nt) can be adjusted by increasing or decreasing the time of incubation.

The size of your DNA or RNA insert is a key factor for library construction and sequencing. You’ll need to choose an instrument and read length that is compatible with your insert length. You can choose this by entering project parameters in the Shop by Project page and filtering according to read length (estimated insert length). If you’re not sure, we can help. Send us a request through our consultation form .

Reference:

Systematic Comparison of Three Methods for Fragmentation of Long-Range PCR Products for Next Generation Sequencing

Ellen Knierim, Barbara Lucke, Jana Marie Schwarz, Markus Schuelke, Dominik Seelow

 

 

Considerations for Sequencing microRNA

microRNA sequencing

We’ve put together a new small RNA (microRNA) sequencing guide describing considerations all new users should make before undertaking a small RNA sequencing project. One of the first considerations is determining the number of reads you need. This usually depends on whether you’re interested in differential small RNA expression or if you’re trying to discover new microRNAs. Once you know the number of reads you need per sample, consider the following factors before and after library preparation:

  1. Should you start with total RNA or isolated small RNA?
  2. How much material should you start with?
  3. What’s the minimum quality of total RNA acceptable for microRNA library preparation and sequencing?
  4. How will small RNA ligation bias affect my results?
  5. How can I minimize adapter dimers to improve read mapping and general usability of my sequencing reads?
  6. How many samples can I multiplex or pool together in a single sequencing lane?
  7. What sequencing read length should I choose for microRNA or small RNA sequencing studies?

The guide also includes recommendations for getting accurate per sample pricing and turnaround times.

Small RNAs play a big role in regulating the translation of target RNAs through RNA to RNA interactions and have been shown to offer potential as biomarkers in diagnostic applications. Sequencing promises to be a useful tool in unraveling the roles of these short non-coding RNAs. We look forward to working with you on your next microRNA project.

 

 

New Short, Long and High Throughput Sequencing Reads in 2016

 

Nanopore sequencing

Nanopore sequencing

An exciting wave of newly released DNA sequencing instruments and technology will soon be available to researchers. From DNA sequencers the size of a cell phone to platforms that turn short reads into long-range information, these new sequencing technologies will be available on Genohub as services that can be ordered. Below is a summary of the technology you can expect in Q1 of 2016:

10X Genomics GemCode Platform

The GemCode platform from 10X Genomics partitions long DNA fragments up to 100 kb with a pool of ~750K molecular barcodes, indexing the genome during library construction. Barcoded DNA fragments are made such that all fragments share the same barcode. After several cycling and pooling steps, >100K barcode containing partitions are created. GemCode software then maps short Illumina read pairs back to the original long DNA molecules using the barcodes added during library preparation. With long range information, haplotype phasing and improved structural variant detection are possible. Gene fusions, deletions and duplications can be detected from exome data.

Ion Torrent S5, S5 XL

The S5 system was developed by Ion to focus on the clinical amplicon-seq market. While the wait for delivery of Proton PII chips continues, Ion delivered a machine with chip configurations very much similar to past PGM and Proton chips. 520/530 chips offer 200-400 bp runs with 80M reads and 2-4 hour run times. Using Ion’s fixed amplicon panels, data analysis can be completed within 5 hours. The Ion chef is required to reduce hands on library prep time, otherwise libraries and chip loading needs to be performed manually. Ion looks to have positioned their platform toward clinical applications. With stiff competition from Illumina and their inability to deliver similar read lengths and throughput, this is a smart decision by Ion. Focusing their platform on a particular application likely means future development (longer and higher throughput reads) has been paused indefinitely.

Pacific Biosciences Sequel System

Announced in September 2015, the Sequel System uses the same Single Molecule, Real Time (SMRT) technology as the RSII,   but boasts several tech advancements. At around one third the cost of a RSII, the Sequel offers 7x more reads with 1M zero –mode waveguides (ZMWs) per SMRT cell versus the previous standard of 150K. The application of Iso-Seq or full length transcript sequencing is especially promising as 1M reads crosses into the threshold where discovery and quantitation of transcripts becomes interesting. By providing full length transcript isoforms, it’s no longer necessary to reconstruct transcripts or infer isoforms based on short read information. Of course, the Sequel is ideal for generating whole genome de novo assemblies. We’l follow how the Oxford Nanopore’s ONT MinIon competes with the Sequel system in 2016.

Oxford Nanopore’s (ONT) MinIon

In 2014, Oxford Nanopore started it’s MinIon Access Program (MAP) delivering over 1,000 MinIons to users who wanted to test the technology. These users have gone on to publish whole E. Coli and Yeast genome assemblies. Accuracy of the device is up to 85% per raw base and there are difficulties in dealing with high G+C content sequences. There remains a lot of work left to improve the technology before widespread adoption. The workflow is simple and uses typical library construction steps of end-repair and ligation. Once the sample is added to the flow cell, users can generate long reads >100 kb and can analyze data in real time. Median reads are currently in the 1-2 kb length. Combined alongside with MiSeq reads, publications have shown MinIon output can enhance contiguity of de novo assembly. Lower error rates generated by Two Direction reads produced with recent updated MinIon chemistry does give cause for optimism that greatly reduced error rates can be achieved in the near future. This along with a low unit cost and the ability to deploy the USB sized device in the field make this a very exciting technology.

Illumina HiSeq X

While HiSeq X services have been available on Genohub for over a year, Illumina’s announcement of its expansion to non-human whole genomes was well received. However there are still several unanswered questions. Illumina states,

The updated rights of use will allow for market expansion and population-scale sequencing of non-human species in a variety of markets, including plants and livestock in agricultural research and model organisms in pharmaceutical research. Previously, it has been cost prohibitive to sequence non-human genomes at high coverage.

You can now sequence mouse, rat and other relatively large sized genomes economically on the HiSeq X. This makes the most sense for high coverage applications, e.g. 30x or above. While smaller sized and medium sized genomes can be sequenced on a HiSeq X, the low level of barcoding and high coverage you’d obtain makes these applications less attractive. According to Illumina, as of 12/20/2015, metagenomic whole genome sequencing was not a compatible application on the HiSeq X. The instrument is still restricted to WGS only. RNA-Seq, Exome-seq and ChIP-Seq applications will have to wait. Perhaps by the time the HiSeq X One is released access will be opened to these non-WGS applications.

While these new instruments make their way onto Genohub’s Shop by Project page, you can make inquiries and even order services by placing a request on our consultation page.

Key Considerations for Whole Exome Sequencing

exome sequencing and library preparation

Exome, UTR, non-coding regions, CDS

Whole exome sequencing is a powerful technique for sequencing protein coding genes in the genome (known as the exome). It’s a useful tool for applications where detecting variants is important, including population genetics, association and linkage, and oncology studies.

As the main hub for searching and ordering next generation sequencing services, most researchers about to embark on an exome sequencing project start their search on Genohub.com.  It’s our responsibility to make sure the researcher is informed and prepared before placing an order for an exome sequencing service.

Working toward achieving this goal, we’ve established a series of guides for anyone about to start a whole exome sequencing project. We’ve described each of these guides here.

  1. Should I choose Whole Genome Sequencing or Whole Exome Sequencing

This guide describes what you can get with WGS that you won’t with WES and compares pricing on a per sample basis. It also provides an overview of sequence coverage, coverage uniformity, off-target effects and bias due to PCR amplification.

  1. How to choose a Exome Sequencing Kit for capture and sequencing

This guide breaks down each commercial exome capture kit, comparing Agilent SureSelect, Nimblegen SeqCap and Illumina Nextera Rapid Capture. Numbers of probes used for capture, DNA input required, adapter addition strategy, probe length and design, hybridization time and cost per capture are all compared. This comparison is followed by a description of each kit’s protocol.

  1. How to calculate the number of sequencing reads needed for exome sequencing

In the same guide that compares library preparation kits (above), we go through an example on how to determine the amount of sequencing and read length required for your exome study. This is especially important when you start comparing the cost for exome sequencing services (see the next guide).

  1. How to choose an exome sequencing and library preparation service

Are you looking for 100x sequencing coverage, what many in the industry call standard exome sequencing or 200x coverage, considered ‘high depth’?  Or are you interested in a CLIA grade, clinical whole exome sequencing service? This exome guide breaks each down into searches that can be performed on Genohub. The search buttons allow for real time comparison of available exome services, their prices, turnaround times and kits being used. Once you’ve identified a service that looks like a good match, you can send questions to the provider or immediately order the exome-seq service.

  1. Find a service provider to perform exome-seq data analysis only

Do you already have an exome-seq dataset? Do you need a bioinformatician to perform variant calling or SNP ID? Are you interested in studying somatic or germline mutations? Use this guide to identify providers who have experienced bioinformaticians on staff that regularly perform this type of data analysis service. Simply click on a contact button to immediately send a message or question to a provider. If you’re looking for a quote, they will respond within the same or next business day.

If you still need help, feel free to take advantage of Genohub’s complimentary consultation services. We’re happy to help make recommendations for your whole exome sequencing project.