Top 3 Sample QC steps prior to Library Preparation for NGS

Before beginning library preparation for next-generation sequencing, it is highly recommended to perform sample quality control (QC) to check the nucleic acid quantity, purity and integrity. The starting material for NGS library construction might be any type of nucleic acid that is or can be converted into double-stranded DNA (dsDNA). These materials, often gDNA, RNA, PCR amplicons, and ChIP samples, must have high purity and integrity and sufficient concentration for the sequencing reaction.

1. Nucleic Acid Quantification

Measuring the concentration of nucleic acid samples is a key QC step to determine the fit and amount of nucleic acid available for further processing.

  • Absorbance Method:

A UV-Vis spectrophotometer can be used to analyze spectral absorbance to measure the whole nucleic acid profile and can differentiate between DNA, RNA and other absorbing contaminants. Different molecules such as nucleic acids, proteins, and chemical contaminants absorb light in their own pattern. By measuring the amount of light absorbed at a defined wavelength, the concentration of the molecules of interest can be calculated. Most laboratories are equipped with a US-Vis spectrophotometer to quantify nucleic acids or proteins for their day-to-day experiments. Customers can choose from several spectrophotometers currently available such as Thermo Scientific™ NanoDrop™ UV-Vis spectrophotometer, Qiagen QIAExpert System, Shimadzu Biospec-nano etc.

  • Fluorescence Method:

Fluorescence methods are more sensitive than absorbance, particularly for low-concentration samples, and the use of DNA-binding dyes allows more specific measurement of DNA than spectrophotometric methods. Fluorescence measurements are set at excitation and emission values that vary depending on the dye chosen (Hoechst bis-benzimidazole dyes, PicoGreen® or QuantiFluor™ dsDNA dyes). The concentration of unknown samples is calculated based on comparison to a standard curve generated from samples of known DNA concentration.

The availability of single-tube and microplate fluorometers gives flexibility for reading samples in PCR tubes, cuvettes or multiwell plates and makes fluorescence measurement a convenient modern alternative to the more traditional absorbance methods. Thermo Scientific (Invitrogen) Qubit™ Fluorometer is one of the most commonly used fluorometers that accurately measure low concentration DNA, RNA, and protein.


2. Nucleic Acid Purity

Nucleic acid samples can become contaminated by other molecules with which they were co-extracted and eluted during the purification process or by chemicals from upstream applications. Purification methods involving phenol extraction, ethanol precipitation or salting-out may not completely remove all contaminants or chemicals from the final eluates. The resulting impurities can significantly decrease the sensitivity and efficiency of your downstream enzymatic reactions.

  • UV spectrophotometry measurements enable calculation of nucleic acid concentrations based on the sample’s absorbance at 260 nm. The absorbance at 280 nm and 230 nm can be used to assess the level of contaminating proteins or chemicals, respectively. The absorbance ratio of nucleic acids to contaminants provides an estimation of the sample purity, and this number can be used as acceptance criteria for inclusion or exclusion of samples in downstream applications.
  • Contaminants such as RNA, proteins or chemicals can interfere with library preparation and the sequencing reactions. When sequencing DNA, an RNA removal step is highly recommended, and when sequencing RNA, a gDNA removal step is recommended. Sample purity can be assessed following nucleic acid extraction and throughout the library preparation workflow using UV/Vis spectrophotometry. For DNA and RNA samples the relative abundance of proteins in the sample can be assessed by determining the A260/A280ratio, which should be between 1.8–2.0. Contamination by organic compounds can be assessed using the A260/A230 ratio, which should be higher than 2.0 for DNA and higher than 1.5 for RNA. Next-generation spectrophotometry with the Qiagen QIAxpert system enables spectral content profiling, which can discriminate DNA and RNA from sample contaminants without using a dye.


  • qPCR:

Quantitative PCR, or real-time PCR, (qPCR) uses the linearity of DNA amplification to determine absolute or relative quantities of a known sequence in a sample. By using a fluorescent reporter in the reaction, it is possible to measure DNA generation in the qPCR assay. In qPCR, DNA amplification is monitored at each cycle of PCR. When the DNA is in the log-linear phase of amplification, the amount of fluorescence increases above the background. The point at which the fluorescence becomes measurable is called the threshold cycle (CT) or crossing point. By using multiple dilutions of a known amount of standard DNA, a standard curve can be generated of log concentration against CT. The amount of DNA or cDNA in an unknown sample can then be calculated from its CT value.

qPCR-based assays can accurately qualify and quantify amplifiable DNA in challenging samples. For example, DNA derived from Formalin-fixed paraffin-embedded tissue samples, is oftentimes highly fragmented, cross-linked with protein and has a high proportion of single-stranded DNA making it challenging to perform library preparation steps. For FFPE samples, the Agilent NGS FFPE QC kit enables functional DNA quality assessment of input DNA.

3. Nucleic Acid Integrity (Size distribution)

Along with quantity and purity, size distribution is a critical QC parameter that provides valuable insight into sample quality. Analyzing nucleic acid size informs you about your sample’s integrity and indicates whether the samples are fragmented or contaminated by other DNA or RNA products. Various electrophoretic methods can be used to assess the size distribution of your sample.

  • Agarose Gel Electrophoresis

In this method, a horizontal gel electrophoresis tank with an external power supply, analytical-grade agarose, an appropriate running buffer (e.g., 1X TAE) and an intercalating DNA dye along with appropriately sized DNA standards are required. A sample of the isolated DNA is loaded into a well of the agarose gel and then exposed to an electric field. The negatively charged DNA backbone migrates toward the anode. Since small DNA fragments migrate faster, the DNA is separated by size. The percentage of agarose in the gel will determine what size range of DNA will be resolved with the greatest clarity. Any RNA, nucleotides, and protein in the sample migrate at different rates compared to the DNA so the band(s) containing the DNA will be distinct.


Analyzing PCR amplicons or RFLP fragments confirms the presence of the expected size fragments and alerts you to the presence of any non-specific amplicons. Electrophoresis also helps you assess the ligation efficiency yield for plasmid cloning procedures as well as the efficiency of removal of primer–dimers or other unspecific fragments during sample cleanup.

For complex samples such as genomic DNA (gDNA) or total RNA, the shape and position of the smear from electrophoresis analysis directly correlates with the integrity of the samples. Nucleic acid species of larger size tend to be degraded first and provide degradation products of lower molecular weight. Samples of poor integrity generally have a higher abundance of shorter fragments, while high-quality samples contain intact nucleic acid molecules with higher molecular size.

Eukaryotic RNA samples have unique electrophoretic signatures, which consist of a smear with major fragments corresponding to 28S, 18S and 5S ribosomal RNA (rRNA). These electrophoretic patterns correlate with the integrity of the RNA samples. The RNA integrity can either be assessed manually or with automation that employs a dedicated algorithm such as the RNA Integrity Score (RIS) that gives an objective integrity grade to RNA samples ranging from 1–10. RNA samples of highest quality usually have a score of 8 or above.

  • Capillary Electrophoresis

In this method, charged DNA or RNA molecules are injected into a capillary and are resolved during migration through a gel-like matrix. Nucleic acids are detected as they pass by a detector that captures signals of specific absorbance. Results are presented in the form of an electropherogram, which is a plot of signal intensity against migration time. The fragment sizes are precisely determined using a size marker consisting of fragments of known size. This method provides highly resolving and sensitive nucleic acid analysis that is faster and safer.



Hybrid Read Sequencing: Applications and Tools

Next-generation sequencing (Illumina) and long read sequencing (PacBio/Oxford Nanopore) platforms each have their own strengths and weaknesses. Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, the combination of these techniques led to a new improved approach known as hybrid sequencing.

The hybrid sequencing methods utilize the high-throughput and high-accuracy short read data to correct errors in the long reads. This approach reduces the required amount of costlier long-read sequence data as well as results in more complete assemblies including the repetitive regions. Moreover, PacBio long reads can provide reliable alignments, scaffolds, and rough detections of genomic variants, while short reads refine the alignments, assemblies, and detections to single-nucleotide resolution. The high coverage of short read sequencing data output can also be utilized in downstream quantitative analysis1.


De novo sequencing

As alternatives to using PacBio sequencing alone for eukaryotic de novo assemblies, error correction strategies using hybrid sequencing have also been developed.

  • Koren et al. developed the PacBio corrected Reads (PBcR) approach for using short reads to correct the errors in long reads2. PBcR has been applied to reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced parrot (Melopsittacus undulates) The long-read correction approach, has achieved >99.9% base-call accuracy, leading to substantially better assemblies than non-hybrid sequencing strategies.
  • Also, Bashir et al. used hybrid sequencing data to assemble the two-chromosome genome of a Haitian cholera outbreak strain at >99.9% accuracy in two nearly finished contigs, completely resolving complex regions with clinically relevant structures3.
  • More recently, Goodwin et al. developed an open-source error correction algorithm Nanocorr, specifically for hybrid error correction of Oxford Nanopore reads. They used this error correction method with complementary MiSeq data to produce a highly contiguous and accurate de novo assembly of the Saccharomyces cerevisiae The contig N50 length was more than ten times greater than an Illumina-only assembly with >99.88% consensus identity when compared to the reference. Additionally, this assembly offered a complete representation of the features of the genome with correctly assembled gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly4.

Transcript structure and Gene isoform identification

Besides genome assembly, hybrid sequencing can also be applied to the error correction of PacBio long reads of transcripts. Moreover, it could improve gene isoform identification and abundance estimation.

  • Along with genome assembly, Koren et al. used the PBcR method to identify and confirm full-length transcripts and gene isoforms. As the length of the single-molecule PacBio reads from RNA-Seq experiments is within the size distribution of most transcripts, many PacBio reads represent near full-length transcripts. These long reads can therefore greatly reduce the need for transcript assembly, which requires complex algorithms for short reads and confidently detect alternatively spliced isoforms. However, the predominance of indel errors makes analysis of the raw reads challenging. Both sets of PacBio reads (before and after error-correction) were aligned to the reference genome to determine the ones that matched the exon structure over the entire length of the annotated transcripts. Before correction, only 41 (0.1%) of the PacBio reads exactly matched the annotated exon structure that rose to 12, 065 (24.1%) after correction.
  • Au et al. developed a computational tool called LSC for the correction of raw PacBio reads by short reads5. Applying this tool to 100,000 human brain cerebellum PacBio subreads and 64 million 75-bp Illumina short reads, they reduced the error rate of the long reads by more than 3-fold. In order to identify and quantify full-length gene isoforms, they also developed an Isoform Detection and Prediction tool (IDP), which makes use of TGS long reads and SGS short reads6. Applying LSC and IDP to PacBio long reads and Illumina short reads of the human embryonic stem cell transcriptome, they detected several thousand RefSeq-annotated gene isoforms at full-length. IDP-fusion has also been released for the identification of fusion genes, fusion sites, and fusion gene isoforms from cancer transcriptomes7.
  • Ning et al. developed an analysis method HySeMaFi to decipher gene splicing and estimate the gene isoforms abundance8. Firstly, the method establishes the mapping relationship between the error-corrected long reads and the longest assembled contig in every corresponding gene. According to the mapping data, the true splicing pattern of the genes is detected, followed by quantification of the isoforms.

Personal transcriptomes

Personal transcriptomes are expected to have applications in understanding individual biology and disease, but short read sequencing has been shown to be insufficiently accurate for the identification and quantification of an individual’s genetic variants and gene isoforms9.

  • Using a hybrid sequencing strategy combining PacBio long reads and Illumina short reads, Tilgner et al. sequenced the lymphoblastoid transcriptomes of three family members in order to produce and quantify an enhanced personalized genome annotation. Around 711,000 CCS reads were used to identify novel isoforms, and ∼100 million Illumina paired-end reads were used to quantify the personalized annotation, which cannot be accomplished by the relatively small number of long reads alone. This method produced reads representing all splice sites of a transcript for most sufficiently expressed genes shorter than 3 kb. It provides a de novo approach for determining single-nucleotide variations, which could be used to improve RNA haplotype inference10.

Epigenetics research

  • Beckmann et al. demonstrated the ability of PacBio sequencing to recover previously-discovered epigenetic motifs with m6A and m4C modifications in both low-coverage and high-contamination scenarios11. They were also able to recover many motifs from three mixed strains ( E. coliG. metallireducens, and C. salexigens), even when the motif sequences of the genomes of interest overlap substantially, suggesting that PacBio sequencing is applicable to metagenomics. Their studies infer that hybrid sequencing would be more cost-effective than using PacBio sequencing alone to detect and accurately define k-mers for low proportion genomes.

Hybrid assembly tools

Several algorithms have been developed that can help in the single molecule de novo assembly of genomes along with hybrid error correction using the short, high-fidelity sequences.

  • Jabba is a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. It uses a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds12. The tool is available here:
  • HALC is a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement and constructs a contig graph. This tool was applied on E. coliA. thaliana and Maylandia zebra data sets and has been showed to achieve up to 41 % higher throughput than other existing algorithms while maintaining comparable accuracy13. HALC can be downloaded here:
  • The HYBRIDSPADES algorithm was developed for assembling short and long reads and benchmarked on several bacterial assembly projects. HYBRIDSPADES generated accurate assemblies (even in projects with relatively low coverage by long reads), thus reducing the overall cost of genome sequencing. This method was used to demonstrate the first complete circular chromosome assembly of a genome from single cells of Candidate Phylum TM6using SMRT reads14. The tool is publicly available on this page:

Due to the constant development of new long read error correction tools, La et al. have recently published an open-source pipeline that evaluates the accuracy of these different algorithms15. LRCstats analyzed the accuracy of four hybrid correction methods for PacBio long reads over three data sets and can be downloaded here:

Sović et al. evaluated the different non-hybrid and hybrid assembly methods for de novo assembly using nanopore reads16. They benchmarked five non-hybrid assembly pipelines and two hybrid assemblers that use nanopore sequencing data to scaffold Illumina assemblies. Their results showed that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and performed relatively well on lower nanopore coverages. The implementation of this DNA Assembly benchmark is available here:


  1. Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genomics, Proteomics Bioinforma. 13, 278–289 (2015).
  2. Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotech 30, 693–700 (2012).
  3. Bashir, A. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat Biotechnol 30, (2012).
  4. Goodwin, S. et al. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res 25, (2015).
  5. Au, K. F., Underwood, J. G., Lee, L. & Wong, W. H. Improving PacBio Long Read Accuracy by Short Read Alignment. PLoS One 7, e46679 (2012).
  6. Au, K. F. et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl. Acad. Sci. 110, E4821–E4830 (2013).
  7. Weirather, J. L. et al. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Res. 43, e116 (2015).
  8. Ning, G. et al. Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome. 7, 43793 (2017).
  9. Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq.(ANALYSIS OPEN)(Report). Nat. Methods 10, 1177 (2013).
  10. Tilgner, H., Grubert, F., Sharon, D. & Snyder, M. P. Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl. Acad. Sci. 111, 9869–9874 (2014).
  11. Beckmann, N. D., Karri, S., Fang, G. & Bashir, A. Detecting epigenetic motifs in low coverage and metagenomics settings. BMC Bioinformatics 15, S16 (2014).
  12. Miclotte, G. et al. Jabba: hybrid error correction for long sequencing reads. Algorithms Mol. Biol. 11, 10 (2016).
  13. Bao, E. & Lan, L. HALC: High throughput algorithm for long read error correction. BMC Bioinformatics 18, 204 (2017).
  14. Antipov, D., Korobeynikov, A., McLean, J. S. & Pevzner, P. A. hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics 32, 1009–1015 (2016).
  15. La, S., Haghshenas, E. & Chauve, C. LRCstats, a tool for evaluating long reads correction methods. Bioinformatics (2017). doi:10.1093/bioinformatics/btx489
  16. Sović, I., Križanović, K., Skala, K. & Šikić, M. Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads . Bioinformatics 32, 2582–2589 (2016).


PacBio vs. Oxford Nanopore sequencing

Long-read sequencing developed by Pacific Biosciences and Oxford Nanopore overcome many of the limitations researchers face with short reads. Long reads improve de novo assembly, transcriptome analysis (gene isoform identification) and play an important role in the field of metagenomics. Longer reads are also useful when assembling genomes that include large stretches of repetitive regions.

Currently, there are two long read sequencing platforms. To help a researcher choose between which platform has greater utility for their application, we compare overall instrument specifications offered by PacBio and Oxford Nanopore, and published applications in the next-generation sequencing space.

Capturea Oxford Nanopore charges an access fee that gives users one MinION/PromethIon instrument, a starter pack of consumables, certain data services, and community-based support

* Insufficient data

Although both PacBio and Oxford Nanopore generate longer reads compared to short read Illumina or Ion sequencing, the higher error rate of both the PacBio and Oxford Nanopore sequencers remain an issue needs addressing. Whereas PacBio reads a molecule multiple times to generate high-quality consensus data, Oxford Nanopore can only sequence a molecule twice. As a result, PacBio generates data with lower error rates compared to Oxford Nanopore. PacBio has a slightly better overall performance for applications such as the discovery of transcriptome complexity and sensitive identification of isoforms. On the other hand, MinION provides higher throughput as nanopores can sequence multiple molecules simultaneously. Hence, it is best suited for applications that require a larger amount of data9

As long reads can provide large scaffolds, de novo assembly is one of the main applications of PacBio sequencing5. Though the error rate of PacBio data is higher than that of short read Illumina or Ion sequencing, increased coverage or hybrid sequencing can greatly improve the accuracy of genome assembly. PacBio sequencing has been successfully used to finish the 100-contig draft genome of Clostridium autoethanogenum DSM 10061, a Class III, the most complex genome classification in terms of repeat content and repeat type. It has a 31.1% GC content and contains repeats, prophage, and nine copies of rRNA gene operons. Using a single PacBio library and sequencing it with two SMRT cells, an entire genome can be assembled de novo with a single contig. When short read Illumina or Ion sequencing was used alone with the same genome, >22 contigs were needed, and each of the assemblies contained at least four collapsed repeat regions, PacBio assemblies had none10.

PacBio sequencing has also been used to assemble the chloroplast genome of Potentilla micrantha11, Saccharomyces cerevisiae, Aradopsis thaliana and Drosophila melanogaster using fewer contigs and CPU time for assembly compared to assemblies using Illumina sequencers12.

PacBio sequencing of PCR products can be used to improve the quality of current draft genomes by closing gaps and sequencing through hairpin structures and areas of high GC content more efficiently than Sanger sequencing13.

Pacific Biosciences has developed a protocol, Iso-Seq, for transcript sequencing. This includes library construction, size selection, sequencing data collection, and data processing. Iso-Seq allows direct sequencing of transcripts up to 10 kb without the use of a reference genome. Iso-Seq has been used to characterize alternative splicing events involved in the formation of blood cellular components14. This is essential for interpreting the effects of mutations leading to inherited disorders and blood cancers, and can be applied to design strategies to advance transplantation and regenerative medicine.

Another major application of PacBio sequencing is in epigenetics research. Recent studies demonstrate that investigation of intercellular heterogeneity in previously undetectable genome DNA modifications (such as m6A and m4C) is facilitated by the direct detection of modifications in single molecules by PacBio sequencing15.

Compared to PacBio, the Oxford Nanopore MinION is small (size of a USB thumb drive), affordable, utilizes a simple library prep and is field portable16. This is useful in situations such as a virus outbreak where a mobile diagnostic laboratory can be set up using MinIONS. In remote regions such as parts of Brazil and Africa where there are logistical issues associated with shipping samples for sequencing, MinION can provide immediate and real-time data to scientific investigators. The most notable clinical use of MinION has been the analysis of Ebola samples on-site during the viral outbreak in West Africa17,18.

The low cost of sequencing and portability of the MinION sequencer also make it a useful tool for teaching. It has been used to provide hands-on experience to students, most recently at Columbia University and the University of California Santa Cruz, where every student performed their own MinION sequencing19.

Perhaps the most ambitious MinION application is its potential to detect and identify bacteria and viruses on manned space flights. In a proof-of-concept experiment, Castro-Wallace et al. demonstrated successful sequencing and de novo assembly of a lambda phage genome, an E. coli genome, and a mouse mitochondrial genome. They observed that there was no significant difference in the quality of sequence data generated on the International Space Station and in control experiments that were performed in parallel on Earth22.

Recently, Oxford Nanopore developed a bench-top instrument, PromethION, that provides high-throughput sequencing and is modular in design. It contains 48 flow cells that can be run individually or in parallel. The PromethION flow cells contain 3000 channels each, and produce up to 40 Gb of data.



  1. Pacific Biosciences – AllSeq. Available at:
  2. Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).
  3. Lu, H., Giordano, F. & Ning, Z. Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics. Proteomics Bioinformatics 14, 265–279 (2016).
  4. Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. bioRxiv (2017).
  5. Jain, M. et al. MinION Analysis and Reference Consortium: Phase 2 data release and analysis of R9.0 chemistry [version 1; referees: awaiting peer review]. F1000Research 6, (2017).
  6. Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genomics, Proteomics Bioinforma. 13, 278–289 (2015).
  7. MinION. Available at:
  8. PromethION Early Access Programme. Available at:
  9. Oxford Nanopore in 2016. Available at:
  10. Weirather, J. L. et al. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Research 6, 100 (2017).
  11. Brown, S. D. et al. Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia. Biotechnol. Biofuels 7, 40 (2014).
  12. Ferrarini, M. et al. An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics 14, 670 (2013).
  13. Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotech 33, 623–630 (2015).
  14. Zhang, X. et al. Improving genome assemblies by sequencing PCR products with PacBio. Biotechniques 53, 61–62 (2012).
  15. Chen, L. et al. Transcriptional diversity during lineage commitment of human blood progenitors. Science (80-. ). 345, (2014).
  16. Feng, Z., Li, J., Zhang, J.-R. & Zhang, X. qDNAmod: a statistical model-based tool to reveal intercellular heterogeneity of DNA modification from SMRT sequencing data. Nucleic Acids Res. 42, 13488–13499 (2014).
  17. Jain, M., Olsen, H. E., Paten, B. & Akeson, M. Erratum to: The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 256 (2016).
  18. Quick, J. et al. Real-time, portable genome sequencing for Ebola surveillance. Nature 530, 228–232 (2016).
  19. Hoenen, T. et al. Nanopore sequencing as a rapidly deployable Ebola outbreak tool. Emerg. Infect. Dis. 22, 331–334 (2016).
  20. Citizen Sequencers: Taking Oxford Nanopore’s MinION to the Classroom and Beyond – Bio-IT World. Available at:
  21. Castro-Wallace, S. L. et al. Nanopore DNA Sequencing and Genome Assembly on the International Space Station. bioRxiv (2016).

AGBT 2014 – Summary of Day 1

AGBT 2014 Summary

The first day of the Advances in Genome Biology & Technology (AGBT) meeting kicked off with an introduction by Eric Green, Director of the National Human Genome Research Institute. He announced that this 15th annual meeting was the largest ever with 850 expected to attend. The opening plenary session certainly did not look like 850 people in attendance. Winter Storm Pax wreaked havoc on flights coming in from Atlanta and other cities, resulting in several speaker and general attendee cancellations.

The plenary session began with scheduled talks by Aviv Regev, Jeanne Lawrence, Wendy Winckler and Valerie Schneider. Jeanne Lawrence couldn’t make it, which was a shame particularly since she gave a brilliant talk at ASHG on using a single gene XIST to shut down the extra copy of chromosome 21 in Down syndrome. This work was nicely summarized in a publication that came out this summer titled: Translating dosage compensation to trisomy 21.          

Aviv Regev and Wendy Winckler’s talks were subject to a blog/tweet embargo (unclear whether Regev’s talk was completely under embargo or only the last half, we’re playing it safe and not discussing it here), leaving Valerie Schneider’s presentation the only one that was tweeted or written about. This instantly created great angst among those attending the lectures, those stuck in airports enroute to AGBT and those at home waiting for in depth coverage.

Single-cell sequencing, considered the “method of the year” by Nature Methods was the basis of the opening lecture. Aviv Regev offered an excellent view of the dendritic cell network based on cyclical perturbations and variations between single cells. Regev’s first half of her presentation titled, “Harnessing Variation Between Single Cells to Decipher Intra and Intercellular Circuits in Immune Cells” was largely covered by her publication in April, “Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells”.

The second talk, by Wendy Winckler was not allowed to be discussed or tweeted according to Winckler, courtesy of Novartis’s communications department. The title of her presentation “Next Generation Diagnostics for Precision Cancer Medicine” wasn’t revealing either. To get an idea of what she’s up to and the direction of her lecture, you can read these recent publications.

The final talk by Valerie Schneider, titled “Taking advantage of GRCh38” began with an analogy to an unwanted pair of socks one receives for Christmas that ends up being used and finally really liked. “It was time for an update….whether or not it was on your wish list”. We were reminded that centromeres are important specialized chromatin structures important for cell division, but because of repetitive regions, they are not represented in reference assemblies. Previous versions of the human reference assembly had centromeres represented by a 3M gap. The latest assembly, GRCh38 incorporates centromere models generated using whole genome shotgun reads as part of the Venter sequencing project. Since there are two copies of each centromere for each autosome, these centromere models represent an average of two copies. She concluded her presentation urging users to switch now:

 After a short break from the talks, the closing reception sponsored by Roche began outside. Halfway through, there was a brief yet sudden Florida thundershower that sent the entire AGBT community scurrying indoors for shelter. That was okay though because the conversations just continued indoors. Looking forward to tomorrow morning’s lectures. Several of the ones we’ve highlighted will be up.


3 Top Factors Researchers Consider When Selecting an NGS Provider

At Genohub, not only do we seek feedback from researchers, our development methodology is almost entirely based on this feedback. We receive this feedback via website forms as well as routine one-on-one conversations with some of the top researchers using next generation sequencing for their projects. Through this data and interaction, certain trends have begun to emerge which may be useful to an NGS provider seeking additional projects. This list is not based on a controlled experiment, however countless conversations indicate that these factors are extremely important:

  1. Turnaround time – this one is a toss up when compared with price, but we typically find turnaround time to be among the leading factors in a researcher’s decision to select an NGS provider. We have heard quite a few stories of researchers seeing turnaround times over several months for library prep and sequencing.
  2. Price – while this is one of the biggest factors for researchers, it must be qualified with established trust which is the next major factor.
  3. Trust – this one is a biggie for many researchers and often a non-starter if not established. The main reasons for this are that researchers are hesitant to ship their precious samples (ie human brain tissue) to an NGS provider for quite often costly sequencing if they are not confident in their abilities. Researchers have told us some of the things they look for which lend to building their confidence:
    • Referrals & Reviews – researchers seek out colleagues who have done similar projects and look for recommendations. Word of mouth is one of the biggest methods researchers rely on to select an NGS provider.
    • Publications – providers who are listed in publications involving similar projects.
    • What kind of QC will be run on the sample.
    • Overall experience indicators such as time in business and volume of samples regularly handled.
    • Data and sample security.
    • Location – this factor is considerably important if previous trust is not established. Some researchers have absolutely no problem shipping samples across the globe, while others might physically drive their samples to a local provider to ensure sample integrity.

We would love to hear your feedback on this topic whether you are an NGS provider, or a researcher actively using next sequencing. What other decision driving criteria have you found as a provider, or what are some other factors important to you as a researcher?

In a Nutshell: Life Tech Exome Certified Service Provider Program

Life Technologies announced yesterday that they launched the Ion AmpliSeq Exome Certified Service Provider Program.

What the program is in a nutshell:

  • Goals: Offer a network of next gen sequencing providers able to help researchers get a high quality exome sequence at a reduced cost with fast turnaround times and low amounts of input material
  • Exome sequencing inputs: as little as 50ng of customer DNA
  • Library kit used: Ion AmpliSeq Exome kit
  • NGS Instrument used: Ion Proton
  • Exome sequencing outputs: high quality data, which of course can be used with Ion Reporter Software for mutation validation, annotation, and reporting

The Service Provider Program is intended to fill exome sequencing market demand which Life Tech argues has been under-serviced with exome sequencing currently going for $1,000+ , long turnaround times up to 8 weeks, and requiring up to 3mg of DNA. Dr. Candace Johnson, Deputy Director and the Wallace Chair of Translational Research at Roswell Park Cancer Institute states “Exome sequencing will be central to discoveries made in clinical research”. If the Exome CSP delivers as promised, it could have a major impact in accelerating discoveries made in clinical research.

For more information on the Life Tech Provider Program please see the entire press release.

Targeted Resequencing (TPS/WES) Tops Next Gen Sequencing Survey

Oxford Gene Technology (NGS provider currently listed on Genohub) recently presented the results of their next gen sequencing survey which demonstrated targeted resequencing as the top use for next generation sequencing. The results are based on a survey of 596 researchers who responded regarding their current and expected use of NGS services. When compared to the results for whole genome sequencing the popularity of targeted resequencing is possibly attributed mostly to the lower cost of targeted resequencing. This infographic depicts the results:

OGT NGS Survey Results

OGT NGS Survey Results

Other interesting results point to a general data problem with 38% of respondents saying they lack trust in bioinformatics data. Bioinformatics also leads the field when researchers were asked about the biggest barrier to NGS usage (see below).

Barriers to NGS Usage

Barriers to NGS Usage

Undoubtedly this presents an immense opportunity for the bioinformatics sector to increase confidence in data accuracy and interpretation which could have a positive impact on the use of next gen sequencing as a whole.

You can find many more interesting survey results on the excellent infographic titled Oxford Gene Technology – NGS Survey 2013.