Hybrid Read Sequencing: Applications and Tools

Next-generation sequencing (Illumina) and long read sequencing (PacBio/Oxford Nanopore) platforms each have their own strengths and weaknesses. Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, the combination of these techniques led to a new improved approach known as hybrid sequencing.

The hybrid sequencing methods utilize the high-throughput and high-accuracy short read data to correct errors in the long reads. This approach reduces the required amount of costlier long-read sequence data as well as results in more complete assemblies including the repetitive regions. Moreover, PacBio long reads can provide reliable alignments, scaffolds, and rough detections of genomic variants, while short reads refine the alignments, assemblies, and detections to single-nucleotide resolution. The high coverage of short read sequencing data output can also be utilized in downstream quantitative analysis1.

Applications

De novo sequencing

As alternatives to using PacBio sequencing alone for eukaryotic de novo assemblies, error correction strategies using hybrid sequencing have also been developed.

  • Koren et al. developed the PacBio corrected Reads (PBcR) approach for using short reads to correct the errors in long reads2. PBcR has been applied to reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced parrot (Melopsittacus undulates) The long-read correction approach, has achieved >99.9% base-call accuracy, leading to substantially better assemblies than non-hybrid sequencing strategies.
  • Also, Bashir et al. used hybrid sequencing data to assemble the two-chromosome genome of a Haitian cholera outbreak strain at >99.9% accuracy in two nearly finished contigs, completely resolving complex regions with clinically relevant structures3.
  • More recently, Goodwin et al. developed an open-source error correction algorithm Nanocorr, specifically for hybrid error correction of Oxford Nanopore reads. They used this error correction method with complementary MiSeq data to produce a highly contiguous and accurate de novo assembly of the Saccharomyces cerevisiae The contig N50 length was more than ten times greater than an Illumina-only assembly with >99.88% consensus identity when compared to the reference. Additionally, this assembly offered a complete representation of the features of the genome with correctly assembled gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly4.

Transcript structure and Gene isoform identification

Besides genome assembly, hybrid sequencing can also be applied to the error correction of PacBio long reads of transcripts. Moreover, it could improve gene isoform identification and abundance estimation.

  • Along with genome assembly, Koren et al. used the PBcR method to identify and confirm full-length transcripts and gene isoforms. As the length of the single-molecule PacBio reads from RNA-Seq experiments is within the size distribution of most transcripts, many PacBio reads represent near full-length transcripts. These long reads can therefore greatly reduce the need for transcript assembly, which requires complex algorithms for short reads and confidently detect alternatively spliced isoforms. However, the predominance of indel errors makes analysis of the raw reads challenging. Both sets of PacBio reads (before and after error-correction) were aligned to the reference genome to determine the ones that matched the exon structure over the entire length of the annotated transcripts. Before correction, only 41 (0.1%) of the PacBio reads exactly matched the annotated exon structure that rose to 12, 065 (24.1%) after correction.
  • Au et al. developed a computational tool called LSC for the correction of raw PacBio reads by short reads5. Applying this tool to 100,000 human brain cerebellum PacBio subreads and 64 million 75-bp Illumina short reads, they reduced the error rate of the long reads by more than 3-fold. In order to identify and quantify full-length gene isoforms, they also developed an Isoform Detection and Prediction tool (IDP), which makes use of TGS long reads and SGS short reads6. Applying LSC and IDP to PacBio long reads and Illumina short reads of the human embryonic stem cell transcriptome, they detected several thousand RefSeq-annotated gene isoforms at full-length. IDP-fusion has also been released for the identification of fusion genes, fusion sites, and fusion gene isoforms from cancer transcriptomes7.
  • Ning et al. developed an analysis method HySeMaFi to decipher gene splicing and estimate the gene isoforms abundance8. Firstly, the method establishes the mapping relationship between the error-corrected long reads and the longest assembled contig in every corresponding gene. According to the mapping data, the true splicing pattern of the genes is detected, followed by quantification of the isoforms.

Personal transcriptomes

Personal transcriptomes are expected to have applications in understanding individual biology and disease, but short read sequencing has been shown to be insufficiently accurate for the identification and quantification of an individual’s genetic variants and gene isoforms9.

  • Using a hybrid sequencing strategy combining PacBio long reads and Illumina short reads, Tilgner et al. sequenced the lymphoblastoid transcriptomes of three family members in order to produce and quantify an enhanced personalized genome annotation. Around 711,000 CCS reads were used to identify novel isoforms, and ∼100 million Illumina paired-end reads were used to quantify the personalized annotation, which cannot be accomplished by the relatively small number of long reads alone. This method produced reads representing all splice sites of a transcript for most sufficiently expressed genes shorter than 3 kb. It provides a de novo approach for determining single-nucleotide variations, which could be used to improve RNA haplotype inference10.

Epigenetics research

  • Beckmann et al. demonstrated the ability of PacBio sequencing to recover previously-discovered epigenetic motifs with m6A and m4C modifications in both low-coverage and high-contamination scenarios11. They were also able to recover many motifs from three mixed strains ( E. coliG. metallireducens, and C. salexigens), even when the motif sequences of the genomes of interest overlap substantially, suggesting that PacBio sequencing is applicable to metagenomics. Their studies infer that hybrid sequencing would be more cost-effective than using PacBio sequencing alone to detect and accurately define k-mers for low proportion genomes.

Hybrid assembly tools

Several algorithms have been developed that can help in the single molecule de novo assembly of genomes along with hybrid error correction using the short, high-fidelity sequences.

  • Jabba is a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. It uses a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds12. The tool is available here: https://github.com/biointec/jabba.
  • HALC is a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement and constructs a contig graph. This tool was applied on E. coliA. thaliana and Maylandia zebra data sets and has been showed to achieve up to 41 % higher throughput than other existing algorithms while maintaining comparable accuracy13. HALC can be downloaded here:  https://github.com/lanl001/halc.
  • The HYBRIDSPADES algorithm was developed for assembling short and long reads and benchmarked on several bacterial assembly projects. HYBRIDSPADES generated accurate assemblies (even in projects with relatively low coverage by long reads), thus reducing the overall cost of genome sequencing. This method was used to demonstrate the first complete circular chromosome assembly of a genome from single cells of Candidate Phylum TM6using SMRT reads14. The tool is publicly available on this page: http://bioinf.spbau.ru/en/spades.

Due to the constant development of new long read error correction tools, La et al. have recently published an open-source pipeline that evaluates the accuracy of these different algorithms15. LRCstats analyzed the accuracy of four hybrid correction methods for PacBio long reads over three data sets and can be downloaded here: https://github.com/cchauve/lrcstats.

Sović et al. evaluated the different non-hybrid and hybrid assembly methods for de novo assembly using nanopore reads16. They benchmarked five non-hybrid assembly pipelines and two hybrid assemblers that use nanopore sequencing data to scaffold Illumina assemblies. Their results showed that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and performed relatively well on lower nanopore coverages. The implementation of this DNA Assembly benchmark is available here: https://github.com/kkrizanovic/NanoMark.

References:

  1. Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genomics, Proteomics Bioinforma. 13, 278–289 (2015).
  2. Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotech 30, 693–700 (2012).
  3. Bashir, A. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat Biotechnol 30, (2012).
  4. Goodwin, S. et al. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res 25, (2015).
  5. Au, K. F., Underwood, J. G., Lee, L. & Wong, W. H. Improving PacBio Long Read Accuracy by Short Read Alignment. PLoS One 7, e46679 (2012).
  6. Au, K. F. et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl. Acad. Sci. 110, E4821–E4830 (2013).
  7. Weirather, J. L. et al. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Res. 43, e116 (2015).
  8. Ning, G. et al. Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome. 7, 43793 (2017).
  9. Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq.(ANALYSIS OPEN)(Report). Nat. Methods 10, 1177 (2013).
  10. Tilgner, H., Grubert, F., Sharon, D. & Snyder, M. P. Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl. Acad. Sci. 111, 9869–9874 (2014).
  11. Beckmann, N. D., Karri, S., Fang, G. & Bashir, A. Detecting epigenetic motifs in low coverage and metagenomics settings. BMC Bioinformatics 15, S16 (2014).
  12. Miclotte, G. et al. Jabba: hybrid error correction for long sequencing reads. Algorithms Mol. Biol. 11, 10 (2016).
  13. Bao, E. & Lan, L. HALC: High throughput algorithm for long read error correction. BMC Bioinformatics 18, 204 (2017).
  14. Antipov, D., Korobeynikov, A., McLean, J. S. & Pevzner, P. A. hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics 32, 1009–1015 (2016).
  15. La, S., Haghshenas, E. & Chauve, C. LRCstats, a tool for evaluating long reads correction methods. Bioinformatics (2017). doi:10.1093/bioinformatics/btx489
  16. Sović, I., Križanović, K., Skala, K. & Šikić, M. Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads . Bioinformatics 32, 2582–2589 (2016).

 

PacBio vs. Oxford Nanopore sequencing

Long-read sequencing developed by Pacific Biosciences and Oxford Nanopore overcome many of the limitations researchers face with short reads. Long reads improve de novo assembly, transcriptome analysis (gene isoform identification) and play an important role in the field of metagenomics. Longer reads are also useful when assembling genomes that include large stretches of repetitive regions.

Currently, there are two long read sequencing platforms. To help a researcher choose between which platform has greater utility for their application, we compare overall instrument specifications offered by PacBio and Oxford Nanopore, and published applications in the next-generation sequencing space.

Capturea Oxford Nanopore charges an access fee that gives users one MinION/PromethIon instrument, a starter pack of consumables, certain data services, and community-based support

* Insufficient data

Although both PacBio and Oxford Nanopore generate longer reads compared to short read Illumina or Ion sequencing, the higher error rate of both the PacBio and Oxford Nanopore sequencers remain an issue needs addressing. Whereas PacBio reads a molecule multiple times to generate high-quality consensus data, Oxford Nanopore can only sequence a molecule twice. As a result, PacBio generates data with lower error rates compared to Oxford Nanopore. PacBio has a slightly better overall performance for applications such as the discovery of transcriptome complexity and sensitive identification of isoforms. On the other hand, MinION provides higher throughput as nanopores can sequence multiple molecules simultaneously. Hence, it is best suited for applications that require a larger amount of data9

As long reads can provide large scaffolds, de novo assembly is one of the main applications of PacBio sequencing5. Though the error rate of PacBio data is higher than that of short read Illumina or Ion sequencing, increased coverage or hybrid sequencing can greatly improve the accuracy of genome assembly. PacBio sequencing has been successfully used to finish the 100-contig draft genome of Clostridium autoethanogenum DSM 10061, a Class III, the most complex genome classification in terms of repeat content and repeat type. It has a 31.1% GC content and contains repeats, prophage, and nine copies of rRNA gene operons. Using a single PacBio library and sequencing it with two SMRT cells, an entire genome can be assembled de novo with a single contig. When short read Illumina or Ion sequencing was used alone with the same genome, >22 contigs were needed, and each of the assemblies contained at least four collapsed repeat regions, PacBio assemblies had none10.

PacBio sequencing has also been used to assemble the chloroplast genome of Potentilla micrantha11, Saccharomyces cerevisiae, Aradopsis thaliana and Drosophila melanogaster using fewer contigs and CPU time for assembly compared to assemblies using Illumina sequencers12.

PacBio sequencing of PCR products can be used to improve the quality of current draft genomes by closing gaps and sequencing through hairpin structures and areas of high GC content more efficiently than Sanger sequencing13.

Pacific Biosciences has developed a protocol, Iso-Seq, for transcript sequencing. This includes library construction, size selection, sequencing data collection, and data processing. Iso-Seq allows direct sequencing of transcripts up to 10 kb without the use of a reference genome. Iso-Seq has been used to characterize alternative splicing events involved in the formation of blood cellular components14. This is essential for interpreting the effects of mutations leading to inherited disorders and blood cancers, and can be applied to design strategies to advance transplantation and regenerative medicine.

Another major application of PacBio sequencing is in epigenetics research. Recent studies demonstrate that investigation of intercellular heterogeneity in previously undetectable genome DNA modifications (such as m6A and m4C) is facilitated by the direct detection of modifications in single molecules by PacBio sequencing15.

Compared to PacBio, the Oxford Nanopore MinION is small (size of a USB thumb drive), affordable, utilizes a simple library prep and is field portable16. This is useful in situations such as a virus outbreak where a mobile diagnostic laboratory can be set up using MinIONS. In remote regions such as parts of Brazil and Africa where there are logistical issues associated with shipping samples for sequencing, MinION can provide immediate and real-time data to scientific investigators. The most notable clinical use of MinION has been the analysis of Ebola samples on-site during the viral outbreak in West Africa17,18.

The low cost of sequencing and portability of the MinION sequencer also make it a useful tool for teaching. It has been used to provide hands-on experience to students, most recently at Columbia University and the University of California Santa Cruz, where every student performed their own MinION sequencing19.

Perhaps the most ambitious MinION application is its potential to detect and identify bacteria and viruses on manned space flights. In a proof-of-concept experiment, Castro-Wallace et al. demonstrated successful sequencing and de novo assembly of a lambda phage genome, an E. coli genome, and a mouse mitochondrial genome. They observed that there was no significant difference in the quality of sequence data generated on the International Space Station and in control experiments that were performed in parallel on Earth22.

Recently, Oxford Nanopore developed a bench-top instrument, PromethION, that provides high-throughput sequencing and is modular in design. It contains 48 flow cells that can be run individually or in parallel. The PromethION flow cells contain 3000 channels each, and produce up to 40 Gb of data.

 

References:

  1. Pacific Biosciences – AllSeq. Available at: http://allseq.com/knowledge-bank/sequencing-platforms/pacific-biosciences/.
  2. Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).
  3. Lu, H., Giordano, F. & Ning, Z. Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics. Proteomics Bioinformatics 14, 265–279 (2016).
  4. Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. bioRxiv (2017).
  5. Jain, M. et al. MinION Analysis and Reference Consortium: Phase 2 data release and analysis of R9.0 chemistry [version 1; referees: awaiting peer review]. F1000Research 6, (2017).
  6. Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genomics, Proteomics Bioinforma. 13, 278–289 (2015).
  7. MinION. Available at: https://nanoporetech.com/products/minion.
  8. PromethION Early Access Programme. Available at: https://nanoporetech.com/community/promethion-early-access-programme.
  9. Oxford Nanopore in 2016. Available at: http://blog.booleanbiotech.com/nanopore_2016.html.
  10. Weirather, J. L. et al. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Research 6, 100 (2017).
  11. Brown, S. D. et al. Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia. Biotechnol. Biofuels 7, 40 (2014).
  12. Ferrarini, M. et al. An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics 14, 670 (2013).
  13. Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotech 33, 623–630 (2015).
  14. Zhang, X. et al. Improving genome assemblies by sequencing PCR products with PacBio. Biotechniques 53, 61–62 (2012).
  15. Chen, L. et al. Transcriptional diversity during lineage commitment of human blood progenitors. Science (80-. ). 345, (2014).
  16. Feng, Z., Li, J., Zhang, J.-R. & Zhang, X. qDNAmod: a statistical model-based tool to reveal intercellular heterogeneity of DNA modification from SMRT sequencing data. Nucleic Acids Res. 42, 13488–13499 (2014).
  17. Jain, M., Olsen, H. E., Paten, B. & Akeson, M. Erratum to: The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 256 (2016).
  18. Quick, J. et al. Real-time, portable genome sequencing for Ebola surveillance. Nature 530, 228–232 (2016).
  19. Hoenen, T. et al. Nanopore sequencing as a rapidly deployable Ebola outbreak tool. Emerg. Infect. Dis. 22, 331–334 (2016).
  20. Citizen Sequencers: Taking Oxford Nanopore’s MinION to the Classroom and Beyond – Bio-IT World. Available at: http://www.bio-itworld.com/2015/12/9/citizen-sequencers-taking-oxford-nanopores-minion-classroom-beyond.html.
  21. Castro-Wallace, S. L. et al. Nanopore DNA Sequencing and Genome Assembly on the International Space Station. bioRxiv (2016).

6 Methods to Fragment Your DNA / RNA for Next-Gen Sequencing

The preparation of a high quality sequencing library plays an important role in next-generation sequencing (NGS). The first main step in preparing nucleic acid for NGS is fragmentation. In the next series of blog posts we will present important challenges and things to consider as you isolate nucleic acid samples and prepare your own libraries.

Next Generation Sequencing, will give you a plethora of reads, but they will be short. Illumina and Ion read lengths are currently under 600 bases. Roche 454 outputs reads at less than 1kb and PacBio less than 9kb in length. This makes sizing your input DNA or RNA important prior to library construction. There are three main ways to shorten your long nucleic acid material into something compatible for next-gen sequencing: 1) Physical, 2) Enzymatic and 3) Chemical shearing.

Physical Fragmentation

1) Acoustic shearing

2) Sonication

3) Hydrodynamic shear

Acoustic shearing and sonication are the main physical methods used to shear DNA. The Covaris® instrument (Woburn, MA) is an acoustic device for breaking DNA into 100-5kb bp. Covaris also manufactures tubes (gTubes) which will process samples in the 6-20 kb for Mate-Pair libraries. The Bioruptor® (Denville, NJ) is a sonication device utilized for shearing chromatin, DNA and disrupting tissues. Small volumes of DNA can be sheared to 150-1kb in length. Hydroshear from Digilab (Marlborough, MA) utilizes hydrodynamic forces to shear DNA.  Nebulizers (Life Tech, Grand Island, NY) can also be used to atomize liquid using compressed air, shearing DNA into 100-3kb fragments in seconds. While nebulization is low cost and doesn’t require the purchase of an instrument, it is not recommended if you have limited starting material. You can lose up to 30% of your DNA with a nebulizer. The other sonication and acoustic shearing devices described above are better designed for smaller volumes and retain the entire amount of your DNA more efficiently.

Enzymatic Methods

4) DNase I or other restriction endonuclease, non-specific nuclease

5) Transposase

Enzymatic methods to shear DNA into small pieces include DNAse I, a combination of maltose binding protein (MBP)-T7 Endo I and a non-specific nuclease Vibrio vulnificus (Vvn), NEB’s (Ipswich, MA) Fragmentase and Nextera tagmentation technology (Illumina, San Diego, CA). The combination of non-specific nuclease and T7 Endo synergistically work to produce non-specific nicks and counter nicks, generating fragments that disassociate 8 nucleotides or less from the nick site. Tagmentation uses a transposase to simultaneously fragment and insert adapters onto dsDNA. Generally enzymatic fragmentation has shown to be consistent, but worse when compared to physical shear methods when it comes to bias and detecting insertions and deletions (indels) (Knierim et al., 2011). Depending on your specific application, de novo genome sequencing vs. small genome re-sequencing, biases associated with enzymatic fragmentation may not be as important.

RNAse III is an endonuclease that cleaves RNA into small fragments with 5’phosphate and 3’hydroxyl groups. While these end groups are needed for RNA ligation, making the assay convenient, RNAse III cleavage does have sequence preference which makes the cleavage biased. Heat / chemical methods described below, while they leave 3’phosphate and 5’hydroxyl ends, show less sequence bias and are generally preferred methods in library preparation.

Chemical Fragmentation    

6) Heat and divalent metal cation

Chemical shear is typically reserved for the breakup of long RNA fragments. This is typically performed through the heat digestion of RNA with a divalent metal cation (magnesium or zinc). The length of your RNA (115 bp – 350 nt) can be adjusted by increasing or decreasing the time of incubation.

The size of your DNA or RNA insert is a key factor for library construction and sequencing. You’ll need to choose an instrument and read length that is compatible with your insert length. You can choose this by entering project parameters in the Shop by Project page and filtering according to read length (estimated insert length). If you’re not sure, we can help. Send us a request through our consultation form .

Reference:

Systematic Comparison of Three Methods for Fragmentation of Long-Range PCR Products for Next Generation Sequencing

Ellen Knierim, Barbara Lucke, Jana Marie Schwarz, Markus Schuelke, Dominik Seelow

 

 

Considerations for Sequencing microRNA

microRNA sequencing

We’ve put together a new small RNA (microRNA) sequencing guide describing considerations all new users should make before undertaking a small RNA sequencing project. One of the first considerations is determining the number of reads you need. This usually depends on whether you’re interested in differential small RNA expression or if you’re trying to discover new microRNAs. Once you know the number of reads you need per sample, consider the following factors before and after library preparation:

  1. Should you start with total RNA or isolated small RNA?
  2. How much material should you start with?
  3. What’s the minimum quality of total RNA acceptable for microRNA library preparation and sequencing?
  4. How will small RNA ligation bias affect my results?
  5. How can I minimize adapter dimers to improve read mapping and general usability of my sequencing reads?
  6. How many samples can I multiplex or pool together in a single sequencing lane?
  7. What sequencing read length should I choose for microRNA or small RNA sequencing studies?

The guide also includes recommendations for getting accurate per sample pricing and turnaround times.

Small RNAs play a big role in regulating the translation of target RNAs through RNA to RNA interactions and have been shown to offer potential as biomarkers in diagnostic applications. Sequencing promises to be a useful tool in unraveling the roles of these short non-coding RNAs. We look forward to working with you on your next microRNA project.

 

 

New Short, Long and High Throughput Sequencing Reads in 2016

 

Nanopore sequencing

Nanopore sequencing

An exciting wave of newly released DNA sequencing instruments and technology will soon be available to researchers. From DNA sequencers the size of a cell phone to platforms that turn short reads into long-range information, these new sequencing technologies will be available on Genohub as services that can be ordered. Below is a summary of the technology you can expect in Q1 of 2016:

10X Genomics GemCode Platform

The GemCode platform from 10X Genomics partitions long DNA fragments up to 100 kb with a pool of ~750K molecular barcodes, indexing the genome during library construction. Barcoded DNA fragments are made such that all fragments share the same barcode. After several cycling and pooling steps, >100K barcode containing partitions are created. GemCode software then maps short Illumina read pairs back to the original long DNA molecules using the barcodes added during library preparation. With long range information, haplotype phasing and improved structural variant detection are possible. Gene fusions, deletions and duplications can be detected from exome data.

Ion Torrent S5, S5 XL

The S5 system was developed by Ion to focus on the clinical amplicon-seq market. While the wait for delivery of Proton PII chips continues, Ion delivered a machine with chip configurations very much similar to past PGM and Proton chips. 520/530 chips offer 200-400 bp runs with 80M reads and 2-4 hour run times. Using Ion’s fixed amplicon panels, data analysis can be completed within 5 hours. The Ion chef is required to reduce hands on library prep time, otherwise libraries and chip loading needs to be performed manually. Ion looks to have positioned their platform toward clinical applications. With stiff competition from Illumina and their inability to deliver similar read lengths and throughput, this is a smart decision by Ion. Focusing their platform on a particular application likely means future development (longer and higher throughput reads) has been paused indefinitely.

Pacific Biosciences Sequel System

Announced in September 2015, the Sequel System uses the same Single Molecule, Real Time (SMRT) technology as the RSII,   but boasts several tech advancements. At around one third the cost of a RSII, the Sequel offers 7x more reads with 1M zero –mode waveguides (ZMWs) per SMRT cell versus the previous standard of 150K. The application of Iso-Seq or full length transcript sequencing is especially promising as 1M reads crosses into the threshold where discovery and quantitation of transcripts becomes interesting. By providing full length transcript isoforms, it’s no longer necessary to reconstruct transcripts or infer isoforms based on short read information. Of course, the Sequel is ideal for generating whole genome de novo assemblies. We’l follow how the Oxford Nanopore’s ONT MinIon competes with the Sequel system in 2016.

Oxford Nanopore’s (ONT) MinIon

In 2014, Oxford Nanopore started it’s MinIon Access Program (MAP) delivering over 1,000 MinIons to users who wanted to test the technology. These users have gone on to publish whole E. Coli and Yeast genome assemblies. Accuracy of the device is up to 85% per raw base and there are difficulties in dealing with high G+C content sequences. There remains a lot of work left to improve the technology before widespread adoption. The workflow is simple and uses typical library construction steps of end-repair and ligation. Once the sample is added to the flow cell, users can generate long reads >100 kb and can analyze data in real time. Median reads are currently in the 1-2 kb length. Combined alongside with MiSeq reads, publications have shown MinIon output can enhance contiguity of de novo assembly. Lower error rates generated by Two Direction reads produced with recent updated MinIon chemistry does give cause for optimism that greatly reduced error rates can be achieved in the near future. This along with a low unit cost and the ability to deploy the USB sized device in the field make this a very exciting technology.

Illumina HiSeq X

While HiSeq X services have been available on Genohub for over a year, Illumina’s announcement of its expansion to non-human whole genomes was well received. However there are still several unanswered questions. Illumina states,

The updated rights of use will allow for market expansion and population-scale sequencing of non-human species in a variety of markets, including plants and livestock in agricultural research and model organisms in pharmaceutical research. Previously, it has been cost prohibitive to sequence non-human genomes at high coverage.

You can now sequence mouse, rat and other relatively large sized genomes economically on the HiSeq X. This makes the most sense for high coverage applications, e.g. 30x or above. While smaller sized and medium sized genomes can be sequenced on a HiSeq X, the low level of barcoding and high coverage you’d obtain makes these applications less attractive. According to Illumina, as of 12/20/2015, metagenomic whole genome sequencing was not a compatible application on the HiSeq X. The instrument is still restricted to WGS only. RNA-Seq, Exome-seq and ChIP-Seq applications will have to wait. Perhaps by the time the HiSeq X One is released access will be opened to these non-WGS applications.

While these new instruments make their way onto Genohub’s Shop by Project page, you can make inquiries and even order services by placing a request on our consultation page.

Key Considerations for Whole Exome Sequencing

exome sequencing and library preparation

Exome, UTR, non-coding regions, CDS

Whole exome sequencing is a powerful technique for sequencing protein coding genes in the genome (known as the exome). It’s a useful tool for applications where detecting variants is important, including population genetics, association and linkage, and oncology studies.

As the main hub for searching and ordering next generation sequencing services, most researchers about to embark on an exome sequencing project start their search on Genohub.com.  It’s our responsibility to make sure the researcher is informed and prepared before placing an order for an exome sequencing service.

Working toward achieving this goal, we’ve established a series of guides for anyone about to start a whole exome sequencing project. We’ve described each of these guides here.

  1. Should I choose Whole Genome Sequencing or Whole Exome Sequencing

This guide describes what you can get with WGS that you won’t with WES and compares pricing on a per sample basis. It also provides an overview of sequence coverage, coverage uniformity, off-target effects and bias due to PCR amplification.

  1. How to choose a Exome Sequencing Kit for capture and sequencing

This guide breaks down each commercial exome capture kit, comparing Agilent SureSelect, Nimblegen SeqCap and Illumina Nextera Rapid Capture. Numbers of probes used for capture, DNA input required, adapter addition strategy, probe length and design, hybridization time and cost per capture are all compared. This comparison is followed by a description of each kit’s protocol.

  1. How to calculate the number of sequencing reads needed for exome sequencing

In the same guide that compares library preparation kits (above), we go through an example on how to determine the amount of sequencing and read length required for your exome study. This is especially important when you start comparing the cost for exome sequencing services (see the next guide).

  1. How to choose an exome sequencing and library preparation service

Are you looking for 100x sequencing coverage, what many in the industry call standard exome sequencing or 200x coverage, considered ‘high depth’?  Or are you interested in a CLIA grade, clinical whole exome sequencing service? This exome guide breaks each down into searches that can be performed on Genohub. The search buttons allow for real time comparison of available exome services, their prices, turnaround times and kits being used. Once you’ve identified a service that looks like a good match, you can send questions to the provider or immediately order the exome-seq service.

  1. Find a service provider to perform exome-seq data analysis only

Do you already have an exome-seq dataset? Do you need a bioinformatician to perform variant calling or SNP ID? Are you interested in studying somatic or germline mutations? Use this guide to identify providers who have experienced bioinformaticians on staff that regularly perform this type of data analysis service. Simply click on a contact button to immediately send a message or question to a provider. If you’re looking for a quote, they will respond within the same or next business day.

If you still need help, feel free to take advantage of Genohub’s complimentary consultation services. We’re happy to help make recommendations for your whole exome sequencing project.

Benchmarking Differential Gene Expression Tools

In a recent study, Schurch et al., 2015 closely examine 9 differential gene expression (DGE) tools (baySeq , cuffdiff , DESeq , edgeR , limma , NOISeq , PoissonSeq , SAMSeq, DEGSeq) and rate their performance as a function of replicates in an RNA-Seq experiment. The group highlights edgeR and DESeq as the most widely used tools in the field and conclude that they along with limma perform the best in studies with high and low numbers of biological replicates. The study goes further, making the specific recommendation that experiments with greater than 12 replicates should use DESeq, while those with fewer than 12 replicates should use edgeR. As for the number of replicates needed, Schurch et al recommend at least 6 replicates/condition in an RNA-seq experiment, and up to 12 in studies where identifying the majority of differentially expressed genes is critical. 

With each technical replicate having only 0.8-2.8M reads, this paper and others (Rapaport et al., 2013) continue to suggest that more replicates in an RNA-seq experiment are preferred over simply increasing the number of sequencing reads. Several other papers, including differential expression profiling recommendations in our Sequencing Coverage Guide recommend at least 10M reads per sample, but do not make recommendations on the numbers of replicates needed. The read/sample number disparity is related to the relatively small and well annotated S. cerevisiae genome in this study and the more complex, multiple transcript isoforms in mammalian tissue. By highlighting studies that carefully examine the number of replicates that should be used, we hope to improve RNA-seq experimental design on Genohub

So why don’t researchers use an adequate number of replicates? 1) Sequencing cost, 2) Inexperience in differential gene expression analysis. We compare the costs between 6 and 12 replicates in yeast and human RNA-Seq experiments using 1 and 10M reads/sample to show that in many cases adding more replicates in an experiment can be affordable. 

 

6 replicates

12 replicates

Human (10M reads/sample)

$2,660

$4,470

Yeast (1M reads/sample)

$2,810

$4,470

*Prices are in USD and are inclusive of both sequencing and library prep cost. Click on prices in the table to see more project specific detail.

The table shows that the main factor in the price difference is related to library preparation costs. Sequencing on the Illumina Miseq or Hiseq at the listed sequencing depths do not play as significant a role in cost, due to the sequencing capacity of those instruments. 

To accurately determine the sequencing output required for your RNA-seq study, simply change the number of reads/sample in our interactive Project Page

 

References:

Evaluation of tools for differential gene expression analysis by RNA-seq on a 48 biological replicate experiment. Nicholas J. Schurch, Pieta Schofield, Marek Gierliński, Christian Cole, Alexander Sherstnev, Vijender Singh, Nicola Wrobel, Karim Gharbi, Gordon G. Simpson, Tom Owen-Hughes, Mark Blaxter, Geoffrey J. Barton

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Franck Rapaport, Raya Khanin, Yupu Liang, Mono Pirun, Azra Krek, Paul Zumbo,Christopher E Mason, Nicholas D Socci and Doron Betel