Biological sample shipment information for various countries

Many scientific researchers prefer to outsource their next generation sequencing projects to commercial service providers to get access to the latest instruments and scientific expertise.

However, there are some countries in the world that do not allow the export of biological samples (tissue samples, DNA, RNA etc.) or require several formal agreements and multi-level clearance.

In this post, we’ll highlight some general information about shipping samples out of several major countries, primarily to the US. Some of this is based on our experience working with many international researchers who use Genohub to outsource their sequencing.

China

China, for example, does not allow the import or export of biological samples, as confirmed by multiple courier service agents1. Major Chinese service providers require biological samples to be shipped to their Hong Kong address to avoid delay or loss of samples2,3.

In a rare situation, a Chinese group of researchers was able to ship DNA samples to the US using FedEx. They have also detailed their experience and have some advice regarding sample shipment that can be potentially useful to other groups willing to do the same4.

Brazil

To export biological material from Brazil, several documents such as Material Transfer Agreement and Institutional invoice of specimen exported, are required for customs clearance. A detailed cover letter in both Portuguese and English that can help Customs officials in Brazil (IBAMA) and the USA (USFWS) properly assess the authorization to export and import specimens is also required5. It could take several weeks to obtain these documents so researchers need to plan their work in advance.

India

Until 2016, The Indian Council of Medical Research made decisions on shipment of biological samples on a case-by-case basis6. However, these regulations have since been lifted since August 2016 and researchers have to follow several guidelines for biological materials to qualify for transport to foreign countries for research purposes7.

According to a FedEx India employee, a non-infectious certificate from an authentic laboratory and a detailed description of the included biological samples is sufficient for customs clearance from India. Any pathogenic material is not allowed to be shipped internationally.

Europe

We haven’t come across any issues shipping samples from European countries and generally, a properly declared biological shipment can be exported without any hassles.

The current Universal Postal Union regulations for shipping biological material have been comprehensively summarized in an official document. This document also lists the countries that allow or ban the import/export of biological substances8.

Please consult our shipping guide for more details on how to prepare your shipment to ship samples to USA – https://genohub.com/dna-rna-shipping-for-ngs/#USA.

If you know of any countries that require a lot of formal paperwork for export of biological substances for research or sequencing purposes, feel free to comment below. I’ll update the blog with this information.

References:

(1)     China Country Snapshot https://smallbusiness.fedex.com/international/country-snapshots/china.html.

(2)     Sample Preparation; Shipping – Novogene https://en.novogene.com/support/sample-preparation/.

(3)     Sample submission guidelines – BGI http://www.bgisample.com/yangbenjianyi/BGI-TS-03-12-01-001 Suggestions for Sample Delivery(NGS) B0.pdf.

(4)     Community/ZJU-China Letter about Shipping DNA – 2015.igem.org http://2015.igem.org/Community/ZJU-China_Letter_about_Shipping_DNA.

(5)     Shipping and Customs http://symbiont.ansp.org/ixingu/shipping/index.html.

(6)    Centre removes ICMR approval for import/export of human biological samples http://www.dnaindia.com/india/report-centre-removes-icmr-approval-for-importexport-of-human-biological-samples-2245910.

(7)     Indian Council of Medical Research http://icmr.nic.in/ihd/ihd.htm.

(8)     WFCC Regulations http://www.wfcc.info/pdf/wfcc_regulations.pdf

PacBio vs. Oxford Nanopore sequencing

Long-read sequencing developed by Pacific Biosciences and Oxford Nanopore overcome many of the limitations researchers face with short reads. Long reads improve de novo assembly, transcriptome analysis (gene isoform identification) and play an important role in the field of metagenomics. Longer reads are also useful when assembling genomes that include large stretches of repetitive regions.

Currently, there are two long read sequencing platforms. To help a researcher choose between which platform has greater utility for their application, we compare overall instrument specifications offered by PacBio and Oxford Nanopore, and published applications in the next-generation sequencing space.

Capturea Oxford Nanopore charges an access fee that gives users one MinION/PromethIon instrument, a starter pack of consumables, certain data services, and community-based support

* Insufficient data

Although both PacBio and Oxford Nanopore generate longer reads compared to short read Illumina or Ion sequencing, the higher error rate of both the PacBio and Oxford Nanopore sequencers remain an issue needs addressing. Whereas PacBio reads a molecule multiple times to generate high-quality consensus data, Oxford Nanopore can only sequence a molecule twice. As a result, PacBio generates data with lower error rates compared to Oxford Nanopore. PacBio has a slightly better overall performance for applications such as the discovery of transcriptome complexity and sensitive identification of isoforms. On the other hand, MinION provides higher throughput as nanopores can sequence multiple molecules simultaneously. Hence, it is best suited for applications that require a larger amount of data9

As long reads can provide large scaffolds, de novo assembly is one of the main applications of PacBio sequencing5. Though the error rate of PacBio data is higher than that of short read Illumina or Ion sequencing, increased coverage or hybrid sequencing can greatly improve the accuracy of genome assembly. PacBio sequencing has been successfully used to finish the 100-contig draft genome of Clostridium autoethanogenum DSM 10061, a Class III, the most complex genome classification in terms of repeat content and repeat type. It has a 31.1% GC content and contains repeats, prophage, and nine copies of rRNA gene operons. Using a single PacBio library and sequencing it with two SMRT cells, an entire genome can be assembled de novo with a single contig. When short read Illumina or Ion sequencing was used alone with the same genome, >22 contigs were needed, and each of the assemblies contained at least four collapsed repeat regions, PacBio assemblies had none10.

PacBio sequencing has also been used to assemble the chloroplast genome of Potentilla micrantha11, Saccharomyces cerevisiae, Aradopsis thaliana and Drosophila melanogaster using fewer contigs and CPU time for assembly compared to assemblies using Illumina sequencers12.

PacBio sequencing of PCR products can be used to improve the quality of current draft genomes by closing gaps and sequencing through hairpin structures and areas of high GC content more efficiently than Sanger sequencing13.

Pacific Biosciences has developed a protocol, Iso-Seq, for transcript sequencing. This includes library construction, size selection, sequencing data collection, and data processing. Iso-Seq allows direct sequencing of transcripts up to 10 kb without the use of a reference genome. Iso-Seq has been used to characterize alternative splicing events involved in the formation of blood cellular components14. This is essential for interpreting the effects of mutations leading to inherited disorders and blood cancers, and can be applied to design strategies to advance transplantation and regenerative medicine.

Another major application of PacBio sequencing is in epigenetics research. Recent studies demonstrate that investigation of intercellular heterogeneity in previously undetectable genome DNA modifications (such as m6A and m4C) is facilitated by the direct detection of modifications in single molecules by PacBio sequencing15.

Compared to PacBio, the Oxford Nanopore MinION is small (size of a USB thumb drive), affordable, utilizes a simple library prep and is field portable16. This is useful in situations such as a virus outbreak where a mobile diagnostic laboratory can be set up using MinIONS. In remote regions such as parts of Brazil and Africa where there are logistical issues associated with shipping samples for sequencing, MinION can provide immediate and real-time data to scientific investigators. The most notable clinical use of MinION has been the analysis of Ebola samples on-site during the viral outbreak in West Africa17,18.

The low cost of sequencing and portability of the MinION sequencer also make it a useful tool for teaching. It has been used to provide hands-on experience to students, most recently at Columbia University and the University of California Santa Cruz, where every student performed their own MinION sequencing19.

Perhaps the most ambitious MinION application is its potential to detect and identify bacteria and viruses on manned space flights. In a proof-of-concept experiment, Castro-Wallace et al. demonstrated successful sequencing and de novo assembly of a lambda phage genome, an E. coli genome, and a mouse mitochondrial genome. They observed that there was no significant difference in the quality of sequence data generated on the International Space Station and in control experiments that were performed in parallel on Earth22.

Recently, Oxford Nanopore developed a bench-top instrument, PromethION, that provides high-throughput sequencing and is modular in design. It contains 48 flow cells that can be run individually or in parallel. The PromethION flow cells contain 3000 channels each, and produce up to 40 Gb of data.

 

References:

  1. Pacific Biosciences – AllSeq. Available at: http://allseq.com/knowledge-bank/sequencing-platforms/pacific-biosciences/.
  2. Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).
  3. Lu, H., Giordano, F. & Ning, Z. Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics. Proteomics Bioinformatics 14, 265–279 (2016).
  4. Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. bioRxiv (2017).
  5. Jain, M. et al. MinION Analysis and Reference Consortium: Phase 2 data release and analysis of R9.0 chemistry [version 1; referees: awaiting peer review]. F1000Research 6, (2017).
  6. Rhoads, A. & Au, K. F. PacBio Sequencing and Its Applications. Genomics, Proteomics Bioinforma. 13, 278–289 (2015).
  7. MinION. Available at: https://nanoporetech.com/products/minion.
  8. PromethION Early Access Programme. Available at: https://nanoporetech.com/community/promethion-early-access-programme.
  9. Oxford Nanopore in 2016. Available at: http://blog.booleanbiotech.com/nanopore_2016.html.
  10. Weirather, J. L. et al. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Research 6, 100 (2017).
  11. Brown, S. D. et al. Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia. Biotechnol. Biofuels 7, 40 (2014).
  12. Ferrarini, M. et al. An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics 14, 670 (2013).
  13. Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotech 33, 623–630 (2015).
  14. Zhang, X. et al. Improving genome assemblies by sequencing PCR products with PacBio. Biotechniques 53, 61–62 (2012).
  15. Chen, L. et al. Transcriptional diversity during lineage commitment of human blood progenitors. Science (80-. ). 345, (2014).
  16. Feng, Z., Li, J., Zhang, J.-R. & Zhang, X. qDNAmod: a statistical model-based tool to reveal intercellular heterogeneity of DNA modification from SMRT sequencing data. Nucleic Acids Res. 42, 13488–13499 (2014).
  17. Jain, M., Olsen, H. E., Paten, B. & Akeson, M. Erratum to: The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 256 (2016).
  18. Quick, J. et al. Real-time, portable genome sequencing for Ebola surveillance. Nature 530, 228–232 (2016).
  19. Hoenen, T. et al. Nanopore sequencing as a rapidly deployable Ebola outbreak tool. Emerg. Infect. Dis. 22, 331–334 (2016).
  20. Citizen Sequencers: Taking Oxford Nanopore’s MinION to the Classroom and Beyond – Bio-IT World. Available at: http://www.bio-itworld.com/2015/12/9/citizen-sequencers-taking-oxford-nanopores-minion-classroom-beyond.html.
  21. Castro-Wallace, S. L. et al. Nanopore DNA Sequencing and Genome Assembly on the International Space Station. bioRxiv (2016).

Sequencing trends in early 2017

Every month, ~5,000 unique queries for sequencing are submitted using Genohub’s NGS project matching engine: https://genohub.com/ngs/. Briefly, a user chooses the NGS application they are interested in (e.g. exome, RNA-Seq), the number of reads or coverage they’d like to achieve and the number of samples they plan on sequencing. Genohub’s matching engine, takes this input calculates the sequencing output required to meet the desired coverage and recommends services, filterable by sequencing instrument, read length, and library preparation kit. Results can be sorted by price, turnaround time and selected for immediate ordering.

Every query that’s submitted is recorded giving us a unique perspective into what types of NGS services researchers are actually interested in.

DNA-Seq

First, it’s important to note that DNA-seq is our default option in the matching engine: https://genohub.com/ngs/. Due to this bias, you can’t really compare it to other services being ordered so it’s a good idea to just throw away this data point. Of DNA-seq services that are actually ordered, this breaks down into: whole human genome sequencing, re-sequencing, and metagenomics sequencing. The most frequently used instruments for this service are currently the HiSeq X, HiSeq 3000/4000 and NextSeq. With PacBio’s release of the Sequel, requests have significantly increased this quarter compared to PacBio service requests in the last 4 quarters. We expect this trend to continue through 2017.

RNA-Seq

The pie chart above breaks down the types of RNA-seq services requested in the first three months of 2017. Total RNA-seq represents all applications where rRNA is depleted prior to library preparation, whereas mRNA-seq represents all applications where mRNA is enriched. In 2016, the number of Total RNA-seq projects was half that of this year. We attribute this to a growing interest in non-coding RNA and the availability of higher throughput sequencing runs. As sequencing costs drop and rRNA depletion becomes more affordable, researchers are asking for more biological information.  Today, the Nextseq and HiSeq 3000/4000 are the most commonly used instruments for any RNA-seq application. Counting applications continue to dominate, although requests for de novo transcriptome alignments are steady rising over the previous year. Whereas in the past, 1×50 and 1×75 were the most frequently requested read length for RNA counting applications, around 2x more researchers are requesting paired-end sequencing versus last year.

Methylation analysis

Compared to last year, there is an increased interest in WGBS as compared to RRBS and MeDIP. With the advent of the HiSeq X and it’s compatibility with WGBS applications, more researchers are finding whole genome based applications easier and more informative than reduced representation bisulfite sequencing.

Instrument trends

By far the biggest trend this year was the number of long read requests on the PacBio Sequel. Whereas in the past, Mate-pair library prep was more popular, we’re starting to see this service decline, and long read sequencing be ordered more frequently. Hybrid Ilumina/PacBio reads are also being more frequently ordered to improve the quality of assemblies. Long-reads are being requested to detect functional elements in human genomes that are missed by short-read sequencing. We should add that requests for 10X Genomics services have started to increase, although they are too small right now to make any meaningful comments. We currently don’t have providers offering Oxford Nanopore services on Genohub, so can’t comment here either.

This month NovaSeq services are expected to be available on Genohub. We expect there to be a lag phase as kinks are worked out, before this becomes a popular instrument request.

The future

Having spent the last 4 years receiving sequencing requests and performing consultation, it’s clear that new technology does influence behavior. With reduced sequencing costs, we see clients not only including more control and duplicates, but also looking at RNA-seq from a more global perspective, and beginning to become more interested in long reads. Clients that previously only performed exome-seq are now turning to whole genome sequencing on the HiSeq X. Researchers that normally only look at coding RNA’s are starting to show interest in long non-coding and small RNAs. Overall, faster and cheaper sequencing does tend to promote better science. Gone are the n=1 days of sequencing.

Beginner’s Handbook to High Throughput Sequencing

book-311432_640

As sequencing becomes more ubiquitous, we find researchers struggling with concepts like ‘paired-end’, designing a custom sequencing primer, cluster density, and technical library prep details, like why can’t small RNA and mRNA both be prepared in the same library and sequenced? This is partially the fault of industry, e.g. are 100M ‘paired-end reads’ comprised of 200M, 100M or 50M single reads [We like to denote this as 100M paired end reads (50M reads in each direction)], and partially due to all the moving parts: new sequencing and library prep chemistries, technology jargon and complexities in data analysis.

Seeing first time researchers struggle (on hundreds of sequencing projects), we sought to put together a guide to help the sequencing novice get a strong foothold on starting a sequencing project. This guide is called our Beginner’s Handbook to Next Generation Sequencing.

The guide is broken up into four main sections:

  1. Sequencing instruments and design of a sequencing project
  2. Library prep
  3. Sample isolation
  4. Providers we recommend you contact for analyzing your data

Whether you are new to NGS or an experienced NGS user, we recommend you check it out and ask questions. We’ll be updating the guide on a regular basis, so if you have recommendations, please post them here. Thanks!

 

 

RNA-Seq considerations when working with nucleic acid derived from FFPE

RNA-seq from FFPE samples

Millions of formalin-fixed paraffin-embedded (FFPE) tissue sections are stored in oncology tissue banks and pathology laboratories around the world. Formalin fixation followed by embedding paraffin has historically been a popular preservation method in histological studies as morphological features of the original tissue remain intact. However for RNA-seq or other gene expression methods, formalin fixation and paraffin embedding can degrade and modify RNA, complicating retrospective analysis using this commonly used archival method.

During the fixation and embedding process RNA is affected in the following ways:

  1. Degradation of RNA to short ~100 base fragments as a result of sample treatment during fixation or long term storage in paraffin.
  2. Formaldehyde modification of RNA. Formaldehyde modification can block base pairing and can cause cross-linking to other macromolecules. These RNA modifications include hydroxymethyl and methylene bridge cross-links on amine moieties of adenine bases.
  3. High variability in the degree of RNA degradation and modification in FFPE samples precludes transcriptomic similarity and gene expression correlation studies, or simply forces researchers to exclude certain samples.
  4. Oligo-dT approaches are not recommended when amplifying RNA as most RNA fragments derived from FFPE no long contain a poly(A) tail making rRNA depletion a necessary first step prior to RNA-seq.

If formalin fixation and paraffin embedding can’t be avoided, Ahlfen et al., nicely summarize best practices for improving RNA quality and yield from FFPE samples. These include:

  1. Starting fixation and cutting samples into thin pieces to avoid tissue autolysis.
  2. Reduction of fixation time (< 24 hours) to reduce irreversible cross-linking and RNA fragmentation during storage of FFPE blocks.
  3. Utilizing a method to reverse cross-linking during RNA isolation. These include heating RNA to remove some formaldehyde cross-linking. Reaction of formaldehyde with amino groups in bases and proteins are largely irreversible and inhibit cDNA synthesis.
  4. Use of a rRNA depletion step and random priming as opposed to oligo-dT based reversed transcription.
  5. RNA QC methods such as a measurement of RNA integrity or one of several RT-PCR based kits to qualify a sample prior to RNA-seq.

Despite these challenges, FFPE samples are frequently used in transcriptomic studies and in many cases correlate nicely with fresh frozen samples (Hedegaard et al., 2014; Li et al., 2014; Zhao et al., 2014). The study of somatic mutations continues to remain a challenge in FFPE tissue due to fragmentation and the presence of artifacts. Nevertheless, RNA molecules from FFPE are being used regularly for investigating both non-coding and coding parts of the genome.

If you have FFPE blocks or total RNA and would like to perform gene expression analysis by RNA-Seq, we recommend you start with a NGS service provider who has specific experience with FFPE RNA isolation, QC, library preparation, sequencing and data analysis. Providers with this experience can be found using this search on Genohub: https://genohub.com/ngs/?r=mt3789#q=4c5f2d036f.

 

Accurate measurement of error rate and base quality in Illumina sequencing runs

With new instrumentation, cluster chemistries, software updates and continuously updated library preparation reagents; accurately monitoring sequencing run quality has become increasing difficult.  In a recent paper by Manley et al., 2016, the authors develop an open source tool called the Percent Perfect Reads (PPR) plot to monitor base quality.

PPR uses PhiX alignment and calculates percent of reads with 0–4 mismatches.  A PPR plot contains a cycle-by-cycle representation of the percentage of reads with mismatches. PPR was originally introduced with the original Genome Analyzer and retired in 2014.

PPR is developed as an alternative to the Phred-like Q score for determining run quality and has the following advantages:

  1. PPR is independently calculated, unlike Illumina’s Q Score which is calculated with instrument dependent variables (vary by instrument, chemistry, software)
  2. PPR is a direct measure of error unlike Q score’s which rely on a table of data, generated under ideal sequencing circumstances
  3. Q scores tend to overestimate quality
  4. Unlike with Q scores, PPR allows the user to identify the source of sequencing error

By examining a PPR profile, the following issues are distinguishable:

  1. Adapter read through (sequencing cycles are longer than the library insert and the run reads through the adapter sequence)
  2. Repetitive or low diversity sequences
  3. Imaging problems
  4. Over/under clustering
  5. Chemistry problems (cluster reagents are not working properly)

The PPR plot program is compatible with HiSeq 2000/2500, NextSeq 500, and MiSeq instruments. It’s written in Perl and R, and accepts FASTQ files as input. The PPR software package is available at http://openwetware.org/wiki/BioMicroCenter:PPR_Program (BioMicro Center, Massachusetts Institute of Technology, Cambridge, MA, USA).

 

Illumina unveils NovaSeq 5000 and 6000

Illumina NovaSeq

Today, at the annual J.P. Morgan Healthcare Conference, Illumina announced the release of a new series of instruments called NovaSeq. Continuing the use of ExAmp cluster amplification and patterned nano-wells that form the basis of HiSeq 3000/4000 HiSeq X Ten and HiSeq X Five flow cell technology, Illumina further reduced the spacing between nanowells to increase cluster density and data output. In the end, this promises to produce ~ 2-3x more reads than a single 8 lane HiSeq X flow cell.

Here are the specs available on day 1 of launch:

Number of instruments being launched: 2; NovaSeq 5000 and 6000

Non-technical application based restrictions: No, unlike the HiSeq X Ten or HiSeq X Five; these instruments will not have application based restrictions. Illumina plans to continue restricting HiSeq X instruments to WGS applications (1).

Potential technical based restrictions: Notable is the absence of Nextera based DNA or Nextera Exome in the list of compatible library preparation kits. Mate-pair based Nextera kits are however listed as compatible (2). This may indicate there are template (library) size restrictions on this instrument (similar to HiSeq 3000/4000 and HiSeq X).

Instrument availability: NovaSeq 6000 will begin shipping in March 2017 and NovaSeq 5000 will begin shipping mid-2017.

Anticipated availability on GenohubIn April 2017, researchers will be able to order NovaSeq based sequencing. This hinges on on-time instrument delivery to our partnering service providers.

Instrument cost: NovaSeq 5000 and 6000 Systems are priced at $850,000 and $985,000 respectively

Target Market: Research labs that cannot afford the capital cost of a HiSeq X Five or HiSeq X Ten System and don’t want to deal with the restrictions. HiSeq X Five and Ten systems are restricted from running RNA-seq or exome based libraries.

Other updates: RFID added to make sure loading is done properly, reduction in the number of steps in a sequencing workflow (from 38 to 8) (1) and flow cell loading is automated.

Cbot or onboard clustering: onboard

Tunable output: 4 flow cells are available. NovaSeq S1 and S2 flow cells are compatible with both NoveSeq 5000 and 6000 systems while NovaSeq S3 and S4 are exclusive to NovaSeq 6000 instruments.

Two color or Four color chemistry: Two color, like the NextSeq 500

Number of lanes: S1 and S2 have two lanes whereas S3 and S4 have four lanes

Available read lengths: 2×50, 2×100 and 2×150

Run times: < 19, 29 and 40 hours for 2×50, 2×100 and 2×150 bp read lengths respectively

Output: 

Instrument and flow cell Reads per flow cell *(billion) Output from 2×150 bp run (Gb) *
NovaSeq 5000/6000 S1 1.6 500
NovaSeq 5000/6000 S2 3.3 1000
NovaSeq 6000 S3 6.6 2000
NovaSeq 6000 S4 10 3000

*Output and read numbers based on a single flow cell

Number of flow cells that can be run at once: 1 or 2 flow cells can be run on both the NovaSeq 5000 or 6000

So what does this mean for the sequencing industry? Clearly the Novaseq was launched to target research labs that can’t afford the capital costs of the HiSeq X series but want to upgrade from their current HiSeq instruments. NovaSeq S3 and S4 flow cells promise to produce 2-3x more reads than a single 8 lane HiSeq X flow cell (2.6-3 billion reads).  Of course,  if NovaSeq is priced to run 2-3x more expensive than a HiSeq X flow cell, the cost it takes to sequence a genome will be the same. When reagent pricing is available, this will be more clear.

2016 was a tough year for Illumina as it lost one third of its value. As Illumina launches another instrument geared for the research market, much continues to hinge on federally funded research grants to fuel growth. A focus on developing clinical based applications, insurance reimbursable tests and a global shift toward diagnostics is going to be required for sustained growth. ‘Market generation’ activities, as were initiatives like Helix and Grail are steps in this direction.