Clustering densities for standard and non-standard library preparation applications

illumina_cluster_generation

Illumina sequencing follows three very simple steps:

  1. Libraries are prepared from DNA or RNA samples
  2. Single molecular DNA templates are bridge amplified to form clonal clusters inside a flow cell
  3. Clusters are sequenced by massive parallel synthesis

Template molecules are immobilized on a flow cell surface and amplified by isothermal bridge amplification to create  individual dense clonal clusters containing ~2,000 molecules each (see figure above).

The exact density of these clusters can influence:

  1. Run quality
  2. Reads passing filter
  3. Q30 scores
  4. Total number of reads

This makes proper loading of an Illumina flow cell crucial to the success of a sequencing run.

In a recent guide, we review recommended loading concentrations and cluster densities for each Illumina instrument. See a summary in Table 1. below:

Illumina flow cell loading recommendations by instrument

While this table includes recommendations for standard library applications where libraries are sufficiently diverse, researchers shouldn’t follow these recommendation for libraries that have poor diversity. Sequence diversity refers to the balance of nucleotides (A, T, G, C) at each position of a template library. Applications where you should load a library at a concentration below Illumina’s standard recommendations include:

  1. Any amplicon based library where primers are included in the read insert
  2. GBS or RAD-seq libraries that start with a similar restriction site
  3. 16S or 18S libraries that start with the same primer or variable domain sequence
  4. MeDIP or other low diversity targeting approach

If you’re working with a non-standard library preparation application or one where libraries have poor sequence diversity, submit a request here: genohub.com/ngs and a scientist will recommend flow cell loading concentrations.

 

 

Assessing CLIA / CAP Certified Next Generation Sequencing Facilities

clia-ngs-lab

According to the Centers for Medicare and Medicaid Services (CMS), Clinical Laboratory Improvement Amendment (CLIA) registration is required for entities that perform a single test on, “materials derived from the human body for the purpose of providing information for the diagnosis, prevention or treatment of any disease or impairment of, or the assessment of the health of, human beings”.

To date, only two next generation sequencing (NGS) instruments/tests have been approved or cleared by the FDA. All other NGS based tests are developed in house as laboratory developed tests (LDTs), and are regulated under CLIA. CLIA regulations are required to certify the validity of a test. Validity is established by measuring:

  1. Accuracy
  2. Precision
  3. Analytical sensitivity and specificity
  4. Reportable reference range or interval

For next generation sequencing tests this means several sequencing based metrics are required:

Assessment Test Next Generation Sequencing Specification Sample Material
Accuracy Coverage and Quality or Phred Scores Known variants (SNP, indel) in targeted region
Precision Sequence replication and coverage distribution between different operators and instruments Reference with known variants
Specificity False positive rate, degree with which a false variant is identified at a specific coverage threshold Several samples with well characterized targets
Sensitivity Likelihood test detects known variant Several samples with well characterized targets
Reportable Range Intron buffer and exon region of one or more genes Target material with repeat regions, indels, allele drop outs
Reference interval Sequence variation background measurement Derived from an unaffected population, same as patient

In addition to CLIA, the College of American Pathologists (CAP) has several specific guidelines for NGS labs. These include consideration for validated sample extraction, library preparation, barcoding, pooling and target enrichment. Each protocol has specific quality metrics associated with it. In addition to the wet lab, bioinformatics pipelines must be validated and tested for how precise and sensitive variants are called.

Clinical regulation of NGS based tests are undergoing rapid change as new NGS tests enter the clinic, and older ones are improved. As these changes happen, both CAP and CLIA requirements for NGS are updated on a yearly basis.

The most common NGS based assays or tests performed in a CLIA/CAP setting today include:

  1. Exome sequencing
  2. NGS gene panel sequencing
  3. Whole genome sequencing
  4. Cell free DNA sequencing
  5. Metagenomic sequencing

Genohub has existing relationships with 7 service providers offering nucleic acid extraction, library preparation, sequencing and data analysis under CLIA and CAP. To obtain NGS services under CLIA/CAP accreditation, submit a request here: https://genohub.com/ngs.

Isolation of cell free / circulating tumor DNA from plasma

tubes

Identification of biomarkers that indicate presence of disease are highly sought after. Non-invasive methods to measure those biomarkers are even more valuable. By extracting and measuring cell-free DNA, scientists have satisfy both.

Cell free DNA are degraded fragments released in plasma. Elevated levels of cfDNA are found in cancer states, making assessment of somatic genomic alterations from tumors possible using sequencing. Cell free fetal DNA (cffDNA) can be found as early as 7 weeks gestation, and analysis of cffDNA is already being used in non-invasive prenatal diagnostics. Cell free DNA (cfDNA) in blood was first described by Mandel and Metais in 1948 [1] but only recently has been identified as having utility for prenatal testing and disease diagnostics and monitoring.

Unlike mutations that are passed from a parent to child and are in every cell of your body, somatic mutations form during a person’s life. These somatic mutations are present in tumor cell DNA and are an excellent biomarker if they can be measured and monitored.

Acquiring tumor DNA often requires a biopsy, a potentially risky and invasive procedure. In many cases presence of a tumor or the ability to biopsy is not even an option for patient. During tumor turnover and progression, apoptotic and necrotic cells release small pieces of their DNA (cfDNA) into the bloodstream. The amount of cfDNA in the blood steam is influenced by clearance and filtering of the blood and lymphatic circulation.

Detecting cfDNA in plasma is called a ‘liquid biopsy’ and is already a popular method for obtaining clinical samples for prenatal testing, disease diagnostics and monitoring. One of the challenges of liquid biopsies, are standardization of the isolation procedure and maintaining  uniform specificity and sensitivity. Extraction of cfDNA can be carried out using magnetic beads or silica matrices along with chaotrophic salts, such as guanidine thiocyanate. While several commercial approaches (Table 1) exist, none have undergone rigorous large patient scale studies. Once more information is known, universal standardization should allow greater clinical utility.

Commercial kits for extraction of cfDNA need to be designed to extract uniform DNA copies from varying biopsy volumes. Scalability and adaptability for cell free fetal and ctDNA are important considerations. Below we highlight current kits available in the market. In a future blog post we’ll discuss isolation and sequencing standardizations required for broader use of cfDNA liquid biopsy.

Table 1.

Kit Company Method Digestion Prep Time (min) Plasma Volume (mL) Elution (uL) DNA sizes

(bp)

NextPrep-Mag

 

Bioo Scientific Mag Beads Proteinase K (optional) 30 1 – >5 12 >50
Chemagic cfNA Chemagen Mag Beads Proteinase K 120 2 – 10 60 >100
MagMAX Cell Free DNA Kit Thermo Fisher Mag Beads Proteinase K

(optional)

40 1 – >5 15 >50
QIAamp

 

Qiagen Column Proteinase K 120 1-5 20 >70
Quick-cfDNA

 

Zymo Research Column Proteinase K 60 3- 10 35 >100

Targeted gene panels vs. whole exome sequencing

gene-panels

One frequent question we hear on Genohub is, ‘should I make a custom panel for this gene set, or not bother and do whole exome sequencing?’. While whole genome sequencing approaches can capture all possible mutations, whole exome or targeted gene panel sequencing are cost-effective approaches for capturing phenotype altering mutations. We go into the advantages of WGS vs. WES in an earlier blog post. A remaining question however is, among targeting approaches, which is best. We attempt to address this here:

Advantages of targeting all exons – whole exome sequencing (WES)

If your study is discovery based, in other words you don’t know what genes you need to target, WES is the obvious choice.

  • Better for discovery based applications where you’re not sure what genes you should be targeting.
  • Exome panels are commercially available, they don’t need to be customized or designed.
  • Exome sequencing services are fairly standard, costs range between $550-800 for 100-150x mean on target coverage.

Advantages of targeted gene panels (amplicon-seq or targeted hybridization methods)

Targeted gene panels are ideal for analyzing specific mutations or genes that have suspected associations with disease.

  • Focusing on individual genes or gene regions allows you to sequence at a much higher depth than exome-seq, e.g. 2,000-10,000x as opposed to 200x which is typical with exome-seq.
  • High depth sequencing enables the identification of rare variants
  • Can be customized for different samples types, e.g. FFPE, cf/ctDNA, degraded samples.
  • Lower input amounts can be used with targeted gene panels (1 ng vs. 100 ng with whole exome sequencing).
  • Gene panels can be customized to only include genomic regions of interest. Why sequence everything when you don’t need that extra information?
  • Panels can be easily designed for non-human species. Designing a non-human exome is much more laborious.
  • Gene panel workflows are a lot simpler and time to results is often as little as 1-2 days.
  • You can process thousands of samples on a single sequencing run. Targeted gene panels can be run at a higher throughput and are often more cost-effective than whole exome sequencing.

By focusing on genes likely to be involved with disease, you can reduce expense and focus sequencing resources on your targeted region. However, if you only have a few samples that you need to sequence at a low depth of coverage, consider whether it’s worth designing a panel vs. performing whole exome sequencing using an existing commercial panel.

If you’re interested in designing a custom gene panel or already have an existing panel you’d like to sequence, submit a request describing your project or view several of the existing commercially available panels here.

Guides to improve your next generation sequencing project

read length, output and instrument recommendations for next generation sequencing

If you’re new to next generation sequencing or if you’re simply looking for tips to improve your next project, we recommend you take some time to look at the guides available on Genohub. As researchers order sequencing services it’s completely normal for there to be numerous questions related to nucleic acid extraction, library prep and best practices for loading a sequencing instrument. Over the years, we’ve curated these questions and published guides to help those embarking on their next NGS project. Topics covered include: library prep applications, batch effects, optimal cluster densities, read lengths and instrument output.

Next generation sequencing is a tool that can be applied to answer any number of questions related to the genome, transcriptome or epigenome. Regardless of the organism being sequenced or the library method used to prepare nucleic acid from that organism, the fundamentals of how a sequencing platform works, is similar across all samples. There are currently four main sequencing platforms that researchers regularly use. These include Illumina, Ion, PacBio and Oxford Nanopore. The guides below tend to be Illumina focused because that’s the platform most people are currently using today. Despite that, we review the read throughput of each available instrument and discuss hybrid methodologies where short and long reads are combined from two different instruments to improve assemblies.

Guides for sequencing

  1. Designing a sequencing project
  2. Recommended coverage by library preparation application
  3. Comparison of instrument read lengths and read outputs

Guides for preparing your samples

  1. Best practices for shipping tissue and nucleic acid 
  2. Library preparation kits and tips

Guides by application

  1. Transcriptome and mRNA-Seq
  2. Genome sequencing and re-sequencing
  3. Exome
  4. Metagenomics 
  5. Small RNA (microRNA)
  6. WGS vs. WES

Tips and considerations for commonly used sequencing instruments

  1. HiSeq X
  2. HiSeq 3000/4000
  3. Nextseq and low diversity

These are evolving guides, meaning our goal is to continuously improve them. If you have any feedback or would like to contribute please send us a message. We hope these guides will be helpful in designing your next NGS run. If you have technical questions related to an upcoming NGS project, feel free to submit them on our consultation page.

 

6 Methods to Fragment Your DNA / RNA for Next-Gen Sequencing

The preparation of a high quality sequencing library plays an important role in next-generation sequencing (NGS). The first main step in preparing nucleic acid for NGS is fragmentation. In the next series of blog posts we will present important challenges and things to consider as you isolate nucleic acid samples and prepare your own libraries.

Next Generation Sequencing, will give you a plethora of reads, but they will be short. Illumina and Ion read lengths are currently under 600 bases. Roche 454 outputs reads at less than 1kb and PacBio less than 9kb in length. This makes sizing your input DNA or RNA important prior to library construction. There are three main ways to shorten your long nucleic acid material into something compatible for next-gen sequencing: 1) Physical, 2) Enzymatic and 3) Chemical shearing.

Physical Fragmentation

1) Acoustic shearing

2) Sonication

3) Hydrodynamic shear

Acoustic shearing and sonication are the main physical methods used to shear DNA. The Covaris® instrument (Woburn, MA) is an acoustic device for breaking DNA into 100-5kb bp. Covaris also manufactures tubes (gTubes) which will process samples in the 6-20 kb for Mate-Pair libraries. The Bioruptor® (Denville, NJ) is a sonication device utilized for shearing chromatin, DNA and disrupting tissues. Small volumes of DNA can be sheared to 150-1kb in length. Hydroshear from Digilab (Marlborough, MA) utilizes hydrodynamic forces to shear DNA.  Nebulizers (Life Tech, Grand Island, NY) can also be used to atomize liquid using compressed air, shearing DNA into 100-3kb fragments in seconds. While nebulization is low cost and doesn’t require the purchase of an instrument, it is not recommended if you have limited starting material. You can lose up to 30% of your DNA with a nebulizer. The other sonication and acoustic shearing devices described above are better designed for smaller volumes and retain the entire amount of your DNA more efficiently.

Enzymatic Methods

4) DNase I or other restriction endonuclease, non-specific nuclease

5) Transposase

Enzymatic methods to shear DNA into small pieces include DNAse I, a combination of maltose binding protein (MBP)-T7 Endo I and a non-specific nuclease Vibrio vulnificus (Vvn), NEB’s (Ipswich, MA) Fragmentase and Nextera tagmentation technology (Illumina, San Diego, CA). The combination of non-specific nuclease and T7 Endo synergistically work to produce non-specific nicks and counter nicks, generating fragments that disassociate 8 nucleotides or less from the nick site. Tagmentation uses a transposase to simultaneously fragment and insert adapters onto dsDNA. Generally enzymatic fragmentation has shown to be consistent, but worse when compared to physical shear methods when it comes to bias and detecting insertions and deletions (indels) (Knierim et al., 2011). Depending on your specific application, de novo genome sequencing vs. small genome re-sequencing, biases associated with enzymatic fragmentation may not be as important.

RNAse III is an endonuclease that cleaves RNA into small fragments with 5’phosphate and 3’hydroxyl groups. While these end groups are needed for RNA ligation, making the assay convenient, RNAse III cleavage does have sequence preference which makes the cleavage biased. Heat / chemical methods described below, while they leave 3’phosphate and 5’hydroxyl ends, show less sequence bias and are generally preferred methods in library preparation.

Chemical Fragmentation    

6) Heat and divalent metal cation

Chemical shear is typically reserved for the breakup of long RNA fragments. This is typically performed through the heat digestion of RNA with a divalent metal cation (magnesium or zinc). The length of your RNA (115 bp – 350 nt) can be adjusted by increasing or decreasing the time of incubation.

The size of your DNA or RNA insert is a key factor for library construction and sequencing. You’ll need to choose an instrument and read length that is compatible with your insert length. You can choose this by entering project parameters in the Shop by Project page and filtering according to read length (estimated insert length). If you’re not sure, we can help. Send us a request through our consultation form .

Reference:

Systematic Comparison of Three Methods for Fragmentation of Long-Range PCR Products for Next Generation Sequencing

Ellen Knierim, Barbara Lucke, Jana Marie Schwarz, Markus Schuelke, Dominik Seelow

 

 

Considerations for Sequencing microRNA

microRNA sequencing

We’ve put together a new small RNA (microRNA) sequencing guide describing considerations all new users should make before undertaking a small RNA sequencing project. One of the first considerations is determining the number of reads you need. This usually depends on whether you’re interested in differential small RNA expression or if you’re trying to discover new microRNAs. Once you know the number of reads you need per sample, consider the following factors before and after library preparation:

  1. Should you start with total RNA or isolated small RNA?
  2. How much material should you start with?
  3. What’s the minimum quality of total RNA acceptable for microRNA library preparation and sequencing?
  4. How will small RNA ligation bias affect my results?
  5. How can I minimize adapter dimers to improve read mapping and general usability of my sequencing reads?
  6. How many samples can I multiplex or pool together in a single sequencing lane?
  7. What sequencing read length should I choose for microRNA or small RNA sequencing studies?

The guide also includes recommendations for getting accurate per sample pricing and turnaround times.

Small RNAs play a big role in regulating the translation of target RNAs through RNA to RNA interactions and have been shown to offer potential as biomarkers in diagnostic applications. Sequencing promises to be a useful tool in unraveling the roles of these short non-coding RNAs. We look forward to working with you on your next microRNA project.

 

 

Key Considerations for Whole Exome Sequencing

exome sequencing and library preparation

Exome, UTR, non-coding regions, CDS

Whole exome sequencing is a powerful technique for sequencing protein coding genes in the genome (known as the exome). It’s a useful tool for applications where detecting variants is important, including population genetics, association and linkage, and oncology studies.

As the main hub for searching and ordering next generation sequencing services, most researchers about to embark on an exome sequencing project start their search on Genohub.com.  It’s our responsibility to make sure the researcher is informed and prepared before placing an order for an exome sequencing service.

Working toward achieving this goal, we’ve established a series of guides for anyone about to start a whole exome sequencing project. We’ve described each of these guides here.

  1. Should I choose Whole Genome Sequencing or Whole Exome Sequencing

This guide describes what you can get with WGS that you won’t with WES and compares pricing on a per sample basis. It also provides an overview of sequence coverage, coverage uniformity, off-target effects and bias due to PCR amplification.

  1. How to choose a Exome Sequencing Kit for capture and sequencing

This guide breaks down each commercial exome capture kit, comparing Agilent SureSelect, Nimblegen SeqCap and Illumina Nextera Rapid Capture. Numbers of probes used for capture, DNA input required, adapter addition strategy, probe length and design, hybridization time and cost per capture are all compared. This comparison is followed by a description of each kit’s protocol.

  1. How to calculate the number of sequencing reads needed for exome sequencing

In the same guide that compares library preparation kits (above), we go through an example on how to determine the amount of sequencing and read length required for your exome study. This is especially important when you start comparing the cost for exome sequencing services (see the next guide).

  1. How to choose an exome sequencing and library preparation service

Are you looking for 100x sequencing coverage, what many in the industry call standard exome sequencing or 200x coverage, considered ‘high depth’?  Or are you interested in a CLIA grade, clinical whole exome sequencing service? This exome guide breaks each down into searches that can be performed on Genohub. The search buttons allow for real time comparison of available exome services, their prices, turnaround times and kits being used. Once you’ve identified a service that looks like a good match, you can send questions to the provider or immediately order the exome-seq service.

  1. Find a service provider to perform exome-seq data analysis only

Do you already have an exome-seq dataset? Do you need a bioinformatician to perform variant calling or SNP ID? Are you interested in studying somatic or germline mutations? Use this guide to identify providers who have experienced bioinformaticians on staff that regularly perform this type of data analysis service. Simply click on a contact button to immediately send a message or question to a provider. If you’re looking for a quote, they will respond within the same or next business day.

If you still need help, feel free to take advantage of Genohub’s complimentary consultation services. We’re happy to help make recommendations for your whole exome sequencing project.

Standard Quality Policy for Next Generation Sequencing Services

Whenever you order a scientific related service or outsource research, there is a degree of trust that is naturally built into the relationship. As a researcher you expect the service provider to take care of your precious samples and complete a service to your expectations. The service provider expects to work with researchers who have complied with sample requirements and will be reasonable when it comes to unexpected situations. This is especially the case when it comes to the relationship between a researcher and next generation sequencing service provider. Having worked with both researcher and sequencing service providers on hundreds of real projects, we’ve developed a standard quality policy for all services that happen through Genohub.com. We’ve developed this policy to ensure a trusted and efficient environment for clients and providers to work together. By setting expectations for both the researcher and provider we’re improving the way sequencing services are being performed.

Benchmarking Differential Gene Expression Tools

In a recent study, Schurch et al., 2015 closely examine 9 differential gene expression (DGE) tools (baySeq , cuffdiff , DESeq , edgeR , limma , NOISeq , PoissonSeq , SAMSeq, DEGSeq) and rate their performance as a function of replicates in an RNA-Seq experiment. The group highlights edgeR and DESeq as the most widely used tools in the field and conclude that they along with limma perform the best in studies with high and low numbers of biological replicates. The study goes further, making the specific recommendation that experiments with greater than 12 replicates should use DESeq, while those with fewer than 12 replicates should use edgeR. As for the number of replicates needed, Schurch et al recommend at least 6 replicates/condition in an RNA-seq experiment, and up to 12 in studies where identifying the majority of differentially expressed genes is critical. 

With each technical replicate having only 0.8-2.8M reads, this paper and others (Rapaport et al., 2013) continue to suggest that more replicates in an RNA-seq experiment are preferred over simply increasing the number of sequencing reads. Several other papers, including differential expression profiling recommendations in our Sequencing Coverage Guide recommend at least 10M reads per sample, but do not make recommendations on the numbers of replicates needed. The read/sample number disparity is related to the relatively small and well annotated S. cerevisiae genome in this study and the more complex, multiple transcript isoforms in mammalian tissue. By highlighting studies that carefully examine the number of replicates that should be used, we hope to improve RNA-seq experimental design on Genohub

So why don’t researchers use an adequate number of replicates? 1) Sequencing cost, 2) Inexperience in differential gene expression analysis. We compare the costs between 6 and 12 replicates in yeast and human RNA-Seq experiments using 1 and 10M reads/sample to show that in many cases adding more replicates in an experiment can be affordable. 

 

6 replicates

12 replicates

Human (10M reads/sample)

$2,660

$4,470

Yeast (1M reads/sample)

$2,810

$4,470

*Prices are in USD and are inclusive of both sequencing and library prep cost. Click on prices in the table to see more project specific detail.

The table shows that the main factor in the price difference is related to library preparation costs. Sequencing on the Illumina Miseq or Hiseq at the listed sequencing depths do not play as significant a role in cost, due to the sequencing capacity of those instruments. 

To accurately determine the sequencing output required for your RNA-seq study, simply change the number of reads/sample in our interactive Project Page

 

References:

Evaluation of tools for differential gene expression analysis by RNA-seq on a 48 biological replicate experiment. Nicholas J. Schurch, Pieta Schofield, Marek Gierliński, Christian Cole, Alexander Sherstnev, Vijender Singh, Nicola Wrobel, Karim Gharbi, Gordon G. Simpson, Tom Owen-Hughes, Mark Blaxter, Geoffrey J. Barton

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Franck Rapaport, Raya Khanin, Yupu Liang, Mono Pirun, Azra Krek, Paul Zumbo,Christopher E Mason, Nicholas D Socci and Doron Betel