6 Methods to Fragment Your DNA / RNA for Next-Gen Sequencing

The preparation of a high quality sequencing library plays an important role in next-generation sequencing (NGS). The first main step in preparing nucleic acid for NGS is fragmentation. In the next series of blog posts we will present important challenges and things to consider as you isolate nucleic acid samples and prepare your own libraries.

Next Generation Sequencing, will give you a plethora of reads, but they will be short. Illumina and Ion read lengths are currently under 600 bases. Roche 454 outputs reads at less than 1kb and PacBio less than 9kb in length. This makes sizing your input DNA or RNA important prior to library construction. There are three main ways to shorten your long nucleic acid material into something compatible for next-gen sequencing: 1) Physical, 2) Enzymatic and 3) Chemical shearing.

Physical Fragmentation

1) Acoustic shearing

2) Sonication

3) Hydrodynamic shear

Acoustic shearing and sonication are the main physical methods used to shear DNA. The Covaris® instrument (Woburn, MA) is an acoustic device for breaking DNA into 100-5kb bp. Covaris also manufactures tubes (gTubes) which will process samples in the 6-20 kb for Mate-Pair libraries. The Bioruptor® (Denville, NJ) is a sonication device utilized for shearing chromatin, DNA and disrupting tissues. Small volumes of DNA can be sheared to 150-1kb in length. Hydroshear from Digilab (Marlborough, MA) utilizes hydrodynamic forces to shear DNA.  Nebulizers (Life Tech, Grand Island, NY) can also be used to atomize liquid using compressed air, shearing DNA into 100-3kb fragments in seconds. While nebulization is low cost and doesn’t require the purchase of an instrument, it is not recommended if you have limited starting material. You can lose up to 30% of your DNA with a nebulizer. The other sonication and acoustic shearing devices described above are better designed for smaller volumes and retain the entire amount of your DNA more efficiently.

Enzymatic Methods

4) DNase I or other restriction endonuclease, non-specific nuclease

5) Transposase

Enzymatic methods to shear DNA into small pieces include DNAse I, a combination of maltose binding protein (MBP)-T7 Endo I and a non-specific nuclease Vibrio vulnificus (Vvn), NEB’s (Ipswich, MA) Fragmentase and Nextera tagmentation technology (Illumina, San Diego, CA). The combination of non-specific nuclease and T7 Endo synergistically work to produce non-specific nicks and counter nicks, generating fragments that disassociate 8 nucleotides or less from the nick site. Tagmentation uses a transposase to simultaneously fragment and insert adapters onto dsDNA. Generally enzymatic fragmentation has shown to be consistent, but worse when compared to physical shear methods when it comes to bias and detecting insertions and deletions (indels) (Knierim et al., 2011). Depending on your specific application, de novo genome sequencing vs. small genome re-sequencing, biases associated with enzymatic fragmentation may not be as important.

RNAse III is an endonuclease that cleaves RNA into small fragments with 5’phosphate and 3’hydroxyl groups. While these end groups are needed for RNA ligation, making the assay convenient, RNAse III cleavage does have sequence preference which makes the cleavage biased. Heat / chemical methods described below, while they leave 3’phosphate and 5’hydroxyl ends, show less sequence bias and are generally preferred methods in library preparation.

Chemical Fragmentation    

6) Heat and divalent metal cation

Chemical shear is typically reserved for the breakup of long RNA fragments. This is typically performed through the heat digestion of RNA with a divalent metal cation (magnesium or zinc). The length of your RNA (115 bp – 350 nt) can be adjusted by increasing or decreasing the time of incubation.

The size of your DNA or RNA insert is a key factor for library construction and sequencing. You’ll need to choose an instrument and read length that is compatible with your insert length. You can choose this by entering project parameters in the Shop by Project page and filtering according to read length (estimated insert length). If you’re not sure, we can help. Send us a request through our consultation form .


Systematic Comparison of Three Methods for Fragmentation of Long-Range PCR Products for Next Generation Sequencing

Ellen Knierim, Barbara Lucke, Jana Marie Schwarz, Markus Schuelke, Dominik Seelow



Key Considerations for Whole Exome Sequencing

exome sequencing and library preparation

Exome, UTR, non-coding regions, CDS

Whole exome sequencing is a powerful technique for sequencing protein coding genes in the genome (known as the exome). It’s a useful tool for applications where detecting variants is important, including population genetics, association and linkage, and oncology studies.

As the main hub for searching and ordering next generation sequencing services, most researchers about to embark on an exome sequencing project start their search on Genohub.com.  It’s our responsibility to make sure the researcher is informed and prepared before placing an order for an exome sequencing service.

Working toward achieving this goal, we’ve established a series of guides for anyone about to start a whole exome sequencing project. We’ve described each of these guides here.

  1. Should I choose Whole Genome Sequencing or Whole Exome Sequencing

This guide describes what you can get with WGS that you won’t with WES and compares pricing on a per sample basis. It also provides an overview of sequence coverage, coverage uniformity, off-target effects and bias due to PCR amplification.

  1. How to choose a Exome Sequencing Kit for capture and sequencing

This guide breaks down each commercial exome capture kit, comparing Agilent SureSelect, Nimblegen SeqCap and Illumina Nextera Rapid Capture. Numbers of probes used for capture, DNA input required, adapter addition strategy, probe length and design, hybridization time and cost per capture are all compared. This comparison is followed by a description of each kit’s protocol.

  1. How to calculate the number of sequencing reads needed for exome sequencing

In the same guide that compares library preparation kits (above), we go through an example on how to determine the amount of sequencing and read length required for your exome study. This is especially important when you start comparing the cost for exome sequencing services (see the next guide).

  1. How to choose an exome sequencing and library preparation service

Are you looking for 100x sequencing coverage, what many in the industry call standard exome sequencing or 200x coverage, considered ‘high depth’?  Or are you interested in a CLIA grade, clinical whole exome sequencing service? This exome guide breaks each down into searches that can be performed on Genohub. The search buttons allow for real time comparison of available exome services, their prices, turnaround times and kits being used. Once you’ve identified a service that looks like a good match, you can send questions to the provider or immediately order the exome-seq service.

  1. Find a service provider to perform exome-seq data analysis only

Do you already have an exome-seq dataset? Do you need a bioinformatician to perform variant calling or SNP ID? Are you interested in studying somatic or germline mutations? Use this guide to identify providers who have experienced bioinformaticians on staff that regularly perform this type of data analysis service. Simply click on a contact button to immediately send a message or question to a provider. If you’re looking for a quote, they will respond within the same or next business day.

If you still need help, feel free to take advantage of Genohub’s complimentary consultation services. We’re happy to help make recommendations for your whole exome sequencing project.

10 Sequencing Based Approaches for Interrogating the Epigenome

cytosine methylation

DNA methylation occurs when DNA methyltransferase transfers a methyl group from S-adenosyl-methionine to cytosine in CpG dinucleotides. The methylation of 5’ methyl cytosine (5mC) nucleotides is an important epigenetic change that regulates gene activity and impacts several cellular processes including differentiation, transcriptional control and chromatin remodeling.

Genome wide analysis of 5mC, histone modifications and DNA accessibility are possible with next generation sequencing approaches and provide unique insight to complex phenotypes where primary genomic sequence is not sufficient.

Methods for methyl DNA sequencing can be broken down into three global approaches:

1) bisulfite sequencing

2) restriction enzyme based sequencing

3) targeted enrichment of methyl sites

We’ve outlined several library preparation techniques under each category.

1) Bisulfite Sequencing

Bisulfite-seq (1, 2) is a well-established protocol that provides single base resolution of methylated cytosine in the genome. Genomic DNA is bisulfite treated, deaminating un-methylated cytosines to uracils, which are later converted to thymidines. Methylated cytosines are protected from deamination, allowing researchers to identify methylation sites by comparing the sequence of bisulfite and non-bisulfite treated samples.

1-Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning

2-Highly integrated single-base resolution maps of the epigenome in Arabidopsis

2) Post Bisulfite Adapter Tagging (PBAT)

To avoid loss of template during bisulfite treatment, with PBAT (3) bisulfite treatment follows adapter ligation (tagging) and two rounds of random primer extension.

3- Amplification-free whole-genome bisulfite sequencing by post-bisulfite adaptor tagging.

3) Reduced Representation Bisulfite Sequencing (RRBS)

RRBS (4) is a method aimed at targeting sequencing coverage toward CpG islands or regions of the genome with dense CpG methylation. Sample is digested with one more restriction enzymes and then is treated with bisulfite prior to sequencing. This method offers single nucleotide methylation

4- Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis

4) Oxidative Bisulfite Sequencing (oxBS-Seq)

5-hydroxymethylcytosine (5’hmC), an intermediate of the demethylation of 5-methylcytosine (5’mC) to cytosine cannot be distinguished using the bisulfite-seq approach. With oxBS-Seq (5), 5’hmC is oxidized, causing a deamination to uracil, while leaving 5’mC. Sequencing of both treated and untreated samples allows for single base resolution of 5’hmC and 5’mC modifications.

5-Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution

5) TET-Assisted Bisulfite Sequencing (TAB-Seq)

TAB-Seq (6) utilizes glucose moieties to interact with 5’hmC protecting it from TET protein oxidation. 5’mC and non-methylated cytosines are deaminated to uracil and sequenced as thymidines, allowing for the specific identification of 5’hmC.

6- Base-resolution analysis of 5-hydroxymethylcytosine in the Mammalian genome

6) Methylation Sensitive Restriction Enzyme Sequencing (MRE-Seq)

MRE-Seq (7) utilizes a combination of methyl sensitive and insensitive restriction enzymes to identify regions of CpG methylation status.

7- Genome-scale DNA methylation analysis

7) HpaII tiny fragment-Enrichment by Ligation-mediated PCR (HELP-Seq)

HELP-Seq (8) allows for intragenomic profiling and intergenomic comparisons of cytosine methylation by using HpaII and its methylation insensitive isoschizomer MSPI.

8- Comparative isoschizomer profiling of cytosine methylation: the HELP assay

8) Methylated DNA Immunoprecipitation Sequencing (MeDIP)

MeDIP (9) is a technique based on affinity enrichment of methylated DNA using either antibodies or other protein capable of binding methylated DNA. This technique pulls down heavily methylated regions of the genome, such as CpG islands. It does not offer single nucleotide resolution.

9- Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells

9) Methyl Binding Domain Capture (MBD-CAP)

MBD-CAP (10) uses methyl DNA binding proteins MeCP2, MBD1-2 and MBD3LI to immunoprecipitate methylated DNA. Similar to MeDIP, this approach pulls down regions that are heavily methylated and does not offer single nucleotide methylation resolution.

10- High-resolution mapping of DNA hypermethylation and hypomethylation in lung cancer

10) Probe Based Targeted Enrichment

Methyl-Seq Targeted enrichment – involves the use of synthetic, biotinylated oligonucleotides designed to CpG islands, shores, gene promoters and differentially methylated regions (DMRs). Kits are commercially available through Agilent and Roche Nimblegen.  

Finally, it’s worth mentioning Single Molecule Real Time (SMRT) DNA Sequencing. SMRT sequencing by Pacific Biosciences uses the kinetics of base incorporation to allow for direct detection of methylated cytosines. Unlike any of the protocols mentioned, this does not require the use of restriction enzymes or bisulfite reagent.

Several service providers on Genohub offer targeted bisulfite-seq, reduced representation bisulfite-seq (RRBS), methylated DNA immunoprecipitation seq (MeDIP) and whole genome bisulfite-seq (WGBS) library preparation and sequencing services. Simply click on one of these application types to get started. 

AGBT 2015 – Summary of Day 1

AGBT 2015

Highlights of Day 1 at AGBT

‘Welcome to paradise’, first words by Rick Wilson kicking off the annual Advances in Genome Biology and Technology (AGBT) meeting.  The plenary session began with David Goldstein from Columbia University presenting, “Toward Precision Medicine in Neurological Disease”.  David’s talk began with a discussion of clinical sequencing for neurological diseases, specifically large scale gene discoveries in epileptic encephalopathies. In epilepsy, 12% of patients are ‘genetically explained’ by a casual de novo mutation, which allows for the application of precision medicine. He discussed how a K+ channel plays a key role in at least two different epilepsies and how Quinidine which has never been used for epilepsy, was being used as a targeted treatment. He cautioned that in the literature there are too may correlation studies that don’t really amount to much and as we use genetics to target diseases, it’s critical to perform proper genetics driven precision medicine and not put patients on wrong treatment plans. To better characterize the effects of mutations, he emphasized the need for solid model systems. He also mentioned that he believes truly complex diseases can be tackled with enough patients, numbers matter. To illustrate his point, he described the sequencing of over 3,000 ALS patients to get a clear picture of what genes/proteins have therapeutic importance. At the end of his talk he was asked the old whole genome sequencing (WGS) vs. whole exome sequencing (WES) question and replied that WES was sufficient, as WGS added little due to lack of interpretability. This touched off some debate in the audience and Twitter with regards to Exome-seq and WGS. Highlighted are the advantages of each approach here: https://blog.genohub.com/whole-genome-sequencing-wgs-vs-whole-exome-sequencing-wes/.

The second talk in the plenary session was by Richard Lifton, from Yale and it was titled, “Genes, Genomes and the Future of Medicine”. Richard cautioned the audience, describing the rush to sequence whole genomes as more industry driven than good science, essentially reiterating the point that WGS is hard to interpret. This began a side discussion on Twitter about those who agree and disagree with this sentiment. Most notably, Gholson Lyon referenced two recent papers that demonstrated new ways to make processing of WGS data easier: and that the accuracy of INDEL detection was greater in WGS compared to WES, even in targeted regions. On the cost front, WGS at 35x coverage currently costs $1,750: https://genohub.com/shop-by-next-gen-sequencing-project/#query=ef95a222ca23fc310eedf6de661e4b22 or $3,500 for 70X coverage, while whole exome sequencing costs at 100x coverage are around $1,314: https://genohub.com/shop-by-next-gen-sequencing-project/#query=0d4231a1d12425085f4e284373605acd.  Richard remarked at the end of his talk that not much had been found in non-coding regions, several in the audience challenged him on this assessment.

The third talk in the plenary session was by Yaniv Erlich titled, “Dissecting the Genetic Architecture of Longevity Using Massive-Scale Crowd Sourced Genealogy”. We’ve had the pleasure to hear Yaniv give several lectures in the past, all have been engaging, this was no different. His talk was on using social media to dissect the genetic architecture of complex traits, specifically whether they work independently (additive) or together (epistatic). Predictions of epistasis suggest an exponential increase with added risk alleles. He used geni.com to dissect complex traits in large family trees and validated publicly submitted trees using genetic markers. He encoded birthplace as GPS coordinates using Yahoo Maps and showed migration from the Middle Ages through the early 20th Century. The video he played was amazing, check it out: https://www.youtube.com/watch?v=fNY_oZaH3Yo#t=19. His take home message was that longevity is an additive trait, which is good for personalized medicine. The Geni data he described is open to the public for use.

The fourth and final talk of the night was by Steven McCarroll titled, “A Common Pre-Malignant State, Detectable by Sequencing Blood DNA”. He started by posing the questions, what happens in the years before a disease becomes apparent; cancer genomes are usually studied when there are enough mutations to drive malignancy, do they happen in a particular order? He examined 12,000 exomes for somatic variants at low allele frequency and uncovered 3,111 mutations. Blood derived schizophrenia clustered in four genes: DNMT3A, TET2, ASXL1 and PPM1D, all disruptive. He postulates that driver mutations give cells an advantage, over several years clonal progeny takes over creating pre-cancerous cells. Therefore clonal hematopoiesis with somatic mutations can be readily detected by DNA sequencing and should become more common as we age. Patients with clonal mutations have a 12 fold higher rate of blood cancer, meaning there is a window for early detection, possibly 3 years. This work was recently published in the New England Journal of Medicine: Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence.

Today’s sessions were impressive and set expectations high for tomorrow’s talks. McCarroll’s last comment nicely captured this sentiment, setting the tone for the rest of the meeting, “Medicine thinks of health and illness, there is a lot in between that is ascertainable via genome sequencing”. 

Whole Genome Sequencing (WGS) vs. Whole Exome Sequencing (WES)

Gene, exon, intron, sequencing


“Should I choose whole genome sequencing (WGS) or whole exome sequencing (WES) for my project?” is such a frequently posed question during consultation on Genohub, we thought it would be useful to address it here. With unlimited resources and time, WGS is a clear winner as it allows you to interrogate single-nucleotide variants (SNVs), indels, structural variants (SVs) and copy number variants (CNVs) in both the ~1% part of the genome that encodes protein sequences and the ~99% of remaining non-coding sequences. WES still costs a lot less than WGS, allowing researchers to increase sample number, an important factor for large population studies. WES does however have its limitations. Below we’ve highlighted the advantages of WGS vs. WES and described a real case example of someone ordering these services using Genohub.

Advantages of Whole Genome Sequencing

  1. Allows examination of SNVs, indels, SV and CNVs in coding and non-coding regions of the genome. WES omits regulatory regions such as promoters and enhancers.
  2. WGS has more reliable sequence coverage. Differences in the hybridization efficiency of WES capture probes can result in regions of the genome with little or no coverage.
  3. Coverage uniformity with WGS is superior to WES. Regions of the genome with low sequence complexity restrict the ability to design useful WES capture baits, resulting in off target capture effects.
  4. PCR amplification isn’t required during library preparation reducing the potential of GC bias. WES frequently requires PCR amplification as the bulk input amount needed to capture is generally ~1 ug of DNA.
  5. Sequencing read length isn’t a limitation with WGS. Most target probes for exome-seq are designed to be less than 120 nt long, making it meaningless to sequence using a greater read length.
  6. A lower average read depth is required to achieve the same breath of coverage as WES.
  7. WGS doesn’t suffer from reference bias. WES capture probes tend to preferentially enrich reference alleles at heterozygous sites producing false negative SNV calls.
  8. WGS is more universal. If you’re sequencing a species other than human your choices for exome sequencing are pretty limited.

Advantages of Whole Exome Sequencing

  1. WES is targeted to protein coding regions, so reads represent less than 2% of the genome. This reduces the cost to sequence a targeted region at a high depth and reduces storage and analysis costs.
  2. Reduced costs make it feasible to increase the number of samples to be sequenced, enabling large population based comparisons.

Most functional related disease variants can be detected at a depth of between 100-120x (1) which definitely makes the cost case for exome sequencing. Today on Genohub if you want to perform whole human genome sequencing at a depth of ~35X, the cost is roughly $1700/sample. If you were to request human exome-sequencing services with 100x coverage, using a 62 Mb target region, your cost would be $550/sample. Both of these prices include library preparation. So in terms of producing data WES is still significantly cheaper than WGS. It’s important to note that this doesn’t include your data storage and analysis costs which can also be quite a bit higher with whole genome sequencing.

It’s also important to remember that depth isn’t everything. The better your uniformity of reads and breath of coverage, the higher the likelihood you’ll actually find de novo mutations and call them. And that’s the main goal, if you can’t call SNPs or INDELs with high sensitivity and accuracy, then the most high depth sequencing runs are worthless.

To conclude, whole genome sequencing typically offers better uniformity and balanced allele ratio calls. While greater exome-seq depth can match this, sufficient mapped depth or variant detection in specific regions may never reach the quality of WGS due to probe design failures or protocol shortcomings. These are important considerations when examining tissues like primary tumors where copy number changes and heterogeneity are confounding factors.

If you’re ready to start an exome project, spend a few minutes determining the coverage you’ll need for your experiment. We have an exome-seq guide with examples to help you determine the number of sequencing reads you need to achieve a certain coverage of your exome. If you’re planning to embark on whole genome sequencing, use our NGS Matching Engine which automatically calculates the amount of sequencing capacity on various platforms to meet the coverage requirements for your project.


1) Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants

TCR-Repertoire Sequencing Services

TCR sequencing

The immune repertoire reflects the sum total of diverse B and T-cells in the circulatory system. The adaptive immune system drives immune response by these hypervariable molecules. The antigen specificity of each T-cell receptor (TCR) is determined by the complementarity-determining region: CDR3 of the beta receptor chain, formed by V, D and J gene regions. Examination of TCR diversity is important for understanding adaptive immunity and it’s function in diseases.  Next generation sequencing has become a powerful tool for measuring TCR diversity. Before samples can be sequenced a unique library preparation method must be performed to allow for reproducible and reliable results.

Girihlet, a newly formed biotech company in Brooklyn, NY is one of the first companies to offer TCR repertoire sequencing services and is the first to offer it on Genohub.com. We got in touch with Girihlet to learn more about this service offering and have posted our conversation with one of it’s co-founders, Dr. Ravi Sachidanandam. Ravi also holds a position as Assistant Professor on the faculty of  the Icahn School of Medicine at Mount Sinai, department of Oncological Sciences. He has published over 85 papers in the latest and most interesting areas of genomics, including small RNA, mRNA splicing, methylation and virology.

Genohub: Hi Ravi, we’re excited that you’ve joined Genohub.com and listed your services. We’re particularly interested in the TCR-repertoire sequencing services you have on Genohub.com. Not many service providers currently offer this service, how come?

Ravi: There are very few companies that offer this currently, and this is mostly because it’s a very challenging problem both experimentally and computationally. It may be easier to count all the dollar bills in circulation than to profile the diversity of the T Cell Receptors.

Genohub: Can you comment briefly on the ‘library prep’ approach to TCR profiling?

Ravi: Our library prep method is very unique, it is based on quantifying RNA, and in particular just the CDR3 regions while most of the other companies quantify DNA. This allows us to only quantify functional rearranged TCR locus. We also use universal primers for amplification and do not depend on previously known TCR regions, thereby accelerating discovery.  We have also compared our data to flow results and demonstrated good concordance.

Genohub: Inefficiencies during library prep and and sequencing can lead to severe bias generating artificial TCR diversity. Does your approach address this?

Ravi: The beauty of our approach is we use common primers to amplify the T cell receptor regions.  This ensures there is no bias during PCR, allowing for accurate sequencing. And since the accuracy and enrichment for the TCR mRNAs is >98%, we need very little total RNA and less sequencing depth reducing the overall cost of sequencing.

Genohub: How many sequencing reads or TCR sequences do you recommend for a single human sample? Our readers can use your recommendation directly on our project search page: https://genohub.com/shop-by-next-gen-sequencing-project/.

Ravi: Currently 10 million sequences of 150bp PE reads is enough to accurately and quantitatively capture most of the TCR diversity

Genohub: How do you handle under-expression?

Ravi: We keep track of low -expressed TCR transcripts as they are needed to understand the statistics of the distribution of the TCR repertoire. We provide these to the researchers, in case they might need to look for rare transcripts.

Genohub: Why is diversity of the immune repertoire important for health?

Ravi: The diversity is the key to the effectiveness of the TCR-repertoire.  The diversity reflects the ability of the immune system to fight infections.  

Genohub: Any comments on its use for vaccine development, autoimmune study, biomarker detection?

Ravi: We believe the TCR sequence can be easily monitored over time, thereby serving as a powerful biomarker to study the effects of vaccination, to determine if the vaccination was effective. It will also be useful in understanding the underlying cause of autoimmune reactions.

Genohub: Thanks for taking the time to discuss this exciting new method. Is there anything that you’d like to add?

Ravi: Girihlet is very excited to take this approach to the rest of scientific community and make a significant difference on how the TCR is sequenced currently and eventually have an impact on the practice of “precision medicine”. 

NextSeq, HiSeq or MiSeq for Low Diversity Sequencing ?

Low diversity libraries, such as those from amplicons and those generated by restriction digest can suffer from Illumina focusing issues, a problem not found with random fragment libraries (genomic DNA). Illumina’s real time analysis software uses images from the first 4 cycles to determine cluster positions (X,Y coordinates for each cluster on a tile). With low diversity samples, color intensity is not evenly distributed causing a phasing problem. This tends to result in a high phasing number that deteriorates quickly.

Traditionally this problem is solved in two ways:

1)      ‘Spiking in’ a higher diversity sample such as PhiX (small viral genome used to enable quick alignment and estimation of error rates) into your library.  This increases the diversity at the beginning of your read and takes care of intensity distribution across all four channels. Many groups spike in as much as 50% PhiX in order to achieve a more diverse sample. This disadvantage of this is that you lose 50% of your reads to sample you were never interested in sequencing.

2)      Other groups have designed amplicon primers with a series of random ‘N’ (25%A, 25%T, 25%G, 25%C) bases upstream of their gene target. This and a combination of PhiX spike also helps to increase color diversity. The disadvantage is that these extra bases cut into your desired read length and can be problematic when you are trying to conserve cycles to sequence a 16S variable domain: https://genohub.com/shop-by-next-gen-sequencing-technology/#query=3b4a64a17f1396f34c6cc9ec4aa4c938

Last year, Illumina released a new version of their control program that included updated MiSeq Real Time Analysis (RTA) software that significantly improves the data quality of low diverse samples. This included 1) improved template generation and higher sensitivity template detection of optically dense and dark images,  2) a new color matrix calculation that is performed at the beginning of read 1, 3) using 11 cycles to increase diversity, and 4) new optimizations to phasing and pre-phasing corrections to each cycle and tile to maximize intensity data. Now with a software update and as little as 5% PhiX spike-in, you can sequence low diversity libraries and expect significantly better MiSeq data quality.  

Other instruments, including the HiSeq and GAIIx still require at least 20-50% PhiX and are less suited for low diversity samples. If you must use a HiSeq for your amplicon libraries take the following steps with low diversity libraries:

1)      Reduce your cluster density by 50-80% to reduce overlapping clusters

2)      Use a high amount of PhiX spike in (up to 50%) of the total library

3)      Use custom primers with a random sequence to increase diversity. Alternatively, intentionally concatamerize your amplicons and fragment them to increase base diversity at the start of your reads.

The NextSeq 500, released in March of 2014, uses a two channel SBS sequencing process, likely making it even less suited for low diversity amplicons. As of 4/2014, Illumina has not performed significant validation or testing using low diversity samples on the NextSeq 500. It is not expected the NextSeq 500 instrument will perform better than the HiSeq for these sample types.

So, in conclusion, the MiSeq is currently still the best Illumina instrument for sequencing samples of low diversity: https://genohub.com/shop-by-next-gen-sequencing-technology/#query=c814746ad739c57b9a69e449d179c27c