Sequencing Suggests the Ebola Virus Genome is Changing

Genome of the Ebola Virus is Changing Rapidly

Using high throughput sequencing, researchers from MIT, Harvard and the Sierra Leone Ministry of Health and Sanitation have recently reported rapid changes in the Ebola’s genetic code. The Ebola virus genome, a single stranded RNA comprised of ~19,000 nucleotides encodes several structural proteins: RNA Polymerase,  nucleoprotein, polymerase co-factors and transcription activators. The researchers used Illumina HiSeq 2500 platforms to achieve 2000x coverage of the Ebola genome. Using Genohub, we estimate the cost to sequence 100 such genomes at 2000x to be under $1,500: https://genohub.com/shop-by-next-gen-sequencing-project/#query=0929767cd66b8ec8a9fb209c99d75b27.

Sequencing 99 Ebola genomes from 78 patients, they found greater than 300 genetic changes that make the genomes sequenced from the current outbreak distinct from previous outbreaks. In fact, they found that the substitution rate was twice as high with this year’s outbreak compared to all other Ebola virus outbreaks. They also determined that mutations during this year’s outbreak were frequently nonsynonymous (mutation that alters the amino acid sequence of a protein). 50 mutational events and 29 new viral lineages were observed in this outbreak alone, suggesting potential for viral adaptation. To determine whether Ebola could be evolving away from defenses against it or whether it could become more contagious and spread faster, will require functional analysis. For their part, Gire et al., have published the full length Ebola genomes in the NCBI database.  Tragically, the authors note that 5 co-authors died from the disease before the manuscript could be published. Last week The New Yorker, published The Ebola Wars, an excellent in depth story of the work involved to actually sequence the Ebola genome and track its mutations.

While basic PCR tests are sufficient for giving you a yes/no answer about infection, this new study highlights the important role of sequencing in characterizing patterns of viral transmission and mutations in an epidemic. We expect sequencing to play a greater role in development of diagnostics and treatments for this and other viral outbreaks.

 

 

 

 

Illumina HiSeq v4 Sequencing Services Yielding 1Tb Data / Week Now Available on Genohub

Read output per Illumina Lane

Latest Illumina Chemistry – Reads/Lane

In line with our efforts to democratize the latest high throughput sequencing technology, we’re pleased to announce the availability of HiSeq sequencing services with v4 chemistry. Any researcher, anywhere in the world can now order this sequencing service in a matter of minutes using Genohub.com.

The new Illumina HiSeq version 4 chemistry allows for sequencing runs with 25% greater read length (2×125 for high-output runs) and 33% more clusters.  Running two full flow cells, users can expect to generate up to 1Tb of data per week, or 167 Gb per day. At an output of 250M reads per lane, users will need at least 2 lanes to achieve ~35x coverage of the human genome. While this isn’t the most efficient tech for sequencing whole human genomes (HiSeq X Ten only requires a single lane), it certainly is for exome, transcriptome and re-sequencing applications.  Earlier this year we announced the availability of NextSeq 500 and HiSeq X Ten services on Genohub.com.

The new HiSeq v4 chemistry not only improves output, but reduces the time it takes for a sequencing run to complete. With run times taking only 6 days, we expect several of our HiSeq Rapid Run users to begin switching over to take advantage of outputs of 250M reads / lane in one week instead of 150M reads / lane in a single day. As we point this out, it’s important to note that HiSeq Rapid read lengths have also increased to 2×250. 

If you’re not sure exactly what platform, chemistry or read length is the most efficient for your application, use our Shop by Project page and enter the numbers of reads or coverage you need. We’ll display all your options! 

Illumina’s Next Big Pivot

President of Illumina

In a recent article in MIT Technology Review, Francis de Souza, president of Illumina is quoted as saying 228,000 human genomes will be sequenced this year (2014).  He further estimates that this number will double every 12 months to reach 1.6 million genomes by 2017. In a March blog post we extrapolated 400,000 genomes in 2015 by estimating the throughput of Illumina instruments in the market, HiSeq X Ten projects initiated on Genohub and large population sequencing projects starting in the UK and other countries. Pretty close to De Souza’s latest numbers. 

80% of the genomes sequenced this year will be part of scientific research projects, making one wonder when ‘clinical genomes’ will be ready. To get there we’re going to either need greater throughput, higher coverage or lower costs. However, instead on focusing on reducing costs, Illumina is betting on simplified, targeted sequencing. According to De Souza, “It’s not clear you can get another order of magnitude out of this…people are saying the price is not the issue”.  Rather than focusing on selling complex instruments, Illumina wants to become an everyday brand in hospitals. Illumina is actually in the process of simplifying their instruments and developing clinically relevant, targeted panels to be sold as FDA approved kits.

While targeted panels for research purposes are available today, most are not-regulated. Illumina believes regulation is a necessary step the FDA will have to take in order for targeted sequencing to become more popular in the clinic. A fast track way to get there is to work with pharmaceutical companies who are in the business of getting approval from the FDA. Last month, Illumina said it was developing a universal NGS based oncology test with AstraZeneca, Janssen Biotech, and Sanofi for use as a companion diagnostic on it’s MiseqDX platform. Today, Thermo Fisher announced plans to develop NGS based tests for solid tumors on its Ion PGM Dx platform with Pfizer and GSK.  At least in the short term future, it looks like targeted re-sequencing will be a mainstay in the clinic while research based WGS will guide targeted panel design. 

Next Generation Sequencing Applications in the Clinic Today

clinical sequencing carcinoma

Every year an increasing number of next-generation sequencing based diagnostic assays are validated and enter into clinical practice. Hundreds more are developed for pre-clinical research purposes. NGS applications being used in the clinic today include pre-implantation genetic screening (PGS) for in vitro fertilization, chromosomal aneuploidy detection, mutation analysis of patient samples, and sequence driven chemotherapeutics.

Comprehensive assays that identify single base substitutions and fusion events are commonly performed to establish a diagnosis or to help in deciding what drug treatment is best. While routine implementation of clinical NGS in oncology is still in it’s infancy, several assays are currently being performed in CLIA-certified environments: amplicon-based gene panels (1, 2), targeted capture based gene panels (3, 4), full exome and transcriptome-seq (5, 6), whole genome and RNA-seq (7-9). Some patients are even being treated with drugs for off-label indications, based on NGS tests (of all chemotherapeutic prescriptions, 33-47% are off-label (10)). As assays are standardized and confirmed by orthogonal platforms, we expect see an increase in the number of these clinical applications.

NGS-based pre-implantation genetic screening has significantly changed prenatal testing and screening. Currently only ~ 25% of invitro-fertilization procedures are successful. This low rate of success is due to an increasing maternal age and chromosomal aneuploidy.  PGS is performed to select chromosomally balanced embryos during the IVF process, ensuring only euploid embryos are implanted. This NGS based assay has been shown to improve implantation success rates and is being used in clinics today.

Non-invasive prenatal testing using cell-free fetal DNA circulating in maternal blood allows for the detection of genetic diseases and common chromosomal aneuploidies such as trisomies 13, 18, and 21. Fetal DNA, comprising about 10% of the DNA in maternal circulation becomes detectable between 5-10 weeks after conception. The method allows for an early assessment of aneuploidy without the risk of harming the fetus. Sequenom, Verinata Health, Ariosa Diagnostics and Natera each offer CAP and CLIA certified tests and are available at OB/GYN offices.

In November 2013, the FDA cleared the Illumina MiSeqDx platform as a class II device and a cystic fibrosis carrier screening assay. The assay detects 139 variants in the cystic fibrosis transmembrane conductance regulator gene (CFTR) gene.

These are highlights of assays one could expect to receive if visiting a clinic today. We’d like to hear from you about others in development or those currently in practice.

 

1)    Committee on a Framework for Development a New Taxonomy of Disease & the National Research Council. Toward Precision Medicine: Building a Knowlege Network for Biomedical Research and a New Taxonomy of Disease (National Academies Press, 2011).

2)    Beadling, C. et al. Combining highly multiplexed PCR with semiconductor-based sequencing for rapid cancer genotyping. J. Mol. Diagn. 15, 171–176 (2013).

3)    Dagher, R. et al. Approval summary: imatinib mesylate in the treatment of metastatic and/or unresectable malignant gastrointestinal stromal tumors. Clin. Cancer Res. 8, 3034–3038 (2002)

4)    Davies, H. et al. Mutations of the BRAF gene in human cancer. Nature 417, 949–954 (2002).

5)    Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature Biotech. 27, 182–189 (2009).

6)    Roychowdhury, S. et al. Personalized oncology through integrative high-throughput sequencing: a pilot study.Sci. Transl. Med. 3, 111ra121 (2011).

7)    Wagle, N. et al. High-throughput detection of actionable genomic alterations in clinical tumor samples by targeted, massively parallel sequencing.Cancer Discov. 2, 82–93 (2012).

8)    Matulonis, U. A. et al. High throughput interrogation of somatic mutations in high grade serous cancer of the ovary. PLoS ONE 6, e24433 (2011).

9)    Weiss, G. J. et al. Paired tumor and normal whole genome sequencing of metastatic olfactory neuroblastoma. PLoS ONE 7, e37029 (2012).

10) Conti, R.M. et al. Prevalence of Off-Label Use and Spending in 2010 Among Patent-Protected Chemotherapies in a Population-Based Cohort of Medical Oncologists. J. Clin. Oncol. 31, 1134–1139 (2013).

Mycoplasma Contamination in your Sequencing Data

mycoplasma contamination

Mycoplasma, the bane of any cell culture lab’s existence is a genus of bacteria characterized by a lack of a cell wall.  With a relatively small genome, mycoplasma have limited biosynthetic capabilities, requiring a host to efficiently replicate. Inspired by a bout of mycoplasma contamination in their own lab, Anthony O Olarerin-George and John B Hogenesch from the University of Pennsylvania recently set out to determine how widespread mycoplasma contamination was in other labs by screening RNA-seq data deposited in the NCBI Sequence Read Archive (1). Their study estimates that ~ 11% of NCBI’s Gene Expression Omnibus (GEO) projects between 2012 and 2013 contain at least ≥ 100 reads / million reads mapping to mycoplasma’s small 0.6 Mb genome. They also reference a recent study (2) which suggests that 7% of the samples from the 1,000 Genomes project are contaminated. Bad news if you’ve recently completed a large study and are wondering why you have so many unmapped reads. While most of these are likely from regions of the genome that haven’t been sequenced, reads mapping to mycoplasma should be taken seriously as they can affect the expression of thousands of genes and slow cellular growth.

Preventing contamination in the first place along with routine monitoring is essential, but if you’ve already completed the sequencing end of your project you can start aligning your data to several completed mycoplasma genomes.

With recent drops in cost, routine sequencing of cell culture samples has become more prevalent. If you’re interested in testing your cultures, start by searching for sequencing services and providers on Genohub

1) Assessing the prevalence of mycoplasma contamination in cell culture via a survey of NCBI’s RNA-seq archive. Anthony O Olarerin-George, John B Hogenesch doi: http://dx.doi.org/10.1101/007054

2) Mycoplasma contamination in the 1000 Genomes Project. William B Langdon

Beginner’s Guide to Exome Sequencing

Exome Capture Kit Comparison

With decreasing costs to sequence whole human genomes (currently $1,550 for 35X coverage), we frequently hear researchers ask, “Why should I only sequence protein coding genes” ?

First, WGS of entire populations is still quite expensive. These types of projects are currently only being performed by large centers or government entities, like Genomics England, a company owned by UK’s Department of Health, which announced that they would sequence 100,000 whole genomes by 2017. At Genohub’s rate of $1,550/genome, 100,000 genomes would cost $155 million USD. This $155 million figure only includes sequencing costs and does not take into account labor, data storage and analysis which is likely several fold greater. 

Second, the exome, or all ~180,000 exons comprise less than 2% of all sequence in the human genome, but contain 85-90% of all known disease causing variants. A more focused dataset makes interpretation and analysis a lot easier.

Let’s assume you’ve decided to proceed with exome sequencing. The next step is to either find a service provider to perform your exome capture, sequencing and analysis or do it yourself. Genohub has made it easy to find and directly order sequencing services from providers around the world. Several of our providers offer exome library prep and sequencing services. If you’re only looking for someone to help with your data analysis, you can contact one of our providers offering exome bioinformatics services. Whether you decide to send your samples to a provider or make libraries yourself, you’ll need to decide on what capture technology to use, the number of reads you’ll need and what type of read length is most appropriate for your exome-seq project.

There are currently three main capture technologies available: Agilent SureSelect, Illumina Nextera Rapid Capture, Roche Nimblegen SeqCap EZ Exome. All three are in-solution based and utilize biotinylated DNA or RNA probes (baits) that are complementary to exons. These probes are added to genomic fragment libraries and after a period of hybridization, magnetic streptavidin beads are used to pull down and enrich for fragmented exons. Each of these three exome capture technologies is compared in a detailed table: https://genohub.com/exome-sequencing-library-preparation/. Each kit has a varying numbers of probes, probe length, target region, input DNA requirements and hybridization time. Researchers planning on exome sequencing should first determine whether the technology they’re considering covers their regions of interest. Only 26.2 Mb of total targeted bases are in common, and only small portions of the CCDS Exome are uniquely covered by each tech (Chilamakuri, 2014).

Our Exome Guide breaks down the steps you’ll need to determine how much sequencing and what read length is appropriate for your exome capture sequencing project.

Sequencing, Finishing, Analysis in the Future – 2014 – Day 1 Meeting Highlights

SFAF 2014

Sequencing Finishing and Analysis in the Future Meeting 2014

Arguably, one of the top genome conferences, the annual SFAF meeting began this year in Santa Fe with a great line up of speakers from genome centers, academia and industry. Frankly, what’s amazing is that the meeting is completely supported by outside funding, there is no registration fee (hope that last comment doesn’t spoil the intimate, small size of the meeting next year).

Rick Wilson kicked off SFAF with his keynote titled, ‘Recent Advances in Cancer Genomics’. He discussed a few clinical cases where the combination of whole genome sequencing, exome-seq and RNA-seq were used to help diagnose and guide targeted drug cancer therapy.  He emphasized that this combination based sequencing approach would be required to identify actionable genes and that WGS or exome-seq alone wasn’t enough.  

Jonathan Bingham from Google announced the release of a simple web-based API to import, process, store, and collaborate with genomic data: https://gabrowse.appspot.com/. He mentioned that Google thinks of computing in terms of data centers, where is there availability? At any given time their idle computers pooled together are larger than any single data center. His new genomics team is looking to harness this and use it for genome analysis. He made the comparison of a million genomes adding up to more than 100 petabytes, on the scale of their web search index.

Steve Turner from Pacific Biosciences discussed platform advances that have led to higher quality assemblies that rival pre-second generation clone by clone sequencing. He made an analogy to the current state of transcriptome assembly: like putting a bunch of magazines in the shredder, the gluing pieces together. He described a method that is now available for construction of full length transcripts, cDNA SMRTbell™ libraries for single molecule sequencing. Finally, he announced that there were >100 Pacbio instruments installed in the field. At Genohub, we already have several listed, with service available for purchase: https://genohub.com/shop-by-next-gen-sequencing-technology/#query=f64db717ac261dad127c20124a9e1d85.

Kelly Hoon from Illumina was next up. She described a series of new updates, the most notable being the submission of the HiSeq2500 for FDA approval by the end of the year. Other points included updates to basespace, the 1T upgrade (1T data in 6 days), Neoprep allows 1 ng of input, coming this summer, new RNA capture kits and a review of Nextseq optics.

Thermo Fisher’s presentation was immediately after Illumina. Most of the discussion was on Ion Torrent’s new Hi-Q system, designed to improve accuracy, read-length and error-rates.

Right after the platform talks was a panel discussion with Pacbio, Illumina, Roche and Thermo Fisher. Main points from that discussion were: 

  • Steve Turner from Pacbio declined to discuss or entertain discussion on benchtop platform. This was met with lots of audience laughter
  • Illumina had no response for ONT except to say they’re not going to respond to ONT until after they launch…ouch.
  • Pacbio said that right now read length is not being limited by on board chemistry but rather quality of input DNA.
  • Roche 454 is phasing out 454 but looking to compete on 4-5 other possibilities (very interesting news)

Ruth Timme from the FDA discussed implementation of an international NGS network of public health labs to collect and submit draft genomes of food pathogens to a reference database. Data coming in from these sites provides the FDA with actionable leads in outbreak investigations. Currently Genome Trakr consists of six health state labs and a series of FDA labs.

Sterling Thomas discussed Noblis’ Center for Applied High Performance Computing (CAHPC) suite of high speed algorithms called BioVelocity. BioVelocity basically performs reference based multiple sequence alignment (MSA) and variant detection on human raw reads. High speed variant finding in adenocarcinoma using whole genome sequencing was used as an example.

Sean Conlan from NHGRI discussed sequence analysis of plasmid diversity amongst hospital-associated carbapenem-resistant Enterobactericeae. Using finished genome sequences of isolates from patients and the hospital, he was able to better understand transmission of bacterial strains and plasmids encoding antibiotic resistance.

David Trees examined the use of WGS to determine molecular mechanisms responsible for decreased susceptibility and resistance to azithromycin in gonorrhoeae. Predominant causes of resistance included mutations in the promotor region or structure gene of mtrR and mutations in 23S rRNA alleles located on the gonococcal chromosome.

Darren Grafham from Sheffield Diagnostic Genetics Services emphasized the importance of consensus in the choice of an analytical pipeline along side Sanger confirmation of variants for diagnostics. He described his pipeline that is currently being use in a clinical diagnostic lab for regular screening of inherited, pathogenic variants. He stated that 30x coverage is the point at which false positives are eliminated with >99.9% confidence.

Other talks during the first day (that we likely missed enjoying the beautiful Santa Fe weather):  

Heike Sichtig: Enabling Sequence Based Technologies for Clinical Diagnostic: FDA Division of Microbiology Devices Perspective

Christian Buhay: The BCM-HGSC Clinical Exome: from concept to implementation

Dinwiddie: WGS of Respiratory Viruses from Clinical Nasopharyngeal Swabs

Karina Yusim: Analyzing TB Drug Resistance

Colman: Universal Tail Amplicon Sequencing

Roby Bhattacharyya: Transcriptional signatures in microbial diagnostics

Eija Trees: NGS as a surveillance tool

Helen Cui: Genomics Capability Development and Cooperative Research with Global Engagement

Raphael Lihana: HIV-1 Subtype Surveillance in Kenya: the Puzzle of Emerging Drug Resistance and Implications on Continuing Care

Gvantsa Chanturia: NGS Capability at NCDC

The night ended with a poster and networking session. The entire agenda is posted here: http://www.lanl.gov/conferences/sequencing-finishing-analysis-future/agenda.php

Follow us on twitter and #SFAF2014 for the latest updates !

 

 

 

 

Ask a Bioinformatician

In the last 2 years, next-gen sequencing instrument output has significantly increased; labs are now sequencing more samples with greater depth than ever before. Demand for the analysis of next generation sequencing data is also growing at an arguably even higher rate. To help accommodate this higher demand, Genohub.com now allows researchers to quickly find and connect directly with service providers who have specific data analysis expertise: https://genohub.com/bioinformatics-services-and-providers/

 Whether it’s a simple question about gluing piece together in a pipeline or a request to have your transcriptome annotated, researchers can quickly choose a bioinformatics provider based on their expertise and post queries and project requests. Services that bioinformaticians offer on Genohub are broken down into primary, secondary and tertiary data analysis:

Primary – Involves the quality analysis of raw sequence data from a sequencing platform. Primary analysis solutions are typically provided by the platform after the sequencing phase is complete. This often results in a FASTQ file, which is a combination of sequence data and Phred quality scores for each base.

Secondary – Encompasses sequence alignment, assembly and variant calling of aligned reads. Analysis is usually resource intensive, requiring a significant amount of data and compute resources. This type of analysis often requires a set of algorithms that can often be automated into a pipeline. While the simplest pipelines can be a matter of gluing together publically available tools, a certain level of expertise is required to maintain and optimize the analysis flow for a particular project. 

Tertiary Analysis – Annotation, variant call validation, data aggregation and sample or population based statistical analysis are all components of tertiary data analysis. This type of analysis is typically performed to answer a specific biologically relevant question or generate a series of new hypothesis that need testing.

Researchers that still need library prep, sequencing and data analysis services can still search find and begin projects as before using our Shop by Project page. What’s new is that researchers who only need data analysis services can now directly search for and contact a bioinformatics service provider to request a quote: https://genohub.com/bioinformatics-services-and-providers/

Whether you plan performing a portion of your sequencing data analysis yourself or intend on taking on the challenge of putting together your own pipeline, consultation by a seasoned expert saves time and ensures you’re the way to successfully completing your project. By adding this new service, we’re trying to make it easier to search for and identify the right provider for your analysis requirements.

If you’re a service provider and would like your services to be listed on Genohub, you can sign up for Service Provider Account or contact us to discuss the screening and approval process.

 

PEG Precipitation of DNA Libraries – How Ampure or SPRIselect works

Polyethylene_glycol

One question we’ve been asked, and one that our NGS providers are frequently asked, is how in principle does PEG precipitate DNA in next generation sequencing library preparation cleanup? We usually hear the question presented as: how do Agencourt’s Ampure XP or SPRIselect beads precipitate DNA? The answer has to do with the chemical properties of DNA, polyethylene glycol (PEG), the beads being used and water.  Polystyrene – magnetite beads (Ampure) are coated with a layer of negatively charged carboxyl  groups. DNA’s highly charged phosphate backbone makes it polar, allowing it to readily dissolve in water (also polar). When PEG [ H-(O-CH2-CH2)n-OH ] is added to a DNA solution in saturating condition, DNA forms large random coils. Adding this hydrophilic molecule with the right concentration of salt (Na+) causes DNA to aggregate and precipitate out of solution from lack of solvation (1, 2). Too much salt and you’ll have a lot of salty DNA, too little will result in poor recovery. The Na+ ions shield the negative phosphate backbones causing DNA to stick together and on anything else that’s in near vicinity (including carboxylated beads). Once you’re ready to elute your DNA and put it back into solution (after you’ve done your size selection or removal of enzymes, nucleotides, etc.) an aqueous solution is added back (TE or water) fully hydrating the DNA and moving it from an aggregated state back into solution. The negative charge of the carboxyl beads now repel DNA, allowing the user to extract it in the supernatant. Changing the amount of PEG and salt concentration can aid in size selecting DNA (2). This is a common method in NGS library preparation where the user is interested in size selecting a fragment of particular size. It’s often used to replace gel steps in NGS library prep. There is already a wealth of literature out there on conditions to size select DNA, just do a simple google search. The first article we’ve found that describes this selection is referenced below (3).

Updated 7/18/2016

Since this publication on May 7th, 2014, there are several more commercial, Ampure-like size selection beads on the market:

  • MagJet – ThermoFisher
  • Mag-Bind – Omega Biotek
  • Promega Beads – Promega
  • Kapa Pure Beads – Kapa Biosystems

While we haven’t explored each one of these yet, we suspect the chemistry behind precipitation and selection is very similar. If you’d like to share information about these beads, please leave us a comment or send us an email at support@genohub.com

If you’d like help in constructing your NGS library contact us, and we’d be happy to consult with you on your sequencing project: https://genohub.com/ngs-consultation/

If you’re looking for a NGS service provider, check out our NGS Service Matching Engine: https://genohub.com/.

(1)     A Transition to a Compact Form of DNA in Polymer Solutions: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC389314/pdf/pnas00083-0227.pdf

(2)    DNA Condensation by Multivalent Cations: https://www.biophysics.org/Portals/1/PDFs/Education/bloomfield.pdf

(3)    Size fractionation of double -stranded DNA by precipitation with polyethylene glycol: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC342844/pdf/nar00500-0080.pdf

 

NextSeq, HiSeq or MiSeq for Low Diversity Sequencing ?

Low diversity libraries, such as those from amplicons and those generated by restriction digest can suffer from Illumina focusing issues, a problem not found with random fragment libraries (genomic DNA). Illumina’s real time analysis software uses images from the first 4 cycles to determine cluster positions (X,Y coordinates for each cluster on a tile). With low diversity samples, color intensity is not evenly distributed causing a phasing problem. This tends to result in a high phasing number that deteriorates quickly.

Traditionally this problem is solved in two ways:

1)      ‘Spiking in’ a higher diversity sample such as PhiX (small viral genome used to enable quick alignment and estimation of error rates) into your library.  This increases the diversity at the beginning of your read and takes care of intensity distribution across all four channels. Many groups spike in as much as 50% PhiX in order to achieve a more diverse sample. This disadvantage of this is that you lose 50% of your reads to sample you were never interested in sequencing.

2)      Other groups have designed amplicon primers with a series of random ‘N’ (25%A, 25%T, 25%G, 25%C) bases upstream of their gene target. This and a combination of PhiX spike also helps to increase color diversity. The disadvantage is that these extra bases cut into your desired read length and can be problematic when you are trying to conserve cycles to sequence a 16S variable domain.

Last year, Illumina released a new version of their control program that included updated MiSeq Real Time Analysis (RTA) software that significantly improves the data quality of low diverse samples. This included 1) improved template generation and higher sensitivity template detection of optically dense and dark images,  2) a new color matrix calculation that is performed at the beginning of read 1, 3) using 11 cycles to increase diversity, and 4) new optimizations to phasing and pre-phasing corrections to each cycle and tile to maximize intensity data. Now with a software update and as little as 5% PhiX spike-in, you can sequence low diversity libraries and expect significantly better MiSeq data quality.

Other instruments, including the HiSeq and GAIIx still require at least 20-50% PhiX and are less suited for low diversity samples. If you must use a HiSeq for your amplicon libraries take the following steps with low diversity libraries:

1)      Reduce your cluster density by 50-80% to reduce overlapping clusters

2)      Use a high amount of PhiX spike in (up to 50%) of the total library

3)      Use custom primers with a random sequence to increase diversity. Alternatively, intentionally concatamerize your amplicons and fragment them to increase base diversity at the start of your reads.

The NextSeq 500, released in March of 2014, uses a two channel SBS sequencing process, likely making it even less suited for low diversity amplicons. As of 4/2014, Illumina has not performed significant validation or testing using low diversity samples on the NextSeq 500. It is not expected the NextSeq 500 instrument will perform better than the HiSeq for these sample types.

So, in conclusion, the MiSeq is currently still the best Illumina instrument for sequencing samples of low diversity: https://genohub.com/shop-by-next-gen-sequencing-technology/#query=c814746ad739c57b9a69e449d179c27c