Beginner’s Guide to Exome Sequencing

Exome Capture Kit Comparison

With decreasing costs to sequence whole human genomes (currently $1,550 for 35X coverage), we frequently hear researchers ask, “Why should I only sequence protein coding genes” ?

First, WGS of entire populations is still quite expensive. These types of projects are currently only being performed by large centers or government entities, like Genomics England, a company owned by UK’s Department of Health, which announced that they would sequence 100,000 whole genomes by 2017. At Genohub’s rate of $1,550/genome, 100,000 genomes would cost $155 million USD. This $155 million figure only includes sequencing costs and does not take into account labor, data storage and analysis which is likely several fold greater. 

Second, the exome, or all ~180,000 exons comprise less than 2% of all sequence in the human genome, but contain 85-90% of all known disease causing variants. A more focused dataset makes interpretation and analysis a lot easier.

Let’s assume you’ve decided to proceed with exome sequencing. The next step is to either find a service provider to perform your exome capture, sequencing and analysis or do it yourself. Genohub has made it easy to find and directly order sequencing services from providers around the world. Several of our providers offer exome library prep and sequencing services. If you’re only looking for someone to help with your data analysis, you can contact one of our providers offering exome bioinformatics services. Whether you decide to send your samples to a provider or make libraries yourself, you’ll need to decide on what capture technology to use, the number of reads you’ll need and what type of read length is most appropriate for your exome-seq project.

There are currently three main capture technologies available: Agilent SureSelect, Illumina Nextera Rapid Capture, Roche Nimblegen SeqCap EZ Exome. All three are in-solution based and utilize biotinylated DNA or RNA probes (baits) that are complementary to exons. These probes are added to genomic fragment libraries and after a period of hybridization, magnetic streptavidin beads are used to pull down and enrich for fragmented exons. Each of these three exome capture technologies is compared in a detailed table: https://genohub.com/exome-sequencing-library-preparation/. Each kit has a varying numbers of probes, probe length, target region, input DNA requirements and hybridization time. Researchers planning on exome sequencing should first determine whether the technology they’re considering covers their regions of interest. Only 26.2 Mb of total targeted bases are in common, and only small portions of the CCDS Exome are uniquely covered by each tech (Chilamakuri, 2014).

Our Exome Guide breaks down the steps you’ll need to determine how much sequencing and what read length is appropriate for your exome capture sequencing project.

Sequencing, Finishing, Analysis in the Future – 2014 – Day 1 Meeting Highlights

SFAF 2014

Sequencing Finishing and Analysis in the Future Meeting 2014

Arguably, one of the top genome conferences, the annual SFAF meeting began this year in Santa Fe with a great line up of speakers from genome centers, academia and industry. Frankly, what’s amazing is that the meeting is completely supported by outside funding, there is no registration fee (hope that last comment doesn’t spoil the intimate, small size of the meeting next year).

Rick Wilson kicked off SFAF with his keynote titled, ‘Recent Advances in Cancer Genomics’. He discussed a few clinical cases where the combination of whole genome sequencing, exome-seq and RNA-seq were used to help diagnose and guide targeted drug cancer therapy.  He emphasized that this combination based sequencing approach would be required to identify actionable genes and that WGS or exome-seq alone wasn’t enough.  

Jonathan Bingham from Google announced the release of a simple web-based API to import, process, store, and collaborate with genomic data: https://gabrowse.appspot.com/. He mentioned that Google thinks of computing in terms of data centers, where is there availability? At any given time their idle computers pooled together are larger than any single data center. His new genomics team is looking to harness this and use it for genome analysis. He made the comparison of a million genomes adding up to more than 100 petabytes, on the scale of their web search index.

Steve Turner from Pacific Biosciences discussed platform advances that have led to higher quality assemblies that rival pre-second generation clone by clone sequencing. He made an analogy to the current state of transcriptome assembly: like putting a bunch of magazines in the shredder, the gluing pieces together. He described a method that is now available for construction of full length transcripts, cDNA SMRTbell™ libraries for single molecule sequencing. Finally, he announced that there were >100 Pacbio instruments installed in the field. At Genohub, we already have several listed, with service available for purchase: https://genohub.com/shop-by-next-gen-sequencing-technology/#query=f64db717ac261dad127c20124a9e1d85.

Kelly Hoon from Illumina was next up. She described a series of new updates, the most notable being the submission of the HiSeq2500 for FDA approval by the end of the year. Other points included updates to basespace, the 1T upgrade (1T data in 6 days), Neoprep allows 1 ng of input, coming this summer, new RNA capture kits and a review of Nextseq optics.

Thermo Fisher’s presentation was immediately after Illumina. Most of the discussion was on Ion Torrent’s new Hi-Q system, designed to improve accuracy, read-length and error-rates.

Right after the platform talks was a panel discussion with Pacbio, Illumina, Roche and Thermo Fisher. Main points from that discussion were: 

  • Steve Turner from Pacbio declined to discuss or entertain discussion on benchtop platform. This was met with lots of audience laughter
  • Illumina had no response for ONT except to say they’re not going to respond to ONT until after they launch…ouch.
  • Pacbio said that right now read length is not being limited by on board chemistry but rather quality of input DNA.
  • Roche 454 is phasing out 454 but looking to compete on 4-5 other possibilities (very interesting news)

Ruth Timme from the FDA discussed implementation of an international NGS network of public health labs to collect and submit draft genomes of food pathogens to a reference database. Data coming in from these sites provides the FDA with actionable leads in outbreak investigations. Currently Genome Trakr consists of six health state labs and a series of FDA labs.

Sterling Thomas discussed Noblis’ Center for Applied High Performance Computing (CAHPC) suite of high speed algorithms called BioVelocity. BioVelocity basically performs reference based multiple sequence alignment (MSA) and variant detection on human raw reads. High speed variant finding in adenocarcinoma using whole genome sequencing was used as an example.

Sean Conlan from NHGRI discussed sequence analysis of plasmid diversity amongst hospital-associated carbapenem-resistant Enterobactericeae. Using finished genome sequences of isolates from patients and the hospital, he was able to better understand transmission of bacterial strains and plasmids encoding antibiotic resistance.

David Trees examined the use of WGS to determine molecular mechanisms responsible for decreased susceptibility and resistance to azithromycin in gonorrhoeae. Predominant causes of resistance included mutations in the promotor region or structure gene of mtrR and mutations in 23S rRNA alleles located on the gonococcal chromosome.

Darren Grafham from Sheffield Diagnostic Genetics Services emphasized the importance of consensus in the choice of an analytical pipeline along side Sanger confirmation of variants for diagnostics. He described his pipeline that is currently being use in a clinical diagnostic lab for regular screening of inherited, pathogenic variants. He stated that 30x coverage is the point at which false positives are eliminated with >99.9% confidence.

Other talks during the first day (that we likely missed enjoying the beautiful Santa Fe weather):  

Heike Sichtig: Enabling Sequence Based Technologies for Clinical Diagnostic: FDA Division of Microbiology Devices Perspective

Christian Buhay: The BCM-HGSC Clinical Exome: from concept to implementation

Dinwiddie: WGS of Respiratory Viruses from Clinical Nasopharyngeal Swabs

Karina Yusim: Analyzing TB Drug Resistance

Colman: Universal Tail Amplicon Sequencing

Roby Bhattacharyya: Transcriptional signatures in microbial diagnostics

Eija Trees: NGS as a surveillance tool

Helen Cui: Genomics Capability Development and Cooperative Research with Global Engagement

Raphael Lihana: HIV-1 Subtype Surveillance in Kenya: the Puzzle of Emerging Drug Resistance and Implications on Continuing Care

Gvantsa Chanturia: NGS Capability at NCDC

The night ended with a poster and networking session. The entire agenda is posted here: http://www.lanl.gov/conferences/sequencing-finishing-analysis-future/agenda.php

Follow us on twitter and #SFAF2014 for the latest updates !

 

 

 

 

Ask a Bioinformatician

In the last 2 years, next-gen sequencing instrument output has significantly increased; labs are now sequencing more samples with greater depth than ever before. Demand for the analysis of next generation sequencing data is also growing at an arguably even higher rate. To help accommodate this higher demand, Genohub.com now allows researchers to quickly find and connect directly with service providers who have specific data analysis expertise: https://genohub.com/bioinformatics-services-and-providers/

 Whether it’s a simple question about gluing piece together in a pipeline or a request to have your transcriptome annotated, researchers can quickly choose a bioinformatics provider based on their expertise and post queries and project requests. Services that bioinformaticians offer on Genohub are broken down into primary, secondary and tertiary data analysis:

Primary – Involves the quality analysis of raw sequence data from a sequencing platform. Primary analysis solutions are typically provided by the platform after the sequencing phase is complete. This often results in a FASTQ file, which is a combination of sequence data and Phred quality scores for each base.

Secondary – Encompasses sequence alignment, assembly and variant calling of aligned reads. Analysis is usually resource intensive, requiring a significant amount of data and compute resources. This type of analysis often requires a set of algorithms that can often be automated into a pipeline. While the simplest pipelines can be a matter of gluing together publically available tools, a certain level of expertise is required to maintain and optimize the analysis flow for a particular project. 

Tertiary Analysis – Annotation, variant call validation, data aggregation and sample or population based statistical analysis are all components of tertiary data analysis. This type of analysis is typically performed to answer a specific biologically relevant question or generate a series of new hypothesis that need testing.

Researchers that still need library prep, sequencing and data analysis services can still search find and begin projects as before using our Shop by Project page. What’s new is that researchers who only need data analysis services can now directly search for and contact a bioinformatics service provider to request a quote: https://genohub.com/bioinformatics-services-and-providers/

Whether you plan performing a portion of your sequencing data analysis yourself or intend on taking on the challenge of putting together your own pipeline, consultation by a seasoned expert saves time and ensures you’re the way to successfully completing your project. By adding this new service, we’re trying to make it easier to search for and identify the right provider for your analysis requirements.

If you’re a service provider and would like your services to be listed on Genohub, you can sign up for Service Provider Account or contact us to discuss the screening and approval process.

 

NextSeq, HiSeq or MiSeq for Low Diversity Sequencing ?

Low diversity libraries, such as those from amplicons and those generated by restriction digest can suffer from Illumina focusing issues, a problem not found with random fragment libraries (genomic DNA). Illumina’s real time analysis software uses images from the first 4 cycles to determine cluster positions (X,Y coordinates for each cluster on a tile). With low diversity samples, color intensity is not evenly distributed causing a phasing problem. This tends to result in a high phasing number that deteriorates quickly.

Traditionally this problem is solved in two ways:

1)      ‘Spiking in’ a higher diversity sample such as PhiX (small viral genome used to enable quick alignment and estimation of error rates) into your library.  This increases the diversity at the beginning of your read and takes care of intensity distribution across all four channels. Many groups spike in as much as 50% PhiX in order to achieve a more diverse sample. This disadvantage of this is that you lose 50% of your reads to sample you were never interested in sequencing.

2)      Other groups have designed amplicon primers with a series of random ‘N’ (25%A, 25%T, 25%G, 25%C) bases upstream of their gene target. This and a combination of PhiX spike also helps to increase color diversity. The disadvantage is that these extra bases cut into your desired read length and can be problematic when you are trying to conserve cycles to sequence a 16S variable domain.

Last year, Illumina released a new version of their control program that included updated MiSeq Real Time Analysis (RTA) software that significantly improves the data quality of low diverse samples. This included 1) improved template generation and higher sensitivity template detection of optically dense and dark images,  2) a new color matrix calculation that is performed at the beginning of read 1, 3) using 11 cycles to increase diversity, and 4) new optimizations to phasing and pre-phasing corrections to each cycle and tile to maximize intensity data. Now with a software update and as little as 5% PhiX spike-in, you can sequence low diversity libraries and expect significantly better MiSeq data quality.

Other instruments, including the HiSeq and GAIIx still require at least 20-50% PhiX and are less suited for low diversity samples. If you must use a HiSeq for your amplicon libraries take the following steps with low diversity libraries:

1)      Reduce your cluster density by 50-80% to reduce overlapping clusters

2)      Use a high amount of PhiX spike in (up to 50%) of the total library

3)      Use custom primers with a random sequence to increase diversity. Alternatively, intentionally concatamerize your amplicons and fragment them to increase base diversity at the start of your reads.

The NextSeq 500, released in March of 2014, uses a two channel SBS sequencing process, likely making it even less suited for low diversity amplicons. As of 4/2014, Illumina has not performed significant validation or testing using low diversity samples on the NextSeq 500. It is not expected the NextSeq 500 instrument will perform better than the HiSeq for these sample types.

So, in conclusion, the MiSeq is currently still the best Illumina instrument for sequencing samples of low diversity: https://genohub.com/shop-by-next-gen-sequencing-technology/#query=c814746ad739c57b9a69e449d179c27c

100 Gb of Data per Day – Nextseq 500 Sequencing Services Now Available on Genohub

Nextseq 500, Genohub

Find Nextseq 500 service providers on Genohub.com

Access to the Nextseq 500, Illumina’s first high throughput desktop sequencing instrument, is now available on Genohub.com. While not the highest throughput instrument on the market, it is one of the fastest with up to a 6x increase in bases read per hour (compared to HiSeq). The instrument is ideally suited for those who need a moderate amount of sequencing data (more than a MiSeq run, less than HiSeq) in a short amount of time. We expect the highest interest to be centered around targeted sequencing (exome or custom regions) and fast RNA profiling. For exome studies, you can run between 1-12 samples in a single run and get back 4 Gb at 2 x75 or 5 Gb at a 2×100 read length. If you’re interested in RNA profiling at 10M reads per sample, you can multiplex between 12-36 samples together in a single run. A 1×75 cycle run takes as few as 11 hours to complete and 2×150 runs take ~29 hours.

You can order Nextseq 500 sequencing services today and expect to receive data back in 3-4 days ! Prices for 1 lane start at $2,250. Start your search here and use our helpful filters to narrow down your choices:  https://genohub.com/shop-by-next-gen-sequencing-technology/#query=e4c84df6f5ddd963cc48c30d3f93d505

After you’ve identified the service you need, communicate your questions directly to the service provider. We’ll make sure you get a fast response. Genohub also takes care of billing and invoicing, making domestic & international ordering a breeze. We also have an easy to use project management interface to keep communication and sample specification data in one place.

If you’re not familiar with Nextseq technology or how best this instrument can be applied to your samples, take advantage of our complementary consultation service: https://genohub.com/ngs-consultation/. We can help with your sequencing project design and make recommendation as to what sequencing service would be best suited for your experiment.

Last month we announced the availability of HiSeq X Ten services on Genohub: https://blog.genohub.com/the-1k-30x-whole-human-genome-is-now-available-for-1400/

As an efficient online market for NGS services, Genohub increases your access to the latest instrumentation and technology.  You don’t have to shell out $250K or $10M for a NextSeq or HiSeq X Ten, when access to professional services is right at your fingertips !

 

 

 

The “$1K”, 30X Whole Human Genome is now available for $1,400

HiSeq X Ten Sequencing Services now Available on Genohub

You can now order whole human genome sequencing (~30x coverage) on Genohub.com for $1,400 / sample ($1,550 with library prep). The Kinghorn Centre for Clinical Genomics is accepting orders for their HiSeq X Ten service through Genohub.com.  In fact, you can order this service today: https://genohub.com/shop-by-next-gen-sequencing-technology/#query=5a4399a2a2cab432b240d2426c708472

Designed for population scale human genome sequencing, the HiSeq X Ten when operating individually can output between 1.6-1.8 Tb on a dual flow cell in less than 3 days (600 Gb / day). When running 10 in parallel, tens of thousands of genomes can be sequenced in a single year. While currently Illumina has limited the HiSeq X Ten to human samples, we expect this will change in 2015. 

A single lane of HiSeq X Ten, gives you 750M paired end 2x 150 reads, for a total output of 112.5 Gb / lane. Kinghorn guarantees 100 Gb raw data per lane, with >75% of bases above Q30 at 2x150bp. With a haploid human genome size of 3.2 Gb, that’s equivalent to 30-35x  per lane of sequencing.  The $10 million price tag for the HiSeq X Ten means that not all institutes have access to such sequencing power. Genohub solves this problem by making it easy for researchers interested in next generation sequencing services to access all the latest sequencing technology. We also:

  1. Ensure your project with the provider goes smoothly
  2. Take care of billing and invoicing, making domestic & international ordering a breeze
  3. Have an easy to use project management interface to keep communication and information in one place
  4. Offer NGS project design and consultation
  5. Have competitive pricing and turnaround times

Start your population study on Genohub.com today !

 

 

 

Press Release: Genohub Launches Next Generation Sequencing Marketplace

You can view the original press release on our official launch at PRWeb.com. Thanks to the folks at RNA-Seq Blog for quickly picking up the story.

Genohub announced the launch of their online market for next generation sequencing services today.

Austin, TX (PRWEB) August 16, 2013

Genohub.com announced the launch of their online market for next generation sequencing services today. The online service, is positioned to completely change the way high throughput sequencing services are ordered, accelerating genomic research by improving access to sequencing services. Genohub’s intelligent sequencing matching engine instantly matches researchers with service providers based on specific project criteria. Genohub facilitates the management of sequencing projects throughout the sequencing lifecycle from selecting orderable sequencing packages, to communication, payments and delivery of data.

For Researchers

Genohub’s online service transforms the way researchers go about ordering next generation sequencing (NGS) and reinforces the critical researcher-provider communication cycle involved in every project. Genohub’s model eliminates the need for researchers to call multiple service providers to compare service details and prices. Researchers use the smart NGS matching engine to gain immediate access to up-to-date service listings from reputable providers. The transparent pricing model, with exact service prices, reduces the time needed to compare services and makes it significantly faster, more informed and more accurate than manually ordering by email or phone. Researchers using the service are also able to take advantage of one-time deals and other offers not normally available through a provider’s website or pricing sheet. Clear maximum turnaround times for each service reduces the unpredictability associated with project completion dates. Researchers using the service are able to track the status of their orders, upload data or project specific information and post messages to providers performing the work.

Genohub’s shopping interface is designed to accommodate both researchers with prior experience with the latest sequencing technology, as well as the increasing number of life science researchers who are not necessarily familiar with the latest sequencer specs or perhaps have no prior sequencing experience at all. Experienced users can search by selecting specific instruments and run types while researchers new to sequencing can shop for services by their project requirements, e.g., read number and coverage. Researchers who need help selecting the right sequencing service can also take advantage of free consultation by Genohub’s PhD trained staff.

For Service Providers

Genohub has also invested significantly on promoting NGS service providers by allowing them to advertise services and extend their reach to places where they would normally not receive orders. Providers sign up and list their services in a structured format allowing Genohub’s matching engine to automatically offer services to customers based on their experimental needs. The online service facilitates customer communication via a centralized messaging interface, which allows providers to request data, convey unforeseen handling or quality issues and relay project status to the researcher.

Genohub also automatically generates accurate quotes based on the pricing information that has to be entered by the provider only once. This significantly reduces the amount of time providers would normally have to spend on creating and communicating quotes

“While high-throughput sequencing holds enormous potential for unlocking new discoveries, the high cost and complexity of sequencing projects necessitate a professional marketplace like Genohub to improve access and facilitate collaboration between researchers and service providers across universities, companies, as well as other private and public research organizations around the globe,” said Pouya Razavi, Genohub’s CEO and co-founder.

Media Contact:
Estevan McCalley, Head of Customer Development
Genohub
512-436-0111
info@genohub.com
https://genohub.com/