Whole Genome Sequencing (WGS) vs. Whole Exome Sequencing (WES)

Gene, exon, intron, sequencing

 

“Should I choose whole genome sequencing (WGS) or whole exome sequencing (WES) for my project?” is such a frequently posed question during consultation on Genohub, we thought it would be useful to address it here. With unlimited resources and time, WGS is a clear winner as it allows you to interrogate single-nucleotide variants (SNVs), indels, structural variants (SVs) and copy number variants (CNVs) in both the ~1% part of the genome that encodes protein sequences and the ~99% of remaining non-coding sequences. WES still costs a lot less than WGS, allowing researchers to increase sample number, an important factor for large population studies. WES does however have its limitations. Below we’ve highlighted the advantages of WGS vs. WES and described a real case example of someone ordering these services using Genohub.

Advantages of Whole Genome Sequencing

  1. Allows examination of SNVs, indels, SV and CNVs in coding and non-coding regions of the genome. WES omits regulatory regions such as promoters and enhancers.
  2. WGS has more reliable sequence coverage. Differences in the hybridization efficiency of WES capture probes can result in regions of the genome with little or no coverage.
  3. Coverage uniformity with WGS is superior to WES. Regions of the genome with low sequence complexity restrict the ability to design useful WES capture baits, resulting in off target capture effects.
  4. PCR amplification isn’t required during library preparation reducing the potential of GC bias. WES frequently requires PCR amplification as the bulk input amount needed to capture is generally ~1 ug of DNA.
  5. Sequencing read length isn’t a limitation with WGS. Most target probes for exome-seq are designed to be less than 120 nt long, making it meaningless to sequence using a greater read length.
  6. A lower average read depth is required to achieve the same breath of coverage as WES.
  7. WGS doesn’t suffer from reference bias. WES capture probes tend to preferentially enrich reference alleles at heterozygous sites producing false negative SNV calls.
  8. WGS is more universal. If you’re sequencing a species other than human your choices for exome sequencing are pretty limited.

Advantages of Whole Exome Sequencing

  1. WES is targeted to protein coding regions, so reads represent less than 2% of the genome. This reduces the cost to sequence a targeted region at a high depth and reduces storage and analysis costs.
  2. Reduced costs make it feasible to increase the number of samples to be sequenced, enabling large population based comparisons.

Most functional related disease variants can be detected at a depth of between 100-120x (1) which definitely makes the cost case for exome sequencing. Today on Genohub if you want to perform whole human genome sequencing at a depth of ~35X, the cost is roughly $1700/sample. If you were to request human exome-sequencing services with 100x coverage, using a 62 Mb target region, your cost would be $550/sample. Both of these prices include library preparation. So in terms of producing data WES is still significantly cheaper than WGS. It’s important to note that this doesn’t include your data storage and analysis costs which can also be quite a bit higher with whole genome sequencing.

It’s also important to remember that depth isn’t everything. The better your uniformity of reads and breath of coverage, the higher the likelihood you’ll actually find de novo mutations and call them. And that’s the main goal, if you can’t call SNPs or INDELs with high sensitivity and accuracy, then the most high depth sequencing runs are worthless.

To conclude, whole genome sequencing typically offers better uniformity and balanced allele ratio calls. While greater exome-seq depth can match this, sufficient mapped depth or variant detection in specific regions may never reach the quality of WGS due to probe design failures or protocol shortcomings. These are important considerations when examining tissues like primary tumors where copy number changes and heterogeneity are confounding factors.

If you’re ready to start an exome project, spend a few minutes determining the coverage you’ll need for your experiment. We have an exome-seq guide with examples to help you determine the number of sequencing reads you need to achieve a certain coverage of your exome. If you’re planning to embark on whole genome sequencing, use our NGS Matching Engine which automatically calculates the amount of sequencing capacity on various platforms to meet the coverage requirements for your project.

Reference:

1) Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants

Illumina’s Latest Release: HiSeq 3000, 4000, NextSeq 550 and HiSeq X5

HiSeq 3000, HiSeq 4000, HiSeq X Five, HiSeq X Ten

Illumina’s latest instrument release essentially comes down to more data/day.  Using the same patterned flow cell technology already in use with the HiSeq X Ten, The HiSeq 3000 has an output of 750 Gb or 2.5B PE150 reads in 3.5 days. The HiSeq 4000 has two flow cells, so twice the output: 1.5 Tb, 5B PE150 reads in 3.5 days. The NextSeq 550 combines the current NextSeq 500 with a microarray scanning system that fits right into the flow cell holder. The HiSeq X Five is less exciting; just half the number of instruments as the HiSeq X Ten.

If you don’t have the $10M budget for a HiSeq X Ten, you can purchase a HiSeq X Five and scale to the X Ten at a lower price/instrument: $1M/unit

Price Price/unit $/Genome* Consumables $/Gb
HiSeq X Five $6M $1.2M $1,425 $1,200 $10.6
HiSeq X Ten $10M $1M $1,000 $800 $7

*Price per 30X human genome according to Illumina. We’re not aware of any sequencing facility currently offering 30 human genomes for $1,000. On Genohub today, you can order a single whole human genome at 35X for $1,750.

Both the HiSeq X Five and Ten are still only “licensed” for human whole genomes [Update: Since this post was published in January 2015, Illumina now allows the sequencing of other large species on the HiSeq X Ten. For an up to date status on what is and what isn’t allowed on a HiSeq X, follow our HiSeq X Guide Page]. That basically means that while they can technically be used on non-human samples or transcriptomes, Illumina wants these focused on the WGS market (probably thinking about the BGI / Complete Genomic’s WGS instrument release this year).  Plus it gives them an excuse to release patterned flowed cells on more models, hence the HiSeq 3000/4000. Interestingly, Illumina is going to start bundling the TruSeq PCR-free and TruSeq Nano library prep kits (the only chemistry currently compatible with the X Five and X Ten) with X Five/Ten cluster reagents. At least for now, they don’t intend on doing this with the HiSeq series. Other news from this release:

HiSeq 3000/4000 do not have a rapid mode, high-output only. However PE150 reads only take 3.5 days

You can’t upgrade from a HiSeq 2500 (non-patterned flow cell) to a HiSeq 3000 or 4000 (patterned flow cells)

You can upgrade from the single flow cell HiSeq 3000 to dual flow cell HiSeq 4000

HiSeq 3000 yields >200 Gb/day, a 28% increase vs. HiSeq 2500 v4, yet the cost to purchase a HiSeq 3000 is the same as a HiSeq 2500. With two flow cells, the HiSeq 4000 yields twice as much data.

Sequencing Applications and Turnaround Time 

Exomes Transcriptomes 30X Genomes
HiSeq 3000 90 (2×75, <2 days) 50 (2×75, <2 days) 6 (2×150, 3.5 days)
HiSeq 4000 180 (2×75, <2 days) 100 (2×75, <2 days) 12 (2×150, 3.5 days)

So in the end, assuming sequencing facilities aren’t fed up with this break neck upgrade cycle and actually purchase these instruments, researchers can expect more data with faster turnaround times. We’ve already spoken to a few of our service providers who are considering upgrades to their HiSeq 1500/2000 instruments. As soon as these new instruments are available on Genohub, we’ll make an announcement [Update: they are all available, use our NGS Matching Engine for access to the latest Illumina instruments]. If you’d like to be the first to know send us an email at support@genohub.com. In the meantime, our providers offer services on the HiSeq 2500 v4, HiSeq X Ten, NextSeq 500 and HiSeq instruments (amongst many others).  You can order these services immediately and expect data delivery within the listed guaranteed turn around times. If you’re not sure what technology / instrument is right for you, just enter the number of reads or coverage you need and let our NGS Matching Engine identify the best service for you.  So what’s next? A little bird has told us patterned flow cells on the MiSeq !

TCR-Repertoire Sequencing Services

TCR sequencing

The immune repertoire reflects the sum total of diverse B and T-cells in the circulatory system. The adaptive immune system drives immune response by these hypervariable molecules. The antigen specificity of each T-cell receptor (TCR) is determined by the complementarity-determining region: CDR3 of the beta receptor chain, formed by V, D and J gene regions. Examination of TCR diversity is important for understanding adaptive immunity and it’s function in diseases.  Next generation sequencing has become a powerful tool for measuring TCR diversity. Before samples can be sequenced a unique library preparation method must be performed to allow for reproducible and reliable results.

Girihlet, a newly formed biotech company in Brooklyn, NY is one of the first companies to offer TCR repertoire sequencing services and is the first to offer it on Genohub.com. We got in touch with Girihlet to learn more about this service offering and have posted our conversation with one of it’s co-founders, Dr. Ravi Sachidanandam. Ravi also holds a position as Assistant Professor on the faculty of  the Icahn School of Medicine at Mount Sinai, department of Oncological Sciences. He has published over 85 papers in the latest and most interesting areas of genomics, including small RNA, mRNA splicing, methylation and virology.

Genohub: Hi Ravi, we’re excited that you’ve joined Genohub.com and listed your services. We’re particularly interested in the TCR-repertoire sequencing services you have on Genohub.com. Not many service providers currently offer this service, how come?

Ravi: There are very few companies that offer this currently, and this is mostly because it’s a very challenging problem both experimentally and computationally. It may be easier to count all the dollar bills in circulation than to profile the diversity of the T Cell Receptors.

Genohub: Can you comment briefly on the ‘library prep’ approach to TCR profiling?

Ravi: Our library prep method is very unique, it is based on quantifying RNA, and in particular just the CDR3 regions while most of the other companies quantify DNA. This allows us to only quantify functional rearranged TCR locus. We also use universal primers for amplification and do not depend on previously known TCR regions, thereby accelerating discovery.  We have also compared our data to flow results and demonstrated good concordance.

Genohub: Inefficiencies during library prep and and sequencing can lead to severe bias generating artificial TCR diversity. Does your approach address this?

Ravi: The beauty of our approach is we use common primers to amplify the T cell receptor regions.  This ensures there is no bias during PCR, allowing for accurate sequencing. And since the accuracy and enrichment for the TCR mRNAs is >98%, we need very little total RNA and less sequencing depth reducing the overall cost of sequencing.

Genohub: How many sequencing reads or TCR sequences do you recommend for a single human sample? Our readers can use your recommendation directly on our project search page: https://genohub.com/shop-by-next-gen-sequencing-project/.

Ravi: Currently 10 million sequences of 150bp PE reads is enough to accurately and quantitatively capture most of the TCR diversity

Genohub: How do you handle under-expression?

Ravi: We keep track of low -expressed TCR transcripts as they are needed to understand the statistics of the distribution of the TCR repertoire. We provide these to the researchers, in case they might need to look for rare transcripts.

Genohub: Why is diversity of the immune repertoire important for health?

Ravi: The diversity is the key to the effectiveness of the TCR-repertoire.  The diversity reflects the ability of the immune system to fight infections.  

Genohub: Any comments on its use for vaccine development, autoimmune study, biomarker detection?

Ravi: We believe the TCR sequence can be easily monitored over time, thereby serving as a powerful biomarker to study the effects of vaccination, to determine if the vaccination was effective. It will also be useful in understanding the underlying cause of autoimmune reactions.

Genohub: Thanks for taking the time to discuss this exciting new method. Is there anything that you’d like to add?

Ravi: Girihlet is very excited to take this approach to the rest of scientific community and make a significant difference on how the TCR is sequenced currently and eventually have an impact on the practice of “precision medicine”. 

Genohub Projects Now Support Multiple Collaborators

Most researchers using Genohub typically work in a team with other investigators and administrators. However, so far every Genohub user has been able to view and manage only the projects they directly started on Genohub.

We’re pleased to announce that effective immediately, you can add one or more collaborators to any of your Genohub projects, all the way from the project request stage until after your project is complete and the results are ready.

By quickly adding a collaborator to your project you can allow another member of your team to view the quotes and detailed project information. You may also give them permission to manage the project (e.g. post messages, accept quotes, attach files, etc.).

In addition to setting per-member permissions, you can also choose whether each member receives email notifications when there is activity on your project.

To add a collaborator, all you need is their email address:

Genohub Collaboration Tool

For instance you may want to share the instant Genohub quote on a particular service with the primary investigator on your team in order to get their approval. You may also want to give someone at your purchasing department access to the detailed pricing information on your project. Or you may want a colleague to manage the project and handle the communication with the service provider while you’re on vacation. This can all be done by simply adding these individuals as collaborators.

Please give it a try and feel free to reach out to us at support@genohub.com if you have any questions or feedback.

 

Genohub Opens Access to Latest Project Management Tool

We’re pleased to announce the launch of a new project management tool called PIP (Provider Initiated Projects) on Genohub! Up to now, Genohub has been a successful marketplace for connecting researchers with next-generation sequencing service providers. Service providers on Genohub have used Genohub’s project management tools to manage hundreds of incoming researcher requests. We’re instituting two new BIG changes:

  1. We’ve opened up our project management tools to all service providers and CROs. These tools are no longer limited to providers offering sequencing services. If you offer any type of scientific service, you can now use Genohub software to initiate projects, write quotes and manage back and forth project communication at no charge.
  2. In the past, a researcher would have to inquire about your services before you could start a project. Providers can now start projects and quotes for researchers anywhere in the world.

Here are examples of how this service could be useful to you:

Example 1 

You’re a service provider who handles internal service projects from researchers within your University. You’re tired of using email threads to manage these projects and are looking for a way to send quotes, have researchers upload project specs and have all back and forth communication saved as part of a unique project. Genohub’s PIP feature handles all of that and is available free of charge! 

Example 2

A new researcher or someone you’ve worked with in the past contacts you to start a service project. Using PIP you can initiate a project, write a quote and manage communication with anyone in the world. We’ll only charge you if you elect Genohub to handle invoicing and billing, otherwise it’s free. 

 

To get started, use the blue ‘Start New Project and Create Quote’ button on your Project Dashboard to initiate a project. 

 

Genohub Project Dashboard Continue reading

Sequencing Suggests the Ebola Virus Genome is Changing

Genome of the Ebola Virus is Changing Rapidly

Using high throughput sequencing, researchers from MIT, Harvard and the Sierra Leone Ministry of Health and Sanitation have recently reported rapid changes in the Ebola’s genetic code. The Ebola virus genome, a single stranded RNA comprised of ~19,000 nucleotides encodes several structural proteins: RNA Polymerase,  nucleoprotein, polymerase co-factors and transcription activators. The researchers used Illumina HiSeq 2500 platforms to achieve 2000x coverage of the Ebola genome. Using Genohub, we estimate the cost to sequence 100 such genomes at 2000x to be under $1,500: https://genohub.com/shop-by-next-gen-sequencing-project/#query=0929767cd66b8ec8a9fb209c99d75b27.

Sequencing 99 Ebola genomes from 78 patients, they found greater than 300 genetic changes that make the genomes sequenced from the current outbreak distinct from previous outbreaks. In fact, they found that the substitution rate was twice as high with this year’s outbreak compared to all other Ebola virus outbreaks. They also determined that mutations during this year’s outbreak were frequently nonsynonymous (mutation that alters the amino acid sequence of a protein). 50 mutational events and 29 new viral lineages were observed in this outbreak alone, suggesting potential for viral adaptation. To determine whether Ebola could be evolving away from defenses against it or whether it could become more contagious and spread faster, will require functional analysis. For their part, Gire et al., have published the full length Ebola genomes in the NCBI database.  Tragically, the authors note that 5 co-authors died from the disease before the manuscript could be published. Last week The New Yorker, published The Ebola Wars, an excellent in depth story of the work involved to actually sequence the Ebola genome and track its mutations.

While basic PCR tests are sufficient for giving you a yes/no answer about infection, this new study highlights the important role of sequencing in characterizing patterns of viral transmission and mutations in an epidemic. We expect sequencing to play a greater role in development of diagnostics and treatments for this and other viral outbreaks.

 

 

 

 

Illumina HiSeq v4 Sequencing Services Yielding 1Tb Data / Week Now Available on Genohub

Read output per Illumina Lane

Latest Illumina Chemistry – Reads/Lane

In line with our efforts to democratize the latest high throughput sequencing technology, we’re pleased to announce the availability of HiSeq sequencing services with v4 chemistry. Any researcher, anywhere in the world can now order this sequencing service in a matter of minutes using Genohub.com.

The new Illumina HiSeq version 4 chemistry allows for sequencing runs with 25% greater read length (2×125 for high-output runs) and 33% more clusters.  Running two full flow cells, users can expect to generate up to 1Tb of data per week, or 167 Gb per day. At an output of 250M reads per lane, users will need at least 2 lanes to achieve ~35x coverage of the human genome. While this isn’t the most efficient tech for sequencing whole human genomes (HiSeq X Ten only requires a single lane), it certainly is for exome, transcriptome and re-sequencing applications.  Earlier this year we announced the availability of NextSeq 500 and HiSeq X Ten services on Genohub.com.

The new HiSeq v4 chemistry not only improves output, but reduces the time it takes for a sequencing run to complete. With run times taking only 6 days, we expect several of our HiSeq Rapid Run users to begin switching over to take advantage of outputs of 250M reads / lane in one week instead of 150M reads / lane in a single day. As we point this out, it’s important to note that HiSeq Rapid read lengths have also increased to 2×250. 

If you’re not sure exactly what platform, chemistry or read length is the most efficient for your application, use our Shop by Project page and enter the numbers of reads or coverage you need. We’ll display all your options!