New Short, Long and High Throughput Sequencing Reads in 2016

 

Nanopore sequencing

Nanopore sequencing

An exciting wave of newly released DNA sequencing instruments and technology will soon be available to researchers. From DNA sequencers the size of a cell phone to platforms that turn short reads into long-range information, these new sequencing technologies will be available on Genohub as services that can be ordered. Below is a summary of the technology you can expect in Q1 of 2016:

10X Genomics GemCode Platform

The GemCode platform from 10X Genomics partitions long DNA fragments up to 100 kb with a pool of ~750K molecular barcodes, indexing the genome during library construction. Barcoded DNA fragments are made such that all fragments share the same barcode. After several cycling and pooling steps, >100K barcode containing partitions are created. GemCode software then maps short Illumina read pairs back to the original long DNA molecules using the barcodes added during library preparation. With long range information, haplotype phasing and improved structural variant detection are possible. Gene fusions, deletions and duplications can be detected from exome data.

Ion Torrent S5, S5 XL

The S5 system was developed by Ion to focus on the clinical amplicon-seq market. While the wait for delivery of Proton PII chips continues, Ion delivered a machine with chip configurations very much similar to past PGM and Proton chips. 520/530 chips offer 200-400 bp runs with 80M reads and 2-4 hour run times. Using Ion’s fixed amplicon panels, data analysis can be completed within 5 hours. The Ion chef is required to reduce hands on library prep time, otherwise libraries and chip loading needs to be performed manually. Ion looks to have positioned their platform toward clinical applications. With stiff competition from Illumina and their inability to deliver similar read lengths and throughput, this is a smart decision by Ion. Focusing their platform on a particular application likely means future development (longer and higher throughput reads) has been paused indefinitely.

Pacific Biosciences Sequel System

Announced in September 2015, the Sequel System uses the same Single Molecule, Real Time (SMRT) technology as the RSII,   but boasts several tech advancements. At around one third the cost of a RSII, the Sequel offers 7x more reads with 1M zero –mode waveguides (ZMWs) per SMRT cell versus the previous standard of 150K. The application of Iso-Seq or full length transcript sequencing is especially promising as 1M reads crosses into the threshold where discovery and quantitation of transcripts becomes interesting. By providing full length transcript isoforms, it’s no longer necessary to reconstruct transcripts or infer isoforms based on short read information. Of course, the Sequel is ideal for generating whole genome de novo assemblies. We’l follow how the Oxford Nanopore’s ONT MinIon competes with the Sequel system in 2016.

Oxford Nanopore’s (ONT) MinIon

In 2014, Oxford Nanopore started it’s MinIon Access Program (MAP) delivering over 1,000 MinIons to users who wanted to test the technology. These users have gone on to publish whole E. Coli and Yeast genome assemblies. Accuracy of the device is up to 85% per raw base and there are difficulties in dealing with high G+C content sequences. There remains a lot of work left to improve the technology before widespread adoption. The workflow is simple and uses typical library construction steps of end-repair and ligation. Once the sample is added to the flow cell, users can generate long reads >100 kb and can analyze data in real time. Median reads are currently in the 1-2 kb length. Combined alongside with MiSeq reads, publications have shown MinIon output can enhance contiguity of de novo assembly. Lower error rates generated by Two Direction reads produced with recent updated MinIon chemistry does give cause for optimism that greatly reduced error rates can be achieved in the near future. This along with a low unit cost and the ability to deploy the USB sized device in the field make this a very exciting technology.

Illumina HiSeq X

While HiSeq X services have been available on Genohub for over a year, Illumina’s announcement of its expansion to non-human whole genomes was well received. However there are still several unanswered questions. Illumina states,

The updated rights of use will allow for market expansion and population-scale sequencing of non-human species in a variety of markets, including plants and livestock in agricultural research and model organisms in pharmaceutical research. Previously, it has been cost prohibitive to sequence non-human genomes at high coverage.

You can now sequence mouse, rat and other relatively large sized genomes economically on the HiSeq X. This makes the most sense for high coverage applications, e.g. 30x or above. While smaller sized and medium sized genomes can be sequenced on a HiSeq X, the low level of barcoding and high coverage you’d obtain makes these applications less attractive. According to Illumina, as of 12/20/2015, metagenomic whole genome sequencing was not a compatible application on the HiSeq X. The instrument is still restricted to WGS only. RNA-Seq, Exome-seq and ChIP-Seq applications will have to wait. Perhaps by the time the HiSeq X One is released access will be opened to these non-WGS applications.

While these new instruments make their way onto Genohub’s Shop by Project page, you can make inquiries and even order services by placing a request on our consultation page.

AGBT 2015 Summary of Day 3

Advances in Genome Biology and Technology Conference 2015

Day 3 of the Advances in Genome Biology and Technology meeting in Marco Island began with an announcement that next year the meeting would be held in Orlando due to hotel renovations, eliciting a groan from the audience. The meeting will come back to Marco Island in 2017.

Today’s plenary session speakers all presented work with a clinical focus, acknowledgement by the conference organizers about the direction of genome sequencing. The first speaker, Gail Jarvik, head of medical genetics at the University of Washington Medical Center presented on lessons learned from the Clinical Sequencing Exploratory Research (CSER) Consortium, marketed as ‘Hail CSER’. CSER is a national consortium of projects aimed at sharing innovations and best practices in the integration of genomic sequencing into clinical care. CSER has established a list of 112 actionable genes, some overlapping with the American College of Medical Genetics (ACMG) list. The CSER group annotated pathogenic and novel variants of the Exome Variant Server (EVS) to estimate rates in individuals of European and African ancestry.

The next talk was by Euan Ashley on moving toward clinical grade whole genome sequencing. He started by describing the genome as complex, full of repeats, duplications and paralogous sequences, giving him ‘a cold sweat at night’. He gave an example of a study with 12 adult participants who underwent WGS and described how clinical grade sequencing demands consistency in reporting. Most variants annotated as pathogenic were downgraded after manual review, but this takes lots of time. 12 individuals with 1,000 variants took around 1 hour per variant. In this case the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detectable genetic variation and uncertainty about clinically reportable findings. He commented that new algorithms would be needed to address these problems and that ‘we’re at the beginning of genomics medicine’. Parts of his talk can be seen in his presentation at PMWC last month.

The last presentation before the break was by Levi Garraway who discussed the goal of cancer precision medicine to develop new therapeutics and combinations against molecular defined tumors. He mentioned that there are many discovery opportunities in clinical cancer genomics especially in terms of response and resistance to new therapies. Garraway sequenced the genomes of 57 prostate tumors and matched normal tissues to study somatic alterations. His model suggests that chormoplexy induces considerable genomic derangement over a relatively few number of events in prostate cancer supporting a model of punctuated cancer evolution. He introduced a 10X Genomics approach for phasing of large – 100 kb regions with exonic baits to obtain rearrangement information for chromoplexy. In the end he emphasized the importance of RNA-Seq profiling in conjunction with DNA sequencing for translational medicine to be relevant.

After the break, Stephen Kingsmore gave a presentation on rapid genome sequencing for genetic disease diagnostics in neonatal intensive care units. Kingsmore began the talk by describing how newborn screening (NBS) and early diagnosis reduces morbidity and mortality. NGS of 60 genetic diseases identifies ~5,000 affected newborns each year. He described how rapid genome sequencing (RGS) has the potential to improve NBS to most genetic diseases in newborns admitted to level II-IV NICUs. He mentioned a ‘ultra rapid’ sequencing pipeline he developed along with Illumina that takes 28 hours to go from sample to variant annotation (not publically available). He also discussed NSIGHT, a consortium for newborn sequencing sponsored by the NIH to understand the role of genome sequencing. More details can be found on the NHGRI page.

The last two plenary talks were by Christian Matranga and Malachi Griffith. Matranga described the clinical sequencing of viral genomes as important to understanding the evolution and transmission of the pathogen and the ability to inform on surveillance and therapeutic development. They developed a sequencing approach that combines RNAse H based depletion of rRNA with random primed cDNA RNA-seq to detect and assemble genomes from divergent lineages. They sequenced ~300 Lassa (LASV) and ~100 Ebola (EBOV) genomes. We describe some of their efforts in an earlier post called, Sequencing Suggests the Ebola Virus Genome is Changing. Be sure to read the New Yorker reference, it’s compelling!

Griffith’s talk was on optimizing genome sequencing and analysis. He makes the point that while most tumors are sequenced by exome sequencing at 75-100x mean coverage or by whole genome sequencing (WGS) to 30-50x mean coverage, detection of low frequency mutations require greater depth. He performed deep sequencing of an acute myeloid leukemia (AML) by WGS up to 350X, whole exome to 300X and using a capture panel of ~260 recurrently mutated AML genes to ~10,000x coverage. He found that deeper sequencing revealed more driver variants and improved the assignment of variants to clonal clusters. Checkout his animation of WGS depth down-sampling.

After lunch began the ‘Bronze sponsor workshops’, essentially the talks you pay >$40K to give. The most interesting was the last by 10X Genomics, mainly because as @bioinformer put it, “10X Genomics is the new princess of the AGBT ball”. First, check out the video that received a round of applause from the AGBT crowd: Changing the Definition of Sequencing. They announced their instrument would be available in Q2 this year, cost ~$75K and $500 / sample. This brings the question whether 10X Genomic’s microfluidic platform offers greater potential than Molecule. What are the implications for Illumina or PacBio? To learn more check out Keith Robison’s insightful post detailing all there is currently known about 10X Genomics.

After dinner began concurrent sessions on technology, genomic medicine and transcriptomics. Hopefully someone else will post details about the genomic medicine and transcriptomics sessions. The technology session began with Iain Macaulay describing G&T-seq, separation and parallel sequencing of genomes and transcriptomes of single cells. This was the first talk this year at AGBT with an embargo, no tweets were allowed. So rather than go into details, we did find this lecture online. The next talk was by Alexandre Melnikov on MITE-Seq, an approach to site directed mutagenesis referred to as Mutagenesis by Integrated TiLEs. MITE facilitates structure-function studies of proteins at higher resolution than typical site directed approaches. To read more check out their paper published last year in Nucleic Acid Research. Andrea Kohn, then described single-cell methylome profiling of Aplysia neurons. Using methyl-dip and bisulfite sequencing she achieved >20x coverage for each neuron and then added RNA-seq providing the first methylome and transcriptome from a single neuron. Next up was Sara Goodwin who gave an in depth analysis of the Oxford MinION Device for de novo and cDNA sequencing. She sequenced the yeast strain W303 to over 120x coverage and was able to achieve up to 80% aligned reads. She mentioned that identifying the right aligner was still a work in progress but overall found promise in the technology for long read sequencing, de novo assembly and splice site id.

Tomorrow’s plenary talks are the second installment of genomics, ‘Genomics II’ with presentations by Michael Fischback, Rob Knight, Chris Mason, and Gene Myers, excellent lineup to close the final day of AGBT. Checkout our earlier posts if you’ve missed day 1 or day 2

AGBT 2015 – Summary of Day 1

AGBT 2015

Highlights of Day 1 at AGBT

‘Welcome to paradise’, first words by Rick Wilson kicking off the annual Advances in Genome Biology and Technology (AGBT) meeting.  The plenary session began with David Goldstein from Columbia University presenting, “Toward Precision Medicine in Neurological Disease”.  David’s talk began with a discussion of clinical sequencing for neurological diseases, specifically large scale gene discoveries in epileptic encephalopathies. In epilepsy, 12% of patients are ‘genetically explained’ by a casual de novo mutation, which allows for the application of precision medicine. He discussed how a K+ channel plays a key role in at least two different epilepsies and how Quinidine which has never been used for epilepsy, was being used as a targeted treatment. He cautioned that in the literature there are too may correlation studies that don’t really amount to much and as we use genetics to target diseases, it’s critical to perform proper genetics driven precision medicine and not put patients on wrong treatment plans. To better characterize the effects of mutations, he emphasized the need for solid model systems. He also mentioned that he believes truly complex diseases can be tackled with enough patients, numbers matter. To illustrate his point, he described the sequencing of over 3,000 ALS patients to get a clear picture of what genes/proteins have therapeutic importance. At the end of his talk he was asked the old whole genome sequencing (WGS) vs. whole exome sequencing (WES) question and replied that WES was sufficient, as WGS added little due to lack of interpretability. This touched off some debate in the audience and Twitter with regards to Exome-seq and WGS. Highlighted are the advantages of each approach here: https://blog.genohub.com/whole-genome-sequencing-wgs-vs-whole-exome-sequencing-wes/.

The second talk in the plenary session was by Richard Lifton, from Yale and it was titled, “Genes, Genomes and the Future of Medicine”. Richard cautioned the audience, describing the rush to sequence whole genomes as more industry driven than good science, essentially reiterating the point that WGS is hard to interpret. This began a side discussion on Twitter about those who agree and disagree with this sentiment. Most notably, Gholson Lyon referenced two recent papers that demonstrated new ways to make processing of WGS data easier: and that the accuracy of INDEL detection was greater in WGS compared to WES, even in targeted regions. On the cost front, WGS at 35x coverage currently costs $1,750: https://genohub.com/shop-by-next-gen-sequencing-project/#query=ef95a222ca23fc310eedf6de661e4b22 or $3,500 for 70X coverage, while whole exome sequencing costs at 100x coverage are around $1,314: https://genohub.com/shop-by-next-gen-sequencing-project/#query=0d4231a1d12425085f4e284373605acd.  Richard remarked at the end of his talk that not much had been found in non-coding regions, several in the audience challenged him on this assessment.

The third talk in the plenary session was by Yaniv Erlich titled, “Dissecting the Genetic Architecture of Longevity Using Massive-Scale Crowd Sourced Genealogy”. We’ve had the pleasure to hear Yaniv give several lectures in the past, all have been engaging, this was no different. His talk was on using social media to dissect the genetic architecture of complex traits, specifically whether they work independently (additive) or together (epistatic). Predictions of epistasis suggest an exponential increase with added risk alleles. He used geni.com to dissect complex traits in large family trees and validated publicly submitted trees using genetic markers. He encoded birthplace as GPS coordinates using Yahoo Maps and showed migration from the Middle Ages through the early 20th Century. The video he played was amazing, check it out: https://www.youtube.com/watch?v=fNY_oZaH3Yo#t=19. His take home message was that longevity is an additive trait, which is good for personalized medicine. The Geni data he described is open to the public for use.

The fourth and final talk of the night was by Steven McCarroll titled, “A Common Pre-Malignant State, Detectable by Sequencing Blood DNA”. He started by posing the questions, what happens in the years before a disease becomes apparent; cancer genomes are usually studied when there are enough mutations to drive malignancy, do they happen in a particular order? He examined 12,000 exomes for somatic variants at low allele frequency and uncovered 3,111 mutations. Blood derived schizophrenia clustered in four genes: DNMT3A, TET2, ASXL1 and PPM1D, all disruptive. He postulates that driver mutations give cells an advantage, over several years clonal progeny takes over creating pre-cancerous cells. Therefore clonal hematopoiesis with somatic mutations can be readily detected by DNA sequencing and should become more common as we age. Patients with clonal mutations have a 12 fold higher rate of blood cancer, meaning there is a window for early detection, possibly 3 years. This work was recently published in the New England Journal of Medicine: Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence.

Today’s sessions were impressive and set expectations high for tomorrow’s talks. McCarroll’s last comment nicely captured this sentiment, setting the tone for the rest of the meeting, “Medicine thinks of health and illness, there is a lot in between that is ascertainable via genome sequencing”. 

Whole Genome Sequencing (WGS) vs. Whole Exome Sequencing (WES)

Gene, exon, intron, sequencing

 

“Should I choose whole genome sequencing (WGS) or whole exome sequencing (WES) for my project?” is such a frequently posed question during consultation on Genohub, we thought it would be useful to address it here. With unlimited resources and time, WGS is a clear winner as it allows you to interrogate single-nucleotide variants (SNVs), indels, structural variants (SVs) and copy number variants (CNVs) in both the ~1% part of the genome that encodes protein sequences and the ~99% of remaining non-coding sequences. WES still costs a lot less than WGS, allowing researchers to increase sample number, an important factor for large population studies. WES does however have its limitations. Below we’ve highlighted the advantages of WGS vs. WES and described a real case example of someone ordering these services using Genohub.

Advantages of Whole Genome Sequencing

  1. Allows examination of SNVs, indels, SV and CNVs in coding and non-coding regions of the genome. WES omits regulatory regions such as promoters and enhancers.
  2. WGS has more reliable sequence coverage. Differences in the hybridization efficiency of WES capture probes can result in regions of the genome with little or no coverage.
  3. Coverage uniformity with WGS is superior to WES. Regions of the genome with low sequence complexity restrict the ability to design useful WES capture baits, resulting in off target capture effects.
  4. PCR amplification isn’t required during library preparation reducing the potential of GC bias. WES frequently requires PCR amplification as the bulk input amount needed to capture is generally ~1 ug of DNA.
  5. Sequencing read length isn’t a limitation with WGS. Most target probes for exome-seq are designed to be less than 120 nt long, making it meaningless to sequence using a greater read length.
  6. A lower average read depth is required to achieve the same breath of coverage as WES.
  7. WGS doesn’t suffer from reference bias. WES capture probes tend to preferentially enrich reference alleles at heterozygous sites producing false negative SNV calls.
  8. WGS is more universal. If you’re sequencing a species other than human your choices for exome sequencing are pretty limited.

Advantages of Whole Exome Sequencing

  1. WES is targeted to protein coding regions, so reads represent less than 2% of the genome. This reduces the cost to sequence a targeted region at a high depth and reduces storage and analysis costs.
  2. Reduced costs make it feasible to increase the number of samples to be sequenced, enabling large population based comparisons.

Most functional related disease variants can be detected at a depth of between 100-120x (1) which definitely makes the cost case for exome sequencing. Today on Genohub if you want to perform whole human genome sequencing at a depth of ~35X, the cost is roughly $1700/sample. If you were to request human exome-sequencing services with 100x coverage, using a 62 Mb target region, your cost would be $550/sample. Both of these prices include library preparation. So in terms of producing data WES is still significantly cheaper than WGS. It’s important to note that this doesn’t include your data storage and analysis costs which can also be quite a bit higher with whole genome sequencing.

It’s also important to remember that depth isn’t everything. The better your uniformity of reads and breath of coverage, the higher the likelihood you’ll actually find de novo mutations and call them. And that’s the main goal, if you can’t call SNPs or INDELs with high sensitivity and accuracy, then the most high depth sequencing runs are worthless.

To conclude, whole genome sequencing typically offers better uniformity and balanced allele ratio calls. While greater exome-seq depth can match this, sufficient mapped depth or variant detection in specific regions may never reach the quality of WGS due to probe design failures or protocol shortcomings. These are important considerations when examining tissues like primary tumors where copy number changes and heterogeneity are confounding factors.

If you’re ready to start an exome project, spend a few minutes determining the coverage you’ll need for your experiment. We have an exome-seq guide with examples to help you determine the number of sequencing reads you need to achieve a certain coverage of your exome. If you’re planning to embark on whole genome sequencing, use our NGS Matching Engine which automatically calculates the amount of sequencing capacity on various platforms to meet the coverage requirements for your project.

Reference:

1) Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants

Illumina’s Latest Release: HiSeq 3000, 4000, NextSeq 550 and HiSeq X5

HiSeq 3000, HiSeq 4000, HiSeq X Five, HiSeq X Ten

Illumina’s latest instrument release essentially comes down to more data/day.  Using the same patterned flow cell technology already in use with the HiSeq X Ten, The HiSeq 3000 has an output of 750 Gb or 2.5B PE150 reads in 3.5 days. The HiSeq 4000 has two flow cells, so twice the output: 1.5 Tb, 5B PE150 reads in 3.5 days. The NextSeq 550 combines the current NextSeq 500 with a microarray scanning system that fits right into the flow cell holder. The HiSeq X Five is less exciting; just half the number of instruments as the HiSeq X Ten.

If you don’t have the $10M budget for a HiSeq X Ten, you can purchase a HiSeq X Five and scale to the X Ten at a lower price/instrument: $1M/unit

Price Price/unit $/Genome* Consumables $/Gb
HiSeq X Five $6M $1.2M $1,425 $1,200 $10.6
HiSeq X Ten $10M $1M $1,000 $800 $7

*Price per 30X human genome according to Illumina. We’re not aware of any sequencing facility currently offering 30 human genomes for $1,000. On Genohub today, you can order a single whole human genome at 35X for $1,750.

Both the HiSeq X Five and Ten are still only “licensed” for human whole genomes [Update: Since this post was published in January 2015, Illumina now allows the sequencing of other large species on the HiSeq X Ten. For an up to date status on what is and what isn’t allowed on a HiSeq X, follow our HiSeq X Guide Page]. That basically means that while they can technically be used on non-human samples or transcriptomes, Illumina wants these focused on the WGS market (probably thinking about the BGI / Complete Genomic’s WGS instrument release this year).  Plus it gives them an excuse to release patterned flowed cells on more models, hence the HiSeq 3000/4000. Interestingly, Illumina is going to start bundling the TruSeq PCR-free and TruSeq Nano library prep kits (the only chemistry currently compatible with the X Five and X Ten) with X Five/Ten cluster reagents. At least for now, they don’t intend on doing this with the HiSeq series. Other news from this release:

HiSeq 3000/4000 do not have a rapid mode, high-output only. However PE150 reads only take 3.5 days

You can’t upgrade from a HiSeq 2500 (non-patterned flow cell) to a HiSeq 3000 or 4000 (patterned flow cells)

You can upgrade from the single flow cell HiSeq 3000 to dual flow cell HiSeq 4000

HiSeq 3000 yields >200 Gb/day, a 28% increase vs. HiSeq 2500 v4, yet the cost to purchase a HiSeq 3000 is the same as a HiSeq 2500. With two flow cells, the HiSeq 4000 yields twice as much data.

Sequencing Applications and Turnaround Time 

Exomes Transcriptomes 30X Genomes
HiSeq 3000 90 (2×75, <2 days) 50 (2×75, <2 days) 6 (2×150, 3.5 days)
HiSeq 4000 180 (2×75, <2 days) 100 (2×75, <2 days) 12 (2×150, 3.5 days)

So in the end, assuming sequencing facilities aren’t fed up with this break neck upgrade cycle and actually purchase these instruments, researchers can expect more data with faster turnaround times. We’ve already spoken to a few of our service providers who are considering upgrades to their HiSeq 1500/2000 instruments. As soon as these new instruments are available on Genohub, we’ll make an announcement [Update: they are all available, use our NGS Matching Engine for access to the latest Illumina instruments]. If you’d like to be the first to know send us an email at support@genohub.com. In the meantime, our providers offer services on the HiSeq 2500 v4, HiSeq X Ten, NextSeq 500 and HiSeq instruments (amongst many others).  You can order these services immediately and expect data delivery within the listed guaranteed turn around times. If you’re not sure what technology / instrument is right for you, just enter the number of reads or coverage you need and let our NGS Matching Engine identify the best service for you.  So what’s next? A little bird has told us patterned flow cells on the MiSeq !

Genohub Projects Now Support Multiple Collaborators

Most researchers using Genohub typically work in a team with other investigators and administrators. However, so far every Genohub user has been able to view and manage only the projects they directly started on Genohub.

We’re pleased to announce that effective immediately, you can add one or more collaborators to any of your Genohub projects, all the way from the project request stage until after your project is complete and the results are ready.

By quickly adding a collaborator to your project you can allow another member of your team to view the quotes and detailed project information. You may also give them permission to manage the project (e.g. post messages, accept quotes, attach files, etc.).

In addition to setting per-member permissions, you can also choose whether each member receives email notifications when there is activity on your project.

To add a collaborator, all you need is their email address:

Genohub Collaboration Tool

For instance you may want to share the instant Genohub quote on a particular service with the primary investigator on your team in order to get their approval. You may also want to give someone at your purchasing department access to the detailed pricing information on your project. Or you may want a colleague to manage the project and handle the communication with the service provider while you’re on vacation. This can all be done by simply adding these individuals as collaborators.

Please give it a try and feel free to reach out to us at support@genohub.com if you have any questions or feedback.

 

Genohub Opens Access to Latest Project Management Tool

We’re pleased to announce the launch of a new project management tool called PIP (Provider Initiated Projects) on Genohub! Up to now, Genohub has been a successful marketplace for connecting researchers with next-generation sequencing service providers. Service providers on Genohub have used Genohub’s project management tools to manage hundreds of incoming researcher requests. We’re instituting two new BIG changes:

  1. We’ve opened up our project management tools to all service providers and CROs. These tools are no longer limited to providers offering sequencing services. If you offer any type of scientific service, you can now use Genohub software to initiate projects, write quotes and manage back and forth project communication at no charge.
  2. In the past, a researcher would have to inquire about your services before you could start a project. Providers can now start projects and quotes for researchers anywhere in the world.

Here are examples of how this service could be useful to you:

Example 1 

You’re a service provider who handles internal service projects from researchers within your University. You’re tired of using email threads to manage these projects and are looking for a way to send quotes, have researchers upload project specs and have all back and forth communication saved as part of a unique project. Genohub’s PIP feature handles all of that and is available free of charge! 

Example 2

A new researcher or someone you’ve worked with in the past contacts you to start a service project. Using PIP you can initiate a project, write a quote and manage communication with anyone in the world. We’ll only charge you if you elect Genohub to handle invoicing and billing, otherwise it’s free. 

 

To get started, use the blue ‘Start New Project and Create Quote’ button on your Project Dashboard to initiate a project. 

 

Genohub Project Dashboard Continue reading

Beginner’s Guide to Exome Sequencing

Exome Capture Kit Comparison

With decreasing costs to sequence whole human genomes (currently $1,550 for 35X coverage), we frequently hear researchers ask, “Why should I only sequence protein coding genes” ?

First, WGS of entire populations is still quite expensive. These types of projects are currently only being performed by large centers or government entities, like Genomics England, a company owned by UK’s Department of Health, which announced that they would sequence 100,000 whole genomes by 2017. At Genohub’s rate of $1,550/genome, 100,000 genomes would cost $155 million USD. This $155 million figure only includes sequencing costs and does not take into account labor, data storage and analysis which is likely several fold greater. 

Second, the exome, or all ~180,000 exons comprise less than 2% of all sequence in the human genome, but contain 85-90% of all known disease causing variants. A more focused dataset makes interpretation and analysis a lot easier.

Let’s assume you’ve decided to proceed with exome sequencing. The next step is to either find a service provider to perform your exome capture, sequencing and analysis or do it yourself. Genohub has made it easy to find and directly order sequencing services from providers around the world. Several of our providers offer exome library prep and sequencing services. If you’re only looking for someone to help with your data analysis, you can contact one of our providers offering exome bioinformatics services. Whether you decide to send your samples to a provider or make libraries yourself, you’ll need to decide on what capture technology to use, the number of reads you’ll need and what type of read length is most appropriate for your exome-seq project.

There are currently three main capture technologies available: Agilent SureSelect, Illumina Nextera Rapid Capture, Roche Nimblegen SeqCap EZ Exome. All three are in-solution based and utilize biotinylated DNA or RNA probes (baits) that are complementary to exons. These probes are added to genomic fragment libraries and after a period of hybridization, magnetic streptavidin beads are used to pull down and enrich for fragmented exons. Each of these three exome capture technologies is compared in a detailed table: https://genohub.com/exome-sequencing-library-preparation/. Each kit has a varying numbers of probes, probe length, target region, input DNA requirements and hybridization time. Researchers planning on exome sequencing should first determine whether the technology they’re considering covers their regions of interest. Only 26.2 Mb of total targeted bases are in common, and only small portions of the CCDS Exome are uniquely covered by each tech (Chilamakuri, 2014).

Our Exome Guide breaks down the steps you’ll need to determine how much sequencing and what read length is appropriate for your exome capture sequencing project.

Sequencing, Finishing, Analysis in the Future – 2014 – Day 1 Meeting Highlights

SFAF 2014

Sequencing Finishing and Analysis in the Future Meeting 2014

Arguably, one of the top genome conferences, the annual SFAF meeting began this year in Santa Fe with a great line up of speakers from genome centers, academia and industry. Frankly, what’s amazing is that the meeting is completely supported by outside funding, there is no registration fee (hope that last comment doesn’t spoil the intimate, small size of the meeting next year).

Rick Wilson kicked off SFAF with his keynote titled, ‘Recent Advances in Cancer Genomics’. He discussed a few clinical cases where the combination of whole genome sequencing, exome-seq and RNA-seq were used to help diagnose and guide targeted drug cancer therapy.  He emphasized that this combination based sequencing approach would be required to identify actionable genes and that WGS or exome-seq alone wasn’t enough.  

Jonathan Bingham from Google announced the release of a simple web-based API to import, process, store, and collaborate with genomic data: https://gabrowse.appspot.com/. He mentioned that Google thinks of computing in terms of data centers, where is there availability? At any given time their idle computers pooled together are larger than any single data center. His new genomics team is looking to harness this and use it for genome analysis. He made the comparison of a million genomes adding up to more than 100 petabytes, on the scale of their web search index.

Steve Turner from Pacific Biosciences discussed platform advances that have led to higher quality assemblies that rival pre-second generation clone by clone sequencing. He made an analogy to the current state of transcriptome assembly: like putting a bunch of magazines in the shredder, the gluing pieces together. He described a method that is now available for construction of full length transcripts, cDNA SMRTbell™ libraries for single molecule sequencing. Finally, he announced that there were >100 Pacbio instruments installed in the field. At Genohub, we already have several listed, with service available for purchase: https://genohub.com/shop-by-next-gen-sequencing-technology/#query=f64db717ac261dad127c20124a9e1d85.

Kelly Hoon from Illumina was next up. She described a series of new updates, the most notable being the submission of the HiSeq2500 for FDA approval by the end of the year. Other points included updates to basespace, the 1T upgrade (1T data in 6 days), Neoprep allows 1 ng of input, coming this summer, new RNA capture kits and a review of Nextseq optics.

Thermo Fisher’s presentation was immediately after Illumina. Most of the discussion was on Ion Torrent’s new Hi-Q system, designed to improve accuracy, read-length and error-rates.

Right after the platform talks was a panel discussion with Pacbio, Illumina, Roche and Thermo Fisher. Main points from that discussion were: 

  • Steve Turner from Pacbio declined to discuss or entertain discussion on benchtop platform. This was met with lots of audience laughter
  • Illumina had no response for ONT except to say they’re not going to respond to ONT until after they launch…ouch.
  • Pacbio said that right now read length is not being limited by on board chemistry but rather quality of input DNA.
  • Roche 454 is phasing out 454 but looking to compete on 4-5 other possibilities (very interesting news)

Ruth Timme from the FDA discussed implementation of an international NGS network of public health labs to collect and submit draft genomes of food pathogens to a reference database. Data coming in from these sites provides the FDA with actionable leads in outbreak investigations. Currently Genome Trakr consists of six health state labs and a series of FDA labs.

Sterling Thomas discussed Noblis’ Center for Applied High Performance Computing (CAHPC) suite of high speed algorithms called BioVelocity. BioVelocity basically performs reference based multiple sequence alignment (MSA) and variant detection on human raw reads. High speed variant finding in adenocarcinoma using whole genome sequencing was used as an example.

Sean Conlan from NHGRI discussed sequence analysis of plasmid diversity amongst hospital-associated carbapenem-resistant Enterobactericeae. Using finished genome sequences of isolates from patients and the hospital, he was able to better understand transmission of bacterial strains and plasmids encoding antibiotic resistance.

David Trees examined the use of WGS to determine molecular mechanisms responsible for decreased susceptibility and resistance to azithromycin in gonorrhoeae. Predominant causes of resistance included mutations in the promotor region or structure gene of mtrR and mutations in 23S rRNA alleles located on the gonococcal chromosome.

Darren Grafham from Sheffield Diagnostic Genetics Services emphasized the importance of consensus in the choice of an analytical pipeline along side Sanger confirmation of variants for diagnostics. He described his pipeline that is currently being use in a clinical diagnostic lab for regular screening of inherited, pathogenic variants. He stated that 30x coverage is the point at which false positives are eliminated with >99.9% confidence.

Other talks during the first day (that we likely missed enjoying the beautiful Santa Fe weather):  

Heike Sichtig: Enabling Sequence Based Technologies for Clinical Diagnostic: FDA Division of Microbiology Devices Perspective

Christian Buhay: The BCM-HGSC Clinical Exome: from concept to implementation

Dinwiddie: WGS of Respiratory Viruses from Clinical Nasopharyngeal Swabs

Karina Yusim: Analyzing TB Drug Resistance

Colman: Universal Tail Amplicon Sequencing

Roby Bhattacharyya: Transcriptional signatures in microbial diagnostics

Eija Trees: NGS as a surveillance tool

Helen Cui: Genomics Capability Development and Cooperative Research with Global Engagement

Raphael Lihana: HIV-1 Subtype Surveillance in Kenya: the Puzzle of Emerging Drug Resistance and Implications on Continuing Care

Gvantsa Chanturia: NGS Capability at NCDC

The night ended with a poster and networking session. The entire agenda is posted here: http://www.lanl.gov/conferences/sequencing-finishing-analysis-future/agenda.php

Follow us on twitter and #SFAF2014 for the latest updates !

 

 

 

 

Ask a Bioinformatician

In the last 2 years, next-gen sequencing instrument output has significantly increased; labs are now sequencing more samples with greater depth than ever before. Demand for the analysis of next generation sequencing data is also growing at an arguably even higher rate. To help accommodate this higher demand, Genohub.com now allows researchers to quickly find and connect directly with service providers who have specific data analysis expertise: https://genohub.com/bioinformatics-services-and-providers/

 Whether it’s a simple question about gluing piece together in a pipeline or a request to have your transcriptome annotated, researchers can quickly choose a bioinformatics provider based on their expertise and post queries and project requests. Services that bioinformaticians offer on Genohub are broken down into primary, secondary and tertiary data analysis:

Primary – Involves the quality analysis of raw sequence data from a sequencing platform. Primary analysis solutions are typically provided by the platform after the sequencing phase is complete. This often results in a FASTQ file, which is a combination of sequence data and Phred quality scores for each base.

Secondary – Encompasses sequence alignment, assembly and variant calling of aligned reads. Analysis is usually resource intensive, requiring a significant amount of data and compute resources. This type of analysis often requires a set of algorithms that can often be automated into a pipeline. While the simplest pipelines can be a matter of gluing together publically available tools, a certain level of expertise is required to maintain and optimize the analysis flow for a particular project. 

Tertiary Analysis – Annotation, variant call validation, data aggregation and sample or population based statistical analysis are all components of tertiary data analysis. This type of analysis is typically performed to answer a specific biologically relevant question or generate a series of new hypothesis that need testing.

Researchers that still need library prep, sequencing and data analysis services can still search find and begin projects as before using our Shop by Project page. What’s new is that researchers who only need data analysis services can now directly search for and contact a bioinformatics service provider to request a quote: https://genohub.com/bioinformatics-services-and-providers/

Whether you plan performing a portion of your sequencing data analysis yourself or intend on taking on the challenge of putting together your own pipeline, consultation by a seasoned expert saves time and ensures you’re the way to successfully completing your project. By adding this new service, we’re trying to make it easier to search for and identify the right provider for your analysis requirements.

If you’re a service provider and would like your services to be listed on Genohub, you can sign up for Service Provider Account or contact us to discuss the screening and approval process.