Matchmaking for the Life Scientist

Seq-ing NGS data analyst for long walks on the beach (Marco Island). Experience with SNP calling and genome mapping a plus. Must be willing to deal with large data sets and my species diversity. Willing to pay hourly for a short term (one-time) relationship. Also offering to include you as an author on my paper. 

So maybe you haven’t seen a request exactly like this one, Genohub has ! There are typically two main types of people using

There is the researcher who is completely new to next-gen sequencing. He or she may know nothing about sequencing read types or how to analyze data and needs help starting. The researcher may have some resources, but needs help with sequencing and / or  data analysis.  These researchers can begin their foray into NGS using our complimentary consultation form. This researcher will talk with our scientific staff and get matched with a sequencing service provider or bioinformatician best suited to handle their project. 

 Then there’s the experienced NGS researcher who has ordered library prep, sequencing or data analysis services before and knows exactly what he or she wants. Often this researcher needs to discuss a few things with a service provider before moving things forward. This type of researcher can find service providers based on the NGS instrument or library prep application they need using Genohub’s technology search page or enter the number of reads / coverage they need using our project specific search engine. Using either search applications, the researcher can find, submit their project and talk directly with a service provider.

So the next time you’re ‘seqing’ a match for your next-gen sequencing project, start with

next generation sequencing consultation


Consider Bias in NGS Library Preparation

Library preparation for next-generation sequencing involves a coordinated series of enzymatic reactions to produce a random, representative collection of adapter modified DNA fragments within a specific range of fragment sizes. The success of next-generation sequencing is contingent on this proper processing of DNA or RNA sample. However as you go from isolating nucleic acid to finally interpreting your sequencing data, assumptions and bias prone steps can quickly reduce the value of your work. This is especially the case during library preparation.

Where are biases introduced ?

The first step in most library preparation is shearing of your DNA or RNA to fragments that are compatible with today’s short read NGS instruments. Whether you fragment your nucleic acid material using a high divalent cation buffer, nuclease, transposase, acoustic or mechanical method, you are invariably cutting at “preferred” positions resulting in fragments with non-random ends. In DNA-Seq/ChIP-Seq library prep procedures the next step is to end-repair or repair 5’overhangs and fill in 3’overhangs using a couple polymerases (or fragments of a polymerase). If you’re starting with variably fragmented ends, this process isn’t going to occur at the same efficiency for all your fragments, and neither will adenylation of these ends (commonly performed after end-repair). The next step is typically ligation of an adapter that is compatible with the instrument you plan to use. Unfortunately there aren’t that many studies that have examined bias in DNA adapter ligation. However, if you’re keeping up to date with papers coming out on RNA adapter ligation bias, there is reason to be concerned. To complete your library you typically have to amplify it, which can lead to preferences for particular fragments, and un-even fragment representation in your data. We’re just talking polymerase bias in PCR here, we haven’t even mentioned the polymerase biases on board the sequencing platform.

All RNA-Seq library prep procedures utilize reverse transcriptases that often have issues reading through regions of RNA that have secondary structure or specific base compositions. The reverse transcriptase primer you choose can also bias your results. Follow that with second strand synthesis and all the steps described above for DNA-Seq and you now have potentially the most compromised library application on hand.

A recent review nicely highlights some of these biases in more detail: Library preparation methods for next-generation sequencing: Tone down the bias

It’s not all doom and gloom, there are several ways you can reduce or measure these biases.  For example using randomized bases in small RNA library preparations, eliminating PCR  or even using a polymerase that can handle GC / AT rich regions fairly equally. Eliminating or reducing these biases will be the subject of a subsequent blog post.

Consistency in library prep method

For many applications, as long as all your samples are being treated equally, biases introduced during library preparation affect all your samples and may be less of a concern. If you’re in the middle of a long project, we highly recommend using the same library prep method or kit for your whole study. Differences in method, enzymes and reaction times between two different library prep kits can give you significantly different results. To understand differences in library preparation methodology, we have a NGS Library Prep Kit Guide that describes kits providers use on Genohub.

If you’d like help navigating issues in library preparation bias or would like to use our complementary consultation service, contact us !


NextSeq 500 and HiSeq X Ten: New Tech Lowering Cost per Mbp

Jay Flatley’s announcement yesterday certainly changes calculations for whole genome sequencing. Newer, cheaper optics, fluidics and reagent chemistry have lowered the cost of sequencing and enabled a 300 cycle, 125 Gb run in 30 hours with the NextSeq 500. The HiSeq X Ten, (pronounced ex ten, not ten ten) consisting of 10 instruments daisy chained together, will generate 18 Tb in 72 hours.  The new optical technology now utilizes a 2 dye system: adenine and cytosine bases are represented by one dye, an absence of dye for guanine bases and both dyes for thymine. This allows Illumina to utilize lower resolution cameras with half the number of images. The new patterned flow cells with larger clusters use nano wells and are scanned bi-directionally making optical scanning 6 times faster than a HiSeq 2500. New reagent chemistry now allows reactions to occur at room temperature, eliminating the need for a bulky chiller which reduces the instrument’s size to that of a Miseq, leading to the commonly quoted phrase: “HiSeq in a MiSeq”.

What’s the cost ?

The NEXTSeq 500 will cost $250,000 and the HiSeq X Ten must be purchased in sets of 10 at $10 million for a full set. According to Illumina, the HiSeq X Ten will yield whole human genome sequences for $1,000 each and will have the capability to generate around 15,000-20,000 genomes per year. The NEXTSeq 500 will be able to generate 120 Gb or 4, whole human genomes at 30x coverage for ~$4,000.

Excess capacity

So what will providers be doing with all this excess capacity….enter  Genohub’s intelligent sequencing matching engine instantly matches researchers with service providers based on specific project criteria. Genohub facilitates the management of sequencing projects throughout the sequencing lifecycle from selecting orderable sequencing packages, to communication, payments and delivery of data. In March, NGS service facilities are going to need to recoup operational costs and convince their institutions they made the right choice dropping $250K for a NEXTSeq 500 or $10M for a HiSeq X 10 cluster. We estimate that toward the middle of 2014 there will be a lot of available NEXTSeq 500 flow cells needing filling and a much higher number of whole human genomes needed for the HiSeq X 10. Regulatory issues, data analysis bottlenecks and operational logistics will most likely keep the 5 HiSeq X 10’s fairly quiet in 2014 (Illumina has promised 5 in 2014, 3 have already been purchased). Genohub is uniquely positioned to distribute this excess capacity to researchers around the world.  Your local institution or even country no longer need to have one of these instruments on hand (See our post on reasons to outsource NGS services). By using, you have access to sequencing capacity and instruments located throughout the world.

Looking to use the NEXTSeq 500 ? After discussions with our current service providers, we expect NEXTSeq 500 sequencing services to be available on Genohub in 3 months. We’ve already spoken to one of the announced HiSeq X 10 customers and hope to have that service available on Genohub shortly after delivery.  So today, we’re happy to announce that Genohub is taking NEXTSeq 500 pre-delivery service requests ! Send your request through our consultation form. Check back with us in March for regular access to these platforms using our intelligent sequencing search engine.   




Top Next Generation Sequencing Applications

A common question we’re asked is what library preparation applications are researchers most interested in. Providers starting their own core facility, bioinformaticians writing software for a particular pipeline and others trying to gauge demand for NGS applications are most interested in this answer. In the last three months we looked at the number of initiated projects on Genohub that included library preparation. Projects initiated on Genohub are made through our Shop by Project: or our Shop by Technology: interfaces. Users enter project information like coverage or the number of required reads and can specify if they prefer one platform over another. Genohub’s intelligent project matching engine takes this data and displays packages that consist of provider services that match the user’s request. Users who select a package and begin direct communication with the provider are considered those who have initiated a project. A summary of the library preparation applications those users choose in the projects started between 10/2013 and 12/2013 are plotted in Figure 1 (data of projects using our complementary consultation service was also included in this graph).  

projects started

RNA-Seq projects encompass all those starting with Total RNA, ribosomal depleted and poly-A select RNA. These applications were the most popular followed by projects involving whole genome sequencing. RNA-Seq’s growing versatility as both an expression analysis and de novo assembly/construction tool are likely the reasons for the greatest number of projects on Genohub. Targeted DNA applications were also frequently performed as Exome, 16S V4 and other Amplicon-Seq projects consisted of the 3rd, 4th and 5th most commonly started projects on Genohub. While not illustrated in Figure 1, specialized applications related to Methyl-Seq and ChIP-Seq were some of the fastest growing.

Having recently started, we expect these numbers to grow significantly. We’ll keep the community updated with our latest data. If you’re a researcher or service provider that has a unique NGS application, we’d like to hear about it ! For inquiries or suggestions please contact us at