4 Approaches to HLA Type using Next Generation Sequencing

Human leucocyte antigen (HLA) genes are among the most polymorphic in the entire genome. They are located on the short arm of chromosome 6 within the major histocompatibility complex (MHC) and play an important role in organ or hematopoietic stem cell transplantation. Donor – recipient matching prior to transplantation is performed by examining 6 genes (A, B, C, DP, DQ, DR) that encode HLAs to reduce risk of transplant rejection or graft-versus-host-disease (GVHD).

 While the current “gold-standard” for HLA typing is Sanger Sequencing, defining the phase of sequence motifs is difficult and genotype ambiguity increases with each database release. Next generation sequencing (NGS) offers many advantages including: the ability to provide nucleotide-specific reads across the entire genome, clonal amplification provides phase information, and the ability to thoroughly evaluate larger genomic regions (introns). Unresolved technical issues however have limited the implementation of NGS in accredited HLA laboratories. These include significant library preparation and processing times and the more complex bioinformatics analysis needed to generate an easily readable HLA typing report.

 The main approaches to prepare DNA template for HLA typing on NGS platforms include:

  1. Multiplex PCR to amplify target regions, including exons or introns. Advantage: After PCR, fragmentation and ligation are no longer required as sequencing primers are designed as past of the initial targeting primers.  Disadvantage: Primer design is difficult due to multiple polymorphisms, necessitating primer pools.
  2. Long range PCR of individual loci followed by fragmentation and ligation of sequencing adapters. Advantage: Longer regions can be targeted. Disadvantage: Fragmentation and subsequent ligation of sequencing adapters can be difficult.
  3. Sequence capture using oligo based hybridization to target regions >20 Mb. Advantage: This technique is similar to exome capture and well characterized. Disadvantage: Hybrid capture of shorter HLA regions or the MHC is less effective than larger segments.
  4. Sequencing of the entire genome (whole genome sequencing). Advantage: This is the least biased way to examine HLA regions of interest. Disadvantage: Data analysis and the requirement to extract sequences of interest from the entire genome is difficult and not ready for routine studies.

Generating an unambiguous HLA genotype is important and software customized to each of these 4 approaches now exists. Several commercial companies, including bioinformatics providers on Genohub have software to accept data in FASTA or FASTQ and output accurate HLA genotype results.

 Whether you’re just beginning your HLA typing work and are interested in library prep, sequencing and an analysis solution or if you’ve already extracted your data and now are looking for the right analysis to generate an unambiguous report, Genohub offers complementary HLA consultation and can match you with the right service provider. Service providers on Genohub have experience using the four template preparation methods described above and have the pipelines in place for your analysis. To get started, fill out our NGS project consultation form and we’ll contact you with our recommendations. 


Choosing the Right NGS Instrument for Your Research

If you’re about to embark on a high throughput sequencing project, choosing the right sequencing instrument to use is an important consideration. Perhaps you’re replicating a published study or repeating an experiment from previous work and the instrument you plan to use is known. If not, the right sequencing instrument should be based on the sequencing goal you are trying to achieve. Instrument features to take into consideration include: number of reads per run, read length, read type (paired or single end), error type, turnaround time and price. Using Genohub’s Shop by Project page, you can enter the number of required reads or coverage you need and instantly compare instruments, filtering by read length and sorting by turnaround time and price. To get a better idea for the differences between NGS instruments, we’ve generated the following comparison: Table 1.   

Certain instruments are ideally suited to specific applications. Illumina instruments are versatile and ideal for a variety of sequencing applications, including: de novo assembly, resequencing, transcriptome, SNP detection and metagenomic studies. The HiSeq and GAIIx instruments are both suited for analyzing large animal or plant genomes. High level multiplexing of samples are possible when analyzing species with a smaller genome size. While the Illumina MiSeq outputs significantly fewer reads (Table 1), its read lengths are significantly longer making it ideal for small genomes, sequencing long variable domains or targeted regions within a genome. The only real limitation to the Illumina platform is its relatively short reads compared to other platforms (Roche 454 and PacBio).

The Ion PGM (Ion Torrent), is ideal for amplicons, small genomes or targeting of small regions within a genome. Its low throughput makes it ideal for smaller sized studies. The Ion Proton however is capable of generating significantly larger outputs (Table 1) making sequencing of transcriptome, exome and medium sized genomes possible.

The PacBio RS/RS II breaks the mold of other short reads high throughput sequencing instruments by focusing on length. The reads, averaging ~4.6 kb are significantly longer than other sequencing platforms making it ideal for sequencing small genomes such as bacteria or viruses. Other advantages include its ability to sequence regions of high G/C content and determine the status of modified bases (methylation, hydroxymethylation) without necessitating the need for chemical conversion during library preparation. The instrument’s low output of reads prevent it from being useful for assembly of medium to large genomes.

The Roche 454 FLX+ is typically used in studies where read length is critical. These include de novo assemblies of microbial genomes, BACs and plastids. It’s long read length has made it a favorite of those examining 16S variable regions and other targeted amplicon sequences. The lower output of the FLX and FLX+ instruments make it less cost-effective for transcriptome or larger genome studies. Roche has announced that it will stop producing the 454 in 2015 and end servicing in mid-2016. 

The SOLiD series of instruments are high throughput, generating a large number of short reads. De novo sequencing, differential transcript expression and resequencing are all viable applicaions of the SOLiD platform. The weakness of the platform is its short reads making assembly very difficult. 

If you’re still not sure about what NGS instrument to choose for your work, feel free to contact us for our complementary sequencing project consultation

5 Reasons to Consider Outsourcing Your Sequencing Project

According to the US News & World Report [1] there are 257 universities with biological sciences graduate programs in the US alone. Of these, fewer than a hundred [2] have genomics core facilities.  If you’re a researcher in an institution without a sequencing core facility you’ll obviously have to outsource your sequencing projects. We’ve also found that many researchers with access to their own local core facilities often choose to get their sequencing done  elsewhere, be it at other core facilities or commercial NGS providers. Here are some of the reasons why:


1. Shorter turnaround times

Depending on the volume of projects a core lab can handle at any given time, researchers are often faced with long queues before capacity opens up for their project. Even if a facility has plenty of capacity,  high-throughput instruments allow for multiple libraries to run simultaneously. For example the popular Illumina HiSeq 2000 sequencer has flow cells that have 8 lanes each. Reagent costs are largely independent of how many lanes are filled, so it makes sense for the facility manager to wait till there are enough projects to fill up all lanes before starting a new run. These are some of the factors that can lead to turnaround times as long as several weeks or even a few months. Especially when faced with tight publication deadlines it often makes sense for researchers to consider getting their sequencing done at a facility that can offer a shorter turnaround time for their project.


2. Access to instruments not available locally

Not all sequencing instruments are created equal. Typically the relevant parameters are total output per run, number of reads per run, read length, price per run and run duration.  Each sequencer is designed to strike a different balance between these parameters and there is no single sequencer that would be the optimal choice for all projects. Sometimes the right instrument for a project is simply not available at the local facility.

For example, applications such as de novo assemblies of novel genomes benefit from longer read lengths which will span more repeats and missing bases, closing gaps and simplifying finishing.  Going with an instrument like the PacBio RS whose average read length is currently ~4600 bp with some reads up to ~20,000 bp has become routine for those looking to completely assemble bacterial genomes. Long read lengths also allow you to uncover complex structural variations and identify where copy number variations have occurred relative to a reference genome. For transcriptome analysis, they help resolve RNA splicing patterns as long reads allow for a greater chance that the entire transcript is read, eliminating the need to infer isoforms.


3. Bioinformatics capabilities and expertise:

When it comes to analysis of sequencing data, there are typically three things to consider:

  1. Expertise to design or select the right analysis pipeline for the given project
  2. Access to and familiarity with the right software tools
  3. Access to sufficient hardware resources, e.g. compute clusters.

It’s fairly common for researchers to not have all three available locally. Most NGS facilities can typically assist with one or more of these, but the vast majority of facilities have unique expertise and specializations.  For example, alignment of methylation sites using bisulfite-seq data differs from alignment of regular DNA sequence in that the conversion of unmethylated cytosines to uracil decreases the total amount of information available for alignment of sequenced reads against a reference genome. Some approaches align cytosines to reference genome cytosines and sequenced thymines to cytosines or thymines in the reference genome. An alternative method is to remove the bias toward the alignment of methylated reads at the cost of degrading the total information available for alignment. You will need to work with a provider who has experience with methylation analysis to determine which is the most appropriate for your study.

On the hardware resource front, while for example the E. Coli genome can be assembled in 15 minutes with a desktop computer equipped with 32GB of RAM, what about an organism that hasn’t been sequenced before? Will you be waiting on your core to setup a de novo pipeline? What if the analysis you need doesn’t fit a cookie cutter pipeline, does your core facility have the capacity to perform custom analysis services? These are all questions to think about when deciding where to get your NGS project done. 


4. Library preparation expertise

Properly designed libraries are probably the single most important part of a successful sequencing experiment. Library preparation is a relatively manual process, so experience matters and it will be reflected in your sequencing results. It is important to use facilities that have experience in the library preparation application you are looking for.

Each sequencing service facility usually has a core 3-4 library prep applications they are experts in. While almost every service facility you encounter has DNA-Seq and RNA-Seq in their “expert list”, if you’re looking for amplicon or targeted capture, it is important the facility has performed this type of work. Outside a core’s defacto 3-4 library prep applications, we recommend spending the time to  look for a facility experienced with your application.  Otherwise,  you may be just as successful by purchasing a kit and making your own libraries.

As a specific example, if you outsource chromatin immunoprecipitated or microRNA samples to a facility that hasn’t performed ChIP-Seq or Small RNA- Seq library prep, chances are you won’t be getting back anything that’s sequenceable. A fault of current small RNA library methodologies are adapter dimers that form during the preparation. An experienced provider will know to look for these and ensure that 50% of your reads are not wasted on meaningless sequence. ChIP-Seq or any low input library preparation methodology is prone to high duplication rates. Again, an experienced operator will know to perform qc checks and limit PCR cycles so that you achieve reads with high diversity. 


5. Cost

It’s needless to say that using the most efficient use of research funds is important. This is especially true given the recent tightening of government funding of life science research, at least in the US and Europe. Everything else being equal , it’s often wise to at least shop around to see whether there are facilities that can carry out your sequencing project at substantially lower prices. This used to be a very time consuming process but using Genohub you can now easily make apples-to-apples comparisons of NGS services with just a few clicks.


We’re here to help

Our job here at Genohub is to make it very easy for researchers to find NGS service providers and facilitate such collaborations. You can use our project matching page to compare very accurate information about NGS services from a wide variety of service providers and easily submit your project. We’d also love to hear from researchers directly and help identify the right sequencing solution and service provider. Just fill our free consultation form or email us at support@genohub.com.



[1] US News& World Report Biological Sciences Graduate Programs

[2] http://omicsmaps.com/