New Short, Long and High Throughput Sequencing Reads in 2016


Nanopore sequencing

Nanopore sequencing

An exciting wave of newly released DNA sequencing instruments and technology will soon be available to researchers. From DNA sequencers the size of a cell phone to platforms that turn short reads into long-range information, these new sequencing technologies will be available on Genohub as services that can be ordered. Below is a summary of the technology you can expect in Q1 of 2016:

10X Genomics GemCode Platform

The GemCode platform from 10X Genomics partitions long DNA fragments up to 100 kb with a pool of ~750K molecular barcodes, indexing the genome during library construction. Barcoded DNA fragments are made such that all fragments share the same barcode. After several cycling and pooling steps, >100K barcode containing partitions are created. GemCode software then maps short Illumina read pairs back to the original long DNA molecules using the barcodes added during library preparation. With long range information, haplotype phasing and improved structural variant detection are possible. Gene fusions, deletions and duplications can be detected from exome data.

Ion Torrent S5, S5 XL

The S5 system was developed by Ion to focus on the clinical amplicon-seq market. While the wait for delivery of Proton PII chips continues, Ion delivered a machine with chip configurations very much similar to past PGM and Proton chips. 520/530 chips offer 200-400 bp runs with 80M reads and 2-4 hour run times. Using Ion’s fixed amplicon panels, data analysis can be completed within 5 hours. The Ion chef is required to reduce hands on library prep time, otherwise libraries and chip loading needs to be performed manually. Ion looks to have positioned their platform toward clinical applications. With stiff competition from Illumina and their inability to deliver similar read lengths and throughput, this is a smart decision by Ion. Focusing their platform on a particular application likely means future development (longer and higher throughput reads) has been paused indefinitely.

Pacific Biosciences Sequel System

Announced in September 2015, the Sequel System uses the same Single Molecule, Real Time (SMRT) technology as the RSII,   but boasts several tech advancements. At around one third the cost of a RSII, the Sequel offers 7x more reads with 1M zero –mode waveguides (ZMWs) per SMRT cell versus the previous standard of 150K. The application of Iso-Seq or full length transcript sequencing is especially promising as 1M reads crosses into the threshold where discovery and quantitation of transcripts becomes interesting. By providing full length transcript isoforms, it’s no longer necessary to reconstruct transcripts or infer isoforms based on short read information. Of course, the Sequel is ideal for generating whole genome de novo assemblies. We’l follow how the Oxford Nanopore’s ONT MinIon competes with the Sequel system in 2016.

Oxford Nanopore’s (ONT) MinIon

In 2014, Oxford Nanopore started it’s MinIon Access Program (MAP) delivering over 1,000 MinIons to users who wanted to test the technology. These users have gone on to publish whole E. Coli and Yeast genome assemblies. Accuracy of the device is up to 85% per raw base and there are difficulties in dealing with high G+C content sequences. There remains a lot of work left to improve the technology before widespread adoption. The workflow is simple and uses typical library construction steps of end-repair and ligation. Once the sample is added to the flow cell, users can generate long reads >100 kb and can analyze data in real time. Median reads are currently in the 1-2 kb length. Combined alongside with MiSeq reads, publications have shown MinIon output can enhance contiguity of de novo assembly. Lower error rates generated by Two Direction reads produced with recent updated MinIon chemistry does give cause for optimism that greatly reduced error rates can be achieved in the near future. This along with a low unit cost and the ability to deploy the USB sized device in the field make this a very exciting technology.

Illumina HiSeq X

While HiSeq X services have been available on Genohub for over a year, Illumina’s announcement of its expansion to non-human whole genomes was well received. However there are still several unanswered questions. Illumina states,

The updated rights of use will allow for market expansion and population-scale sequencing of non-human species in a variety of markets, including plants and livestock in agricultural research and model organisms in pharmaceutical research. Previously, it has been cost prohibitive to sequence non-human genomes at high coverage.

You can now sequence mouse, rat and other relatively large sized genomes economically on the HiSeq X. This makes the most sense for high coverage applications, e.g. 30x or above. While smaller sized and medium sized genomes can be sequenced on a HiSeq X, the low level of barcoding and high coverage you’d obtain makes these applications less attractive. According to Illumina, as of 12/20/2015, metagenomic whole genome sequencing was not a compatible application on the HiSeq X. The instrument is still restricted to WGS only. RNA-Seq, Exome-seq and ChIP-Seq applications will have to wait. Perhaps by the time the HiSeq X One is released access will be opened to these non-WGS applications.

While these new instruments make their way onto Genohub’s Shop by Project page, you can make inquiries and even order services by placing a request on our consultation page.

Key Considerations for Whole Exome Sequencing

exome sequencing and library preparation

Exome, UTR, non-coding regions, CDS

Whole exome sequencing is a powerful technique for sequencing protein coding genes in the genome (known as the exome). It’s a useful tool for applications where detecting variants is important, including population genetics, association and linkage, and oncology studies.

As the main hub for searching and ordering next generation sequencing services, most researchers about to embark on an exome sequencing project start their search on  It’s our responsibility to make sure the researcher is informed and prepared before placing an order for an exome sequencing service.

Working toward achieving this goal, we’ve established a series of guides for anyone about to start a whole exome sequencing project. We’ve described each of these guides here.

  1. Should I choose Whole Genome Sequencing or Whole Exome Sequencing

This guide describes what you can get with WGS that you won’t with WES and compares pricing on a per sample basis. It also provides an overview of sequence coverage, coverage uniformity, off-target effects and bias due to PCR amplification.

  1. How to choose a Exome Sequencing Kit for capture and sequencing

This guide breaks down each commercial exome capture kit, comparing Agilent SureSelect, Nimblegen SeqCap and Illumina Nextera Rapid Capture. Numbers of probes used for capture, DNA input required, adapter addition strategy, probe length and design, hybridization time and cost per capture are all compared. This comparison is followed by a description of each kit’s protocol.

  1. How to calculate the number of sequencing reads needed for exome sequencing

In the same guide that compares library preparation kits (above), we go through an example on how to determine the amount of sequencing and read length required for your exome study. This is especially important when you start comparing the cost for exome sequencing services (see the next guide).

  1. How to choose an exome sequencing and library preparation service

Are you looking for 100x sequencing coverage, what many in the industry call standard exome sequencing or 200x coverage, considered ‘high depth’?  Or are you interested in a CLIA grade, clinical whole exome sequencing service? This exome guide breaks each down into searches that can be performed on Genohub. The search buttons allow for real time comparison of available exome services, their prices, turnaround times and kits being used. Once you’ve identified a service that looks like a good match, you can send questions to the provider or immediately order the exome-seq service.

  1. Find a service provider to perform exome-seq data analysis only

Do you already have an exome-seq dataset? Do you need a bioinformatician to perform variant calling or SNP ID? Are you interested in studying somatic or germline mutations? Use this guide to identify providers who have experienced bioinformaticians on staff that regularly perform this type of data analysis service. Simply click on a contact button to immediately send a message or question to a provider. If you’re looking for a quote, they will respond within the same or next business day.

If you still need help, feel free to take advantage of Genohub’s complimentary consultation services. We’re happy to help make recommendations for your whole exome sequencing project.

Standard Quality Policy for Next Generation Sequencing Services

Whenever you order a scientific related service or outsource research, there is a degree of trust that is naturally built into the relationship. As a researcher you expect the service provider to take care of your precious samples and complete a service to your expectations. The service provider expects to work with researchers who have complied with sample requirements and will be reasonable when it comes to unexpected situations. This is especially the case when it comes to the relationship between a researcher and next generation sequencing service provider. Having worked with both researcher and sequencing service providers on hundreds of real projects, we’ve developed a standard quality policy for all services that happen through We’ve developed this policy to ensure a trusted and efficient environment for clients and providers to work together. By setting expectations for both the researcher and provider we’re improving the way sequencing services are being performed.

Benchmarking Differential Gene Expression Tools

In a recent study, Schurch et al., 2015 closely examine 9 differential gene expression (DGE) tools (baySeq , cuffdiff , DESeq , edgeR , limma , NOISeq , PoissonSeq , SAMSeq, DEGSeq) and rate their performance as a function of replicates in an RNA-Seq experiment. The group highlights edgeR and DESeq as the most widely used tools in the field and conclude that they along with limma perform the best in studies with high and low numbers of biological replicates. The study goes further, making the specific recommendation that experiments with greater than 12 replicates should use DESeq, while those with fewer than 12 replicates should use edgeR. As for the number of replicates needed, Schurch et al recommend at least 6 replicates/condition in an RNA-seq experiment, and up to 12 in studies where identifying the majority of differentially expressed genes is critical. 

With each technical replicate having only 0.8-2.8M reads, this paper and others (Rapaport et al., 2013) continue to suggest that more replicates in an RNA-seq experiment are preferred over simply increasing the number of sequencing reads. Several other papers, including differential expression profiling recommendations in our Sequencing Coverage Guide recommend at least 10M reads per sample, but do not make recommendations on the numbers of replicates needed. The read/sample number disparity is related to the relatively small and well annotated S. cerevisiae genome in this study and the more complex, multiple transcript isoforms in mammalian tissue. By highlighting studies that carefully examine the number of replicates that should be used, we hope to improve RNA-seq experimental design on Genohub

So why don’t researchers use an adequate number of replicates? 1) Sequencing cost, 2) Inexperience in differential gene expression analysis. We compare the costs between 6 and 12 replicates in yeast and human RNA-Seq experiments using 1 and 10M reads/sample to show that in many cases adding more replicates in an experiment can be affordable. 


6 replicates

12 replicates

Human (10M reads/sample)



Yeast (1M reads/sample)



*Prices are in USD and are inclusive of both sequencing and library prep cost. Click on prices in the table to see more project specific detail.

The table shows that the main factor in the price difference is related to library preparation costs. Sequencing on the Illumina Miseq or Hiseq at the listed sequencing depths do not play as significant a role in cost, due to the sequencing capacity of those instruments. 

To accurately determine the sequencing output required for your RNA-seq study, simply change the number of reads/sample in our interactive Project Page



Evaluation of tools for differential gene expression analysis by RNA-seq on a 48 biological replicate experiment. Nicholas J. Schurch, Pieta Schofield, Marek Gierliński, Christian Cole, Alexander Sherstnev, Vijender Singh, Nicola Wrobel, Karim Gharbi, Gordon G. Simpson, Tom Owen-Hughes, Mark Blaxter, Geoffrey J. Barton

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Franck Rapaport, Raya Khanin, Yupu Liang, Mono Pirun, Azra Krek, Paul Zumbo,Christopher E Mason, Nicholas D Socci and Doron Betel

How Many Replicates are Sufficient for Differential Gene Expression?

In a nicely done, convincing 2 condition, 48 replicate RNA-Seq experiment, researchers from the University of Dundee aimed to answer a frequently asked question in the field and on, ‘How many replicates are necessary for differential gene expression (DGE)?’. In their study they examined three statistical models to see which best represented read-count distribution of genes from commonly used DGE tools.

Using the statistical power of 48 replicates they determined that inter-lane variability does not play a large role in DGE results. Assuming even loading and amplification their results showed a Poisson distribution of counts from individual genes. The authors also determined that read count distribution was consistent with a negative binomial model, an assumption in widely used tools such as edgR, DESeq, cuffdiff and baySeq. Performing a goodness-of-fit test for log-normal, negative binomial and normal distributions the authors demonstrated that inclusion of ‘bad replicates’ made results inconsistent with the statistical models they tested, complicating the interpretation of differential expression results. A bad replicate was defined as 1) one that poorly correlated with other replicates, 2) a replicate with atypical read counts, 3) one having non-uniform read depth profiles. 

So how many replicates are sufficient for differential gene expression? The authors sequenced 96 mRNA samples in 7 1×50 HiSesq lanes. The cost for this on Genohub today is $28k USD. When the authors removed 6-8 bad replicates from their pool of 48 samples, their data became consistent with a negative binomial distribution. Assuming experimental variability similar to the authors, this indicates at least 6 replicates in a DGE experiment is good practice. The cost for the preparation of 6 RNA-seq libraries and sequencing is $2,500 USD.  

Doing a literature search and observing client behavior on Genohub, we estimate ~80% of those studying DGE use 3 replicates in their experiments, which with dropouts and variation is unsatisfactory. A final point we’d like to make is the authors used ~11M, 1×50 reads/sample, which goes to show that with DGE, replicates can be more important than read depth. This is further discussed in our Coverage and Read Depth Guide

At Genohub, we help consult and design sequencing experiments with users. Determining the replicates needed for a study is a common question that needs to be answered. Unfortunately, too few studies examine these fundamental elements in sequencing design. We hope this article gets the recognition it deserves. 


Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment Marek Gierliński, Christian Cole, Pietà Schofield, Nicholas J. Schurch, Alexander Sherstnev, Vijender Singh, Nicola Wrobel, Karim Gharbi, Gordon Simpson, Tom Owen-Hughes, Mark Blaxter, Geoffrey J. Barton


Coverage Recommendations by Sequencing Application – A Starting Point

Before starting a sequencing run, you need to know the depth of sequencing you want to achieve. Researchers typically determine the amount of coverage needed based on experimental requirements, genes being examined and their expression levels, the reference genome and published literature.

We’ve recently published a guide to serve as a starting point for those trying to determine how deep they should sequence their samples:

Coverage and Read Depth Recommendations by Application

Sequencing Coverage and Read Depth Guidelines

Much of the data in this table comes from published coverage saturation experiments where depth was compared to another specific metric, e.g. differential expression.

It should be noted that increasing sequencing depth is not always the best solution.  Several studies have demonstrated that more replicates in RNA-seq  is preferred over increasing the number of sequence reads (1, 2). This is nicely illustrated by Liu, Y et al., 2014 in the figure below. An increase in biological replicates, from 2 to 7 significantly increases the number of identified differential expressed genes whereas increasing sequencing reads past 10 million has diminishing returns. 

Replicates Versus Sequencing Depth 

Replicates versus Sequencing Depth













Once you’ve determined the coverage you need, calculate the number of sequencing reads required to achieve that coverage using Genohub’s coverage and read calculator. Send us a consultation request if you need help trying to determine the coverage required in your experiment. 



1-     Liu Y, Zhou J, White KP. RNA-seq differential expression studies: more sequence or more replication? Bioinformatics. 2014 Feb 1;30(3):301-4

2-     Rapaport F, Khanin R, Liang Y, Krek A, Zumbo P, Mason CE. Comprehensive evaluation of differential expression analysis methods for RNA-seq data. Genome Biology 2013, 14:R95 2013

Genohub Registered with SAM – System for Award Management

Genohub is now registered with SAM, (System for Award Management) making it even easier for organizations within the federal government to get quotes and order sequencing services through Genohub. SAM is the official government system that collects data from suppliers, validates, stores and disseminates this information to government acquisition agencies. These include the FDA, NIH, USGS, DOD and USDA. The registration for Genohub, Inc.  / 079294466 / 7ARM5 is now active in the U.S. federal government’s System for Award Management (SAM). 

To get a quote for sequencing, library prep or data analysis services, start by entering your requirements using the Shop by Project or Shop by Technology page. You’ll get an instant quote and be able to immediately place your order. 

10 Sequencing Based Approaches for Interrogating the Epigenome

cytosine methylation

DNA methylation occurs when DNA methyltransferase transfers a methyl group from S-adenosyl-methionine to cytosine in CpG dinucleotides. The methylation of 5’ methyl cytosine (5mC) nucleotides is an important epigenetic change that regulates gene activity and impacts several cellular processes including differentiation, transcriptional control and chromatin remodeling.

Genome wide analysis of 5mC, histone modifications and DNA accessibility are possible with next generation sequencing approaches and provide unique insight to complex phenotypes where primary genomic sequence is not sufficient.

Methods for methyl DNA sequencing can be broken down into three global approaches:

1) bisulfite sequencing

2) restriction enzyme based sequencing

3) targeted enrichment of methyl sites

We’ve outlined several library preparation techniques under each category.

1) Bisulfite Sequencing

Bisulfite-seq (1, 2) is a well-established protocol that provides single base resolution of methylated cytosine in the genome. Genomic DNA is bisulfite treated, deaminating un-methylated cytosines to uracils, which are later converted to thymidines. Methylated cytosines are protected from deamination, allowing researchers to identify methylation sites by comparing the sequence of bisulfite and non-bisulfite treated samples.

1-Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning

2-Highly integrated single-base resolution maps of the epigenome in Arabidopsis

2) Post Bisulfite Adapter Tagging (PBAT)

To avoid loss of template during bisulfite treatment, with PBAT (3) bisulfite treatment follows adapter ligation (tagging) and two rounds of random primer extension.

3- Amplification-free whole-genome bisulfite sequencing by post-bisulfite adaptor tagging.

3) Reduced Representation Bisulfite Sequencing (RRBS)

RRBS (4) is a method aimed at targeting sequencing coverage toward CpG islands or regions of the genome with dense CpG methylation. Sample is digested with one more restriction enzymes and then is treated with bisulfite prior to sequencing. This method offers single nucleotide methylation

4- Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis

4) Oxidative Bisulfite Sequencing (oxBS-Seq)

5-hydroxymethylcytosine (5’hmC), an intermediate of the demethylation of 5-methylcytosine (5’mC) to cytosine cannot be distinguished using the bisulfite-seq approach. With oxBS-Seq (5), 5’hmC is oxidized, causing a deamination to uracil, while leaving 5’mC. Sequencing of both treated and untreated samples allows for single base resolution of 5’hmC and 5’mC modifications.

5-Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution

5) TET-Assisted Bisulfite Sequencing (TAB-Seq)

TAB-Seq (6) utilizes glucose moieties to interact with 5’hmC protecting it from TET protein oxidation. 5’mC and non-methylated cytosines are deaminated to uracil and sequenced as thymidines, allowing for the specific identification of 5’hmC.

6- Base-resolution analysis of 5-hydroxymethylcytosine in the Mammalian genome

6) Methylation Sensitive Restriction Enzyme Sequencing (MRE-Seq)

MRE-Seq (7) utilizes a combination of methyl sensitive and insensitive restriction enzymes to identify regions of CpG methylation status.

7- Genome-scale DNA methylation analysis

7) HpaII tiny fragment-Enrichment by Ligation-mediated PCR (HELP-Seq)

HELP-Seq (8) allows for intragenomic profiling and intergenomic comparisons of cytosine methylation by using HpaII and its methylation insensitive isoschizomer MSPI.

8- Comparative isoschizomer profiling of cytosine methylation: the HELP assay

8) Methylated DNA Immunoprecipitation Sequencing (MeDIP)

MeDIP (9) is a technique based on affinity enrichment of methylated DNA using either antibodies or other protein capable of binding methylated DNA. This technique pulls down heavily methylated regions of the genome, such as CpG islands. It does not offer single nucleotide resolution.

9- Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells

9) Methyl Binding Domain Capture (MBD-CAP)

MBD-CAP (10) uses methyl DNA binding proteins MeCP2, MBD1-2 and MBD3LI to immunoprecipitate methylated DNA. Similar to MeDIP, this approach pulls down regions that are heavily methylated and does not offer single nucleotide methylation resolution.

10- High-resolution mapping of DNA hypermethylation and hypomethylation in lung cancer

10) Probe Based Targeted Enrichment

Methyl-Seq Targeted enrichment – involves the use of synthetic, biotinylated oligonucleotides designed to CpG islands, shores, gene promoters and differentially methylated regions (DMRs). Kits are commercially available through Agilent and Roche Nimblegen.  

Finally, it’s worth mentioning Single Molecule Real Time (SMRT) DNA Sequencing. SMRT sequencing by Pacific Biosciences uses the kinetics of base incorporation to allow for direct detection of methylated cytosines. Unlike any of the protocols mentioned, this does not require the use of restriction enzymes or bisulfite reagent.

Several service providers on Genohub offer targeted bisulfite-seq, reduced representation bisulfite-seq (RRBS), methylated DNA immunoprecipitation seq (MeDIP) and whole genome bisulfite-seq (WGBS) library preparation and sequencing services. Simply click on one of these application types to get started. 

AGBT 2015 Summary of Day 3

Advances in Genome Biology and Technology Conference 2015

Day 3 of the Advances in Genome Biology and Technology meeting in Marco Island began with an announcement that next year the meeting would be held in Orlando due to hotel renovations, eliciting a groan from the audience. The meeting will come back to Marco Island in 2017.

Today’s plenary session speakers all presented work with a clinical focus, acknowledgement by the conference organizers about the direction of genome sequencing. The first speaker, Gail Jarvik, head of medical genetics at the University of Washington Medical Center presented on lessons learned from the Clinical Sequencing Exploratory Research (CSER) Consortium, marketed as ‘Hail CSER’. CSER is a national consortium of projects aimed at sharing innovations and best practices in the integration of genomic sequencing into clinical care. CSER has established a list of 112 actionable genes, some overlapping with the American College of Medical Genetics (ACMG) list. The CSER group annotated pathogenic and novel variants of the Exome Variant Server (EVS) to estimate rates in individuals of European and African ancestry.

The next talk was by Euan Ashley on moving toward clinical grade whole genome sequencing. He started by describing the genome as complex, full of repeats, duplications and paralogous sequences, giving him ‘a cold sweat at night’. He gave an example of a study with 12 adult participants who underwent WGS and described how clinical grade sequencing demands consistency in reporting. Most variants annotated as pathogenic were downgraded after manual review, but this takes lots of time. 12 individuals with 1,000 variants took around 1 hour per variant. In this case the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detectable genetic variation and uncertainty about clinically reportable findings. He commented that new algorithms would be needed to address these problems and that ‘we’re at the beginning of genomics medicine’. Parts of his talk can be seen in his presentation at PMWC last month.

The last presentation before the break was by Levi Garraway who discussed the goal of cancer precision medicine to develop new therapeutics and combinations against molecular defined tumors. He mentioned that there are many discovery opportunities in clinical cancer genomics especially in terms of response and resistance to new therapies. Garraway sequenced the genomes of 57 prostate tumors and matched normal tissues to study somatic alterations. His model suggests that chormoplexy induces considerable genomic derangement over a relatively few number of events in prostate cancer supporting a model of punctuated cancer evolution. He introduced a 10X Genomics approach for phasing of large – 100 kb regions with exonic baits to obtain rearrangement information for chromoplexy. In the end he emphasized the importance of RNA-Seq profiling in conjunction with DNA sequencing for translational medicine to be relevant.

After the break, Stephen Kingsmore gave a presentation on rapid genome sequencing for genetic disease diagnostics in neonatal intensive care units. Kingsmore began the talk by describing how newborn screening (NBS) and early diagnosis reduces morbidity and mortality. NGS of 60 genetic diseases identifies ~5,000 affected newborns each year. He described how rapid genome sequencing (RGS) has the potential to improve NBS to most genetic diseases in newborns admitted to level II-IV NICUs. He mentioned a ‘ultra rapid’ sequencing pipeline he developed along with Illumina that takes 28 hours to go from sample to variant annotation (not publically available). He also discussed NSIGHT, a consortium for newborn sequencing sponsored by the NIH to understand the role of genome sequencing. More details can be found on the NHGRI page.

The last two plenary talks were by Christian Matranga and Malachi Griffith. Matranga described the clinical sequencing of viral genomes as important to understanding the evolution and transmission of the pathogen and the ability to inform on surveillance and therapeutic development. They developed a sequencing approach that combines RNAse H based depletion of rRNA with random primed cDNA RNA-seq to detect and assemble genomes from divergent lineages. They sequenced ~300 Lassa (LASV) and ~100 Ebola (EBOV) genomes. We describe some of their efforts in an earlier post called, Sequencing Suggests the Ebola Virus Genome is Changing. Be sure to read the New Yorker reference, it’s compelling!

Griffith’s talk was on optimizing genome sequencing and analysis. He makes the point that while most tumors are sequenced by exome sequencing at 75-100x mean coverage or by whole genome sequencing (WGS) to 30-50x mean coverage, detection of low frequency mutations require greater depth. He performed deep sequencing of an acute myeloid leukemia (AML) by WGS up to 350X, whole exome to 300X and using a capture panel of ~260 recurrently mutated AML genes to ~10,000x coverage. He found that deeper sequencing revealed more driver variants and improved the assignment of variants to clonal clusters. Checkout his animation of WGS depth down-sampling.

After lunch began the ‘Bronze sponsor workshops’, essentially the talks you pay >$40K to give. The most interesting was the last by 10X Genomics, mainly because as @bioinformer put it, “10X Genomics is the new princess of the AGBT ball”. First, check out the video that received a round of applause from the AGBT crowd: Changing the Definition of Sequencing. They announced their instrument would be available in Q2 this year, cost ~$75K and $500 / sample. This brings the question whether 10X Genomic’s microfluidic platform offers greater potential than Molecule. What are the implications for Illumina or PacBio? To learn more check out Keith Robison’s insightful post detailing all there is currently known about 10X Genomics.

After dinner began concurrent sessions on technology, genomic medicine and transcriptomics. Hopefully someone else will post details about the genomic medicine and transcriptomics sessions. The technology session began with Iain Macaulay describing G&T-seq, separation and parallel sequencing of genomes and transcriptomes of single cells. This was the first talk this year at AGBT with an embargo, no tweets were allowed. So rather than go into details, we did find this lecture online. The next talk was by Alexandre Melnikov on MITE-Seq, an approach to site directed mutagenesis referred to as Mutagenesis by Integrated TiLEs. MITE facilitates structure-function studies of proteins at higher resolution than typical site directed approaches. To read more check out their paper published last year in Nucleic Acid Research. Andrea Kohn, then described single-cell methylome profiling of Aplysia neurons. Using methyl-dip and bisulfite sequencing she achieved >20x coverage for each neuron and then added RNA-seq providing the first methylome and transcriptome from a single neuron. Next up was Sara Goodwin who gave an in depth analysis of the Oxford MinION Device for de novo and cDNA sequencing. She sequenced the yeast strain W303 to over 120x coverage and was able to achieve up to 80% aligned reads. She mentioned that identifying the right aligner was still a work in progress but overall found promise in the technology for long read sequencing, de novo assembly and splice site id.

Tomorrow’s plenary talks are the second installment of genomics, ‘Genomics II’ with presentations by Michael Fischback, Rob Knight, Chris Mason, and Gene Myers, excellent lineup to close the final day of AGBT. Checkout our earlier posts if you’ve missed day 1 or day 2

AGBT 2015 Summary of Day 2

Advances in Genome Biology and Technology Conference


Day 2 of the Advances in Genome Biology and Technology Conference (AGBT 2015) began with high expectations as plenary speakers in day 1 set the bar high with engaging lectures.  The first talk in the morning was by Evan Eichler who described human genetic variation by single molecule sequencing. He generated 40x sequence coverage of a haploid human genome with average read lengths of 9 kbp. His method allows for the detection of indel and structural variants from several bases up to 20 kbp. Comparing his single haplotype to the human reference he resolved some ~26K indels at SVs at the basepair level. His analysis found complex variations such as mobile element insertions as well as inversions. His results suggest systematic bias against longer, complex repetitive regions can now be resolved. According to Eichler, ‘we need to capture SVs to make precision medicine, precision’. 

Elaine Mardis spoke next giving a super lecture comprised of four very interesting vignettes on advances in cancer genomics: markers of late relapse estrogen receptor disease, translating AML and translating cancer genomes into clinical care. She spoke of the database of curated mutations (DoCM), curated database of known disease-causing mutations that provide explorable variant lists and direct links to source citations. She also mentioned the opening of Clinical Interpretations of Variants in Cancer (CIVIC) for crowdsourcing later this year.

Next, Nuria Lopez-Bigas spoke about analyzing thousands of tumor genomes to identify cancer drivers and targeted therapeutic opportunities. She claimed that 90% of tumors have at least one driver mutation, with differences between tumor types. Some of these tumor types like melanoma have many drug opportunities while others like renal carcinoma have few. 36% of patients could benefit from targeted drugs for more than one driver. She described the release of a beta version of Integrative Onco Genomics, Intogen that uses mutations from several sources: tumor mutation data from ICGC, driver id methods from OncodriveCLUST, and mutation mapping by Ensembl VEP.

Substituting for Meredith Carpenter, Carlos Bustamante talked about PhenoCap, a targeted capture panel for comprehensive phenotyping of forensic DNA samples. PhenoCap allows the prediction of phenotypic traits ranging from autosomal ancestry to facial morphology. Bustamante’s group aims to make PhenoCap analogous to a forensic exome that provides a comprehensive profile for all sample types.  General utility of forensic DNA is hampered by heavily degraded and contaminated samples making traditional STR and PCR approaches unreliable, PhenoCap solves this. Another part of his talk focused on the eastern Indian slave trade, specifically sequencing of DNA from a slave cemetery in Mauritius and the broad diversity of ancestry discovered.

The last talk before lunch was by Eric Green from NHGRI on the new ‘US Precision Medicine Initiative’ (PMI). He thanked the organizers for keeping a place for him despite his reluctance to give a title or description of what he wanted to talk about, it was before PMI was even announced at the State of the Union Speech. He started by emphasizing President Obama’s longstanding interest in genomic medicine. As a senator he tried to pass a bill called the Genomics and Personalized Medicine Act in 2006. What was announced was $200M for FY16, $70M for cancer studies near term and $130M for long term cohort work, sequencing of 1 million volunteers. What doesn’t add up are the costs to sequence 1M people and the budget allocated for this. 30x WGS: $1,750 x 1M people = ~$2B, will there be more funds for this initiative? $200M falls short. Checkout Francis Collins (left most, pointing to the TV) watching the State of the Union at his home.  He wasn’t sure exactly what President Obama would say.

Francis Collins watching state of union speech

Sarah Tishkoff gave the next talk on adaptive traits in Africa. A nice summary of this work can be found here: Genetic Variation and Adaptation in Africa: Implications for Human Evolution and Disease. Her findings on GC biased gene conversion and its importance in population genetics is described in a recent paper: Biased Gene Conversion Skews Allele Frequencies in Human Populations, Increasing the Disease Burden of Recessive Alleles.

David Page next spoke on what single haplotype iterative mapping and sequencing (SHIMS) tells us about sex chromosomes. He started by describing all genes that are involved in sperm production as palindromic copies on the Y chromosome, the longest being 3Mbp. The male specific region of the Y chromosome (MSY) sequence reveals lineage- specific convergent acquisition and amplification of X-Y gene families, potentially fueled by a fight between acquired X-Y homologs. He also spoke of the goal to make reference grade sequencing of structurally complex regions faster and more affordable. His work, Sequencing the Mouse Y Chromosome Reveals Convergent Gene Acquisition and Amplification on Both Sex Chromosomes was recently published in Cell late last year.

Then next plenary talk in evolutionary genomics was by Erich Jarvis on the challenges of sequencing representative genomes for an entire vertebrate lineage. As part of the Avian Phylogenomics Consortium (>200 researchres, >120 institutions), he collected and sequenced genomes of 48 avian species representing nearly all orders. He pinpointed the rise of modern birds from mass extinctions and described the convergent expression of 55 genes in birds and human neurological regions as related to vocal learning. Much of this work is described in 8 articles in a special issue of Science called, A Flock of Genomes.

The next last two talks of the plenary session were by Jessica Alfoldi and James Bradner. Jessica’s presentation, Evolution Two Ways: Natural And Artificial Selection showed that changes in allele frequencies of non-coding ancestral variation allows for bursts of speciation in cichlids and behavior changes in rabbits, both on short time scales.  Bradner’s talk was on the  disruption of ‘super enhancers’.

The concurrent sessions began in the evening with topics in biology, informatics and cancer. Hopefully someone else will write about the informatics and cancer sessions. The biology session began with Hie Lim Kim describing WGS of five Nambian Khoisan individuals and a  population analysis study using a 420K SNP dataset to show that two genomes contained exclusive Khoisan ancestry. Climate data along with sequencing indicates an ancient split in modern humans. Karyn Meltz Steinberg’s talk was on exome sequencing of 20,000 Finns. The ability to detect genetic associations with Finns is improved due to a bottleneck that resulted in enrichment of low frequency, deleterious variants. Using three different analysis pipelines they generated a genotype consensus call set and identified several potentially novel associations with plasma adiponectin and plasma CRP levels. Charlotte Lindqvist presented a genome scale study of population dynamics and speciation in polar and brown bears. She found evidence for functional speciation related to cellular respiration, hibernation and pigmentation. Beth Shapiro presented her work on brown and polar bear interspecies hybridization, Matthew Blow talked about sequencing based approaches for genome scale functional annotation and Max Seibold presented on integrated RNA-seq of the host and microbe in the nasal airway of childhood asthmatics.

Tomorrow the talks will clinical based, with concurrent sessions in technology, genomic medicine and transcriptomics. If you missed the first day at AGBT, check our summary of day 1.