Sequencing, Finishing and Analysis in the Future Meeting (SFAF 2013)

Day 1 at SFAF, and already a whole lot to be excited about. Richard Gibbs, Director of the Human Genome Sequencing Center launched the conference with a talk about genomic futurism and predicted widespread access to DNA testing. This was followed by a series of tech talks from Jim Knight (Roche), Haley Fiske (Illumina), Kelly Hoon (Life Technologies), and Stephen Turner (Pacific Biosystems).

We learned that Roche 454 was working on longer read lengths and higher accuracy. The Proton platform will be releasing a Proton III chip that will allow 600 M reads. While the Proton chip can only handle 200 bp reads, they are working on new enzymes to allow for 400 bp reads. 400 bp is already available on the Ion PGM. PacBio is working on polymerase enzymes with protection groups to prevent photo-damage. These new enzymes should allow for a 10-fold increase in sequencing throughput. Illumina was pushing the envelope on the MiSeq with 2X300 available before the end of the year.

As usual, the much-hyped “NGS Technology Panel Discussion” delivered. Essentially Knight, Fiske, Hoon and Turner were sat down next to each other, followed by the audience being goaded to as ask critical questions that would “hopefully put them against each other”. Richard Gibbs framed the discussion saying that while throughput was making everyone bleary eyed, there was still a long way to go to making NGS more accurate, “we need better data”. He also suggested a HLA sequencing & assembly (human leukocyte antigen) bake-off between all the platforms. Ion and PacBio were already doing HLA sequencing internally, Illumina and Roche are going to look into participating in the comparison. Gibbs offered to mediate.

Fiske made an interesting statement during the panel discussion: develop new and innovative enzymes and library prep technologies, “one of the four major platforms will purchase you”. Responding to a statement from the crowd that sequencing costs were not going down, Fiske mentioned possible decreases by ASHG or AGBT. This resulted in a question from the conference organizers to the audience, “What is the most important area you’d like to see improvement”? By a show of hands, the 250 person crowd responded in the following order: 1) read length & accuracy, 2) speed, 3) price, 4) throughput. Johar Ali, an organizer from the OICR responded to the vote by saying that improvements are specific to platforms. Roche 454 needs to reduce costs, PacBio needs to improve throughput, Illumina and Ion need tighter sizing and better accuracy. This lead to Fiske asking the crowd what they think Illumina should develop: smaller and faster sequencing, think sequencing on smart phones, or supercomputers with tons of output. One person in the crowd said, “think emerging markets, smart phone!”, several others retorted, “both”. All four mentioned that their companies were working on nanopore technology. Fiske said that they have licensed technology from Oxford Nanopore, but they are not sharing news or updates, “mum is the word”. Turner said that if nanopore technology becomes a reality, they would be best positioned to take advantage of the technology. Knight mentioned that Roche heavily invested with IBM on silicon nanopore technology, but they have pulled back recently. He reflected personally that he saw nanopore technology becoming a reality; it was just a matter of time. Also reflected that sequencing had a future in diagnostics and Roche was going to be in the sequencing business for the long term. While the panel discussion ended without any arm-to-arm conflict, we did learn a lot of new things that we didn’t hear from their rehearsed talks:

  • All four platforms are actively trying to increase their read lengths (although we didn’t see significant guidance beyond Ion’s 400 bp reads)
  • Haley mentioned that Illumina’s latest aquisition of Moleculo technology might one day replace current Mate-pair library prep (if prices can be lowered).
  • MiSeq DX was pulled back because Illumina has been able to increase MiSeq output. A clinical instrument will require a locked down platform.
  • Turner said that he expects 30,000 PacBio base pairs in 3 years. Shearing technologies and polymerases would have to be improved for better-read lengths.
  • PGM can produce >2G aligned Q20 bases with even coverage (E. coli) using their 318, 400 bp chip.
  • Fiske said that Illumina spends 25% of revenue on R&D, Life Tech and Thermo spend 5% and 3% respectively.
  • Illumina needs help with better and higher fidelity polymerases. Suggested that Illumina would consider purchasing companies who have breakthrough technologies related to polymerases.

The day concluded with talks on:

  • Genome mapping in nanochannel arrays for assembly
  • 1000 cancer gene panels for clinical NGS
  • Library construction from low-input and FFPE samples
  • Direct selection of microbiome DNA
  • Quantitative RNA-Seq using molecular indices
  • Q60 PacBio long reads using Quiver

The meeting schedule and talks should be posted here shortly. Follow #SFAF2013 for the latest news. If you’d like to meet us at the meeting, tweet @genohub or send us an emailLooking forward to day 2 !

Sequencing Design Part I: Replication, Randomization and Multiplexing

Replicates

Replicates are essential in any biological experiment, the same goes for high throughput sequencing. Samples are subject to variation thus making biological replicates important for statistical significance and identifying sources of variation. Despite the desire to cut back on replicates to reduce cost, it’s important to remember that there are many factors which may cause a sequencing run or sample to fail. If you don’t have sufficient replicates, you may have to repeat your sequencing run. In general we recommend at least 4 biological replicates for every experiment.

Randomization

Randomization is a process of assigning biological samples at random to groups or to different groups within an experiment. This reduces bias by equalizing independent variables that have not been accounted for in the experimental design. Randomization reduces instrument effect, systemic bias and the potential for the occurrence and effect of confounding factors (operational, procedural and person confound). The two main sources of variation that contribute to confounding factors are 1) library effects that occur due to reverse transcription and amplification and 2) subunit effects (sequencing lanes [Illumina and SOLiD], chips [Ion], plates [Roche 454]) such as poor base calling, bad sequencing cycles. We recommend randomizing your samples by making sure each sequencing subunit contains samples from both control and experimental groups. This can be done by barcoding or indexing your samples to allow for multiplexing.

Multiplexing

DNA (or cDNA fragments made from RNA) can be labelled with sample specific sequences or barcodes that allow multiple samples to be included in the same sequencing reaction. Multiplexing allows for proper sample identification after the sequencing run is complete. Multiplexing can be used to create balanced, pooled experimental designs. If you have 8 samples that require the sequencing output obtained from 3 Illumina lanes, subunit effects can be eliminated by multiplexing all 8 samples and loading each 8 sample multiplexed pool into all 8 lanes. All subunit (lane effects) will be the same for each sample. Multiplexing also has the advantage of eliminating phasing issues related to low multiplex pools. Low multiplexed pools can result in no signal in one of the color channels of an index read. The image registration might fail and no base will be called from that cycle. If a base isn’t called then samples will not be able to be demultiplexed.

To conclude, the best way to ensure reproducibility is to include independent biological replicates that are randomly assigned to a sequencing subunit (flow cell lane, chip or plate). This can be done by multiplexing your samples using sample indices or barcodes. Multiplexing by adding a barcode during the ligation step of library prep will eliminate 1) library (amplification) and 2) subunit effects, confounding factors in sequencing.

If you’re new to high throughput sequencing and have questions about how you should design your sequencing run, email us to take advantage of our free consultation. AtGenohub we’re always happy to discuss your sequencing project, regardless of whether you use our service.

Biology of Genomes Meeting 2013

We’re looking forward to the annual Biology of Genomes meeting in Cold Spring Harbor this year. In addition to the keynote lectures by Andrew Fire and Eric Lander we’ll be paying close attention to the following talks:

  • Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals – Battle, A
  • Pulling out the 1%—Whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries – Carpenter, M.L.
  • Highly accurate determination of indels (and SNPs) from human resequencing data with an assembly-based approach – Jaffe, D.B.
  • A general approach to account for technical noise in single-cell RNA-seq experiments – Marioni, J.
  • Integrative genomic analysis of RNA and ChIP-seq data across multiple cell types identifies “stretch enhancers” associated with type 2 diabetes – Parker, S.C.
  • A genome-wide estimate of meiotic gene conversion rate in humans – Williams, A.
  • Comprehensive analysis of relapsed acute lymphoblastic leukemia reveals intronic mutations that drive FLT3 over-expression – Wilson, R.K.

Our interests are not only in high-throughput genomics, but also functional and computational genomics, hence the diversity in the highlighted talks. If you’re not able to attend in person (it was oversubscribed even before the abstract deadline), you can listen to the lectures in real time by registering with CSHL’s Leading Strand. If you’re attending drop us a line!