Sequencing, Finishing, Analysis in the Future – 2014 – Day 1 Meeting Highlights

SFAF 2014

Sequencing Finishing and Analysis in the Future Meeting 2014

Arguably, one of the top genome conferences, the annual SFAF meeting began this year in Santa Fe with a great line up of speakers from genome centers, academia and industry. Frankly, what’s amazing is that the meeting is completely supported by outside funding, there is no registration fee (hope that last comment doesn’t spoil the intimate, small size of the meeting next year).

Rick Wilson kicked off SFAF with his keynote titled, ‘Recent Advances in Cancer Genomics’. He discussed a few clinical cases where the combination of whole genome sequencing, exome-seq and RNA-seq were used to help diagnose and guide targeted drug cancer therapy.  He emphasized that this combination based sequencing approach would be required to identify actionable genes and that WGS or exome-seq alone wasn’t enough.  

Jonathan Bingham from Google announced the release of a simple web-based API to import, process, store, and collaborate with genomic data: He mentioned that Google thinks of computing in terms of data centers, where is there availability? At any given time their idle computers pooled together are larger than any single data center. His new genomics team is looking to harness this and use it for genome analysis. He made the comparison of a million genomes adding up to more than 100 petabytes, on the scale of their web search index.

Steve Turner from Pacific Biosciences discussed platform advances that have led to higher quality assemblies that rival pre-second generation clone by clone sequencing. He made an analogy to the current state of transcriptome assembly: like putting a bunch of magazines in the shredder, the gluing pieces together. He described a method that is now available for construction of full length transcripts, cDNA SMRTbell™ libraries for single molecule sequencing. Finally, he announced that there were >100 Pacbio instruments installed in the field. At Genohub, we already have several listed, with service available for purchase:

Kelly Hoon from Illumina was next up. She described a series of new updates, the most notable being the submission of the HiSeq2500 for FDA approval by the end of the year. Other points included updates to basespace, the 1T upgrade (1T data in 6 days), Neoprep allows 1 ng of input, coming this summer, new RNA capture kits and a review of Nextseq optics.

Thermo Fisher’s presentation was immediately after Illumina. Most of the discussion was on Ion Torrent’s new Hi-Q system, designed to improve accuracy, read-length and error-rates.

Right after the platform talks was a panel discussion with Pacbio, Illumina, Roche and Thermo Fisher. Main points from that discussion were: 

  • Steve Turner from Pacbio declined to discuss or entertain discussion on benchtop platform. This was met with lots of audience laughter
  • Illumina had no response for ONT except to say they’re not going to respond to ONT until after they launch…ouch.
  • Pacbio said that right now read length is not being limited by on board chemistry but rather quality of input DNA.
  • Roche 454 is phasing out 454 but looking to compete on 4-5 other possibilities (very interesting news)

Ruth Timme from the FDA discussed implementation of an international NGS network of public health labs to collect and submit draft genomes of food pathogens to a reference database. Data coming in from these sites provides the FDA with actionable leads in outbreak investigations. Currently Genome Trakr consists of six health state labs and a series of FDA labs.

Sterling Thomas discussed Noblis’ Center for Applied High Performance Computing (CAHPC) suite of high speed algorithms called BioVelocity. BioVelocity basically performs reference based multiple sequence alignment (MSA) and variant detection on human raw reads. High speed variant finding in adenocarcinoma using whole genome sequencing was used as an example.

Sean Conlan from NHGRI discussed sequence analysis of plasmid diversity amongst hospital-associated carbapenem-resistant Enterobactericeae. Using finished genome sequences of isolates from patients and the hospital, he was able to better understand transmission of bacterial strains and plasmids encoding antibiotic resistance.

David Trees examined the use of WGS to determine molecular mechanisms responsible for decreased susceptibility and resistance to azithromycin in gonorrhoeae. Predominant causes of resistance included mutations in the promotor region or structure gene of mtrR and mutations in 23S rRNA alleles located on the gonococcal chromosome.

Darren Grafham from Sheffield Diagnostic Genetics Services emphasized the importance of consensus in the choice of an analytical pipeline along side Sanger confirmation of variants for diagnostics. He described his pipeline that is currently being use in a clinical diagnostic lab for regular screening of inherited, pathogenic variants. He stated that 30x coverage is the point at which false positives are eliminated with >99.9% confidence.

Other talks during the first day (that we likely missed enjoying the beautiful Santa Fe weather):  

Heike Sichtig: Enabling Sequence Based Technologies for Clinical Diagnostic: FDA Division of Microbiology Devices Perspective

Christian Buhay: The BCM-HGSC Clinical Exome: from concept to implementation

Dinwiddie: WGS of Respiratory Viruses from Clinical Nasopharyngeal Swabs

Karina Yusim: Analyzing TB Drug Resistance

Colman: Universal Tail Amplicon Sequencing

Roby Bhattacharyya: Transcriptional signatures in microbial diagnostics

Eija Trees: NGS as a surveillance tool

Helen Cui: Genomics Capability Development and Cooperative Research with Global Engagement

Raphael Lihana: HIV-1 Subtype Surveillance in Kenya: the Puzzle of Emerging Drug Resistance and Implications on Continuing Care

Gvantsa Chanturia: NGS Capability at NCDC

The night ended with a poster and networking session. The entire agenda is posted here:

Follow us on twitter and #SFAF2014 for the latest updates !





Ask a Bioinformatician

In the last 2 years, next-gen sequencing instrument output has significantly increased; labs are now sequencing more samples with greater depth than ever before. Demand for the analysis of next generation sequencing data is also growing at an arguably even higher rate. To help accommodate this higher demand, now allows researchers to quickly find and connect directly with service providers who have specific data analysis expertise:

 Whether it’s a simple question about gluing piece together in a pipeline or a request to have your transcriptome annotated, researchers can quickly choose a bioinformatics provider based on their expertise and post queries and project requests. Services that bioinformaticians offer on Genohub are broken down into primary, secondary and tertiary data analysis:

Primary – Involves the quality analysis of raw sequence data from a sequencing platform. Primary analysis solutions are typically provided by the platform after the sequencing phase is complete. This often results in a FASTQ file, which is a combination of sequence data and Phred quality scores for each base.

Secondary – Encompasses sequence alignment, assembly and variant calling of aligned reads. Analysis is usually resource intensive, requiring a significant amount of data and compute resources. This type of analysis often requires a set of algorithms that can often be automated into a pipeline. While the simplest pipelines can be a matter of gluing together publically available tools, a certain level of expertise is required to maintain and optimize the analysis flow for a particular project. 

Tertiary Analysis – Annotation, variant call validation, data aggregation and sample or population based statistical analysis are all components of tertiary data analysis. This type of analysis is typically performed to answer a specific biologically relevant question or generate a series of new hypothesis that need testing.

Researchers that still need library prep, sequencing and data analysis services can still search find and begin projects as before using our Shop by Project page. What’s new is that researchers who only need data analysis services can now directly search for and contact a bioinformatics service provider to request a quote:

Whether you plan performing a portion of your sequencing data analysis yourself or intend on taking on the challenge of putting together your own pipeline, consultation by a seasoned expert saves time and ensures you’re the way to successfully completing your project. By adding this new service, we’re trying to make it easier to search for and identify the right provider for your analysis requirements.

If you’re a service provider and would like your services to be listed on Genohub, you can sign up for Service Provider Account or contact us to discuss the screening and approval process.


PEG Precipitation of DNA Libraries – How Ampure or SPRIselect works


One question we’ve been asked, and one that our NGS providers are frequently asked, is how in principle does PEG precipitate DNA in next generation sequencing library preparation cleanup? We usually hear the question presented as: how do Agencourt’s Ampure XP or SPRIselect beads precipitate DNA? The answer has to do with the chemical properties of DNA, polyethylene glycol (PEG), the beads being used and water.  Polystyrene – magnetite beads (Ampure) are coated with a layer of negatively charged carboxyl  groups. DNA’s highly charged phosphate backbone makes it polar, allowing it to readily dissolve in water (also polar). When PEG [ H-(O-CH2-CH2)n-OH ] is added to a DNA solution in saturating condition, DNA forms large random coils. Adding this hydrophilic molecule with the right concentration of salt (Na+) causes DNA to aggregate and precipitate out of solution from lack of solvation (1, 2). Too much salt and you’ll have a lot of salty DNA, too little will result in poor recovery. The Na+ ions shield the negative phosphate backbones causing DNA to stick together and on anything else that’s in near vicinity (including carboxylated beads). Once you’re ready to elute your DNA and put it back into solution (after you’ve done your size selection or removal of enzymes, nucleotides, etc.) an aqueous solution is added back (TE or water) fully hydrating the DNA and moving it from an aggregated state back into solution. The negative charge of the carboxyl beads now repel DNA, allowing the user to extract it in the supernatant. Changing the amount of PEG and salt concentration can aid in size selecting DNA (2). This is a common method in NGS library preparation where the user is interested in size selecting a fragment of particular size. It’s often used to replace gel steps in NGS library prep. There is already a wealth of literature out there on conditions to size select DNA, just do a simple google search. The first article we’ve found that describes this selection is referenced below (3).

Updated 7/18/2016

Since this publication on May 7th, 2014, there are several more commercial, Ampure-like size selection beads on the market:

  • MagJet – ThermoFisher
  • Mag-Bind – Omega Biotek
  • Promega Beads – Promega
  • Kapa Pure Beads – Kapa Biosystems

While we haven’t explored each one of these yet, we suspect the chemistry behind precipitation and selection is very similar. If you’d like to share information about these beads, please leave us a comment or send us an email at

If you’d like help in constructing your NGS library contact us, and we’d be happy to consult with you on your sequencing project:

If you’re looking for a NGS service provider, check out our NGS Service Matching Engine:

(1)     A Transition to a Compact Form of DNA in Polymer Solutions:

(2)    DNA Condensation by Multivalent Cations:

(3)    Size fractionation of double -stranded DNA by precipitation with polyethylene glycol: