Sequencing Genomic Content from a Single Cell

Most high throughput sequencing is performed on DNA derived from a large populations of cells. A consensus sequence is obtained by putting together or aligning many short reads. The most frequent nucleotide is what determines the identify of the base at that particular position.  While this is fine for most applications, measuring the genomic heterogeneity between single cells isn’t possible. Important differences, including variations in chromosomes such as single nucleotide variations (SNVs), copy number variations (CNVs) (responsible for genetic variation that can lead to gene malfunction or disease conditions) and transcriptome variation based on alternative splicing, can all be lost with population averaging.

While sequencing the genomic content of a single cell is important, achieving this has its challenges.  First, you need to isolate a single cell. Typical methods to achieve this include cell sorting, laser capture microdissection and old fashioned dilution. Once you’ve isolated a single cell you need to deal with its contents. You can’t just use a typical phenol based extraction procedure. The membranes of the cell need to be carefully removed with mild lysis buffer to release compartmentalized DNA.  Incomplete recovery (something that’s very common with single cells) from chromosomal breaks or DNA damage can result in the loss of genomic regions and uneven amplification across the genome yielding some regions with little or no representation. Finally, you have to deal with the minute amounts present in a cell.  Each diploid cell contains less than ~20 picograms of DNA. To make libraries from single cells, you can rely on traditional amplification or several newer primer based techniques:

– Whole genome amplification (WGA)

– SMARTer (SMART-Seq)

– Multiple Annealing and Looping-Based Amplification Cycles

There are many forms of WGA, including multiple displacement amplification (MDA), primer extension preamplification (PEP), and degenerate oligonucleotide primed PCR (DOP). PCR based WGA, DOP and PEP, utilizes degenerate or random oligonucleotide primed Taq based PCR respectively. MDA uses isothermal genome amplification by binding of random hexamers to denatured DNA followed by strand displacement at constant temperatures using Phi 29 polymerase. Priming events on each denatured strand lead to a network of DNA structures. While WGA methods have been available since the early 90s and have been thoroughly tested by many researchers, they can be prone to amplification bias and result in low genome coverage. PCR based WGA can introduce sequence dependent bias and error prone amplification because of the use of a low fidelity Taq and overrepresentation of certain regions due to preferential binding of primers to specific genomic regions. MDA which uses a strand displacing Phi 29 polymerase providers certain improvements but still exhibits considerable bias due to non-linear amplification, random priming which amplifies both target and contaminating DNA and genomic rearrangements or chimeras that complicates genomic assembly by linking non-contiguous chromosomal regions.

SMARTer, is an approach for full length cDNA construction from picograms of total RNA, using the template switching activity of moloney murine leukemia virus (MMLV) reverse transcriptase (Chenchik et al.) Briefly, upon reaching the end of an RNA template, the terminal transferase activity of the transcriptase adds 3-5 nucleotides to the end of the 3’ end of the first strand cDNA. A primer binds to this overhang which serves as the template for transcription. Template switching from the RNA molecule to the primer generate a complete cDNA copy. The SMARTer technique offered by Clontech is an option available through Genohub.

Another new low input method developed by Professor Xiaoliang Sunney Xie’s group at Harvard University and reported in the December 21, 2012, issue of Science  utilizes multiple annealing and loop based amplification cycles (MALBAC). Amplification begins with a pool of random primers each containing a common 27 mer oligo and 8 random nucleotides that evenly anneal to their template. Increasing the temperature to 65ºC generates variable length amplicons which are then amplified to full length amplicons with complementary ends. The temperature of the reaction is lowered to 58ºC to allow looping of the full length amplicons and prevents further amplification or cross hybridization. PCR is performed using the 27 mer oligo as a template, generating micrograms of DNA from as little as picograms.

Techniques to elucidate the genomic contents of a single cell are just beginning to be developed. Their use spans the need to track single circulating tumor cells to mapping chromosomal segregation to understanding the human microbiota. While the importance for these techniques exist, new protocols will have to be developed and tested to ensure accurate and complete representation of the single cell genome.

We’re interested to hear about your single cell protocols. Send them to us at protocols@genohub.com.

References:

– Chenchik, A., Zhu, Y., Diatchenko, L., Li., R., Hill, J. & Siebert, P. (1998) Generation and use of high-quality cDNA from small amounts of total RNA by SMART PCR. In RT-PCR Methods for Gene Cloning and Analysis. Eds. Siebert, P. & Larrick, J. (BioTechniques Books, MA), pp. 305–319.

– Zong C, Lu S, Chapman AR, Xie XS. Genome-Wide Detection of Single-Nucleotide and Copy-Number Variations of a Single Human Cells. Science, 338(6114):1622-6. 2013.

Accurate de novo assembly from 1 long insert library

Long repeats, extreme GC or AT sequence and palindromic regions are several of the reasons why there are gaps in draft genome assemblies. Using a non-hybrid pre-assembly method that corrects errors in long reads, Chin and colleagues (1) demonstrated in an article published in the June issue of Nature Methods, accurate assembly of microbial species using data from just one long read SMRT shotgun library.

Typically, in hybrid strategies, short read sequences from one instrument (HiSeq or SOLiD) are used to correct errors in long sequencing reads (PacBio, Roche 454). This method relies on long reads mapping to uncovered regions (owing to AT/GC content or other reasons described above) sequenced using short read technology. It also relies on the construction of at least two different libraries and several sequencing runs on different platforms. Chin and colleagues new method, “Hierarchical genome assembly process” (HGAP) consists of:

  • Choosing the longest sequence read of a seeding data set
  • Recruiting shorter reads and preassembly using a consensus method
  • Assembly of the pre-assembled
  • Refinement using initial read data to generate a final consensus

Details of each step are described in the Online Methods of the paper.

Briefly, they examined length and accuracy of seed reads by aligning each read longer than 6kb to a reference sequence: E. coli K-12 MG1655. Seed reads were converted into pre-assembled reads with a mean length of 5,777 bp. These were subject to the Celera Assembler which yielded one 4.6 million bp contig. The same technique was also applied to Meiothermus ruber DSM1279 and a bacterial artificial chromosome. Using the HGAP approach the group was able to generate de novo assemblies from large insert template libraries of 80-100x coverage that were comparable to Sanger sequencing or hybrid approaches. While we still expect the hybrid approach to be continued for use in finishing high quality genome assemblies, HGAP combined with SMRT sequencing sounds like a promising method for unfinished and potentially eukaryotic genomes.

Reference:

  • Chen-Shan Chin, David H Alexander, Patrick Marks, Aaron A Klammer, James Drake, Cheryl Heiner, Alicia Clum, Alex Copeland,John Huddleston, Evan E Eichler, Stephen W Turner & Jonas Korlach. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods, 10, 563-569 (2013).