How Much Sequencing is Needed For ChIP-Seq ?

ChIP-Seq Peaks

Adapted from: ChIP–seq: advantages and challenges of a maturing technology
Peter J. Park
Nature Reviews Genetics 10, 669-680 (October 2009)

One of the most common sequencing applications searched for and ordered on Genohub is ChIP-Seq. A frequent question we’re asked is how much sequencing do I need for my ChIP-Seq experiment?

ChIP-Seq is the most widely used technique for measuring protein – DNA interactions on a genome wide scale. Before starting a ChIP-Seq experiment it’s important to have some information about specificity. Assume that you’ve tested your protein’s (nucleosomes, histones, chaperone) specificity and it enriches 10x over background.  If you’re fragmenting a human genome and there are ~3,000 places in the genome that your protein binds, you’ll need approximately 1 sample/fragment:

Your background (human) is : 3 Gb / 300 bp, or 1×107 fragments. 

Signal enrichment is 10x x 3000 locations = 3×104

So you need 1×107 + 3×104 ~= 1×107 sample hits for a 10x signal, with 10 fold enrichment. 

Here are some services on Genohub that would meet these ChIP-Seq metrics:

ChIP Signal Strength

The relationship between ChIP signal strength and regulatory activity is an area of active research. Some very active transcriptional enhancers often display moderate ChIP signal.  As a result it can be difficult to set a threshold for Chip Signal strength that will be inclusive of all functional sites. A rough guide is ~ 20M unique mapped reads / mammalian sample.

ChIP-Seq Control

Designing a good control is essential for every ChIP-Seq experiment. A separate control should be run for every sample, cell type, condition or treatment. For a useful control, perform ChIP with an antibody that reacts with an unrelated antigen. Make sure you’re able to make a library that’s as complex as your experimental samples. We typically recommend that users dedicate at least the same if not more reads to their control versus actual samples.

Making highly ‘complex’ libraries is important for ChIP-Seq. We’ve outlined several library prep kit options here:

Finally, if you’re new to ChIP-Seq and need more project advice, contact us for complimentary project consultation

Sequencing Suggests the Ebola Virus Genome is Changing

Genome of the Ebola Virus is Changing Rapidly

Using high throughput sequencing, researchers from MIT, Harvard and the Sierra Leone Ministry of Health and Sanitation have recently reported rapid changes in the Ebola’s genetic code. The Ebola virus genome, a single stranded RNA comprised of ~19,000 nucleotides encodes several structural proteins: RNA Polymerase,  nucleoprotein, polymerase co-factors and transcription activators. The researchers used Illumina HiSeq 2500 platforms to achieve 2000x coverage of the Ebola genome. Using Genohub, we estimate the cost to sequence 100 such genomes at 2000x to be under $1,500:

Sequencing 99 Ebola genomes from 78 patients, they found greater than 300 genetic changes that make the genomes sequenced from the current outbreak distinct from previous outbreaks. In fact, they found that the substitution rate was twice as high with this year’s outbreak compared to all other Ebola virus outbreaks. They also determined that mutations during this year’s outbreak were frequently nonsynonymous (mutation that alters the amino acid sequence of a protein). 50 mutational events and 29 new viral lineages were observed in this outbreak alone, suggesting potential for viral adaptation. To determine whether Ebola could be evolving away from defenses against it or whether it could become more contagious and spread faster, will require functional analysis. For their part, Gire et al., have published the full length Ebola genomes in the NCBI database.  Tragically, the authors note that 5 co-authors died from the disease before the manuscript could be published. Last week The New Yorker, published The Ebola Wars, an excellent in depth story of the work involved to actually sequence the Ebola genome and track its mutations.

While basic PCR tests are sufficient for giving you a yes/no answer about infection, this new study highlights the important role of sequencing in characterizing patterns of viral transmission and mutations in an epidemic. We expect sequencing to play a greater role in development of diagnostics and treatments for this and other viral outbreaks.