How Many Replicates are Sufficient for Differential Gene Expression?

In a nicely done, convincing 2 condition, 48 replicate RNA-Seq experiment, researchers from the University of Dundee aimed to answer a frequently asked question in the field and on, ‘How many replicates are necessary for differential gene expression (DGE)?’. In their study they examined three statistical models to see which best represented read-count distribution of genes from commonly used DGE tools.

Using the statistical power of 48 replicates they determined that inter-lane variability does not play a large role in DGE results. Assuming even loading and amplification their results showed a Poisson distribution of counts from individual genes. The authors also determined that read count distribution was consistent with a negative binomial model, an assumption in widely used tools such as edgR, DESeq, cuffdiff and baySeq. Performing a goodness-of-fit test for log-normal, negative binomial and normal distributions the authors demonstrated that inclusion of ‘bad replicates’ made results inconsistent with the statistical models they tested, complicating the interpretation of differential expression results. A bad replicate was defined as 1) one that poorly correlated with other replicates, 2) a replicate with atypical read counts, 3) one having non-uniform read depth profiles. 

So how many replicates are sufficient for differential gene expression? The authors sequenced 96 mRNA samples in 7 1×50 HiSesq lanes. The cost for this on Genohub today is $28k USD. When the authors removed 6-8 bad replicates from their pool of 48 samples, their data became consistent with a negative binomial distribution. Assuming experimental variability similar to the authors, this indicates at least 6 replicates in a DGE experiment is good practice. The cost for the preparation of 6 RNA-seq libraries and sequencing is $2,500 USD.  

Doing a literature search and observing client behavior on Genohub, we estimate ~80% of those studying DGE use 3 replicates in their experiments, which with dropouts and variation is unsatisfactory. A final point we’d like to make is the authors used ~11M, 1×50 reads/sample, which goes to show that with DGE, replicates can be more important than read depth. This is further discussed in our Coverage and Read Depth Guide

At Genohub, we help consult and design sequencing experiments with users. Determining the replicates needed for a study is a common question that needs to be answered. Unfortunately, too few studies examine these fundamental elements in sequencing design. We hope this article gets the recognition it deserves. 


Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment Marek Gierliński, Christian Cole, Pietà Schofield, Nicholas J. Schurch, Alexander Sherstnev, Vijender Singh, Nicola Wrobel, Karim Gharbi, Gordon Simpson, Tom Owen-Hughes, Mark Blaxter, Geoffrey J. Barton


2 thoughts on “How Many Replicates are Sufficient for Differential Gene Expression?

  1. The sequencing has never been the big problem and nowadays neither the cost. However, the price behind the sample is not even taken into account here. In the cited paper, they compared a mutant yeast against its wild type. The cost of culture media, the yeast strain or any other material used, can’t be compared to experiments using other organisms.

    Suppose that you or your collaborator are working with mice. What would you say if the advice is to perform 6 replicates for 2 conditions? Despite the low cost of sequencing and library prep, the experimental design including 12 mice and maintaining them will be considerable. Now, add up more conditions…

    I think the paper is interesting and I understand the need of guidelines for DGE analysis. Probably we can extend the question to “How many biological replicates I need for this model/organism”. Also, “For my organism, what can I consider a biological replicate?”. What about human samples? cancer samples?



  2. Pingback: How Many Replicates are Sufficient for Differential Gene Expression? | Genohub High Throughput Sequencing Blog | GenomicsNX - NGS Knowledge-Based

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s