Sequencing Small RNA and MicroRNA

Small RNAs are polymeric RNA molecules shorter than 200 nucleotides in length and are generally non-coding. However, they still have very important roles to play in biological systems and different types of diseases. Under the umbrella of small RNAs are microRNAs (miRNAs), which are about ~22 nucleotides in length and well conserved across species.

MicroRNAs are studied by many researchers because although they are non-coding RNA molecules, they are involved in post-transcriptional regulation of gene expression through both degradation and translational repression [1]. In cancer research specifically, miRNAs are studied, because they have been shown to regulate cancer-specific gene expression, and they are also present and stable in various biofluids [2]. This means that determining specific miRNA profiles may be a non-invasive and useful method for specific cancer diagnosis and prognosis.

Why choose sequencing for small RNA?

Although reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) is used as the gold standard for targeted miRNA analysis, if researchers want to study more than a few miRNAs, small RNA sequencing is the best alternative due to its non-targeted nature. This characteristic allows for simultaneous detection of novel miRNAs as well as other types of small RNA (i.e. small interfering RNA).

The disadvantage of small RNA sequencing is the bias that can be introduced in the extension and PCR steps. The former is where the very short cDNA (reversed transcribed from the original small RNA) is extended by ligation or polyadenylation, and the latter is where these cDNA molecules are amplified. Bias can occur during the extension step because the two adaptors used may have different affinities to target molecules. Bias can occur during PCR because the amplification efficiency is different for molecules of different lengths. Both may confound the results when trying to determine true levels of small RNAs.

There has recently been substantial effort to minimize these disadvantages, resulting in several different types of library prep kits released to market that have managed to reduce extension and PCR bias [1]. These kits reduce bias through different methods, and depending on your sample type and project goals, a specific kit may have an advantage over the others. Our small RNA-Seq partnering providers can help recommend the correct kit for you based on their experience and your own project goals.

What are the library prep and sequencing options?

There are multiple types of library preparation for small RNA sequencing, including two-adaptor ligation, randomized adaptor ligation, single adaptor ligation and circularization, unique molecular identifiers (UMIs), and polyadenylation and template switching [1]. Excluding the kits that use the original two-adaptor ligation method, each kit has its own method to reduce extension and PCR bias. Some commercial kits are also more targeted towards sequencing miRNAs instead of all small RNAs. Choosing the best kit will depend on your research goals.

For sequencing, the most common and cost-effective option is Illumina short-read sequencing. The vast majority of commercial library prep kits are compatible with Illumina platforms.

How can Genohub help?

Genohub’s small RNA sequencing partners are experts in every step of the small RNA and miRNA sequencing process, including extraction, library preparation, sequencing, and data analysis.

Our partnering providers can recommend the appropriate library prep kit and sequencing depth based on your specific project goals. They also have experience extracting from many different types of biological samples, such as human plasma, but they can work just as well with total or exosomal RNA samples that you extract yourself.

We know that each research project is unique, so we have partners who are also open to working with your custom samples and analysis needs! Get started today by letting us know about your small RNA sequencing project here: https://genohub.com/ngs/ .

References

  1. Benesova S, Kubista M, Valihrach L. Small RNA-Sequencing: Approaches and Considerations for miRNA Analysis. Diagnostics (Basel). 2021 May 27;11(6):964. doi: 10.3390/diagnostics11060964. PMID: 34071824; PMCID: PMC8229417.
  2. Gisela Storz, An Expanding Universe of Noncoding RNAs. Science 296, 1260-1263 (2002). DOI: 10.1126/science.1072249

Fungal Sequencing – ITS vs. 18S

Studying the Fungi kingdom is important, because they have so many different ecological roles, including decomposers, symbiotes and parasites. There are also more than 1 million different species of fungi, so researchers need to have high-throughput methods to explore this diversity [1]. One such method is next-generation sequencing.

In this blog, we’ll go over why and how researchers sequence for fungi, what the ITS and 18S genes are, how to choose between them and how Genohub can help with your fungal sequencing project.

Why perform sequencing for fungal community analysis?

Fungal sequencing can be used to discover novel fungal species, quantify known fungi, explore the structure of fungal communities, and determine the roles of fungi in nature. In addition, it’s important to study these communities for human health, as there are some fungi that are resistant to antifungal drugs and others that are involved in plant diseases [2]. Thus, sequencing for fungi is relevant for multiple fields, including environmental conservation, agriculture, and microbiology.

Both ITS and 18S sequencing are well-established methods for studying fungal communities, as focusing on these genes is a simple way to identify fungi within complex microbiomes or environments that would otherwise be difficult to study [3]. For example, this type of specific amplicon sequencing enables the analysis of the fungal community within very mixed environmental samples, such as soil or water.

What are ITS and 18S?

The internal transcribed spacer (ITS) region and the 18S ribosomal RNA gene are used as biomarkers to classify fungi.

Figure 1. Picture of the ITS region as spacers between the ribosomal subunit sequences.

As seen in Figure 1, the ITS region includes ITS1 and ITS2, the spacer genes located between the small-subunit rRNA and large-subunit rRNA. Generally, the ITS1/ITS4 primers are used for amplification of the ITS region, although they can be substituted with the universal primers ITS2, ITS3, and ITS5 [4].

The 18S ribosomal RNA (18S rRNA) gene codes for a component of the small 40S eukaryotic ribosomal subunit and has both conserved and variable regions. The conserved regions can reveal the family relationship among species, whereas the variable regions will show the disparities in their sequences. Regarding the variable regions, 18S rRNA gene has a total of nine, V1-V9. The regions V2, V4 and V9 together are useful for identifying samples at both the family and order levels, while V9 seems to have a higher resolution at the genus level [5].

How to choose between ITS and 18S?

Although both ITS and 18S rRNA have proven useful for assessing fungal diversity in environmental samples, there are enough differences between them that researchers may choose to focus on only one, although sequencing for both is an option as well.

There was relatively low evolutionary pressure for the ITS1 and ITS2 sequences to remain conserved, so the ITS region tends to be hypervariable between fungal species while remaining moderately unchanged among individuals from the same species. It is therefore very well suited as a marker for species identification in the classification of fungus and is often used to study relative abundance of fungi as well [2]. This can be useful if you need to perform a survey for genetic diversity at the species level or even within a species.

On the other hand, there was significant evolutionary pressure for the 18S rRNA gene to remain highly conserved as a component of the small eukaryotic 40S ribosomal subunit, an essential part of all eukaryotic cells. Due to this pressure, 18S is considered a potential biomarker for fungi classification above the species level and is often used in wide phylogenetic analyses and environmental biodiversity screenings [5].

In summary, the ITS region is mainly used for fungal diversity studies, while 18S rRNA is mainly used for high resolution taxonomic studies of fungi.

How can Genohub help?

Genohub’s ITS and 18S sequencing partners are experts in every step of the amplicon sequencing process, including extraction, PCR amplification and library preparation using validated primers based on the literature, and data analysis, including taxonomic assignment, diversity and richness analysis, comparative analysis, and evolutionary analysis. Our partners have experience extracting from many different types of environmental and biological samples, including soil, water, sludge, feces, and plant and animal tissue, but they can work just as well with DNA samples that you extract yourself.

We know that each research project is unique, so we have partners who are also open to working with your custom primers or your custom analysis needs! Get started today by letting us know about your ITS or 18S sequencing project here: https://genohub.com/ngs/ .

PacBio Revio: High-Quality Long Reads at an Affordable Price

Long-read whole genome sequencing, available for years, has driven numerous advancements, including fewer gaps in genome assemblies, more accurate assessment of gene duplications, clearer determination of gene network interactions across chromosomes, and the correction of numerous errors in previous genome assemblies based only on short-read sequencing [1].

PacBio’s high fidelity (HiFi) platforms hold particular value due to their generation of long reads (around 10 kb) with very high accuracy. However, the significant cost compared to other long-read sequencing platforms limited accessibility for many researchers. This barrier has been overcome by the PacBio Revio, the latest HiFi platform that delivers significantly more data at a lower, more affordable price point.

What is the PacBio Revio and how does it compare?

The Revio is the latest platform from PacBio Biosciences and represents a significant upgrade from the previous Sequel II. Comparatively, the Revio offers a 15-fold increase in HiFi read output, with 90% of bases ≥Q30 and a median read accuracy ≥Q30. These HiFi reads typically span 15 to 18 kb. Additionally, the Revio requires 50% fewer consumables and boasts a simpler and more cost-effective workflow compared to the Sequel II [2]. 

Although the Revio is more expensive at $779,000 [3], while the Sequel II is $495,000 [4], the price savings come from the lower reagent costs and  higher Revio output. For example, one of our top PacBio partner facilities charges ~$2,045.00 for Revio library prep and sequencing for an expected output of 90 Gb, roughly equivalent to sequencing a human genome at 30X coverage. This same facility charges ~$2,895.50 for Sequel II library prep and sequencing for an expected output of 30 Gb. In order to get the same output as 1 Revio SMRT cell, 3 Sequel II SMRT cells would be needed, so the actual cost for 90 Gb would be closer to $8,686.50. 

However, there are potential downsides to consider when choosing the Revio over the Sequel II. The Revio demands a higher amount of high-quality, high molecular weight (HMW) DNA, requiring at least 3 μg for whole genome sequencing, while the Sequel II excels in low-input studies, even generating data with as little as 5 ng of DNA using its ultra low-input workflow [5]. Furthermore, the Revio requires library molecules longer than 3 kb. Therefore shorter amplicons, Iso-Seq (RNA) libraries, and 10X Genomics single-cell RNA-Seq libraries necessitate a specific MAS-seq protocol with dedicated kits that concatenate these shorter molecules for compatibility with the Revio [6].

Overall, when compared to other PacBio long-read instruments, the Revio represents a clear advancement in nearly every aspect, provided sufficient high-quality input material is available. This upgrade translates to substantial cost savings and accessibility for researchers who previously lacked the resources for HiFi long-read sequencing.

When should the PacBio Revio be considered?

PacBio Revio sequencing should be considered if you want to:

  • Detect complex structural variants, such as large inversions, deletions, or translocations
  • Accurately map highly repetitive genome regions
  • Polish short-read de novo assembly genomes
  • Perform full-length amplicon sequencing, like 16S
  • Directly detect methylation [7]

How can Genohub help?

At Genohub, we know that each research next-generation sequencing project is unique, so we will take the time to understand your specific project goals and help define precise project specifications. There is no charge for you for this initial consultation or any of our services. Once we have a set of well-defined project specifications we can get you quick and accurate quotes from our NGS partners around the globe, compare all the different quotes for you if needed, and then connect you directly with our partner with the best quote, all using our easy-to-use online platform  We review all quotes to ensure that they meet your project needs and that they specify measurable quality guarantees. If and when you decide to move forward with the project, we will actively supervise and manage it to make sure that all quality and turnaround guarantees are met.

Genohub’s PacBio sequencing partners are experts in every step of the PacBio Revio long-read sequencing process, including high-quality high-molecular weight (HMW) DNA extraction, library preparation, sequencing, and data analysis.

Our partnering providers can recommend the appropriate PacBio platform and sequencing depth based on your specific project goals and sample amount. They also have experience extracting from many different types of biological samples, and have optimized protocols to obtain the highest quality HMW DNA.

We know that each research project is unique, so we have partners who are also open to working with your custom samples and analysis needs! Get started today by letting us know about your long-reads sequencing project here: https://genohub.com/ngs/ .

References

  1. Marx, V. Method of the year: long-read sequencing. Nat Methods 20, 6–11 (2023). https://doi.org/10.1038/s41592-022-01730-w
  2. PacBio. (2024). REVIO SYSTEM: Reveal more with accurate long-read sequencing at scale. https://www.pacb.com/revio/
  3. Pacific Biosciences of California, Inc. (2023, January 9). PacBio Announces Record Order, Including Orders for 76 Revio Systems Received in the Fourth Quarter of 2022. https://www.pacb.com/press_releases/pacbio-announces-record-orders-including-orders-for-76-revio-systems-received-in-the-fourth-quarter-of-2022/#:~:text=The%20Revio%20System%20has%20a,for%20the%20300%2Dcycle%20kit
  4. Han, AP. (2020, October 8). Pacific Biosciences’ New Sequel IIe System Puts Focus on High Read Accuracy. https://www.genomeweb.com/sequencing/pacific-biosciences-new-sequel-iie-system-puts-focus-high-read-accuracy
  5. PacBio. (2022). Considerations for using the low and ultra-low DNA input workflows for whole genome sequencing. https://www.pacb.com/wp-content/uploads/Application-Note-Considerations-for-Using-the-Low-and-Ultra-Low-DNA-Input-Workflows-for-Whole-Genome-Sequencing.pdf
  6. Pacific Biosciences of California, Inc. (2023, February 7). PacBio to Expand MAS-Seq Technology to 16S rRNA and Bulk RNA-Seq Solutions. https://www.pacb.com/press_releases/pacbio-to-expand-mas-seq-technology-to-16s-rrna-and-bulk-rna-seq-solutions/
  7. Illumina, Inc. (2024). Deeper insights into the complex regions of the genome: Long-read sequencing helps resolve challenging regions of the genome. https://www.illumina.com/science/technology/next-generation-sequencing/long-read-sequencing.html

Small RNA Sequencing: Understanding and Preventing Biases

Small RNAs are RNA molecules less than 200 nucleotides in size that are important in regulating numerous biological processes. Small RNA sequencing (small RNA-Seq) allows for genome-wide profiling to discover new variants of small and microRNAs and analyze the levels of known small RNAs. Due to the high sensitivity of small RNA-Seq, it can be used to analyze low-input, liquid biopsy samples for disease diagnosis and prognosis. However, certain steps in the small RNA-Seq process, particularly during library preparation, are susceptible to bias. To accurately capture true small RNA profiles, these biases should be minimized as much as possible.

Sources of Bias

High-quality data in small RNA sequencing is dependent on multiple factors, including the optimization of RNA extraction. However, since researchers can often perform extraction in their own laboratories, this discussion will focus on biases introduced during library preparation. 

After successful RNA extraction, library preparation begins with the reverse transcription of small RNA into complementary DNA (cDNA) fragments. During this process, two-adapter ligation is done, where DNA oligo adapters are attached to both the 3’ and 5’ ends of small RNA molecules. This process elongates the small RNA and provides annealing sites for the reverse transcription (RT) primer and subsequent PCR and sequencing primers [1]. 

Significant bias can be introduced during the PCR amplification step, where molecules of different lengths may be amplified with varying efficiencies, leading to skewed representation in analysis. However, the ligation step is the primary source of bias, as different affinities of adapters to target molecules can artificially influence results. This can lead to inaccurate readings of small RNA levels, and adapter dimers may form, consuming sequencing space in place of more relevant fragments [1].

Strategies to Mitigate Biases

Mitigating PCR bias can be achieved by incorporating unique molecular identifiers (UMIs). These are indices added to cDNA fragments before PCR amplification, facilitating the bioinformatic identification of PCR duplicates. The QIAseq miRNA Library Kit is an example of a library preparation kit utilizing the two-adapter method with 12bp UMIs. While effective in correcting PCR bias, this kit still uses two-adapter ligation and is susceptible to ligation bias [1].

The most common way to address ligation bias involve either improving the original two-adapter ligation method or opting for a ligation-free approach:

Improved versions of the original two-adaptor ligation

Ligation of the two adaptors with randomized nucleotides: The ligases that attach the adapters to small RNAs have affinities for certain sequences, so they would combine adapters to specific RNA sequences more readily, leading to some small RNA being overrepresented and others underrepresented. Researchers noticed that adding random oligonucleotides to the adapters reduced this bias significantly, ensuring a more equitable representation of small RNA sequences. The NEXTflex Small RNA-Seq Kit v3 is an example of a library prep kit using adapter randomization [2].

Ligation of one adaptor followed by circularization: This method involves ligation at the 3’ end, followed by circularization of the small RNA molecule once the 3’ adapter is attached. The adapter for the 3’ end is blocked by a phosphate group, which prevents self-circularization. Then once the 3’ adapter is attached, the phosphate group is removed, the whole small RNA-adapter molecule is circularized and reverse transcription is performed to obtain cDNA. This approach effectively mitigates 5’ end bias but retains some 3’ end ligation bias [1]. The RealSeq-Biofluids kit employs this single adapter and circularization method [3].

Ligation-free approaches 

Polyadenylation extension with template switching: This approach substitutes 3’ end adapter ligation with polyadenylation, followed by a specific type of reverse transcription to cDNA, where non-template cytidines are added at the 5’ end. Then a specific oligo is annealed to this stretch of nucleotides and serves as a template for the necessary adapters to be added at the 3’ and 5’ ends. Having no ligation step is a significant advantage, as it helps ensure more accurate small RNA quantification. However, a drawback includes a lack of control over the number of adenosines during polyadenylation, making it challenging to distinguish the original from additional adenosines [1]. Takara’s SMARTer smRNA-Seq Kit is an example of a library prep kit using this polyadenylation and template switching approach [4]. 

Probe-based techniques: These hybrid techniques allow direct manipulation on small or miRNAs within total RNA samples without the need for PCR amplification or reverse transcription to cDNA. In general, oligonucleotide tags designed to target specific small RNAs are allowed to attach to the small RNAs of interest, unattached tags are removed, and then the attached tags are quantified using a particular quantification system. Here the quantification step does not necessarily involve sequencing, especially if the tags (and the attached small RNAs) can be quantified based on their characteristics. For example, NanoString’s nCounter miRNA assay uses color-coded barcodes as tags [5], and the FirePlex miRNA assay uses barcodes with different fluorescent intensities. These are very strong protocols without adapter ligation or PCR bias. However, these assays are limited to known miRNAs and may not be suitable for discovering novel or rare small RNAs [1]. 

How Can Genohub Help?

At Genohub, we know that each research next-generation sequencing project is unique, so we will take the time to understand your specific project goals and help define precise project  specifications for you if you’re uncertain what specifications are relevant. There is no charge for you for this initial consultation or any of our services. Once we have a set of well-defined project specifications we can get you quick and accurate quotes from our NGS partners around the globe, compare all the different quotes for you if needed, and then connect you directly with our partner with the best quote, all using our easy-to-use online platform  We review all quotes to ensure that they meet your project needs and that they specify measurable quality guarantees. If and when you decide to move forward with the project, we will actively supervise and manage it to make sure that all quality and turnaround guarantees are met.

Our small RNA sequencing partners are experts in every step of the NGS process, including extraction, library preparation, and data analysis. They are also well aware of the biases that can occur during small RNA sequencing and can recommend the best library kit for your particular samples and project goals.

Our partners also have experience extracting from many different types of samples, but they can work just as well with total RNA or small RNA samples that you have extracted yourself.

Get started today by letting us know about your small RNA sequencing project here: https://genohub.com/ngs/ .

References

  1. Benesova S, Kubista M, Valihrach L. Small RNA-Sequencing: Approaches and Considerations for miRNA Analysis. Diagnostics (Basel). 2021 May 27;11(6):964. doi: 10.3390/diagnostics11060964. PMID: 34071824; PMCID: PMC8229417.
  2. Bioo Scientific Corporation. (2016). NEXTflex Small RNA-Seq Kit v3 https://perkinelmer-appliedgenomics.com/wp-content/uploads/marketing/NEXTFLEX/miRNA/5132-05-NEXTflex-Small-RNA-Seq-v3-18-07.pdf
  3. Real Seq Biosciences. (2023) RealSeq-Biofluids. Realseqbiosciences. https://www.realseqbiosciences.com/library-prep-products/realseq-biofluids
  4. Takara Bio Inc. (2024). A SMARTer approach to small RNA sequencing. Takarabio. https://www.takarabio.com/learning-centers/next-generation-sequencing/technical-notes/epigenetic-sequencing/full-length-small-rna-libraries
  5. NanoString Technologies, Inc. (2022). nCounter miRNA Expression Assay Kit. https://nanostring.com/wp-content/uploads/2023/03/PB_MK3354_miRNA_r18.pdf

3’Tag RNA-Seq: A Cost-Effective Alternative to Standard RNA-Seq

Although standard RNA-Seq has helped researchers obtain useful data from full transcripts, in the vast majority of cases scientists only need to perform differential gene expression analysis (DGE), which does not require information from the entire transcript. The 3’Tag RNA-Seq protocol was developed to lower costs and to help researchers focus on DGE data. This method generates DGE data with only one library molecule per transcript, which is complementary to the 3′-end sequence. Due to its selectivity, much less sequencing depth is required for 3’Tag RNA-Seq compared to standard RNA-Seq [1].

3’Tag RNA-Seq is commonly used for the analysis of gene expression in a variety of applications, including transcriptomics, epigenomics, and gene regulation studies.

3’Tag RNA-Seq vs. Standard RNA-Seq

3’Tag RNA-Seq is a protocol developed to obtain gene expression profiling data with a high signal-to-noise ratio at a low cost, and it differs from the classic RNA-Seq (polyA-selection) technique in a few key ways, even though both are based on targeting the eukaryotic messenger RNA (mRNA) molecules that have poly-A tails at their 3′ end.

In the standard RNA-Seq method, the extracted mRNAs are sheared into fragments and then reverse transcribed into complementary DNA (cDNA). Then this cDNA is sequenced. This fragmentation can sometimes introduce bias into the results, because the number of reads corresponding to each transcript is actually proportional to the number of cDNA fragments rather than the real number of transcripts. Longer transcripts can be sheared into more fragments, which means there may be more reads that correspond to them than the shorter transcripts. The results may falsely show that the genes with longer transcripts were expressed at higher levels than the genes with the shorter transcripts [2].

In 3’Tag RNA-Seq, the extracted mRNAs are actually not sheared into fragments at the beginning of the protocol. Instead, the cDNAs are only reverse transcribed from the 3′ end and only one copy of cDNA is generated for each transcript. Thus, when these cDNAs are sequenced, the number of reads directly reflects the number of transcripts of a gene, and there should be no bias for longer transcripts [2].

When is 3’Tag RNA-Seq recommended?

3’Tag RNA-Seq is recommended when you have eukaryotic samples and are only interested in DGE analysis for mRNA with polyA-tails. With this protocol, you will receive a very high-signal, low-noise gene expression profile at a much lower cost than standard RNA-Seq, and your samples don’t even have to be as high quality as many standard polyA-selection kits require. However, if you need any transcript-splicing information, have prokaryotic samples, or are interested in any types of RNA outside of mRNA with polyA-tails, then 3’Tag RNA-Seq is not recommended [3].

Here are some examples of articles where researchers used 3’Tag RNA-Seq:

  • In this study [4], 3’Tag RNA-Seq was used to determine if the researchers’ treatments enhanced Natural Killer (NK) cell activation and expansion in dogs with cancer. The NK cells they targeted are cytotoxic immune cells capable of recognizing heterogenous cancer targets, which makes them very promising targets for use in cellular immunotherapy.
  • The researchers here [5] performed 3’Tag RNA-Seq to identify the genes that were differentially expressed between healthy and injured rat tissue in order to determine which genes changed in response to injury.
  • In this paper [6], 3’Tag RNA-Seq was used to determine the differential gene expression between bovine ovarian cortex cells treated with human-recombinant FSH or vehicle (control). The researchers wanted to study the effects of this folliculogenesis-promoting factor in preantral follicles, which serve as a reservoir of female gametes that can be used in assisted reproduction in humans and other animals.

How can Genohub help you?

Genohub and our 3’Tag RNA-Seq partners are knowledgeable in every step of the process, including extraction, library preparation, sequencing and data analysis. Regardless of your next-generation sequencing (NGS) experience, we will help define your project, present you with the best quote from our network, connect you directly to our NGS partner, manage the project while it’s progressing, see to it that the results are delivered to you in a secure data bucket, and then make sure all your needs were met before considering the project complete. All these steps will occur on the Genohub platform, and we will be here to support you every step of the way. 

Specifically, our 3’Tag RNA-Seq partners have experience extracting from many different types of tissues and cell types, but they can work just as well with your RNA samples or libraries that you have prepared yourself. Our partners can also sequence at their most cost-effective configuration and then trim the reads down to what you need for your specific 3’Tag RNA-Seq analysis. 

We know that each research project is unique, so we are open to working with your custom analysis needs! Get started today by letting us know about your 3’Tag RNA sequencing project here: https://genohub.com/ngs/ .

References

  1. UC Davis DNA Technologies Core. “When do you recommend 3′-Tag RNA-seq?” DNA Tech Genome Center UC Davis, 24 August 2023, https://dnatech.genomecenter.ucdavis.edu/faqs/when-do-you-recommend-3-tag-rna-seq/.
  2. Ma, F., Fuqua, B.K., Hasin, Y. et al. A comparison between whole transcript and 3’ RNA sequencing methods using Kapa and Lexogen library preparation methods. BMC Genomics 20, 9 (2019). https://doi.org/10.1186/s12864-018-5393-3
  3. UC Davis DNA Technologies Core. “Gene Expression Profiling with 3′ Tag-Seq.” DNA Tech Genome Center UC Davis, 24 August 2023, https://dnatech.genomecenter.ucdavis.edu/tag-seq-gene-expression-profiling/.
  4. Razmara A, Farley L, Harris R, et al 272 Pre-clinical evaluation and first-in-dog clinical trials of intravenous infusion of PBMC-expanded adoptive NK cell therapy in dogs with cancer Journal for ImmunoTherapy of Cancer 2022;10:doi: 10.1136/jitc-2022-SITC2022.0272
  5. Danielle Steffen, Michael J. Mienaltowski, Keith Baar, Scleraxis and collagen I expression increase following pilot isometric loading experiments in a rodent model of patellar tendinopathy, Matrix Biology, Volume 109, 2022, Pages 34-48, ISSN 0945-053X, https://doi.org/10.1016/j.matbio.2022.03.006.
  6. Candelaria J., Rabaglino B., Denicol A. (2020) 125 Transcriptomic changes in bovine ovarian cortex in response to FSH signaling. Reproduction, Fertility and Development 32, 189-189. https://doi.org/10.1071/RDv32n2Ab125

Amplicon Sequencing – Short vs. Long Reads

Amplicon sequencing is a type of targeted sequencing that can be used for various purposes. Some common types of amplicon sequencing are 16S and ITS sequencing, which are used in phylogeny and taxonomy studies for the identification of bacteria and fungi, respectively. When there is a need to explore the genome more generally, amplicon sequencing can be used to discover rare somatic mutations, detect and characterize variants, and identify germline single nucleotide polymorphisms (SNPs), insertions/deletions (INDELs), and known fusions [1, 2]. Targeted gene sequencing panel projects are another example of amplicon sequencing, where these panels include genes that are often associated with a certain disease or phenotype-of-interest [3].

In this article, we will go over what amplicon sequencing is, describe the advantages and disadvantages of short- and long-read sequencing, and then explain how Genohub can help support your project.

Amplicon Sequencing

Amplicon sequencing is targeted sequencing that involves specific primer design in order to achieve high on-target rates. It’s called amplicon sequencing, because a crucial step of the process is polymerase chain reaction (PCR), which is a method that amplifies specific DNA sequences based on the primers used. Primers are small DNA oligos that are specifically designed to target only the genes/regions-of-interest. When the amplification part of PCR occurs, only these specific genes are multiplied. The final products of PCR are called amplicons, hence amplicon sequencing [1].

It’s important to think about what type of sequencing (short vs. long read) needs to be done for your specific project, because in order to sequence amplicon samples, the appropriate adapters need to be added to help them adhere to sequencing flow cells [2]. These adapters will differ depending on the flow cell, and in some cases, it may even be more cost-effective to send DNA samples and have one of our NGS partners perform all the library prep themselves.

Short read sequencing (Illumina)

Short-read amplicon sequencing is done with Illumina platforms, often the MiSeq, and has been the standard for 16S, ITS and other microbial profiling projects for many years. Being the standard for so long has advantages, as there are many targeted gene panels created and validated already for use with Illumina sequencing, which can make the workflow much easier on researchers who are new to targeted sequencing. There is also an abundance of literature with Illumina sequencing, so it’s easy for researchers to compare their findings to those of other groups. The biggest advantage is that researchers can sequence hundreds of genes in a single run, which lowers sequencing costs and turnaround time, especially if the researcher is interested in many different genes [1].

A disadvantage with short-read sequencing is that the sequencing resolution may not be as high as long-read sequencing. A comparison of short-read to long-read 16S amplicon sequencing showed that only long-read sequencing could provide strain-level community resolution and insight into novel taxa. Then for the metagenomics portion, a greater number of and more complete bacterial metagenome-assembled genomes (MAGs) were recovered from the data generated from long reads [4].

Long read sequencing (PacBio and Nanopore)

Long-read amplicon sequencing is done with either the PacBio or Oxford Nanopore platforms. They both offer complete, contiguous, uniform, and non-biased coverage across long amplicons up to 10 kb. Advantages of this type of long-read amplicon sequencing is that it’s more efficient, accurate and sensitive than short-read sequencing.

PacBio sequencing can obtain up to 99.999% single-molecule base calling accuracy and has been used to sequence full-length 16S and ITS sequences with very high accuracy as well [3].

Nanopore sequencing can provide accurate variant calling as well as robust coverage of larger targeted regions, which can help enhance the analysis of repetitive regions and improve taxonomic assignment [5]. Nanopore sequencing also tends to allow a bit more flexibility than PacBio sequencing when it comes to scaling amplicon projects at a cost-effective price [6].

The disadvantages to using long-read sequencing for amplicon projects is that it tends to be much more expensive and time-consuming than short-read sequencing, and sometimes long reads may not even be needed if the targeted amplicons themselves are already very short.

How can Genohub help you?

Genohub’s amplicon sequencing partners are experts in every step of the amplicon sequencing process, including extraction, PCR amplification, adapter ligation, library prep and data analysis. Our partners have experience extracting from many different types of environmental and biological samples, but they can work just as well with your DNA or amplicons if you prefer to extract and/or perform PCR in your own lab. From our experience, it’s more cost-effective to send DNA samples rather than amplicons, unless you can attach Illumina adapters yourself.

We know that each research project is unique, so we have partners who are also open to working with your custom primers, custom gene panels and custom bioinformatics needs! Get started today by letting us know about your amplicon sequencing project here: https://genohub.com/ngs/ .

Ribosome Profiling (Ribo-Seq): A High-Precision Tool to Quantify mRNA Translation

RNA-Seq has been used consistently for years as a way to determine gene expression by correlating mRNA levels to protein levels. However, the actual translation process in vivo cannot be completely captured by this method. This is because each mRNA molecule isn’t necessarily translated into protein by ribosomes. Ribosome Profiling was developed to help complete this picture.

In this blog, we’ll go over what Ribosome Profiling is, some real-world applications, a typical workflow, and how Genohub can help you with your Ribo-Seq project.

What is Ribosome Profiling?

In order to synthesize proteins, cells transcribe mRNA from DNA and then translate proteins from mRNA. Many researchers who want to study this gene expression process have used RNA-Seq, which provides data on the relative levels of mRNA within a cell. While the levels of specific mRNA often do correlate with the levels of particular proteins, standard RNA-Seq cannot provide actual data regarding gene regulation at the translational level. This is where Ribosome Profiling (Ribo-Seq) comes in.

Ribo-Seq is a sequencing method that uses specific ribosome-protected mRNA fragments (RPFs) to determine the mRNAs that are actively being translated in vivo. This snapshot can then be compared to parallel RNA-Seq done for the transcriptome to reveal the positions and amounts of ribosomes on any specific mRNA.

What are the applications of Ribo-Seq?

Ribo-Seq can help identify alternative mRNA translation start sites, confirm annotated open reading frames (ORFs) and upstream ORFs that may be involved in translation regulation, the distribution of ribosomes on an mRNA and the rate at which ribosomes decode codons. As Ribo-Seq can provide data about gene expression, protein synthesis and protein abundance, it can be useful in almost every type of research, including research on cancer, autoimmune disease, heart disease, neurological disorders, and psychiatric disorders.

The following are examples where Ribo-Seq was used in different types of research.

  • Scheckel et al. used Ribo-Seq in combination with another technique to discover that aberrant translation within the glia only may be enough to cause severe neurological symptoms and may be a primary driver of prion disease.
  • In this paper, the authors summarize multiple studies where Ribo-Seq was used to identify novel genes within plants that could be useful to increase yield through biotic and abiotic stress tolerance if manipulated.
  • In this article, Ribo-Seq was used to reveal translated sequences within long noncoding RNAs and to identify other micropeptides within two herpesviruses, human cytomegalovirus and Kaposi’s sarcoma-associated herpesvirus. Understanding viral gene regulation and other aspects of the proteome are important for understanding their life cycle and identifying epitopes they may present for immune surveillance.

What is the typical Ribo-Seq workflow?

The typical Ribo-Seq workflow begins with collecting and preparing the lysate. First, the cells or tissue samples are harvested and flash-frozen to halt translation. Then, the samples are resuspended in a lysis buffer that includes a salt to stabilize the ribosomes, detergent to puncture the cell membrane, a deoxyribonuclease to degrade genomic DNA, a translation-inhibiting drug to halt the ribosome, and a reducing agent to stop oxidative compounds from interfering with RNA. After lysis, ribonucleases are added to digest the RNA that is not protected inside of the ribosomes. These fragments are called RNA protected fragments (RPFs). Then size selection is performed to identify the ~28 nucleotide RPFs on a gel, and RNA extraction is extracted. Any contaminating rRNA is removed, the RPFs are reverse-transcribed to cDNA, amplified by PCR and then made into libraries that are sequenced.

The data analysis done will ultimately depend on the researcher’s personal aim, but in general, ribosome profiling mapping would include data QC, demultiplexing and then removal of adapter sequences and any remaining rRNA contaminants. The samples would then be aligned to an annotated genome/transcriptome and then counts of the number of reads aligned to each gene would be obtained. These mapped RPFs can then be visualized and compared with what other researchers have done. More specific analysis can include uORF detection, differential gene expression, global translation rates, ribosome stalling, and codon decoding rates.

Where can I get help with my Ribo-Seq project?

As of now the Illumina kit for Ribo-Seq, TruSeq Ribo Profile or ART-Seq, has been discontinued. There is a commercially available all-inclusive library preparation kit, called LACESeq by IMMAGINA Biotechnology. However, Ribo-Seq sample and library preparation is so complex and sample-specific that many labs have their own protocols optimized for their specific samples and then use their favorite commercial small RNA-Seq kit for the last part of library prep. For labs that don’t focus on this type of work, optimizing such a protocol can be very time-intensive and expensive.

Genohub’s Ribo-Seq partners are experts in every step of the Ribo-Seq process, from lysis to custom data analysis, including preparing and running RNA-Seq libraries in parallel, allowing for the measurement of translation efficiency. Our in-network partners also have experience in isolating ribosome-bound mRNA from many different types of samples, including bacteria and eukaryotic cells, and animal and plant tissue. Their proprietary optimized Ribo-Seq protocols means they routinely produce high-quality libraries efficiently and effectively. All you would have to do is provide your frozen cell or tissue samples and let us do the rest.We will be with you every step of the way, from extraction to data analysis! Get started today by letting us know about your Ribo-Seq project here: https://genohub.com/ngs/ .

Illumina Unveils NextSeq 1000 & NextSeq 2000

Last week at the J.P. Morgan Healthcare Conference, Illumina presented their new sequencers, the NextSeq 1000 and NextSeq 2000. 

Strengths: The NextSeq 1000 and 2000 use patterned flow cells similar to the NovaSeq 6000 System that offer the highest cluster density flow cell of any on-market NGS system. To take full advantage of these higher density flow cells, they feature a novel super resolution optics system that is optimized to increase cluster brightness, reduce channel cross-talk, and improve signal-to-noise ratio. This should increase the output and reduce the cost per run compared to the previous NextSeq model (1). The system uses fluors, which both excite and emit with blue and green wavelengths. 

The major difference between the NextSeq 1000 and 2000 capacities is that only the 2000 will be able to handle the larger P3 flowcell. To compare the P2 and P3 flowcells at the 2×150 read length, the P2 flowcell will yield a similar number of clusters to the NextSeq 550 Hi Ouptut kit for a similar runtime. The P3 flowcell will yield a number of clusters that is between the NovaSeq’s SP and S1 flowcells, although the run time is longer, which is likely due to the new super resolution technology. According to Illumina, the NextSeq 2000 will have a $20 per Gb cost, and the NextSeq 1000 will have a $30 per Gb cost (2). 

Regarding downstream data analysis, these new sequencers also come with the DRAGEN system, which is both on-board and cloud-based. The DRAGEN (Dynamic Read Analysis for GENomics) Bio-IT Platform will enable our providers to automate a variety of genomic analysis, including BCL conversion, mapping, alignment, sorting, duplicate marking, and variant calling. According to Illumina, results can be generated in as little as 2 hours (1).

On the wet bench side of things, the NextSeq 1000 and 2000 reagents will also reduce the volume of the sequencing reactions. This volume reduction should decrease waste and minimize physical storage requirements. For example, one cartridge includes all reagents, fluidics and the waste holder (1), which will simplify library loading and instrument use. This should increase efficiency, reduce the chance of user error, lower the sequencing costs, improve recyclability and minimize waste volume. Ideally, these cost savings will then be passed on to our clients. 

Applications: According to Illumina, the new applications available on the NextSeq 1000 and 2000 are small whole-genome sequencing, whole exome sequencing and single-cell RNA-Seq (1), applications which are useful for research in oncology, genetic disease, reproductive health, agrigenomics, etc. 

As some analysis examples, the new DRAGEN Enrichment Pipeline can be applied to whole exome sequencing and targeted resequencing with alignment, small variant calling, somatic variant calling, SV/CNV calling and custom manifest files. The DRAGEN RNA Pipeline can be applied to whole transcriptome gene expression and gene fusion detection with alignment, fusion detection and gene expression. Other standardized DRAGEN pipelines include DRAGEN-GATK, DNA/RNA targeted panels and single-cell sequencing. A more complete list is available here.

Release Date: The NextSeq 2000 is available for order now, but both the NextSeq 2000 and 1000 will only be available for shipment in Q4 2020. The NextSeq 1000 has a list price of $210,000 and the NextSeq 2000 has a list price of $335,000 (2). We have already added the instrument specifications to our database, so providers can start listing their NextSeq 1000 and 2000 services as soon as they are ready.  

Overall, the new NextSeq 1000 and 2000 seem like solid desktop upgrades and also good testing ground for the new super resolution technology. If it goes well, there may be an upgraded version of the NovaSeq unveiling in the future.

10X Genomics: Combining new and old techniques to unlock new insights

Illumina sequencing is by far the most common next-generation sequencing technique used today, as it extremely accurate and allows for massively parallel data generation, which is incredibly important when you’re working with a dataset the size of a human genome!

That said, there are inherent shortcomings that exist in the typical Illumina sequencing workflow. Illumina uses a very high number of short sequencing reads (usually about 150 bp for whole genome sequencing) that are then assembled together to cover the entirety of the genome. The fact that traditional Illumina can’t be used to identify long-range interactions can cause issues in some cases, such as samples with large structural variants or in phasing haplotypes.

However, a revolutionary new library preparation method designed by 10x Genomics can effectively solve these types of issues. The 10X Genomics GemCode technology is a unique reagent delivery system that allows for long-range information to be gathered from short-read sequencing. It does through usage of a high efficiency microfluidic device which releases gel beads containing unique barcodes and enzymes for library preparation. It then takes high molecular weight DNA, and partitions it into segments that are about 1M bp long; from here, these segments of DNA are combined with the gel beads. This means that each read that comes from that segment of DNA has its own unique barcode, which gives us knowledge about long-range interactions from traditional short reads.

10X

Figure 1: Projections of 20,000 brain cells where each cell is represented as a dot. (A) Shading highlights major clusters identified in the dataset. (B) Cells were colored based on their best match to the average expression profile of reference transcriptomes [2].

Another application of 10X Genomics GemCode comes in the form of single-cell sequencing, which also uses the microfluidics device, but combines individual cells with the gel beads instead of DNA fragments. This allows for sequencing and barcoding of individual cells from a larger heterogeneous sample. This can work for DNA or RNA sequencing. 10c Genomics recently published an application of this technology using the Chromium Single Cell 3’ Solution on a mouse brain. Cells from embryonic mice brains were sequenced and profiled using this technique; principal component analysis and clustering was then performed on the resulting data to separate out the distinct cell types, identifying 7 major classes of cell types, as seen in Figure 1 [1].

Traditional Illumina sequencing will still likely reign supreme for run-of-the-mill applications, since at this point it is still more cost effective. However, the 10x system is gaining in popularity for specialized applications where understanding structural variants or single cell sequencing is important to the goals of the project. We’ve certainly noticed an uptick in projects that require 10x technology recently, and we look forward to seeing the advances that can be made with this amazing technology.

If you’re interested in projects using 10x Genomics sequencing tech, please contact us at projects@genohub.com for more information!

16S sequencing vs. Shotgun metagenomics: Which one to use when it comes to microbiome studies

While a lot of attention has been paid in recent years to the advances made in sequencing the human genome, next-generation sequencing has also led to an explosion of sequencing used to study microbiomes. There are two common methods of sequencing performed to study the microbiome: 16S rDNA sequencing and shotgun metagenomics.

What is 16S sequencing?

The 16S ribosomal gene is thought to exist in all bacteria, but still has regions that are highly variable between species. Because of this, primers have been created to amplify conserved regions that surround variable regions, allowing researchers to target the areas of the genes that are similar to observe the areas that are distinct. Because this approach allows us to observe very specific regions of the genome, we can drop the sequencing needed per sample dramatically, only needing around 50,000 – 100,000 reads to identify different bacterial species in a sample.

The main drawback of this technique is that it can only identify bacteria, and does not identify viruses or fungi.

What is shotgun metagenomics?

Shotgun metagenomics surveys the entire genomes of all the organisms present in the sample, as opposed to only the 16S sequences. One of the main advantages of this over 16S sequencing is that it can capture sequences from all the organisms, including viruses and fungi, which cannot be captured with 16S sequencing. Additionally, it’s less susceptible to the biases that are inherent in targeted gene amplification.

Perhaps most interestingly, it can also provide direct information about the presence or absence of specific functional pathways in sample, also known as the ‘hologenome’. This can provide potentially important information about the capabilities of the organisms in the community. Furthermore, shotgun metagenomics can be used to identify rare or novel organisms in the community, which 16S cannot do.

So which one should I use?

Like anything else, it really depends. 16S studies can be incredibly useful for comparison across different samples (like different environments, or different time points). And some studies have found that 16S sequencing is superior in these types of studies for identifying a higher number of phyla in a particular sample [1], while other studies have of course found the exact opposite [2].

When it comes down to it, it’s really important to evaluate your project needs carefully depending on what you’re trying to accomplish. For example, a large scale project that looks to examine hundreds of samples in order to evaluate the differences in microbiota across different environments might very well prefer to use 16S sequencing, since it is so much more cost-efficient than metagenomics sequencing. On the other hand, a project that is looking to deeply investigate a smaller number of samples might be a better candidate for metagenomics sequencing, which would allow them uncover all the organisms that are present in a particular sample (including viruses and fungi), as well as identify the most dominant gene pathways that are present in that particular sample.