Single Cell Experiments: How to Detect Impurities

I shared it here first.

A few months ago, we received an excellent question from a few colleagues. Here are some resources that might help you understand some of the ideas behind single-cell experimental designs.

Barnyard metrics

Multi-species mixture metrics, also known as a barnyard, are used to test single-cell impurities for encapsulation. You can think of it as a quality check for single-cell encapsulation (and cell integrity). Usually, two different species, human and mouse, are chosen.

When do you design a barnyard experiment?

When you invent or develop a droplet encapsulation technology (e.g., Drop-seq, 10X Chromium, PIP-seq), you usually need such validations. Moreover, if you change a standard procedure of what has been developed, especially any breakage or cell lysis you are afraid of after the intervention, you might prefer standard barnyard metrics as well.

How exactly does it work?

Cells of different species are mixed in a 1:1 ratio before the single-cell encapsulation. Later, the reads are usually assigned to a hybrid (e.g., human and mouse together) reference genome. If there are impurities/crosstalk/doublets/multiplets, you can detect them easily.

Simplified barnyard plot (not real) to exclude the doublets

However, the presence of RNA leakage prior to encapsulation (called ambient RNA, which might be due to stress or less viable cells) or heterogeneous cell population (e.g., smaller cells captured with bigger cells) might mislead such analysis.

Read length, sequencing depth, and complexity of the library, shorter reads, or shallow sequencing depth might not provide enough resolution for multi-species RNA seq alignments (please check out further reading). Please make sure that you need such an experiment. Apart from this, there are tools to estimate based on unique genes/reads detected per cell barcode.

(little self-criticism) As molecular biologists, we like adopting established technologies/methods over newer approaches in some instances for the sake of following the methods from previous papers, which might cause missing the alternative options.

So, what might be the alternative?

Cell Hashing

Yup, there are groups of smart people who come up with excellent ideas every time, leveraging the field. It is like magic (good science is hard to distinguish from magic, huh?).

A simplified depiction of how cell hashing (using cell surface targeting antibodies with barcodes) works

Basically, you label your samples with hashtag oligos (HTO) before mixing the populations. Instead of optimizing the workflow for multi-species and downstream hybrid alignment, you take advantage of hashing the cells before (so that they will already have unique barcodes). Then, you will demultiplex to assign to supposedly single cells encapsulated in the droplets to detect doublets or multiplets in general. It might also be useful to distinguish low-quality cells from ambient RNA.

Want to learn more? This approach is used to reduce the cost as well. The reason is you can combine multiple conditions/cell types/samples in one 10X load (let’s say 3k cells for three different conditions unless you are modifying the reaction mixture for each separately). Furthermore, cite-seq is originally developed to detect cell surface markers to relate gene expression in a multi-modal (ADTs) experimental setup.

I want to expand the content of this post to total-RNA-seq discussions as well (not know when, but stay tuned!).

Further Reading/References:

--

--

Ortaya Karışık (Fatma Betul Dincaslan)
Ortaya Karışık (Fatma Betul Dincaslan)

Written by Ortaya Karışık (Fatma Betul Dincaslan)

FeBe/ Molecular Biologist and Geneticist / Bioinformatician/ Single Cell Assayist / Socially developed nerd