Tissue Specificity Metrics

Published here first. I have been reading about cell and tissue specificity, particularly single-cell context, recently. Before sharing related content here, I would like to introduce the tissue specificity concept referring to my master’s thesis.

As human beings, we simply understand the world with pattern recognition and classifications. Interestingly, gene expression has patterns too.

Some genes keep expressing themselves without major changes in different conditions/cells/tissues, whereas most of them are modified. The ones stable in certain conditions are usually defined as “housekeeping.” Yet, sometimes, the gene expression varies in different tissues. They might be unique or highly expressed in that tissue and are called “tissue-specific”.

So, how do we decide whether a gene is tissue-specific or not?

Defining (Tissue) Specificity

There are a few areas that might require further attention while defining tissue specificity (TS).

  1. The accuracy of tissue location and definition
  2. The sex and age of the patient/model organism or passage number of cell lines
  3. The dynamicity of gene expression, particularly in the context of single-cell level analysis
  4. Previous treatments/therapeutic interventions/underlying conditions, if any (/timing after stimulation/treatment),

for the analysis only

  1. How the data is pre-processed and normalized
  2. Which metrics are chosen
  3. How the threshold is determined

If you ever read about Human Proteome/Cell Atlas, Tabula Muris/Sapiens, you might have an idea about how the tissues are validated. Tissue staining is usually used.

The expression patterns of tissues differ with age. Moreover, the definition of age matters since chronological age is not reflected in the physical aging process for each individual at the same time.

Some tissues might have sex-specific gene expression patterns.

To eliminate confounding factors and (ideally) a valid reference point, it is good to have “healthy” tissues (where the definition of healthy differs, considering not every organ or patient is healthy at the time of the biopsy). Usually, adjacent to the disease location (i.e., not tumor) is collected for this purpose.

Library prep, data pre-processing, normalization, and transformation methods affect a lot. It is important to select similar pipeline-used tissues if combined from multiple databases, and ideally, the same pre-processed data is to be combined and analyzed.

I will be discussing the metrics and how the cutoffs were chosen briefly in the next section.

TS Metrics

I used and compared these metrics in the context of the tissue-specific transcriptome of zebrafish and subsequent application on acetylcholine esterase mutant embryos.

The summary table from my thesis.

As you will notice, some of the TS metrics, such as Tau (be careful with the denominator part), depending on the number of tissues available for the comparison (so they are more sensitive to new tissue data addition; however, they might be useful in the case of ubiquitous but the highly variable gene). Some of the metrics are too strict that only very distinct high expressions in certain tissues or only expression in one tissue, such as counts, is considered as TS gene. Though, some metrics, such as Hg, do not use the absolute gene expression but rather use the relative distribution of a gene.

The distribution of TS scores of the sequencing depth normalized data I used in the thesis.

In most TS metrics, there is a U-shaped distribution of the individual values due to a higher probability of 0 and 1 than any other intermediate values. However, many of the values have intermediate tissue specificity scores. The TS cutoff might be a little bit arbitrary (e.g., 0.8>= was chosen for Tau in most cases).

Fun fact: The human proteome atlas uses Tau.

Most TS webstools use different metrics for TS definition. There is more to discuss regarding TS, however, I will stop here for this little detailed intro. Definitions and classifications have been evolving as well. Stay tuned for the next post to learn how these considerations and values might be useful.

Further Reading/References

--

--

Ortaya Karışık (Fatma Betul Dincaslan)
Ortaya Karışık (Fatma Betul Dincaslan)

Written by Ortaya Karışık (Fatma Betul Dincaslan)

FeBe/ Molecular Biologist and Geneticist / Bioinformatician/ Single Cell Assayist / Socially developed nerd