Turrial: How to use geneGATer#

geneGATer is a Python package designed to facilitate the identification and ranking of potential communication genes from spatial transcriptomics data. The package integrates an adapted metagene construction method by Hue et. al. [2022] and a Graph Attention Network (GAT) to enable the exploration of cell-to-cell communication without relying on explicit cell-type annotations.

First we need to download the latest build of the package from the geneGATer github repo, if we haven’t done so yet:

!python -m pip install git+https://github.com/dertrotl/geneGATer.git@main

import geneGATer as gt
import squidpy as sq
import numpy as np
import pandas as pd
import scanpy as sc

We demonstrate the geneGATer pipeline on the squidpy coronal mouse brain section Visium dataset. The turorial is split in two parts. In part one, we apply the gt.tl.getComGenes() function, to extract the potential communication gene candidates. In part two, we show how we can use geneGATer to learn a graph attention model (GAT) to extract the most “influential” genes by ranking them. We start by loading the dataset and preprocessing it.

adata = sq.datasets.visium_hne_adata()
adata.raw = adata
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.neighbors(adata)

Part I#

Next we apply a standard pre-clustering. The “cluster-key” parameter specifies where to store the clusterin in the adata.obs section.

new_cluster = gt.pp.pre_clustering(adata, cluster_key = "new_cluster", resolution = 1)

sq.pl.spatial_scatter(adata, color="new_cluster")

../_images/c7308ee91efff2a052df9c3ab6fde5557a2462378bd3ce902752d5b688d99a2d.png

We see, that we clustered our spots in 12 different regions. Next, we want to extract potential communication candidates by building metagenes for each cluster in our pre-clustering. For this, we have to input our adata object and the pre-clustering from the first steps. This will take some minutes.

result = gt.tl.getComGenes(adata, new_cluster, tresh=0.0, groupby="new_cluster", verbosse = False)

plot = gt.pl.roc_auc_classification(adata=adata, all_metas = result[2], raw_cluster=new_cluster)

../_images/81f3529fe85560dd51e8a7542afb88705bf376f5aa4aedeb49e6d4e934f373b8.png

“result” saves multiple outputs, indexing “result[2]” gives us the metagene expression of each cluster for each spot. To validate the results of our metagenes, we plotted the above matrix, which highlights the roc-auc scores of the spot clustering by metagenes.

Part II#

Next, we want to train a GAT model to rank the our communication candidates by their importance, according to our model and validate these results. We do this by simply running the gt.tl.learn_model() function and setting a model type and loss function (nn negative binomial, nn poisson or mse loss). For this we need a list of genes, which are ideally the genes we extracted in part I (stored in “result[4]”).

np.save('squid_genes.npy', result[4]) 

gene_list = result[4]

model, data = gt.tl.learn_model(adata, gene_list = gene_list, model_type = "GAT", loss = "mse", epochs = 100)

Tracking run with wandb version 0.15.12

Run data is saved locally in /Users/benjaminweinert/gnn_test/wandb/run-20231024_165811-s03qgv1q

Syncing run elated-fog-26 to Weights & Biases (docs)

View project at https://wandb.ai/thesis_weinert/my_model

View run at https://wandb.ai/thesis_weinert/my_model/runs/s03qgv1q

GAT(
  (conv1): GATv2Conv(293, 293, heads=1)
  (conv2): GATv2Conv(293, 18078, heads=1)
)
Epoch:  10.0  Loss:  9.472657203674316
Epoch:  20.0  Loss:  1.3256076574325562
Epoch:  30.0  Loss:  0.6214395761489868
Epoch:  40.0  Loss:  0.47567155957221985
Epoch:  50.0  Loss:  0.14159516990184784
Epoch:  60.0  Loss:  0.10813288390636444
Epoch:  70.0  Loss:  0.07258807122707367
Epoch:  80.0  Loss:  0.0632665753364563
Epoch:  90.0  Loss:  0.06014275178313255

Waiting for W&B process to finish... (success).

Run history:

Test Eval Error	█▂▂▁▁▁▁▁▁▁
Test R2	▁▇▇███████
Test R2_lin	▁▁▂▄▄▆▇███
Train Eval Error	█▂▂▁▁▁▁▁▁▁
Train R2	▁▇▇███████
Train R2_lin	▁▁▂▄▄▆▇███
Validation Eval Error	█▂▂▁▁▁▁▁▁▁
Validation R2	▁▆▇███████
Validation R2_lin	▁▁▂▄▄▆▇███
epoch	▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
loss	▁█▂▂▃▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

Run summary:

Test Eval Error	0.05811
Test R2	-0.11428
Test R2_lin	0.6006
Train Eval Error	0.0591
Train R2	-0.1429
Train R2_lin	0.59605
Validation Eval Error	0.05957
Validation R2	-0.11087
Validation R2_lin	0.59305
epoch	100
loss	0.05902

View run elated-fog-26 at: https://wandb.ai/thesis_weinert/my_model/runs/s03qgv1q
View job at https://wandb.ai/thesis_weinert/my_model/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjEwOTUyNDA2Mw==/version_details/v0
Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)

Find logs at: ./wandb/run-20231024_165811-s03qgv1q/logs

For monitoring the learning performance, we use “weights & biases” (wandb). The results and the progress is directly uploaded to a wandb instance. Furthermore, we can analyse our results by using some of the plotting functions of this package. (The results indicated here can be significantly improved by learning up to 1000 epochs.)

gt.pl.attention_umap(model, data, adata, n_components=700, cluster_key="new_cluster", umap_model="umap") #tsne works as well

(<Figure size 1000x1000 with 1 Axes>,
 array([[5.2995124, 4.820972 ],
        [4.659695 , 4.6405478],
        [4.6481504, 4.5228744],
        ...,
        [2.0355809, 4.9695773],
        [6.297827 , 5.3710666],
        [6.3939495, 1.6471777]], dtype=float32))

../_images/f06701f632b244589267a5fe0c19e094b28e9195eee0b52ea4b22f774efb84d0.png

We can get the top ranked genes by two different ranking methods:

Using classical saliancy scores using the gt.pl.get_top_k_genes_saliency() function.
Using receiving and sending weight signals using the gt.pl.get_top_k_genes() function.

Furthermore we can compare the top ranked genes by a list of marker genes. In this tutorial, we want to compare our results with found ligand and receptor interactions using the CellphoneDB implementation in Squidpy. Genes which are also present in the marker gene list, are marked with an asterisk (*).

# Extracting L&R interactions using CellphoneDB and Squidpy

adata.obs["new_cluster"] = new_cluster
res = sq.gr.ligrec(adata, "new_cluster",
            fdr_method=None,
                   copy=True,
            interactions_params={"resources": "CellPhoneDB"},
            threshold=0.1, seed=0, n_perms=10000, n_jobs=1, use_raw=False) 

df = res["pvalues"]
print("Number of CellPhoneDB interactions:", len(df))

ligand_receptor_space = []

for tuples in range(0,res["metadata"].index.values.shape[0]):
    for idx in range(0,2):
        ligand_receptor_space.append(res["metadata"].index.values[tuples][idx])
ligand_receptor_space = list( set(ligand_receptor_space) )
print("Number of unique Ligand / Receptor Genes:", len(ligand_receptor_space))


gene_list_upper = [x.upper() for x in gene_list]

Number of CellPhoneDB interactions: 654
Number of unique Ligand / Receptor Genes: 493

gt.pl.get_top_k_genes_saliency(model, data, gene_list_upper, 10, marker_genes = ligand_receptor_space) # Plotting ranked genes by ranking method 1

(<Figure size 1000x600 with 1 Axes>,
   all_genes
   PVALB
     NOV
   TYRO3
   CABP7
   RAB3C
 ST6GAL2
  SH3GL3
 CAMK2N1
   SEPT8
    RPS8)

../_images/a79f793a9d4caaf351d6ba7b7a18528b46cdf4cb9ddae81190ca99932e307899.png

res = gt.pl.get_top_k_genes(model, gene_list_upper, 10, marker_genes = ligand_receptor_space) # Plotting ranked genes by ranking method 2

../_images/0a8ed60750294a2a46c68ad7acff3067f013f573fe332867c90beec016463c92.png

../_images/464e9bb8700ea0f0f5ce7fb20a221e2fa2b4a67ec9a8f3a14be9d9a7b2ece142.png

More plotting functions can be found in the documentation.

Turrial: How to use geneGATer

Contents

Turrial: How to use geneGATer#

Part I#

Part II#

Run history:

Run summary: