cellmapper.CellMapper#

class cellmapper.CellMapper(query, reference=None)#

Mapping of labels, embeddings, and expression values between reference and query datasets.

Attributes table#

mapping_operator

Get the mapping operator for applying matrix powers.

query_imputed

Get the imputed query data.

Methods table#

compute_fast_cca([n_comps, key_added, ...])

Compute a joint embedding using fast CCA on the reference and query datasets.

compute_joint_pca([n_comps, key_added])

Compute a joint PCA on the normalized .X matrices of query and reference, using only overlapping genes.

compute_mapping_matrix([kernel_method, ...])

Compute the mapping matrix for label transfer.

compute_neighbors([n_neighbors, use_rep, ...])

Compute nearest neighbors between reference and query datasets.

compute_presence_score([groupby, key_added, ...])

Estimate raw presence scores for each reference cell based on query-to-reference connectivities.

evaluate_expression_transfer([layer_key, ...])

Evaluate the agreement between imputed and original expression in the query dataset, optionally per group.

evaluate_label_transfer(label_key[, ...])

Evaluate label transfer using a k-NN classifier or externally computed predictions.

load_precomputed_distances([distances_key, ...])

Load precomputed distances from the AnnData object.

map([obs_keys, obsm_keys, layer_key, t, ...])

Map data from reference to query datasets.

map_layers(key[, t, diffusion_method, ...])

Map expression values with optional multi-step diffusion and library size adjustment.

map_obs(key[, t, diffusion_method, ...])

Map observation data from reference dataset to query dataset.

map_obsm(key[, t, diffusion_method, ...])

Map embeddings with optional multi-step diffusion.

plot_confusion_matrix(pred_key, *[, ...])

Plot a confusion matrix heatmap comparing true vs predicted labels.

process_presence_scores(scores[, log, ...])

Post-process presence scores with log1p, percentile clipping, and min-max normalization.

register_external_predictions(label_key[, ...])

Register externally computed predictions for evaluation.

Attributes#

CellMapper.mapping_operator#

Get the mapping operator for applying matrix powers.

The mapping operator encapsulates the mapping matrix and provides methods for applying matrix powers M^t for t-step diffusion processes.

Returns:

MappingOperator The mapping operator containing the validated and normalized mapping matrix

Raises:

ValueError – If the mapping matrix has not been computed yet

CellMapper.query_imputed#

Get the imputed query data.

Returns:

AnnData or None The imputed query data as an AnnData object, or None if not set.

Methods#

CellMapper.compute_fast_cca(n_comps=None, key_added='X_cca', layer=None, mask_var=None, zero_center=True, scale_with_singular=False, l2_scale=True, random_state=0, implicit=True)#

Compute a joint embedding using fast CCA on the reference and query datasets.

This method computes the singular value decomposition (SVD) of the cross-covariance matrix between the reference and query datasets using an efficient implementation that doesn’t materialize the full cross-covariance matrix. It then constructs a joint embedding based on the SVD components.

Parameters:
  • n_comps (int | None (default: None)) – Number of components to keep in the embedding.

  • key_added (str (default: 'X_cca')) – Key under which to store the joint embedding in .obsm of both query and reference AnnData objects.

  • layer (str | None (default: None)) – Layer to use for the computation. If None, use .X.

  • mask_var (ndarray | str | None (default: None)) – Boolean mask or string mask identifying a subset of variables/genes to use for computation. If string, should be a key in .var for both query and reference that contains a boolean mask. If None, uses the intersection of all variables from both datasets.

  • zero_center (bool (default: True)) – If True, center the data (implicitly for sparse matrices).

  • scale_with_singular (bool (default: False)) – If True, scale the singular vectors by the square root of their singular values. If False, return the raw singular vectors.

  • l2_scale (bool (default: True)) – If True, scale the matrices of singular vectors to have l2 norm of 1 per observation.

  • random_state (int (default: 0)) – Random seed for reproducibility.

  • implicit (bool (default: True)) – Whether to use implicit mean centering and covariance computation.

Return type:

None

Returns:

None Embeddings are stored in the .obsm attribute of both query and reference AnnData objects.

Notes

This is a fast implementation of Canonical Correlation Analysis (CCA) that computes the SVD of the cross-covariance matrix between the query and reference datasets. It is designed to be memory-efficient and fast, especially for large datasets. The method uses an efficient SVD implementation that avoids explicitly constructing the cross-covariance matrix, which can be very large for high-dimensional data. The method is particularly useful for integrating datasets with potentially different gene expression profiles, as it focuses on the shared variance between the two datasets.

This method is a fast re-implementation of Seurat’s CCA approach.

CellMapper.compute_joint_pca(n_comps=None, key_added='X_pca', **kwargs)#

Compute a joint PCA on the normalized .X matrices of query and reference, using only overlapping genes.

Parameters:
  • n_comps (int | None (default: None)) – Number of principal components to compute.

  • key_added (str (default: 'X_pca')) – Key under which to store the joint PCA embeddings in .obsm of both query and reference AnnData objects.

  • **kwargs – Additional keyword arguments to pass to scanpy’s pp.pca function.

Return type:

None

Notes

This method performs an inner join on genes (variables) between the query and reference AnnData objects, concatenates the normalized expression matrices, and computes a joint PCA using Scanpy. The resulting PCA embeddings are stored in .obsm[key_added] for both objects. Consider using compute_fast_cca for improved cross-dataset integration.

CellMapper.compute_mapping_matrix(kernel_method=None, symmetrize=None, self_edges=None, n_eigenvectors=50, eigen_solver='partial')#

Compute the mapping matrix for label transfer.

Parameters:
  • kernel_method (Optional[Literal['jaccard', 'gauss', 'scarches', 'inverse_distance', 'random', 'hnoca', 'equal', 'umap']] (default: None)) –

    Method to use for computing the mapping matrix. Options include:

    • ”jaccard”: Jaccard similarity. Inspired by GLUE [CG22]

    • ”gauss”: Gaussian kernel with (global) bandwith equal to the mean distance.

    • ”scarches”: scArches kernel. Inspired by scArches [LNL+22]

    • ”inverse_distance”: Inverse distance kernel.

    • ”random”: Random kernel, useful for testing.

    • ”hnoca”: HNOCA kernel. Inspired by HNOCA-tools [HDF+24]

    • ”equal”: All neighbors are equally weighted (1/n_neighbors).

    • ”umap”: UMAP fuzzy simplicial set connectivities. Only available for self-mapping with true k-NN graphs.

  • symmetrize (bool | None (default: None)) – If True, create a symmetrize connectivity matrix. Only valid for square matrices (self-mapping). If None (default), uses True for self-mapping and False for cross-mapping.

  • self_edges (bool | None (default: None)) – Control self-edges (diagonal entries) for square matrices (self-mapping). If None (default), uses False for self-mapping (scanpy style) and None for cross-mapping. This controls whether or not the kernel used to compute the connectivities is supplied with self-edges. It does not determine whether the final connectivity matrix has self edges. For example, the umap kernel expectes self-edges, but does not produce them in the final connectivity matrix.

  • n_eigenvectors (int (default: 50)) – Number of eigenvectors to compute for spectral decomposition. Only relevant when using spectral methods for matrix powers. Default is 50.

  • eigen_solver (Literal['partial', 'complete'] (default: 'partial')) – Eigendecomposition method for spectral approach: - “partial”: Uses sparse eigendecomposition, faster (default) - “complete”: Uses complete eigendecomposition, exact for testing

Return type:

None

Returns:

None

Notes

Updates the following attributes:

  • mapping_operator: Mapping operator to transfer labels, embeddings, or expression values.

CellMapper.compute_neighbors(n_neighbors=30, use_rep=None, n_comps=None, knn_method='sklearn', knn_dist_metric='euclidean', random_state=0, only_yx=False, neighbors_kwargs=None, fallback_representation='fast_cca', fallback_kwargs=None)#

Compute nearest neighbors between reference and query datasets.

The method computes k-nearest neighbor graphs to enable mapping between datasets. If no representation is provided (use_rep=None), a fallback representation will be computed automatically using either fast CCA ,inspired by Seurat v3 [SBH+19]), or joint PCA. In self-mapping mode, a simple PCA will be computed on the query dataset.

Parameters:
  • n_neighbors (int (default: 30)) – Number of nearest neighbors. This parameter controls the sparsity of the connectivity matrix.

  • use_rep (str | None (default: None)) – Data representation based on which to find nearest neighbors. If None, a fallback representation will be computed automatically.

  • n_comps (int | None (default: None)) – Number of components to use. If a pre-computed representation is provided via use_rep, we will use the number of components from that representation. Otherwiese, if use_rep=None, we will compute the given number of components using the fallback representation method.

  • knn_method (Literal['sklearn', 'pynndescent', 'rapids', 'faiss-cpu', 'faiss-gpu'] (default: 'sklearn')) –

    Method for computing k-nearest neighbors. Options include:

    All methods return exactly n_neighbors neighbors, including the reference cell itself (in self-mapping mode).

knn_dist_metric

Distance metric to use for nearest neighbors. See the knn algorithms documentation for details.

random_state

Random seed for reproducibility. Only used by “pynndescent” method.

only_yx

If True, only compute the xy neighbors. In self-mapping mode, this is automatically set to True for efficiency since all neighbor matrices contain the same information. This is faster, but not suitable for Jaccard or HNOCA methods in cross-mapping mode.

neighbors_kwargs

Additional keyword arguments to pass to the neighbors computation method.

fallback_representation

Method to use for computing a cross-dataset representation when use_rep=None. Options:

  • “fast_cca”: Fast canonical correlation analysis, inspired by Seurat v3 [SBH+19] and SLAT [XCTG23]).

  • “joint_pca”: Principal component analysis on concatenated datasets.

fallback_kwargs

Additional keyword arguments to pass to the fallback representation method. For “fast_cca”: see compute_fast_cca(). For “joint_pca”: see compute_joint_pca().

Return type:

None

Returns:

None

Notes

Updates the following attributes:

  • knn: Nearest neighbors object.

  • n_neighbors: Number of nearest neighbors.

  • only_yx: Whether only yx neighbors were computed.

CellMapper.compute_presence_score(groupby=None, key_added='presence_score', log=False, percentile=(1, 99), minmax=True)#

Estimate raw presence scores for each reference cell based on query-to-reference connectivities.

Adapted from the HNOCA-tools package [HDF+24].

Parameters:
  • groupby (str | None (default: None)) – Column in self.query.obs to group query cells by (e.g., cell type, batch). If None, computes a single score for all query cells.

  • key_added (str (default: 'presence_score')) – Key to store the presence score: always writes the score across all query cells to self.reference.obs[key_added]. If groupby is not None, also writes per-group scores as a DataFrame to self.reference.obsm[key_added].

  • log (bool (default: False)) – Whether to apply log1p transformation to the scores.

  • percentile (tuple[float, float] (default: (1, 99))) – Tuple of (low, high) percentiles for clipping scores before normalization.

  • minmax (bool (default: True)) – Whether to apply min-max normalization to the scores.

CellMapper.evaluate_expression_transfer(layer_key='X', comparison_method='pearson', groupby=None, test_var_key=None)#

Evaluate the agreement between imputed and original expression in the query dataset, optionally per group.

These metrics are inspired by [LZG+22].

Parameters:
  • layer_key (str (default: 'X')) – Key in self.query.layers to use as the original expression. Use “X” to use self.query.X.

  • comparison_method (Literal['pearson', 'spearman', 'js', 'rmse'] (default: 'pearson')) –

    Method to use for comparing the mapping results. Options include:

    • ”pearson”: Pearson correlation coefficient.

    • ”spearman”: Spearman rank correlation coefficient.

    • ”js”: Jenson-Shanon divergence.

    • ”rmse”: Root Mean Square Error.

  • groupby (str | None (default: None)) – Column in self.query.obs to group query cells by (e.g., cell type, batch). If None, computes a single score for all query cells.

  • test_var_key (str | None (default: None)) – Optional key in self.query.var where True marks test genes. If provided, average metrics are computed only over test genes.

Return type:

None

Returns:

None

Notes

Updates the following attributes:

  • expression_transfer_metrics: Dictionary containing the average metric and number of genes used for the evaluation.

  • query.var[metric_name]: Per-gene metric values (overall, across all cells).

  • query.varm[metric_name]: Per-gene, per-group metric values (if groupby is provided).

CellMapper.evaluate_label_transfer(label_key, prediction_postfix=None, confidence_postfix=None, confidence_cutoff=0.0, zero_division=0)#

Evaluate label transfer using a k-NN classifier or externally computed predictions.

Parameters:
  • label_key (str) – Key in .obs storing ground-truth cell type annotations.

  • prediction_postfix (str | None (default: None)) – Postfix for prediction column in .obs. If None, uses self.prediction_postfix.

  • confidence_postfix (str | None (default: None)) – Postfix for confidence column in .obs. If None, uses self.confidence_postfix.

  • confidence_cutoff (float (default: 0.0)) – Minimum confidence score required to include a cell in the evaluation.

  • zero_division (Union[int, Literal['warn']] (default: 0)) – How to handle zero divisions in sklearn metrics computation.

Return type:

None

Returns:

None

Notes

Updates the following attributes:

  • label_transfer_metrics: Dictionary containing accuracy, precision, recall, F1 scores, and excluded fraction.

CellMapper.load_precomputed_distances(distances_key='distances', remove_last_neighbor=False)#

Load precomputed distances from the AnnData object.

This method is only available in self-mapping mode.

Parameters:
  • distances_key (str (default: 'distances')) – Key in adata.obsp where the precomputed distances are stored.

  • remove_last_neighbor (bool (default: False)) – If True, removes the last neighbor from the distances matrix. This is useful for direct comparisons with scanpy.

Return type:

None

Returns:

None

Notes

Updates the following attributes:

  • knn: Neighbors object constructed from the precomputed distances.

For symmetrization of connectivity matrices, use the symmetrize parameter in compute_mapping_matrix() after loading the distances.

CellMapper.map(obs_keys=None, obsm_keys=None, layer_key=None, t=None, diffusion_method='iterative', target_libsize=None, n_neighbors=30, use_rep=None, knn_method='sklearn', knn_dist_metric='euclidean', only_yx=False, neighbors_kwargs=None, kernel_method=None, symmetrize=None, self_edges=None, prediction_postfix='_pred', subset_categories=None)#

Map data from reference to query datasets.

Parameters:
  • obs_keys (str | list[str] | None (default: None)) – One or more keys in reference.obs to be mapped into query.obs.

  • obsm_keys (str | list[str] | None (default: None)) – One or more keys in reference.obsm storing the embeddings to be mapped.

  • layer_key (str | None (default: None)) – Key in reference.layers to be mapped. Use “X” to map reference.X.

  • t (int | None (default: None)) – Number of diffusion time steps. This parameter controls the degree of smoothing applied by the diffusion operator. Larger values lead to more smoothing.

  • diffusion_method (Literal['iterative', 'spectral'] (default: 'iterative')) – Method for computing powers of the mapping matrix (only valid in self-mapping mode). Options are “iterative” for iterative matrix multiplication (inspired by MAGIC [VDSN+18]) and “spectral” for eigendecomposition-based approach.

  • target_libsize (str | ndarray | None (default: None)) –

    Strategy for adjusting library sizes after mapping:
    • str: Layer key in query AnnData to use for computing target library sizes (e.g., “counts”, “X”)

    • np.ndarray: Use the provided array as target library sizes (one per query cell)

    • None: No library size adjustment.

  • n_neighbors (int (default: 30)) – Number of nearest neighbors. This parameter controls the sparsity of the connectivity matrix.

  • use_rep (str | None (default: None)) – Data representation based on which to find nearest neighbors. If None, a fallback representation will be computed automatically.

  • knn_method (Literal['sklearn', 'pynndescent', 'rapids'] (default: 'sklearn')) – Method for computing k-nearest neighbors. Options include: - “sklearn”: Scikit-learn’s NearestNeighbors. See https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html - “pynndescent”: Pynndescent’s approximate nearest neighbors. See https://pynndescent.readthedocs.io/en/latest/ - “rapids”: RAPIDS cuML’s NearestNeighbors (GPU). See https://docs.rapids.ai/api/cuml/stable/api.html#cuml.neighbors.NearestNeighbors - “faiss-cpu”: Facebook AI Similarity Search (FAISS) on CPU. See https://faiss.ai/ - “faiss-gpu”: Facebook AI Similarity Search (FAISS) on GPU. See https://faiss.ai/

Return type:

CellMapper

All methods return exactly n_neighbors neighbors, including the reference cell itself (in self-mapping mode).

knn_dist_metric

Distance metric to use for nearest neighbors. See the knn algorithms documentation for details.

only_yx

If True, only compute the xy neighbors. In self-mapping mode, this is automatically set to True for efficiency since all neighbor matrices contain the same information. This is faster, but not suitable for Jaccard or HNOCA methods in cross-mapping mode.

neighbors_kwargs

Additional keyword arguments to pass to the neighbors computation method. For rapids backend, you can pass batch_size to process queries in batches to avoid GPU OOM errors (e.g., neighbors_kwargs={"batch_size": 50000}).

kernel_method

Method to use for computing the mapping matrix. Options include:

  • “jaccard”: Jaccard similarity. Inspired by GLUE [CG22]

  • “gauss”: Gaussian kernel with (global) bandwith equal to the mean distance.

  • “scarches”: scArches kernel. Inspired by scArches [LNL+22]

  • “inverse_distance”: Inverse distance kernel.

  • “random”: Random kernel, useful for testing.

  • “hnoca”: HNOCA kernel. Inspired by HNOCA-tools [HDF+24]

  • “equal”: All neighbors are equally weighted (1/n_neighbors).

  • “umap”: UMAP fuzzy simplicial set connectivities. Only available for self-mapping with true k-NN graphs.

symmetrize

If True, create a symmetrize connectivity matrix. Only valid for square matrices (self-mapping). If None (default), uses True for self-mapping and False for cross-mapping.

self_edges

Control self-edges (diagonal entries) for square matrices (self-mapping). If None (default), uses False for self-mapping (scanpy style) and None for cross-mapping. This controls whether or not the kernel used to compute the connectivities is supplied with self-edges. It does not determine whether the final connectivity matrix has self edges. For example, the umap kernel expectes self-edges, but does not produce them in the final connectivity matrix.

prediction_postfix

Postfix to append to mapped variable names (including any separator, e.g. “_pred”). Use “” for no postfix.

subset_categories

For categorical data, optionally specify a subset of categories to include in the mapping. If None (default), all categories are included. If specified, only the listed categories will be mapped, and others will be ignored. For numerical data, this parameter is ignored with a warning. Can be a single category string or a list of category strings.

CellMapper.map_layers(key, t=None, diffusion_method='iterative', target_libsize=None)#

Map expression values with optional multi-step diffusion and library size adjustment.

Transfers expression values (e.g., .X or entries from .layers) from reference dataset to a new imputed query AnnData object using matrix multiplication. For t > 1, applies matrix powers representing t-step diffusion processes (only supported in self-mapping mode).

Parameters:
  • key (str) – Key in reference.layers to be transferred. Use “X” to transfer reference.X

  • t (int | None (default: None)) – Number of diffusion time steps. This parameter controls the degree of smoothing applied by the diffusion operator. Larger values lead to more smoothing.

  • diffusion_method (Literal['iterative', 'spectral'] (default: 'iterative')) – Method for computing powers of the mapping matrix (only valid in self-mapping mode). Options are “iterative” for iterative matrix multiplication (inspired by MAGIC [VDSN+18]) and “spectral” for eigendecomposition-based approach.

  • target_libsize (str | ndarray | None (default: None)) –

    Strategy for adjusting library sizes after mapping:
    • str: Layer key in query AnnData to use for computing target library sizes (e.g., “counts”, “X”)

    • np.ndarray: Use the provided array as target library sizes (one per query cell)

    • None: No library size adjustment.

Return type:

None

Returns:

None

Notes

Creates self.query_imputed with the transferred data in .X. The new AnnData object will have the same cells as the query, but the features (genes) of the reference. If target_libsize is specified, the library sizes will be adjusted after mapping.

CellMapper.map_obs(key, t=None, diffusion_method='iterative', prediction_postfix='_pred', confidence_postfix='_conf', return_probabilities=False, subset_categories=None)#

Map observation data from reference dataset to query dataset.

Automatically detects whether the data is categorical or numerical and applies the appropriate mapping strategy. For categorical data, uses one-hot encoding followed by matrix multiplication and argmax. For numerical data, uses direct matrix multiplication.

Parameters:
  • key (str) – Key in reference.obs to be transferred into query.obs

  • t (int | None (default: None)) – Number of diffusion time steps. This parameter controls the degree of smoothing applied by the diffusion operator. Larger values lead to more smoothing.

  • diffusion_method (Literal['iterative', 'spectral'] (default: 'iterative')) – Method for computing powers of the mapping matrix (only valid in self-mapping mode). Options are “iterative” for iterative matrix multiplication (inspired by MAGIC [VDSN+18]) and “spectral” for eigendecomposition-based approach.

  • prediction_postfix (str (default: '_pred')) – Postfix to append to mapped variable names (including any separator, e.g. “_pred”). Use “” for no postfix.

  • confidence_postfix (str (default: '_conf')) – Postfix added to create new keys in query.obs for confidence scores (only applicable for categorical data)

  • return_probabilities (bool (default: False)) – If True, return a sparse pandas DataFrame of probabilities for categorical data (columns are category names). Only applicable for categorical data.

  • subset_categories (None | list[str] | str (default: None)) – For categorical data, optionally specify a subset of categories to include in the mapping. If None (default), all categories are included. If specified, only the listed categories will be mapped, and others will be ignored. For numerical data, this parameter is ignored with a warning. Can be a single category string or a list of category strings.

Return type:

DataFrame | None

Returns:

pd.DataFrame or None For categorical data with return_probabilities=True: a pandas DataFrame with sparse columns (SparseDtype), shape (n_query_cells, n_categories), indexed by query cell names and with columns as category names. For numerical data or when return_probabilities=False: None.

Notes

Updates the following attributes:

  • query.obs: Contains the transferred data and confidence scores (for categorical data).

CellMapper.map_obsm(key, t=None, diffusion_method='iterative', prediction_postfix='_pred')#

Map embeddings with optional multi-step diffusion.

Uses matrix multiplication to transfer embeddings from the reference dataset to the query dataset. For t > 1, applies matrix powers representing t-step diffusion processes (only supported in self-mapping mode).

When the reference embeddings are stored as a pandas DataFrame, the diffusion_method preserves the DataFrame structure by reconstructing it with the query cell index and the original column names after mapping.

Parameters:
  • key (str) – Key in reference.obsm storing the embeddings to be transferred

  • t (int | None (default: None)) – Number of diffusion time steps. This parameter controls the degree of smoothing applied by the diffusion operator. Larger values lead to more smoothing.

  • diffusion_method (Literal['iterative', 'spectral'] (default: 'iterative')) – Method for computing powers of the mapping matrix (only valid in self-mapping mode). Options are “iterative” for iterative matrix multiplication (inspired by MAGIC [VDSN+18]) and “spectral” for eigendecomposition-based approach.

  • prediction_postfix (str (default: '_pred')) – Postfix to append to mapped variable names (including any separator, e.g. “_pred”). Use “” for no postfix.

Return type:

None

Returns:

None

Notes

Updates the following attributes:

  • query.obsm: Contains the transferred embeddings. If the reference embeddings were a pandas DataFrame, the transferred embeddings will also be a DataFrame with the same column names and the query cell names as the index.

CellMapper.plot_confusion_matrix(pred_key, *, true_key=None, subset=None, figsize=(10, 8), cmap='viridis', save=None, ax=None, show_annotation_colors=True, xlabel_position='bottom', show_grid=True, min_cells_true=None, min_cells_pred=None, show_yticklabels=True, show_xticklabels=True, normalize=None, include_values=True, values_format='.2f', values_fontsize=8, colorbar=True, vmin=None, vmax=None, title='Confusion Matrix')#

Plot a confusion matrix heatmap comparing true vs predicted labels.

Parameters:
  • pred_key (str) – Key in .obs identifying the mapped labels (from map_obs). The column f"{pred_key}{prediction_postfix}" is used as the x-axis (predicted).

  • true_key (str | None (default: None)) – Key in .obs to use for the y-axis (true labels). If None, uses pred_key. This allows comparing arbitrary columns, e.g., source_time vs mapped_time.

  • subset (ndarray | Series | None (default: None)) – Boolean mask to select a subset of cells for the confusion matrix. Must have the same length as query.obs or be a pandas Series indexed by obs_names.

  • figsize (tuple[int, int] (default: (10, 8))) – Size of the figure (width, height). Only used if ax is None.

  • cmap (str (default: 'viridis')) – Colormap to use for the heatmap.

  • save (str | Path | None (default: None)) – Path to save the figure. If None, the figure is not saved.

  • ax (Axes | None (default: None)) – Matplotlib axes to plot on. If None, a new figure and axes are created.

  • show_annotation_colors (bool (default: True)) – Whether to show colored bars along axes corresponding to category colors from adata.uns[f"{label_key}_colors"].

  • xlabel_position (Literal['bottom', 'top'] (default: 'bottom')) – Position of x-axis tick labels (“bottom” or “top”).

  • show_grid (bool (default: True)) – Whether to show gridlines on the heatmap.

  • min_cells_true (int | None (default: None)) – Minimum number of cells required for a true category to be included. If None, all true categories are shown.

  • min_cells_pred (int | None (default: None)) – Minimum number of cells required for a predicted category to be included. If None, all predicted categories are shown.

  • show_yticklabels (bool (default: True)) – Whether to show y-axis tick labels.

  • show_xticklabels (bool (default: True)) – Whether to show x-axis tick labels.

  • normalize (Optional[Literal['true', 'pred', 'all']] (default: None)) – Normalization mode: “true” (row), “pred” (column), “all” (total), or None.

  • include_values (bool (default: True)) – Whether to annotate cells with their values.

  • values_format (str (default: '.2f')) – Format string for cell values (e.g., “.2f”, “.0f”, “.1%”).

  • values_fontsize (float (default: 8)) – Font size for cell value annotations.

  • colorbar (bool (default: True)) – Whether to show a colorbar.

  • vmin (float | None (default: None)) – Minimum value for colormap normalization.

  • vmax (float | None (default: None)) – Maximum value for colormap normalization.

  • title (str | None (default: 'Confusion Matrix')) – Title for the plot. Set to None to hide.

Return type:

Axes

Returns:

Matplotlib axes with the confusion matrix plot.

static CellMapper.process_presence_scores(scores, log=False, percentile=(1, 99), minmax=True)#

Post-process presence scores with log1p, percentile clipping, and min-max normalization.

Parameters:
  • scores (DataFrame) – DataFrame of raw presence scores (rows: reference cells, columns: groups or ‘all’).

  • log (bool (default: False)) – Whether to apply log1p transformation to the scores.

  • percentile (tuple[float, float] (default: (1, 99))) – Tuple of (low, high) percentiles for clipping scores before normalization.

  • minmax (bool (default: True)) – Whether to apply min-max normalization to the scores.

Return type:

DataFrame

Returns:

pd.DataFrame Post-processed presence scores, same shape as input.

CellMapper.register_external_predictions(label_key, prediction_postfix='_pred', confidence_postfix='_conf')#

Register externally computed predictions for evaluation.

Parameters:
  • label_key (str) – Base key in .obs for the label (e.g., ‘cell_type’).

  • prediction_postfix (str (default: '_pred')) – Postfix for prediction column in .obs (e.g., ‘pred’). The full column name should be f”{label_key}_{prediction_postfix}”.

  • confidence_postfix (str (default: '_conf')) – Postfix for confidence column in .obs (e.g., ‘conf’). The full column name should be f”{label_key}_{confidence_postfix}”.

Return type:

None

Returns:

None

Notes

Updates the following attributes:

  • prediction_postfix: Postfix for prediction column.

  • confidence_postfix: Postfix for confidence column.