MNIST_Celldega
Linked deck.gl MNIST viewer + Celldega clustergram of Leiden clusters of handwritten digits.
What is this?
The MNIST 70k handwritten-digit corpus, embedded with PCA, neighbor-graphed, and Leiden-clustered with Scanpy into ~100 image clusters. Each cluster is summarized by its mean ink-per-pixel image; the matrix of (top-N pixels × clusters) is rendered as a Celldega clustergram, paired with a deck.gl viewer that draws the cluster averages and individual example digits.
The two views are linked in the browser: clicking a column label
(cluster) in the clustergram zooms the viewer in on that cluster's average and
examples. Clicking a row label (pixel) marks that pixel's position on every average.
Clicking a column-axis dendrogram averages the selected clusters into one
merged image; clicking a row-axis dendrogram tints each cluster's tile by its
mean ink across the selected pixels (so “which clusters paint the center
column?” or “which paint the bottom-left curl?” pop visually).
Clicking a Majority-digit: color on the column strip filters the viewer
to clusters of that digit.
The row strip shows a value-based Center attribute (each pixel's radial
proximity to the image center). Reordering rows by Center surfaces simple
structural patterns — for example, Zero and Seven have very
little ink near the image center compared to other digits, so reordering by
Center stratifies their cluster columns from the rest.
Inspired by the original Clustergrammer MNIST_heatmaps notebook, but rebuilt as a linked anywidget pair following the bike_network_traffic pattern.
Live, in your browser
Pick a variant. The widget below is the linked deck.gl viewer + Celldega Clustergram,
with no kernel attached — all interactions (hover, click, dendrogram select,
color-strip filter) run purely in the browser via jsdlink.
The dropdown swaps between the unbiased combined clustering and one per-digit clustering.
Each option loads a self-contained HTML built by build_widget_htmls.ipynb
via nbconvert.HTMLExporter(exclude_input=True).
All digits
How the API works
The Python package loads MNIST via sklearn.datasets.fetch_openml (cached
on disk), Leiden-clusters with Scanpy, then aggregates into Celldega-ready DataFrames.
The notebook is four lines of setup:
from mnist_celldega import (
get_mnist_data, cluster_mnist,
make_mnist_clustergram, make_mnist_viewer_widget, link_viewer_to_clustergram,
)
from ipywidgets import HBox
ds = get_mnist_data()
clusters = cluster_mnist(ds, mode='all', n_clusters=100)
mat, cgm = make_mnist_clustergram(clusters)
viewer = make_mnist_viewer_widget(clusters)
link_viewer_to_clustergram(viewer, cgm)
HBox([viewer, cgm])
Per-digit variants reuse the same pipeline with mode='digit', digit=N:
clusters = cluster_mnist(ds, mode='digit', digit=7, n_clusters=10) # ... rest is identical
Get it running locally
git clone https://github.com/cornhundred/MNIST_Celldega.git cd MNIST_Celldega uv venv --python 3.11 source .venv/bin/activate uv pip install -e ".[notebooks]" (cd js && npm install && npm run build) jupyter lab
Open MNIST.ipynb and run all cells, or run build_widget_htmls.ipynb to regenerate every variant under ./.