MNIST_Celldega

Linked deck.gl MNIST viewer + Celldega clustergram of Leiden clusters of handwritten digits.

Python · Jupyter · anywidget deck.gl · WebGL Celldega clustergram Scanpy / Leiden clustering MNIST 70k

What is this?

The MNIST 70k handwritten-digit corpus, embedded with PCA, neighbor-graphed, and Leiden-clustered with Scanpy into ~100 image clusters. Each cluster is summarized by its mean ink-per-pixel image; the matrix of (top-N pixels × clusters) is rendered as a Celldega clustergram, paired with a deck.gl viewer that draws the cluster averages and individual example digits.

The two views are linked in the browser: clicking a column label (cluster) in the clustergram zooms the viewer in on that cluster's average and examples. Clicking a row label (pixel) marks that pixel's position on every average. Clicking a column-axis dendrogram averages the selected clusters into one merged image; clicking a row-axis dendrogram tints each cluster's tile by its mean ink across the selected pixels (so “which clusters paint the center column?” or “which paint the bottom-left curl?” pop visually). Clicking a Majority-digit: color on the column strip filters the viewer to clusters of that digit.

The row strip shows a value-based Center attribute (each pixel's radial proximity to the image center). Reordering rows by Center surfaces simple structural patterns — for example, Zero and Seven have very little ink near the image center compared to other digits, so reordering by Center stratifies their cluster columns from the rest.

Inspired by the original Clustergrammer MNIST_heatmaps notebook, but rebuilt as a linked anywidget pair following the bike_network_traffic pattern.

Live, in your browser

Pick a variant. The widget below is the linked deck.gl viewer + Celldega Clustergram, with no kernel attached — all interactions (hover, click, dendrogram select, color-strip filter) run purely in the browser via jsdlink.

open standalone ↑

The dropdown swaps between the unbiased combined clustering and one per-digit clustering. Each option loads a self-contained HTML built by build_widget_htmls.ipynb via nbconvert.HTMLExporter(exclude_input=True).

All digits

How the API works

The Python package loads MNIST via sklearn.datasets.fetch_openml (cached on disk), Leiden-clusters with Scanpy, then aggregates into Celldega-ready DataFrames. The notebook is four lines of setup:

from mnist_celldega import (
    get_mnist_data, cluster_mnist,
    make_mnist_clustergram, make_mnist_viewer_widget, link_viewer_to_clustergram,
)
from ipywidgets import HBox

ds = get_mnist_data()
clusters = cluster_mnist(ds, mode='all', n_clusters=100)
mat, cgm = make_mnist_clustergram(clusters)
viewer = make_mnist_viewer_widget(clusters)
link_viewer_to_clustergram(viewer, cgm)
HBox([viewer, cgm])

Per-digit variants reuse the same pipeline with mode='digit', digit=N:

clusters = cluster_mnist(ds, mode='digit', digit=7, n_clusters=10)
# ... rest is identical

Get it running locally

git clone https://github.com/cornhundred/MNIST_Celldega.git
cd MNIST_Celldega

uv venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[notebooks]"
(cd js && npm install && npm run build)

jupyter lab

Open MNIST.ipynb and run all cells, or run build_widget_htmls.ipynb to regenerate every variant under ./.