bike_network_traffic

Linked deck.gl + Celldega visualizations of bike-share station-to-station traffic.

Python · Jupyter · anywidget deck.gl · WebGL Celldega clustergram UMAP Engineering notes →

What is this?

Each public bike-share program in the US (Citi Bike NYC, Bluebikes Boston, Capital Bikeshare DC, Divvy Chicago) publishes monthly trip CSVs to a public S3 bucket. This project turns one month of trips into a destination-probability matrix — for every origin station, the probability distribution over its destinations — and visualizes it as a Celldega clustergram linked to a deck.gl map of the stations.

The two views are linked in the browser: hovering, selecting, or slicing on the Clustergram drives the map — focused stations, top-trip lines, and matching neighborhoods light up while the rest dim. A slider morphs the map between geographic coordinates and a UMAP embedding of the transition matrix, so behavioral neighborhoods (commute corridors, leisure routes) emerge from the geographic backdrop.

For a tour of how the two widgets actually talk to each other — the observable store that drives the map, the alpha-shape neighborhoods, and the rest of the plumbing — see the engineering notes →

Live, in your browser

Pick a city. The widget below is the linked deck.gl map + Celldega Clustergram, with no kernel attached — all interactions (hover, click, slice, the spatial-vs-UMAP slider) run purely in the browser via jsdlink.

open standalone ↗  ·  view full notebook ↗

The dropdown swaps between the four cities. Each option loads a self-contained HTML built by build_widget_htmls.ipynb via save_minimal_html(HBox([flow, cgm]), …).

All four cities

How the API works

The Python package fetches data straight from each city's S3 bucket and returns Celldega-ready DataFrames. A whole notebook is now four lines of setup:

from bike_network_traffic import (
    get_bike_data, make_station_clustergram, make_flow_widget, link_flow_to_clustergram,
)
from ipywidgets import HBox

stations, transition_prob = get_bike_data("nyc", year=2026, month=3)
mat, cgm, cluster_map = make_station_clustergram(transition_prob, n_clusters=150)
flow = make_flow_widget(stations, transition_prob, cluster_map)
link_flow_to_clustergram(flow, cgm)
HBox([flow, cgm])

get_bike_data handles the S3 listing, monthly archive download, on-disk cache (~/.cache/bike_network_traffic/), zip extraction, schema normalization (modern + legacy column names), and the destination-probability matrix construction. Single month, list of months, or a whole year all work:

get_bike_data("nyc",     year=2026, month=3)        # one month
get_bike_data("boston",  year=2025, month=[6, 7, 8]) # three months
get_bike_data("chicago", year=2025)                  # whole year

Get it running locally

git clone https://github.com/broadinstitute/celldega.git
cd celldega/...

uv venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[notebooks]"

jupyter lab

Open NYC.ipynb (or any of the four city notebooks) and run all cells.