bike_network_traffic
Linked deck.gl + Celldega visualizations of bike-share station-to-station traffic.
What is this?
Each public bike-share program in the US (Citi Bike NYC, Bluebikes Boston, Capital Bikeshare DC, Divvy Chicago) publishes monthly trip CSVs to a public S3 bucket. This project turns one month of trips into a destination-probability matrix — for every origin station, the probability distribution over its destinations — and visualizes it as a Celldega clustergram linked to a deck.gl map of the stations.
The two views are linked in the browser: hovering, selecting, or slicing on the Clustergram drives the map — focused stations, top-trip lines, and matching neighborhoods light up while the rest dim. A slider morphs the map between geographic coordinates and a UMAP embedding of the transition matrix, so behavioral neighborhoods (commute corridors, leisure routes) emerge from the geographic backdrop.
For a tour of how the two widgets actually talk to each other — the observable store that drives the map, the alpha-shape neighborhoods, and the rest of the plumbing — see the engineering notes →
Live, in your browser
Pick a city. The widget below is the linked deck.gl map + Celldega Clustergram, with no
kernel attached — all interactions (hover, click, slice, the spatial-vs-UMAP slider) run
purely in the browser via jsdlink.
The dropdown swaps between the four cities. Each option loads a self-contained
HTML built by build_widget_htmls.ipynb via
save_minimal_html(HBox([flow, cgm]), …).
All four cities
How the API works
The Python package fetches data straight from each city's S3 bucket and returns Celldega-ready DataFrames. A whole notebook is now four lines of setup:
from bike_network_traffic import (
get_bike_data, make_station_clustergram, make_flow_widget, link_flow_to_clustergram,
)
from ipywidgets import HBox
stations, transition_prob = get_bike_data("nyc", year=2026, month=3)
mat, cgm, cluster_map = make_station_clustergram(transition_prob, n_clusters=150)
flow = make_flow_widget(stations, transition_prob, cluster_map)
link_flow_to_clustergram(flow, cgm)
HBox([flow, cgm])
get_bike_data handles the S3 listing, monthly archive download, on-disk
cache (~/.cache/bike_network_traffic/), zip extraction, schema
normalization (modern + legacy column names), and the destination-probability matrix
construction. Single month, list of months, or a whole year all work:
get_bike_data("nyc", year=2026, month=3) # one month
get_bike_data("boston", year=2025, month=[6, 7, 8]) # three months
get_bike_data("chicago", year=2025) # whole year
Get it running locally
git clone https://github.com/broadinstitute/celldega.git cd celldega/... uv venv --python 3.11 source .venv/bin/activate uv pip install -e ".[notebooks]" jupyter lab
Open NYC.ipynb (or any of the four city notebooks) and run all cells.