CitiBike NYC: Demand, Risk, and Net Flow Analysis (2023

1. Overview

1.1 Background

CitiBike is New York City's largest bikeshare system, used for millions of short trips each month. Despite its scale and role as public-transport substitute, riders currently receive no accident or liability insurance. Use of the system requires accepting a broad liability waiver, meaning riders effectively self-insure against injury, medical costs, third-party liability, and damage to the bike.

CitiBike itself is also not fully insulated: waivers do not cover incidents involving maintenance issues, equipment failure, or negligence. This leaves residual legal exposure, reputational risk, and potential settlement pressure.

Other micromobility providers have taken a different approach. For example, Lime partnered with Allianz to provide automatic accident and liability insurance for riders. A similar model may be feasible for CitiBike but requires high-resolution, data-driven understanding of location and time-specific risk.

Key idea: We combine CitiBike trip data with NYPD crash data to construct a transparent, interpretable risk per trip measure by station, time of day, and their interaction. This enables user communication (e.g., in-app warnings), pricing of insurance products, and targeted safety interventions.

1.2 Data & Time Frame

CitiBike trip data: January 2023 - October 2025 (station, timestamps, bike type, user type, start and end location).
NYPD collision data: January 2023 - October 2025 (crash records including severity, injuries, fatalities, and location).

1.3 Structure

Section 2: How CitiBike is used (demand, net flow, usage patterns).
Section 3: Construction and interpretation of the risk measure.
Section 4: Predicting net flow imbalances to support rebalancing operations.
Section 5: Strategic implications for CitiBike and a potential insurance partner.

1.4 Key Findings

Section 2 — Data Analysis
- Demand & maturity: Strong seasonal peaks; growth from 2023 to 2024 but clear stagnation in 2025 -> CitiBike has entered a mature phase.
- Usage structure: Extremely stable weekday/weekend split and hourly commuter peaks; trips are short.
- Imbalance: A small set of stations act as persistent sources/sinks and drive most rebalancing needs.
- Operational timing: Weekends (especially Sundays) are best for major rebalancing and repairs; daily rebalancing is most effective before/after the morning and early-evening peaks, and repairs fit best overnight.
- Actionable insights: Prioritize e-bike availability at peak times; target winter-demand stimulation; focus rebalancing on the structural source/sink stations.
Section 3 — Risk Analysis
- Spatial risk: Most stations are low-risk, but a small number form persistent medium- or high-risk clusters.
- Temporal risk: Time-of-day dominates weekday differences: late evening and night show 1.5*-4* higher risk than daytime; weekday patterns are nearly identical.
- Station * time-of-day: Some stations that are safe at midday become high-risk at night -> aggregate station averages conceal meaningful temporal spikes.
- Actionable insights: Warn riders at high-risk stations/times; use station*time risk for insurance pricing; target nighttime location hotspots for safety improvements.
Section 4 — Net Flow Prediction
- Imbalance patterns: Most stations are balanced; extreme imbalances are predictable and recur at the same locations.
- Prediction approach: Classify stations into under-supply / balanced / over-supply using lagged net flow, calendar structure, and spatial features.
- Actionable insights: Enable proactive rebalancing before stations empty/fill, reducing service failures and operational costs.

2. Data Analysis: CitiBike (2023–2025)

This section summarizes CitiBike demand based strictly on the exploratory data analysis. The analysis investigates: (i) overall daily demand, (ii) net flow and structural imbalances, (iii) usage patterns across bike type, membership, weekday/weekend, and hour of day, and (iv) trip duration and distance distributions.

2.1 Demand & System Maturity

[Figure: Daily usage and average daily demand per station (30-day rolling mean)]

Daily usage shows strong seasonal variation, with high ridership every summer and clear slowdowns in winter.
Demand increases from 2023 to 2024, but the increase from 2024 to 2025 is much smaller, indicating a clear slowdown in growth.
Peak summer levels in 2025 are only slightly above 2024, and winter lows nearly overlap implying growth stagnation.
Per-station usage (bottom panel) shows the same seasonal pattern and similarly minimal growth from 2024 to 2025.

Actionable insights:

Winter demand stimulation: promotional campaigns, discounted rides, or seasonal memberships.
Acquire non-users: insurance partnerships can reduce perceived safety risks and help expand the user base.

2.2 Net Flow & Imbalance

Net flow for station \(j\) and day \(t\) is defined as:

\[ \text{NetFlow}_{j,t} = A_{j,t} - D_{j,t}, \]

with A: arrivals, D: departures. Negative values indicate stations that tend to empty, positive values stations that tend to fill.

[Figure: System-wide average absolute net flow + top persistent source/sink stations]

The system-wide average absolute net flow rises above 5 bikes/station in summer and falls below ~3.5 bikes/station in winter.
This seasonal imbalance pattern is highly consistent across 2023, 2024, and 2025.
Short-term imbalance spikes occur potentially due to weather events, usage surges, or outages.
A small set of stations exhibit large persistent net losses or gains. These stations form structural sources and sinks.

Actionable insights:

Net flow prediction is needed.
Use forecasts for targeted redistribution to prevent empty or full stations.

2.3 Usage Patterns: Who, When, and How People Ride

Bike Type

[Figure: Bike type shares]

E-bike usage increases steadily from 2023 to 2025, but the rate of increase is slowing.
Classic bike usage remains stable: a consistent baseline user group.
The composition is stable across seasons: users do not switch bike types systematically in winter.

Membership vs Casual

[Figure: Membership shares]

Members consistently account for the majority of trips; casual use rises sharply in summer (22-24%) and drops in winter (10-12%).
The member/casual ratio shows no year-to-year change: composition is structurally stable.

Actionable insight:

Convert casual users to members: members ride more consistently in winter -> targeted winter promotions may increase low-season demand.

Usage by Weekday

[Figure: Weekday vs Weekend usage]

Weekday usage in general exceeds weekend usage -> demand is primarily commuter-driven.
The weekday/weekend split is stable over time.

Usage by Hour

[Figure: Hourly usage]

Clear commuting peaks: morning (7-10) and evening (17-19).
Seasonal variation in evening and nighttime share is modest.
Hourly usage patterns are stable across years.

Actionable insights:

Prioritize peak-hour availability: ensure bikes in residential areas in the morning and docks in business areas in the evening.
Use overnight hours for maintenance: very low demand makes this the ideal repair window.
Stable hourly patterns enable precise rebalancing schedules.

Trip Duration & Distance

[Figure: Trip duration and distance]

Trip durations and distances are tightly concentrated -> short urban trips dominate.

2.4 Summary

CitiBike demand is strongly seasonal and growth has slowed since 2024.
Net flow imbalances are structural and predictable, concentrated in a small set of stations.
Usage patterns (bike type, membership, weekday/weekend, hourly) are stable across years.
Trips are short and consistent in duration and distance.
These regularities support risk estimation (Section 3) and net-flow prediction (Section 4).

3. Risk Analysis

We combine CitiBike trips with NYPD collision data to estimate an exposure-adjusted, Empirical-Bayes-smoothed risk per trip by station, time of day, and station * time-of-day.

3.1 Assigning Crashes to Stations

Crashes are assigned to the nearest station using a BallTree with Haversine distance, subject to a strict 300 m cutoff so that only crashes plausibly related to CitiBike trips are included.

3.2 Risk per Trip

For station \(j\) and time bucket \(b\):

Hazard \(H_{j,b}\): sum of crash severities (\(S_{i,b}\) [see section 3.3]) near the station
Exposure \(E_{j,b}\): number of CitiBike departures

Raw risk: \[ R_{j,b} = \frac{H_{j,b}}{E_{j,b} + \epsilon}. \] To stabilize noisy ratios, we apply Empirical Bayes smoothing:

\[ R_{j,b}^{\mathrm{EB}} = \lambda_{j,b} R_{j,b} + (1 - \lambda_{j,b}) \mu_b, \qquad \lambda_{j,b} = \frac{E_{j,b}}{E_{j,b} + C}, \]

where \(\mu_b\) is a time-of-day specific mean risk; \(C\) is a credibility constant that controls the strength of shrinkage toward the mean. Hence, stations with low exposure (few trips) have their risk estimates shrunk more strongly toward the global mean.

3.3 Crash Severity

Crash severity \(S_i\) weights injuries and fatalities heavily, and assigns much higher severity to cyclist-involved crashes. The specification ensures that:

non-cyclist crashes still contribute to hazard,
cyclist crashes are strongly up-weighted,
injuries and fatalities dominate the severity score.

3.4 Station-Level Risk

Hazard and exposure are aggregated to obtain EB-smoothed station-level risk: \[ R_j^{\mathrm{EB}} = \lambda_j R_j + (1-\lambda_j)\mu. \]

[Figure: Station-level EB risk map]

Key findings from the analysis:

Most stations have low to medium per-trip risk.
A small set of stations exhibit high or very high risk, driven by repeated cyclist crashes, injuries/fatalities, or low exposure.
Level of risk at the station-level tend to cluster geographically, indicating

local environmental or infrastructural factors.
Interpretation: Station-level risk can inform location-based safety warnings and insurance pricing.

3.5 Time-of-Day & Weekday Risk

Hazard and exposure are aggregated to obtain EB-smoothed time-level risk: \[ R_b^{\mathrm{EB}} = \lambda_b R_b + (1-\lambda_b)\mu. \]

[Figure: EB risk by time of day]

Findings:

Time of day dominates risk.
Morning, midday, and early evening have similarly low risk.
Late evening approx. 1.5* midday risk.
Night = 3-4* daytime risk — the largest single effect.
Weekday differences are minimal; weekends slightly higher.

Interpretation: System-wide risk is not commuter-driven; it is strongly shaped by visibility and nighttime traffic conditions.

3.6 Station x Time-of-Day Risk

[Figure: Station x time-of-day risk grid]

Findings:

Some stations are safe during the day but high-risk at night.
Some stations have evening-specific spikes linked to commuting patterns.
Some stations remain high-risk at all times, consistent with structural design issues.

3.7 Use for CitiBike & Insurers

User safety alerts: highlight high-risk stations and nighttime conditions.
Insurance pricing: use the risk measure for risk-based premiums.
Operational targeting: identify night-time or location-specific safety improvement opportunities.

4. Net Flow Prediction

Net flow determines where CitiBike needs to rebalance. For station \(j\) and day \(t\):

\[ \text{NetFlow}_{j,t} = A_{j,t} - D_{j,t}, \]

where negative values indicate emptying stations and positive values indicate filling stations.

4.1 From Distribution to Prediction Target

[Figure: Distribution of station-day net flows]

Most station-days lie near zero (balanced).
Operational concern lies in the tails: large negative or positive net flows.
Source–sink patterns are persistent and seasonal → structurally predictable.

We therefore predict imbalance classes rather than exact values.

4.2 Ternary Imbalance Classification

For each station-day \((j,t)\):

\[ y_{j,t} = \begin{cases} -1, & \text{if } \text{NetFlow}_{j,t} < -5, \\ 0, & \text{if } |\text{NetFlow}_{j,t}| \le 5, \\ +1, & \text{if } \text{NetFlow}_{j,t} > 5. \end{cases} \]

-1: likely under-supply (emptying station),
0: balanced,
+1: likely over-supply (filling station).

4.3 Prediction Setup and Data Split

We predict tomorrow's imbalance class \(y_{j,t+1}\) for each station using only information available up to day \(t\).

Horizon: one-day-ahead prediction of \(y_{j,t+1} \in \{-1,0,+1\}\).
Features available at \(t\): lagged net flow, lagged weather, calendar variables, and station location.
Split: train on 2023-2024, validate on early 2025, test on late 2025.

Time-series cross-validation would be ideal for hyperparameter tuning but is computationally expensive for the full data set. A time-consistent train-validation-test split provides an efficient and clean alternative.

4.4 Baseline Models

Two simple baselines provide reference points for evaluating model improvements.

Baseline 0 — Always balanced: predict class 0 for all station-days.
Baseline 1 — Persistence: predict \(y_{j,t+1} = y_{j,t}\).

Because under- and over-supply cases are rare, accuracy is uninformative. We instead report Macro-F1 (balanced performance across classes) and per-class recall (ability to detect rare but critical events).

Evaluation metrics.

We report Macro-F1 and per-class recall, which are appropriate for this imbalanced three-class setting. The F1-score for a class is the harmonic mean of precision and recall, balancing how often the model’s positive predictions are correct (precision) with how many true cases it successfully detects (recall). Macro-F1 computes the F1-score separately for each class (−1, 0, +1) and then averages them with equal weight, preventing the majority class from dominating performance. Per-class recall measures the share of actual instances of each class that the model identifies correctly; high recall for −1 and +1 is operationally crucial because missing an emptying or filling station is far more costly than misclassifying a balanced one. These metrics therefore capture performance on the rare but important imbalance events that CitiBike cares about.

4.5 Predictive Models and Features

Each station-day is characterized by lagged imbalance signals, calendar structure, station location, and lagged weather — all observable at prediction time.

Net-flow dynamics: lag-1, lag-7, rolling mean.
Calendar: weekday/weekend, month.
Station: latitude, longitude.
Weather (lagged): temperature, rain, snow.

We estimate two models:

Multinomial logistic regression: simple, interpretable linear benchmark.
Gradient-boosted trees: flexible non-linear model capturing spatial-temporal interactions.

Class weights compensate for imbalance during training. These make the model pay more attention to the rare −1 and +1 cases by penalizing their misclassification more strongly, preventing the model from defaulting to the majority class. All tuning follows the time-consistent train-validation setup.

4.6 Results

Models are evaluated on the test set using Macro-F1 and per-class recall. Accuracy is not emphasized because the data are strongly imbalanced.

Baseline Models

The two baseline models provide reference points. Baseline 0 (always predicting “balanced”) achieves high accuracy but detects none of the exporter or importer cases (Macro-F1 ≈ 0.28). Baseline 1 (persistence), which predicts tomorrow’s class as today’s, performs better (Macro-F1 ≈ 0.38) by exploiting simple day-to-day persistence, but still identifies only about 16–19% of exporter/importer station-days.

Multinomial Logistic Regression

The multinomial logistic regression with class weights improves minority detection substantially. It identifies roughly 50–56% of exporter and importer station-days and reaches a Macro-F1 ≈ 0.40. This confirms that lagged net flow, calendar structure, and weather features contain useful predictive signal, but the linear decision boundary limits performance on more complex spatial– temporal patterns.

Gradient-Boosted Trees (CatBoost)

The CatBoost gradient-boosting model is the best-performing approach. On the test set it achieves:

Exporter recall: 0.60
Importer recall: 0.60
Balanced recall: 0.57
Macro-F1: 0.51

Compared to Baseline 0, CatBoost improves Macro-F1 by about +23 points (0.28 → 0.51) and raises minority-class recall from 0% to 60%. Relative to the persistence baseline, minority recall improves by more than a factor of three (≈0.16–0.19 → 0.60). Compared to logistic regression, CatBoost increases Macro-F1 from roughly 0.40 to 0.51 and improves both exporter and importer recall (from ≈0.50–0.56 to 0.60), reflecting its ability to capture nonlinear spatial–temporal and weather interactions.

Trade-offs and Interpretation

CatBoost attains this gain by predicting minority classes more often, which lowers recall for the balanced class (≈0.57 vs. ≈0.78 for the persistence baseline). This trade-off is appropriate for the task: misclassifying some balanced days as imbalanced has low operational cost, whereas missing exporter or importer days leads directly to empty or full stations. Overall, CatBoost makes the correct trade-off for proactive rebalancing and is the preferred model for net-flow imbalance prediction.

Future Model Refinement

Although CatBoost performs best, several refinements are possible. Additional features such as improved weather representations, or station-level clustering may strengthen predictions. More targeted hyperparameter tuning and probabilistic calibration could further improve minority-class detection. Exploring alternative gradient-boosting models may also offer potential performance gains.

4.7 Operational Value

Proactive rebalancing: identify tomorrow's problematic stations.
Fewer outages: reduce empty/full stations and lost rides.
Cost efficiency: target redistribution to where imbalance is predicted, not just observed.

5. Conclusion & Strategic Takeaways

5.1 Main Findings

Mature but seasonal demand: strong, repeatable summer peaks and winter lows; clear stagnation in per-station growth after 2024.
Highly structured usage: stable weekday dominance, time-of-day commuter peaks, and short trip lengths.
Structural net-flow imbalance: a small set of persistent source/sink stations drive most rebalancing needs.
Granular crash risk: most stations are low-risk, but a small number form medium-very-high-risk clusters; risk is 3x-4x higher at night.
Predictability: both risk and imbalance have strong, stable spatial and temporal structure, enabling reliable forecasting and targeted interventions.

5.2 Proposed Actions

Implement a CitiBike ride-insurance offering: leverage the risk-per-trip measure to design per-ride or membership-based insurance that reduces perceived personal and third-party risk, helping attract risk-averse non-users and increase overall demand.
Stimulate winter demand: use seasonal pricing, promotions, or member campaigns to lift structurally low winter ridership.
Prioritize rebalancing on structural source/sink stations: concentrate redistribution where persistent extreme net flows occur.
Exploit temporal structure for operations: schedule major rebalancing and repairs on weekends (especially Sundays), and use overnight hours for maintenance when demand is minimal.
Targeted safety communication: provide simple in-app warnings for high-risk stations and nighttime periods.

5.3 Value for CitiBike

Higher demand and user acquisition: insurance reduces perceived risk and can convert hesitant non-users, benefiting a maturing system with slowed natural growth.
Greater service reliability: net-flow predictions allow proactive rebalancing and fewer empty/full stations.
More efficient operations: stable temporal patterns support smarter scheduling of rebalancing, repairs, and routing.
Data-driven safety management: station- and time-specific risk tiers highlight where interventions or advocacy can be most effective.

5.4 Value for an Insurance Partner

Granular, interpretable rating factors: EB-smoothed risk-per-trip measures by station and time-of-day support fair, data-driven pricing.
Portfolio insight and segmentation: clear identification of low-risk areas versus concentrated high-risk pockets.
Focused mitigation opportunities: nighttime and hotspot risks offer concrete entry points for joint safety initiatives that improve long-run loss performance.

5.5 Next Steps

Refine the risk measure (severity weights, exposure definitions, potential covariates).
Co-develop an insurance prototype (pricing concepts, coverage design, simple user-facing interface).
Build an interactive dashboard for jointly exploring demand, risk, and imbalance predictions.