Jesper Armouti-Hansen

Selected work

Analysis, modeling, and research code

Each project lists the question it addresses, the method, and the result.

511 contributions in the last year Less More

Languages
  • Python 79%
  • TeX 12%
  • TypeScript 8%
  • JavaScript 1%
  • Stata <1%

Applied analysis & modeling

Data work aimed at a decision: choosing the right measure, building the analysis, and being clear about what it does and doesn't support — plus the search, ranking, and tool-calling code that goes with it.

01 Case study · Operational analysis

CitiBike Demand, Risk, and Net Flow

Station- and trip-level analysis of demand, net flow, and collision-adjusted risk across New York's bike-share network.

Problem
What can CitiBike trip data and NYPD crash records, together, tell an operator and an insurer?
Approach
Three analyses on 2023–2025 trips: demand and usage patterns; a per-trip crash-risk measure by station and time (NYPD crashes over trip exposure, empirical-Bayes smoothed); and a net-flow imbalance classifier for rebalancing.
Result
An interpretable per-trip risk measure an insurer could use as a rating input, demand patterns showing seasonality and per-station stagnation, and stable net-flow patterns a classifier predicts to guide rebalancing.

Python · pandas · feature engineering · risk analysis · prediction

Station-level crash risk LowerHigher

Each dot is a station; amber flags the highest exposure-adjusted per-trip risk.

02 GitHub project · Retrieval system

RAG Search Engine

A hybrid movie-search engine combining lexical and semantic retrieval, reranking, image search, and RAG answers.

Problem
Plain search fails when a query is vague, visual, or more description than title.
Approach
Combined BM25, embeddings, reciprocal rank fusion, CLIP, reranking, and caching, with a debug trace through the pipeline.
Result
An evaluated pipeline: RRF search scored on precision, recall, and F1 against a golden set, with a debug mode that traces a query through each stage.

Python · RAG · BM25 · embeddings · CLIP · evaluation

Search pipeline

Hybrid retrieval, fused and reranked, then answered with citations.

03 GitHub project · AI tooling

Minimal Coding Agent

A small Gemini-powered coding agent that reads files, runs scripts, and edits code through explicit tools.

Problem
How much can a coding agent do with only a few explicit tools and no framework?
Approach
Implemented file inspection, script execution, and edits behind explicit, inspectable tool calls.
Result
A working agent that reads, runs, and edits code in a loop, with every tool call explicit.

Python · tool-calling · Gemini API · CLI

Agent run

An example run — the agent finds a failing test and fixes it.

04 In development · Python package

choicekit

A scikit-learn-compatible package for discrete choice modeling on wide-form data, in development.

Problem
Discrete-choice packages expect long-form data and their own fit loops, so they sit outside scikit-learn's cross-validation, pipelines, and tuning.
Approach
Estimators inherit from scikit-learn's BaseEstimator and read wide-form X, y — one row per choice situation, {alt}_{feature} columns — so they drop straight into GridSearchCV, cross-validation, and pipelines.
Result
Intended interface: a ConditionalLogitClassifier you tune with GridSearchCV like any sklearn model. Early-stage and not yet released.

Python · scikit-learn · discrete choice · conditional logit

sklearn-native usage

Intended interface: a choicekit estimator tuned with GridSearchCV on wide-form X, y — one row per choice situation.

Research & replication

Research code: published replication packages and methods work in economics, tested and reproducible.

01 Published · GEB 2024 · Replication package

Efficiency Wages with Motivated Agents

Replication data and code for a published paper on wages, motivation, and effort.

Problem
Do wage incentives and prosocial motivation reinforce each other, or work through separate channels?
Approach
Built reproducible analyses around experimental data and documented the paper's online appendix.
Result
A peer-reviewed empirical project with a complete replication package.

Python · Stata · experimental data · replication

Chosen effort by wage

Mean chosen effort by offered wage, with 95% confidence bands.

02 Manuscript + replication package · Inference & simulation

The Informativeness of Frequency-Report Scoring Rules

Manuscript, simulations, and replication code for belief elicitation from frequency reports.

Problem
A count report is observable; the belief behind it is not. How much does the report actually pin down?
Approach
Characterized the identified set of beliefs behind each report under three scoring rules, then checked the bounds with simulation.
Result
No rule dominates: squared-distance gives the sharpest bounds when beliefs concentrate on a few categories, frequency-guessing when they're spread out, and Manhattan distance rarely wins but holds up across regimes.

Python · simulation · pytest · scoring rules

Sharpest-bound win share

Share of cases where each scoring rule gives the sharpest belief bounds, by belief concentration α.

03 GitHub project · Model comparison

Economic Theories and Machine Learning

Code comparing economic theories against machine-learning benchmarks on behavioral data.

Problem
When a theory predicts behavior, how much of the predictable variation does it actually capture?
Approach
Benchmarked theory-based specifications against machine-learning models and examined the remaining predictive gap.
Result
Theory specifications scored on out-of-sample prediction and compared against the machine-learning benchmark.

Python · machine learning · model evaluation

Completeness by model heterogeneity

Completeness climbs as the model captures more individual heterogeneity — a single type is far from enough, and even a handful of latent types don't exhaust it.

Also on GitHub

Smaller builds from learning new tools — a notes site, a static-site generator, and a few games and CLIs.

Personal Knowledge System Static Site Generator BookBot Asteroids Maze Solver