UC San Diego undergraduate research project

Regime Interpretability

Can sparse autoencoders make learned market regime embeddings easier to interpret?

This exploratory project turns rolling multi-asset return windows into dense autoencoder embeddings, then trains a sparse autoencoder over those embeddings. The goal is not to build a trading strategy, but to test whether sparse features are easier to compare with market indicators, heuristic regime labels, simple baselines, and a sanity-check backtest.

Research Question

Making regime embeddings inspectable

Market regime models can compress useful structure while hiding the reasons behind each representation. This project asks whether an overcomplete sparse dictionary can expose a smaller set of active features that line up with familiar market stress, momentum, rate, and cross-asset behavior.

What is learned?

A dense autoencoder compresses flattened 20-day windows of standardized returns into 32-dimensional latent embeddings.

What is interpreted?

A sparse autoencoder reconstructs those embeddings using a 128-feature dictionary with TopK activation.

What is checked?

Sparse features are compared with VIX, yield spread, SPY momentum, cross-asset correlation, HMM/K-means labels, ablations, and a simple backtest.

Why This Matters

Interpretability before trading claims

A regime representation is more useful for research review if a human can inspect what it responds to. The repo treats economic interpretation as a sanity-check layer around learned features, not as evidence of alpha generation or production trading performance.

Research value

  • Separate learned representation quality from economic storytelling.
  • Compare sparse feature activity against external market indicators.
  • Use simple baselines so the sparse model is not evaluated in isolation.

Scope control

  • Backtest outputs are downstream sanity checks, not a deployable strategy.
  • Feature labels are heuristic and correlation-based.
  • Generated results should be rerun before being quoted.

Methodology

Two-stage representation learning

The implementation keeps the modeling stack intentionally small: first learn a compact dense embedding of market windows, then decompose those embeddings into sparse activations that are easier to inspect feature-by-feature.

1. Window construction

Download daily market prices with yfinance, compute log returns, standardize rolling windows, and split chronologically into train, validation, and test sets.

2. Dense autoencoder

Train an MLP autoencoder on flattened windows and save frozen latent embeddings for the full sample and each split.

3. Sparse autoencoder

Train a TopK sparse autoencoder on the dense embeddings, then inspect feature activations, dominant features, and dictionary outputs.

Model Pipeline

From returns to interpretable checks

The project flow is designed to keep artifacts inspectable at each stage, from processed tensors to feature-label tables and generated figures.

01Market dataETF and indicator prices from 2015-10-01 to 2025-09-30.
02Rolling windowsStandardized 20-day log-return windows across the configured universe.
03Dense AEMLP autoencoder compresses each window into a 32-dimensional embedding.
04Sparse AEOvercomplete 128-feature dictionary with TopK 8 active features.
05InterpretationCorrelations, event heatmaps, UMAP views, and heuristic labels.
06ComparisonsHMM, K-means, ablations, and a backtest sanity check.

Dataset / Market Universe

Multi-asset context, small enough to audit

The configured universe mixes equity, sector, bond, credit, commodity, dollar, emerging-market, and volatility-linked series. External indicators are used for interpretation checks rather than as ground-truth labels.

Configured tickers

These are the 15 symbols currently listed in config.yaml.

SPYQQQIWMXLFXLE XLKTLTIEFHYGGLD SLVDBAUUP^VIXEEM
Sample range2015-10-01 through 2025-09-30 in the current config.
Window size20 trading days per flattened return window.
Dense latent size32 dimensions after the dense autoencoder encoder.
Sparse dictionary128 features with TopK 8 active features per sample.
External checksVIX level, VIX 5-day change, yield spread proxy, SPY 20-day momentum, and rolling cross-asset correlation.

Interpretability Checks

Feature meaning is treated as evidence to inspect

The analysis scripts produce feature-level tables and plots that make it easier to ask whether a sparse feature corresponds to a recognizable market condition.

Indicator correlations

Correlate sparse feature activations with external indicators, then assign heuristic labels only when the correlation passes the configured threshold.

Event heatmaps

Aggregate feature activation across selected stress windows such as the COVID crash, 2022 hiking cycle, and detected drawdown periods.

Baseline comparisons

Compare sparse dominant-feature transitions with Gaussian HMM and PCA plus K-means regime labels.

Outputs / Artifacts

Files generated by the pipeline

Generated artifacts are intentionally left out of the repository except for the empty results/ placeholder. Running the scripts locally populates the directory.

Processed data

data/processed/window_data.pt, split tensors, price panels, returns, and external indicator panels.

Model artifacts

Autoencoder checkpoints, dense embeddings, sparse checkpoints, dictionary weights, and sparse outputs.

Interpretability tables

Feature activations, feature-indicator correlations, heuristic labels, event heatmaps, and summary JSON files.

Figures and sanity checks

Training curves, UMAP views, feature heatmaps, baseline summaries, ablation tables, and equity-curve plots.

Limitations

Honest caveats

The project is best read as an exploratory interpretability study. It is not a claim that sparse autoencoders discover durable trading signals.

Data dependence

The pipeline depends on Yahoo Finance availability, symbol coverage, and adjusted-price consistency through yfinance.

Heuristic labels

Feature labels come from correlations with selected indicators, so they are useful prompts for inspection rather than verified economic definitions.

Simple baselines

HMM and K-means comparisons are intentionally lightweight and not heavily tuned.

Backtest scope

The backtest is a sanity check with simplifying assumptions, not a production trading strategy or alpha claim.

How to Reproduce

Run the full local pipeline

The scripts are command-line friendly and use config.yaml for the main experimental settings.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

python data/download.py --config config.yaml
python train/train_ae.py --config config.yaml
python train/train_sparse.py --config config.yaml
python analysis/interpretability.py --config config.yaml
python analysis/baselines.py --config config.yaml
python analysis/backtest.py --config config.yaml
python analysis/ablation.py --config config.yaml

The generated backtest metrics should be interpreted as a downstream plausibility check. They are not evidence of a deployable trading system.