What is learned?
A dense autoencoder compresses flattened 20-day windows of standardized returns into 32-dimensional latent embeddings.
UC San Diego undergraduate research project
Can sparse autoencoders make learned market regime embeddings easier to interpret?
This exploratory project turns rolling multi-asset return windows into dense autoencoder embeddings, then trains a sparse autoencoder over those embeddings. The goal is not to build a trading strategy, but to test whether sparse features are easier to compare with market indicators, heuristic regime labels, simple baselines, and a sanity-check backtest.
Research Question
Market regime models can compress useful structure while hiding the reasons behind each representation. This project asks whether an overcomplete sparse dictionary can expose a smaller set of active features that line up with familiar market stress, momentum, rate, and cross-asset behavior.
A dense autoencoder compresses flattened 20-day windows of standardized returns into 32-dimensional latent embeddings.
A sparse autoencoder reconstructs those embeddings using a 128-feature dictionary with TopK activation.
Sparse features are compared with VIX, yield spread, SPY momentum, cross-asset correlation, HMM/K-means labels, ablations, and a simple backtest.
Why This Matters
A regime representation is more useful for research review if a human can inspect what it responds to. The repo treats economic interpretation as a sanity-check layer around learned features, not as evidence of alpha generation or production trading performance.
Methodology
The implementation keeps the modeling stack intentionally small: first learn a compact dense embedding of market windows, then decompose those embeddings into sparse activations that are easier to inspect feature-by-feature.
Download daily market prices with yfinance, compute log returns, standardize rolling windows, and split chronologically into train, validation, and test sets.
Train an MLP autoencoder on flattened windows and save frozen latent embeddings for the full sample and each split.
Train a TopK sparse autoencoder on the dense embeddings, then inspect feature activations, dominant features, and dictionary outputs.
Model Pipeline
The project flow is designed to keep artifacts inspectable at each stage, from processed tensors to feature-label tables and generated figures.
Dataset / Market Universe
The configured universe mixes equity, sector, bond, credit, commodity, dollar, emerging-market, and volatility-linked series. External indicators are used for interpretation checks rather than as ground-truth labels.
These are the 15 symbols currently listed in config.yaml.
| Sample range | 2015-10-01 through 2025-09-30 in the current config. |
|---|---|
| Window size | 20 trading days per flattened return window. |
| Dense latent size | 32 dimensions after the dense autoencoder encoder. |
| Sparse dictionary | 128 features with TopK 8 active features per sample. |
| External checks | VIX level, VIX 5-day change, yield spread proxy, SPY 20-day momentum, and rolling cross-asset correlation. |
Interpretability Checks
The analysis scripts produce feature-level tables and plots that make it easier to ask whether a sparse feature corresponds to a recognizable market condition.
Correlate sparse feature activations with external indicators, then assign heuristic labels only when the correlation passes the configured threshold.
Aggregate feature activation across selected stress windows such as the COVID crash, 2022 hiking cycle, and detected drawdown periods.
Compare sparse dominant-feature transitions with Gaussian HMM and PCA plus K-means regime labels.
Outputs / Artifacts
Generated artifacts are intentionally left out of the repository except for the empty results/ placeholder. Running the scripts locally populates the directory.
data/processed/window_data.pt, split tensors, price panels, returns, and external indicator panels.
Autoencoder checkpoints, dense embeddings, sparse checkpoints, dictionary weights, and sparse outputs.
Feature activations, feature-indicator correlations, heuristic labels, event heatmaps, and summary JSON files.
Training curves, UMAP views, feature heatmaps, baseline summaries, ablation tables, and equity-curve plots.
Limitations
The project is best read as an exploratory interpretability study. It is not a claim that sparse autoencoders discover durable trading signals.
The pipeline depends on Yahoo Finance availability, symbol coverage, and adjusted-price consistency through yfinance.
Feature labels come from correlations with selected indicators, so they are useful prompts for inspection rather than verified economic definitions.
HMM and K-means comparisons are intentionally lightweight and not heavily tuned.
The backtest is a sanity check with simplifying assumptions, not a production trading strategy or alpha claim.
How to Reproduce
The scripts are command-line friendly and use config.yaml for the main experimental settings.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python data/download.py --config config.yaml
python train/train_ae.py --config config.yaml
python train/train_sparse.py --config config.yaml
python analysis/interpretability.py --config config.yaml
python analysis/baselines.py --config config.yaml
python analysis/backtest.py --config config.yaml
python analysis/ablation.py --config config.yaml
The generated backtest metrics should be interpreted as a downstream plausibility check. They are not evidence of a deployable trading system.