Undergraduate Research Project

Regime Interpretability

Can sparse autoencoders make learned market regime embeddings easier to interpret?

Andrew Stewart · UC San Diego · exploratory ML interpretability project

Motivation

Market regimes are useful, but often hard to explain.

Dense embeddings

Autoencoders can compress rolling market windows, but dense latent dimensions are not naturally human-readable.

Regime labels

Cluster or HMM states can be convenient, but a state number alone does not explain what changed in the market.

Research goal

Test whether sparse features give a cleaner inspection layer over learned regime embeddings.

Scope

This is interpretability research, not a claim of production trading performance.

Data Setup

Rolling multi-asset return windows

Universe

SPY QQQ IWM XLF XLE XLK TLT IEF HYG GLD SLV DBA UUP ^VIX EEM

Windowing

Daily prices become log returns, then standardized 20-day rolling windows.

Sample

Current config covers 2015-10-01 through 2025-09-30.

External checks

VIX, VIX change, yield spread proxy, SPY momentum, and cross-asset correlation.

Dense Autoencoder

A compact representation of each market window

Input

Flattened 20-day return windows across 15 tickers, matching the configured 300-dimensional input.

Encoder

MLP hidden layers compress each window into a 32-dimensional latent embedding.

Training signal

Reconstruct the original standardized window using mean squared error.

Output artifact

results/embeddings.pt stores full, train, validation, and test embeddings.

Sparse Autoencoder

TopK sparse features over dense embeddings

Input

Frozen 32-dimensional dense embeddings.

Dictionary

128 sparse features, an overcomplete representation.

Activation

TopK keeps 8 active features per sample.

dense embedding -> sparse code -> reconstructed embedding
                 TopK active features

Pipeline

End-to-end artifact flow

1Market datayfinance prices and indicators

2Windows20-day standardized returns

3Dense AE32-dimensional embeddings

4Sparse AE128-feature TopK dictionary

5Interpretcorrelations, labels, heatmaps

6Checkbaselines, ablations, backtest

Interpretability Checks

Feature meaning is tested, not assumed

Indicator correlations

Compare feature activations against market indicators and assign heuristic labels only when correlation is strong enough.

Event heatmaps

Inspect average feature activity across selected market stress windows.

UMAP views

Visualize dense embeddings colored by VIX tercile, calendar year, or dominant sparse feature.

Backtest sanity check

Use regime-conditioned SPY timing only as a downstream plausibility check.

Baselines and Ablations

Keep the sparse model in context

HMM baseline

Gaussian hidden Markov models with several state counts.

K-means baseline

PCA reduction followed by K-means regime assignments.

Ablations

Latent dimension, TopK, dictionary size, window size, and sparsity penalty sweeps.

Transition timing

Compare detected regime changes around VIX spike events.

Outputs

Figures appear after running the pipeline

UMAP / feature plot
Generated as results/umap_*.png

Event feature heatmap
Generated as results/event_feature_heatmap.png

Training curves
Generated as results/*_training_curves.png

Equity curve sanity check
Generated as results/equity_curve_*.png

Limitations and Future Work

Useful as a research probe, not a trading product

Limitations

Yahoo Finance data dependence.
Heuristic feature labels.
Lightly tuned baselines.
Backtest is a sanity check.

Future work

Add pinned sample outputs.
Improve experiment tracking.
Test simpler PCA/factor baselines.
Use calendar-aware event definitions.