reg·change·engine
Vol. 1, No. 24 Source FR API
Issue 2026-04-24 Pipeline live Sample data shown below

What changed in U.S. federal regulations today.

A daily, word-level diff of every paragraph published in the Federal Register, classified by regulatory domain. Compliance work usually waits for someone to read 200 pages of XML; this reads them for you and tells you what is actually new.

Built by Andrew Stewart · Python, FastAPI, PostgreSQL, DistilRoBERTa

Today, in figures
2,311
Paragraphs read from today's bulk XML.
328 / 14.2%
Differed from the previous business day.
93.7%
Mean classifier confidence across the changed paragraphs.
The method

Three steps, one daily output.

The Federal Register is the U.S. government's daily journal of every new and proposed rule. Each business day it publishes thousands of paragraphs of regulatory text. This pipeline does three things with that text.

First, it downloads today's bulk XML and yesterday's. Second, it diffs them at the word level with difflib.SequenceMatcher, surfacing the exact tokens added, removed, or modified inside each paragraph. Third, it classifies every changed paragraph into one of eleven regulatory domains via a zero-shot DistilRoBERTa NLI model, so the feed can be filtered to whatever a given user actually cares about.

Records land in PostgreSQL with full provenance (document number, agency, publication date, paragraph hash) and are served through a small FastAPI surface. Most regulatory monitors tell you that a document changed. This one tells you which words.

Today's changes

Eight of 328 paragraphs that differ from yesterday.

Each row is one changed paragraph. Type is whether it was added, removed, or modified in place. Domain is the predicted regulatory area; conf. is the classifier's softmax score on a 0 to 1 scale.

Doc Agency Type Domain Conf. Excerpt
2026-08901 EPA modified environmental 0.981 PM2.5 standard reduced from 35 μg/m³ to 25 μg/m³ under revised NAAQS attainment criteria.
2026-08874 SEC added financial 0.976 Registrants must disclose material cybersecurity incidents within four business days of determining materiality, pursuant to Item 1.05.
2026-08812 CMS modified healthcare 0.963 Readmission reduction program extended to skilled nursing facilities; measurement window from 30 to 45 days.
2026-08770 USDA removed agriculture 0.948 Interim waiver of origin labeling requirements for processed beef imports under USMCA expires 2026-06-30.
2026-08741 FMCSA added transportation 0.957 AV commercial vehicles operating above SAE Level 3 require real-time telemetry logging and FMCSA safety certification.
2026-08712 FERC modified energy 0.944 Transmission planning updated to include climate scenario analysis under Order 896; 10-year 20-year planning horizon required.
2026-08688 OCC added financial 0.969 National banks engaging in crypto-asset custody must maintain segregated ledger accounts with monthly attestation.
2026-08651 DOL modified labor 0.938 Fiduciary duty standard expanded from ERISA plans only to all tax-advantaged retirement accounts.
Where the changes were

Financial and environmental rules dominate, as usual.

Two views of the same 328 changes: the share each domain holds, and how those changes split across additions, modifications, and removals.

By domain
Share of the 328 changed paragraphs that fall into each regulatory area.
By diff type
For the seven busiest domains, how many paragraphs were added, modified, or removed.
A sample diff

What a token-level change actually looks like.

The EPA's PM2.5 rule, paragraph 3 of 12 in document 2026-08901. Strikethrough red is what was removed; solid green is what was inserted. A 4-token edit that tightens an air-quality limit, easy to miss inside a 40-page rule.

differ.py · SequenceMatcher · diff_type: modified
The primary national ambient air quality standard for fine particulate matter (PM2.5) shall not exceed thirty-five (35)twenty-five (25) micrograms per cubic meter (μg/m³) as a 24-hour average. The annual standard is 129 μg/m³ for all Class I and Class II attainment areas.
4 deleted 4 inserted 46 equal
The last fortnight

Daily change volume, fourteen business days.

Federal Register volume is naturally bumpy. Quiet days hover near 250 changed paragraphs; busier rule-issuing days push past 320.

Model performance

How sure the classifier is, by domain.

Average confidence over the last 30 days, n equals 4,872 paragraphs. Domains with distinctive vocabulary (environmental, financial) score high. The other bucket is the deliberate fallback when no candidate label exceeds the min_confidence threshold, so its mean is lower by construction.

The API

How you would consume this feed.

Once records are in Postgres, FastAPI serves them. Filter by pub_date and domain; paginate with limit and offset. Full OpenAPI spec at /docs.

GET/changes?pub_date=2026-04-24&domain=financial
// 200 OK, paginated change records
{
  "total": 42,
  "limit": 50,
  "offset": 0,
  "items": [
    {
      "id": 18421,
      "pub_date": "2026-04-24",
      "diff_type": "added",
      "domain": "financial",
      "domain_score": 0.976,
      "agency": "SEC"
    }
  ]
}
GET/stats
// 200 OK, totals + avg confidence by domain
[
  { "domain": "financial",
    "total_changes": 8421,
    "avg_score": 0.94 },
  { "domain": "environmental",
    "total_changes": 6188,
    "avg_score": 0.95 },
  { "domain": "healthcare",
    "total_changes": 4972,
    "avg_score": 0.92 }
]
How a day flows

Five stages, end to end in about three minutes.

Each stage is its own module so any of them can be tested or swapped. Total runtime on a laptop is dominated by classifier inference, which is batched across the whole day's changed paragraphs.

Ingest
Pull bulk XML from federalregister.gov with retry and on-disk cache.
Diff
Word-level SequenceMatcher over each document, paragraphs hashed with SHA-256.
Classify
Zero-shot DistilRoBERTa NLI across 11 candidate labels, batched.
Persist
Postgres upsert into changed_paragraphs, indexed by pub_date.
Serve
FastAPI surface at /changes, /domains, /stats.
Glossary

Terms used on this page.

Federal Register
The official daily journal of the U.S. government, where every federal agency publishes new and proposed regulations.
Bulk XML
The full text of a single day's Federal Register, served as one downloadable XML file. Whole-document atomicity is what makes day-over-day diffing reliable.
diff_type
How a paragraph changed: added, removed, or modified in place.
Token-level diff
A comparison at the word level rather than the paragraph or document level, so the exact words that changed inside a sentence can be highlighted.
Zero-shot NLI
A classifier that maps text to candidate labels using natural-language entailment, without any labelled training data. Useful here because there is no public corpus of regulatory paragraphs labelled by domain.
domain_score
The model's confidence in its top-predicted domain, between 0 and 1. Predictions below the configurable min_confidence are returned as "other", by design.