Services / Source Coverage

Fourteen sources. One normalized feed.

Federal Register, agency RSS, regulator-specific PDFs — we handle the shape differences so you don't have to.

Why this matters.

Different regulators publish in completely different ways. The Federal Register is a structured JSON API. HUD publishes mortgagee letters as PDFs behind an HTML index. Fannie Mae lender letters are PDFs behind an Akamai CDN that blocks plain HTTP requests. Freddie Mac bulletins are on a JavaScript-rendered SPA. USDA Rural Development procedure notices are on a host that blocks requests from some cloud IP ranges entirely. Each source is its own ingest problem.

Every team that tries to aggregate this eventually hits the same wall: you can get the easy sources working, but the hard ones — the Fannie Mae PDFs, the Freddie Mac SPA, the USDA IP-block — demand specific tooling that doesn't generalize. We've built that tooling. Fourteen sources, none of them hand-waived.

What this looks like.

reglith.com/admin/ingest-health
Federal Register1h agoOK
CFPB RSS3h agoOK
Fannie Mae LL2d agoOK
HUD ML9d agoWARN
VA Circulars6d agoOK
Freddie Bulletin4d agoOK

How we do it.

01

Per-source fetchers

JSON API for the Federal Register. Plain fetch for RSS and static HTML indexes. Firecrawl for sources that serve behind bot protection or a JavaScript SPA. Gemini PDF extraction for regulators that publish body content only as PDFs. Each source gets the right tool, not the easiest tool.

02

Cadence-aware scheduling

Every source has a known publication cadence — daily, weekly, monthly, or as-issued. Ingest runs match that cadence. We don't poll hourly for sources that publish monthly, and we don't wait a week for sources that publish daily.

03

Drift detection

Regulators quietly restructure their pages. When that happens, our parser can start returning nothing without throwing an error. We run a daily health check against every source: if a source hasn't produced new items in longer than expected, or if an ingest run has failed repeatedly, we get a Slack alert. You can see the current status of every source on the admin health page.

What's in the product today.

SOURCECADENCENOTES
Federal Register (7 agencies)DailyCFPB, Fed, OCC, FDIC, FinCEN, HUD, FHFA via JSON API
CFPB newsroom + blogDailyRSS feed, ~25 items per refresh
HUD Mortgagee LettersAs issuedHTML index + Gemini PDF extraction
Ginnie Mae APMsAs issuedSharePoint index, body inline — no PDF dependency
VA CircularsAs issuedStatic HTML index + Gemini PDF
USDA RD Procedure NoticesAs issuedFirecrawl (IP-blocked host) + Gemini PDF
MPF Program AnnouncementsAs issuedStatic HTML index + PDF body via Gemini
Fannie Mae LL / SEL / SVCAs issuedFirecrawl (Akamai-protected) + Gemini PDF
Freddie Mac BulletinsAs issuedFirecrawl (JS-rendered SPA), body on detail page
CFPB EnforcementAs issuedPlain fetch, mortgage-filtered
HUD MRB EnforcementAs issuedHTML index + Gemini PDF, 100% mortgage-relevant
OCC EnforcementMonthlyMonthly news release walker + Gemini PDF
FDIC EnforcementAs issuedTwo-stage: press release detect + Firecrawl per-action
CSBS State LicensesMonthly877 licenses across 54 jurisdictions, versioned

All fourteen sources in one feed, starting day one.