Daily RNS share-transaction monitoring with a self-improving extraction pipeline
Runs daily against the London Stock Exchange Regulatory News Service, pulling every UK Investment Trust share-transaction announcement of the day — buybacks, issuances, and tender offers (~80-90 typical, filtered at the source by sector and headline-type codes). Each announcement is parsed in parallel by a Claude CLI reviewer and a deterministic regex layer. Claude is the source of truth on ambiguous, multi-class, or non-standard announcements; regex is the fast path for boilerplate formats. When the two agree on a ticker for three consecutive runs, the pattern Claude used is promoted into the regex library — so the system gets faster and cheaper over time without code changes. Output lands in a formatted Excel workbook with a full run log of every extraction decision.
Hits the LSE News Explorer each morning, filtering at the source to UK Investment Trust share-transaction announcements only — buybacks, issuances, tender offers — via sector and headline-type codes. Typical day: ~80-90 announcements.
Regex and Claude CLI run in parallel. Claude is the primary extractor and the source of truth on ambiguous, multi-class, or non-standard announcements. Regex handles boilerplate formats fast. Conservative flagging escalates borderline cases for human review rather than guessing.
When regex and Claude agree on a ticker for three consecutive runs, the pattern Claude used is promoted into the regex library. Any disagreement resets the counter. Over time, more tickers run through the fast deterministic path and fewer require Claude calls.
Built as a regulatory-grade instrument, not a best-effort scraper. Output fed daily ownership-percentage calculations used to determine whether TR1 disclosures had to be filed with the FCA inside 48-hour notification thresholds — a missed buyback or a misread share count could translate directly into a missed regulatory filing.
Reconciled manually against Bloomberg announcement-by-announcement during production use. Straight-through extractions were essentially always correct; AI-flagged items, on review, were almost always already correct as well — flagging is conservative by design, raising anything ambiguous for human confirmation rather than guessing. The system handles the hard cases by name: duplicate filings, multi-announcement days, dual share-class structures (HAN/HANA, BHMG/BHMU, CMPI/CMPG), and per-ticker currency and voting-rights conversions.
Replaces ~2 hours per day of manual announcement entry and review — a daily task that previously consumed an analyst's morning before they could start the rest of their work.
Replay of an actual production run. 54 announcements processed, Claude CLI reviewer active. The results table populates in real time as each ticker is processed.
Python, Selenium, openpyxl. 163-test pytest suite covering extractor edge cases, ticker-specific conversions, share-class disambiguation, output formatting, stale-page retry, and provider routing — runs offline with the LLM and browser mocked. AI reviewer is provider-agnostic (Claude CLI, Anthropic API, OpenAI API, Ollama, or regex-only) so the system can run anywhere without a single vendor dependency.