Back
LiveOpen sourceView on GitHub

LSE Buyback Scraper

Daily RNS share-transaction monitoring with a self-improving extraction pipeline

Runs daily against the London Stock Exchange Regulatory News Service, pulling every UK Investment Trust share-transaction announcement of the day — buybacks, issuances, and tender offers (~80-90 typical, filtered at the source by sector and headline-type codes). Each announcement is parsed in parallel by a Claude CLI reviewer and a deterministic regex layer. Claude is the source of truth on ambiguous, multi-class, or non-standard announcements; regex is the fast path for boilerplate formats. When the two agree on a ticker for three consecutive runs, the pattern Claude used is promoted into the regex library — so the system gets faster and cheaper over time without code changes. Output lands in a formatted Excel workbook with a full run log of every extraction decision.

PythonClaude CLISeleniumLSE RNSopenpyxlAuto-learn

How it works

01
Pull & filter at source

Hits the LSE News Explorer each morning, filtering at the source to UK Investment Trust share-transaction announcements only — buybacks, issuances, tender offers — via sector and headline-type codes. Typical day: ~80-90 announcements.

02
Dual-path extraction

Regex and Claude CLI run in parallel. Claude is the primary extractor and the source of truth on ambiguous, multi-class, or non-standard announcements. Regex handles boilerplate formats fast. Conservative flagging escalates borderline cases for human review rather than guessing.

03
Self-improving pattern library

When regex and Claude agree on a ticker for three consecutive runs, the pattern Claude used is promoted into the regex library. Any disagreement resets the counter. Over time, more tickers run through the fast deterministic path and fewer require Claude calls.

Reliability

Built as a regulatory-grade instrument, not a best-effort scraper. Output fed daily ownership-percentage calculations used to determine whether TR1 disclosures had to be filed with the FCA inside 48-hour notification thresholds — a missed buyback or a misread share count could translate directly into a missed regulatory filing.

Reconciled manually against Bloomberg announcement-by-announcement during production use. Straight-through extractions were essentially always correct; AI-flagged items, on review, were almost always already correct as well — flagging is conservative by design, raising anything ambiguous for human confirmation rather than guessing. The system handles the hard cases by name: duplicate filings, multi-announcement days, dual share-class structures (HAN/HANA, BHMG/BHMU, CMPI/CMPG), and per-ticker currency and voting-rights conversions.

Replaces ~2 hours per day of manual announcement entry and review — a daily task that previously consumed an analyst's morning before they could start the rest of their work.

Demo — 07 May 2026 run

real output, replayed

Replay of an actual production run. 54 announcements processed, Claude CLI reviewer active. The results table populates in real time as each ticker is processed.

python scraper.py --demo --ai-provider claude_cli
Press “Run demo” to replay a real scraper session from 07 May 2026...
Extracted results0 / 54
Results appear as tickers are processed...

From this run

54
Announcements
44
Regex extracted
6
AI gap-filled

Engineering

Python, Selenium, openpyxl. 163-test pytest suite covering extractor edge cases, ticker-specific conversions, share-class disambiguation, output formatting, stale-page retry, and provider routing — runs offline with the LLM and browser mocked. AI reviewer is provider-agnostic (Claude CLI, Anthropic API, OpenAI API, Ollama, or regex-only) so the system can run anywhere without a single vendor dependency.