How this works
1
We collect public data from 15+ federal sources — Congress.gov, Regulations.gov, FEC, Senate LDA, GovInfo, USAspending, and more.
2
We normalize names, IDs, and organizations across datasets. A senator's bioguide ID links their votes, trades, speeches, and campaign donors.
3
We run cross-reference queries that surface patterns no single dataset reveals on its own. Every query links to the live data so you can verify.
Influence & Testimony
Who testifies AND lobbies?
Organizations testify before Congress as expert witnesses while simultaneously paying lobbyists
to influence the same committees. Both activities are legal and publicly disclosed —
but the connection between them is almost never surfaced.
Hearings × Lobbying
Hearing Witnesses Who Also Lobby Congress
Organizations that testified as expert witnesses at congressional hearings
while also paying lobbyists. They appear in the public record as
neutral experts, but the lobbying record tells a different story.
hearing_witnesses
↔
lobbying_activities
2,500+
organizations overlap
Data Sources
- Hearing witnesses: GovInfo MODS XML metadata for congressional hearings. Each hearing lists witnesses with name and organization.
- Lobbying clients: Senate Lobbying Disclosure Act (LDA) filings via
lda.senate.gov API. Each quarterly LD-2 filing lists the client organization.
Matching Method
Exact case-insensitive string match: UPPER(TRIM(witness_organization)) = UPPER(TRIM(client_name)). This is a pre-computed materialized table (witness_lobby_overlap) rebuilt during each database update.
Limitations
Exact name matching means we miss cases where the same organization uses slightly different names across filings (e.g., "American Petroleum Institute" vs "API"). Fuzzy matching would increase coverage but also increase false positives. We chose precision over recall.
Comments × Lobbying
Organizations That Comment on Rules AND Lobby
When a company submits a public comment on an EPA rule while also paying
a lobbyist on the same issue, that's dual-channel influence on the
same regulatory process. Both are public record.
comment_details
↔
lobbying_activities
288+
organizations overlap (growing)
Data Sources
- Comment organizations: Regulations.gov API comment detail records. The
organization field is self-reported by the commenter.
- Lobbying clients: Senate LDA quarterly activity reports (LD-2 filings) with
client_name.
Matching Method
Exact case-insensitive match: UPPER(TRIM(organization)) = UPPER(TRIM(client_name)). Materialized as commenter_lobby_overlap.
Limitations
Currently based on ~41K comment details (about 1% of 3.7M total comments). As we expand full comment text collection, this overlap count will grow significantly. Organization names are self-reported and unstandardized, so matches depend on commenters using their organization's exact legal name.
Money & Committees
Follow the money to the committee room
Committee assignments determine which industries a member of Congress oversees.
Financial disclosures reveal what they trade and who funds them.
We connect the two.
Committees × Stock Trades
Stock Trading by Committee Members
Members of Congress actively trading stocks while serving on committees
that oversee related industries. Not necessarily illegal, but a pattern
worth watching.
committee_memberships
→
congress_members
→
stock_trades
95,621
trades by sitting members
Data Sources
- Committee assignments:
congress-legislators GitHub repository (unitedstates project). Maps bioguide_id to committee_id.
- Stock trades: House Financial Disclosure PTR PDFs parsed via
pdftotext, plus Senate eFD periodic transaction reports scraped from efdsearch.senate.gov. Both are government sources.
Matching Method
Members are linked via bioguide_id, the universal congressional identifier. This is deterministic — no fuzzy matching. The query joins committee_memberships to stock_trades through congress_members.
Limitations
85.5% of stock trades have a matched bioguide_id. The unmatched 14.5% are mostly candidates who filed disclosures but never took office. Committee assignments reflect current membership only; historical rotations are not tracked.
Committees × Campaign Finance
Top Donors to Committee Members
Which PACs and organizations fund the members of specific committees?
Cross-references FEC contribution records with current committee
assignments and bioguide IDs.
committee_memberships
→
fec_crosswalk
→
fec_contributions
4.4M
contribution records linked
Data Sources
- Contributions: FEC bulk data from
fec.gov S3 bucket. Committee-to-candidate contributions across all election cycles.
- Member linkage:
fec_candidate_crosswalk maps FEC cand_id to congressional bioguide_id (1,711 matched members).
- Committees:
congress-legislators current membership data.
Matching Method
Four-table join: committee_memberships → fec_candidate_crosswalk (via bioguide_id) → fec_contributions (via cand_id) → fec_committees (via cmte_id). Pre-computed as committee_donor_summary (threshold: $10,000+ total).
Limitations
The FEC-to-bioguide crosswalk covers about 1,711 members. Contributions are PAC-to-candidate only (not individual donors). Committee assignments are current session only — we don't track historical rotations, so a member's past committee work won't show here.
Legislative Activity & Trading
Trading around legislation
Members of Congress trade stocks and also give floor speeches, sponsor bills,
and vote on legislation that can move markets. We connect the timing.
Floor Speeches × Stock Trades
Floor Speeches Within 7 Days of Trades
Members who gave floor speeches within a week of making stock trades.
This doesn't prove anything — but the temporal proximity is a pattern
that researchers and journalists need to be able to see.
stock_trades
↔
crec_speakers
→
congressional_record
7 days
trade-to-speech window
Data Sources
- Stock trades: House PTR PDFs + Senate eFD reports. Transaction dates, tickers, amounts.
- Floor speeches: GovInfo Congressional Record (CREC) daily packages, 1994–present. MODS XML provides speaker bioguide IDs for 99.6% of entries.
Matching Method
Join stock_trades to crec_speakers via bioguide_id, then filter where ABS(julianday(transaction_date) - julianday(speech_date)) <= 7. Pre-computed as speeches_near_trades.
Limitations
Temporal proximity does not imply causation. Members give many speeches and make many trades — some overlap is expected by chance alone. We chose 7 days as a meaningful window, but this is an editorial choice. The speech content is not automatically matched to the traded stock's sector; users must evaluate relevance manually.
Committee Jurisdiction × Trades
Trading in the Sectors They Regulate
Members of Congress sit on committees with jurisdiction over specific industries.
Using SEC EDGAR's SIC classifications for 2,027 traded tickers, we check whether
members are trading stocks in sectors their committees regulate.
committee_memberships
→
committee_jurisdiction (SIC ranges)
→
ticker_sic (SEC EDGAR)
→
stock_trades
2,027
tickers classified by SIC code via SEC EDGAR
Data Sources
- Committee jurisdiction mapping: A curated reference table (view it) that maps 27 congressional committees to the SIC code ranges under their primary jurisdiction.
- Ticker SIC codes: Downloaded from SEC EDGAR — each company's CIK looked up via
data.sec.gov/submissions/ to get its SIC industry classification. 2,027 traded tickers classified across 333 unique SIC codes.
- Stock trades: Same House PTR + Senate eFD sources as above.
- Committee assignments:
congress-legislators GitHub data.
Matching Method
For each committee member, we look up their stock trades, classify each ticker by its SEC-assigned SIC code, then check if that SIC code falls within any of the committee's jurisdiction SIC ranges. The join is: CAST(ticker_sic.sic_code AS INTEGER) BETWEEN committee_sic_ranges.sic_start AND sic_end.
Limitations
SIC codes cover publicly traded companies only — mutual funds, ETFs, and ADRs don't have SIC classifications and are excluded. The committee-to-SIC mapping currently covers primary jurisdiction only; some committees (Appropriations, Budget, Foreign Affairs) are omitted because their jurisdiction is cross-cutting rather than sector-specific. The mapping reflects editorial judgment about which SIC ranges fall under each committee.
Lobbying ↔ Legislation
Which Bills Get Lobbied the Most?
We parse bill numbers (H.R., S., etc.) from the "specific issues" text in 2.7M
lobbying activity reports, then match them to legislation records. See which
bills attract the most lobbying and from how many different clients.
lobbying_activities
→
lobbying_bills
→
legislation
2.7M
lobbying reports text-mined
Data Sources
- Lobbying activities: Senate LDA quarterly LD-2 filings. Each filing has a
specific_issues free-text field describing lobbying activities.
- Legislation: Congress.gov BILLSTATUS bulk XML. 167K+ bills across Congresses 93–119.
Matching Method
Regex extraction of bill references from specific_issues text. Patterns: H.R., S., H.J.Res., S.J.Res., H.Con.Res., S.Con.Res., H.Res., S.Res. followed by a number. Congress number inferred from filing year using (year - 1789) // 2 + 1. Matched to legislation table via constructed bill_id.
Limitations
Not all lobbying filings reference specific bills — many describe issues generally. Congress number inference from filing year may be off for filings near session boundaries. The regex doesn't catch informal references like "the infrastructure bill" or bill names without numbers.
Coming Soon
More connections in progress
We're actively building more cross-references. As our comment detail coverage expands
and new datasets come online (hearings, nominations, GAO reports), new patterns will emerge.
Lobbying ↔ Congress
The Revolving Door: Former Members Who Lobby
Lobbying filings include "covered positions" — when a lobbyist previously
held a government role. Cross-referencing with congress_members reveals
which former legislators now lobby their former colleagues.
lobbying_lobbyists
→
covered_position filter
→
congress_members
~199
former members matched to lobbying filings
Data Sources
- Lobbying disclosures: Senate LDA filings include a
covered_position field where lobbyists disclose former government roles. 1.85M records have this field populated.
- Congress members: 12,700+ historical and current members with bioguide IDs.
Matching Method
We filter lobbying records where the covered_position text indicates the lobbyist was a former member of Congress (matching patterns like "U.S. Senator", "U.S. Representative", "Member of Congress", "Former Member"). Then we join on UPPER(full_name) = lobbyist_name. For ambiguous names (e.g., multiple "Thomas Davis" in history), we pick the most recent member.
Limitations
Name matching misses ~50–80 members who go by nicknames in lobbying filings (e.g., "Billy Tauzin" vs. "William Tauzin", "Dick Gephardt" vs. "Richard Gephardt"). We plan to add a manual nickname mapping table to capture these. The position text is free-form and inconsistent, so some staffers who list their boss's title may be incorrectly included or excluded.
Coming Soon
GAO Oversight of Specific Laws
GAO reports reference specific public laws and U.S. Code sections.
Matching these to legislation reveals which laws have been subject
to the most government accountability scrutiny.
gao_reports
↔
legislation
Coming Soon
Foreign Agents Testifying at Hearings
FARA registrants are organizations representing foreign governments.
Do any of them also appear as witnesses at congressional hearings?
The data to answer that is in two tables.
fara_registrants
↔
hearing_witnesses