Counting GitHub issues each month that contain an em-dash (—) or the literal phrase
“em dash”. What was once a niche typographic flourish has, since mid-2025,
become a near-unmistakable fingerprint of LLM-generated text flooding the world's largest code host.
Linear: blue bars on left axis (0–600K), green line on right axis (0–150). Two scales because the broad and tighter signals differ by ~4 orders of magnitude.Log: blue bars on left axis (10–1M), green line on right axis (1–1K). Both axes logarithmic.
Two signals, one story
The blue bars count any issue with an em-dash anywhere in the title or body — that's
the flood. The green line counts the much narrower set of issues that talk about em-dashes
by name (“em dash” or “em-dash”) — bug reports, feature requests, and
complaints filed because of em-dashes. From a 2021 baseline of ~25/month, the about-em-dash signal
roughly doubles by 2025 and triples in late 2025, peaking at 141 in 2025-09.
Em-dashes are no longer just appearing — they're causing enough friction to be noticed.
What happened?
From 2021 through early 2024 the baseline hovers around 3,000–8,000 issues per month, with
occasional spikes from automated bots and template-text dumps. In mid-2025 the curve breaks loose:
July hits six figures, August clears 430,000, and by April 2026 a single month logs over half a
million issues containing em-dashes — roughly a hundredfold rise from the pre-LLM baseline.
The query
Run against the public githubarchive BigQuery dataset, counting IssuesEvent
actions of type opened whose title or body contains an em-dash character or the literal
phrase “em dash”.
SELECT
FORMAT_DATE('%Y-%m', DATE(created_at)) AS month,
COUNT(*) AS issue_count
FROM
`githubarchive.month.*`
WHERE
type = 'IssuesEvent'AND JSON_EXTRACT_SCALAR(payload, '$.action') = 'opened'AND (
LOWER(JSON_EXTRACT_SCALAR(payload, '$.issue.title')) LIKE'%em dash%'OR LOWER(JSON_EXTRACT_SCALAR(payload, '$.issue.body')) LIKE'%em dash%'OR JSON_EXTRACT_SCALAR(payload, '$.issue.body') LIKE'%—%'
)
AND _TABLE_SUFFIX BETWEEN'2020_01'AND'2026_04'GROUP BY month
ORDER BY month
The tighter query (green line)
Drops the em-dash-character match entirely — only counts issues whose text literally contains
the phrase “em dash” or “em-dash”. These are issues filed because of
em-dashes, not just ones that happen to contain one.
SELECT
FORMAT_DATE('%Y-%m', DATE(created_at)) AS month,
COUNT(*) AS issue_count
FROM
`githubarchive.month.*`
WHERE
type = 'IssuesEvent'AND JSON_EXTRACT_SCALAR(payload, '$.action') = 'opened'AND (
LOWER(JSON_EXTRACT_SCALAR(payload, '$.issue.title')) LIKE'%em-dash%'OR LOWER(JSON_EXTRACT_SCALAR(payload, '$.issue.title')) LIKE'%em dash%'OR LOWER(JSON_EXTRACT_SCALAR(payload, '$.issue.body')) LIKE'%em-dash%'OR LOWER(JSON_EXTRACT_SCALAR(payload, '$.issue.body')) LIKE'%em dash%'
)
AND _TABLE_SUFFIX BETWEEN'2020_01'AND'2026_04'GROUP BY month
ORDER BY month