What Actually Earns LLM Citations (And What It Means for How We Build Content Strategy)
A new study reframes the comprehensive content playbook. Here’s how I’m thinking about it.
I’ve been telling clients to stop building pillar pages for a while now. Not because the underlying logic is wrong. After all, one large, authoritative resource covering every angle of a topic sounds like exactly what search engines should reward. It sounds rigorous. It sounds like the kind of investment that compounds over time.
The problem is that it doesn’t perform the way we expect it to. A study published today by AirOps and Kevin Indig gives me the clearest data yet for why, this time specifically in the context of LLM citations.
The study examined what kinds of content ChatGPT cites when it answers user queries, across a range of signals: SERP position, content length, readability, structural elements, heading similarity, and topic depth. A few findings confirmed what I’ve been advising. One caught me off guard.
The LLM citation finding I got wrong
My long-standing assumption about writing for digital audiences: write plainly. Clear, simple, accessible. The kind of prose that doesn’t require effort to read. That instinct has held up across B2B SaaS, e-commerce, and editorial clients alike.
For ChatGPT citation rates, it’s wrong.

Source: AirOps
The study found that college-level readability correlates with higher citation likelihood. Not impenetrable, but substantive. Writing that assumes the reader can follow a real argument. (My instinct transferred cleanly to humans. It doesn’t transfer to LLMs in the same way, and I’m updating how I brief writers because of it.)
I flag this because it has direct implications for content briefs. “Write simply” remains good advice for clarity and accessibility. It may not be the right primary directive for content that needs to perform in AI-assisted search. Those are two different briefs now.
The finding that validates what I’ve been arguing
The most strategically significant result is about content depth. LLM citation rates peaked for pages covering 26–50% of the fan-out queries around a topic.
Not 80%, not full coverage. Pages that tried to address everything significantly underperformed.

Source: AirOps
The study’s explanation: exhaustive coverage signals generalist content that addresses too many topics without depth. Moderate coverage, paired with strong relevance to the primary query, signals focused expertise.
This is the argument I’ve been making about pillar pages in client conversations for the past couple of years, but now in citation data. A page designed to be the definitive resource on a broad topic reads, to both humans and language models, as a page without a clear point of view. It surfaces often. It doesn’t win.
What this means for content architecture
The hub and spoke model holds up. But this study shifts where the strategic investment belongs.
The hub, or the broad, comprehensive content anchor, is the part most at risk. The spokes, or focused pieces each owning a narrow slice of the topic, are the ones ChatGPT is more likely to reach for. The implication for resource allocation is direct: build fewer hubs and more focused, well-executed spokes. Each spoke should own its corner of the query landscape and answer one thing definitively.
A few other structural findings worth flagging for strategy and technical teams:
- SERP position still drives citation rates significantly. Rank zero is four times more likely to be cited than rank 10, but domain authority and backlinks don’t appear to influence citation rates on their own.
- Schema markup does improve citation likelihood.
- NLP-style question headings help, but citation rates peak at around two subheading matches, and more than that may indicate diluted content focus.
- And length: 500–1,999 words is the citation sweet spot. Pages over 5,000 words underperform pages under 500.

Source: AirOps
One honest caveat: this data reflects ChatGPT’s behavior across a broad sample. Verticals with strong recency signals, like news, finance, and legal, may see the page-age and depth findings behave differently. Apply the spoke-emphasis recommendation with that in mind.
The strategic assumption this changes
The comprehensive content playbook, “cover everything, build authority, earn links, rank broadly,” made sense under the logic that more surface area meant more visibility. The study describes this approach plainly: broad, authoritative resources that get surfaced often but don’t reliably win.
They show up. They just don’t get cited.
The pages that win are narrowly focused resources that surface for a small number of queries and win every time they appear. That’s a different brief, a different structure, and a different definition of quality than the one most content teams have worked from.
Quality still wins. The metrics just changed.
What would it look like to audit your existing content library against that standard—not for coverage, but for the ability to definitively answer one specific thing?

