The benchmark framework

AI Visibility Benchmark Framework

This is the method behind the AudFlo AI Visibility Benchmark, not the report. It defines how scans are selected, what each signal means, how we read the numbers, and the point at which results are worth publishing.

The benchmark aggregates real AudFlo scans to describe the state of AI visibility across many sites. It is a different thing from the AI Visibility Score methodology, which scores one site at a time. This page exists so the benchmark can be trusted: every rule below is fixed in advance, and no figure is published until it is computed from real, completed scans.

How scans are selected

The sample is built from real scans, with three rules that keep it honest.

Completed scans only
Only scans with a status of complete are counted. In-progress, failed, and insufficient scans are excluded.
Latest scan per domain
If a domain was scanned more than once, only its most recent completed scan is counted, so no single site is double-counted.
Failed scans excluded
A scan that could not reach or render a site is not data, so it is left out entirely.
Deduplicated by domain
Domains are lowercased and the www prefix is stripped before counting, so one site is one data point.

When a benchmark is published, the exact date window and sample size are disclosed alongside it.

Definitions: what counts as what

A benchmark is only as trustworthy as its definitions. Each signal below is counted only when the underlying condition is actually detected on the crawled site. Nothing is assumed.

Founder page

A page that names a real person behind the product, with a role and ideally external links such as LinkedIn or X. An anonymous "About us" with no named person does not count, because an AI cannot verify it.

Category clarity

Whether the site states its product category and audience in plain words where an AI reads first: the H1, the title tag, the meta description, and the opening copy. A clear category line counts. A vague slogan does not.

FAQ schema

FAQPage structured data in JSON-LD, with question and answer pairs an AI can extract. A visually styled FAQ with no markup does not count as FAQ schema.

Testimonial

A quote attributed to a named person, ideally with a company and a link. Named testimonials count. Anonymous "happy customer" quotes do not, because an AI cannot confirm the source.

Comparison page

A page that compares the product against named alternatives, structured so an AI can extract the differences. Generic "why us" copy with no named alternative does not count.

llms.txt

A plain-text file at the site root (/llms.txt) that states what the product is and points to key pages, written for AI crawlers. Counted as present when /llms.txt is reachable.

The full definitions for the wider vocabulary live in the AI Visibility Glossary.

Confidence rules

Correlation is not causation
The benchmark reports frequencies and distributions. When it notes that sites with a signal tend to score higher, that is a correlation, not proof that the signal caused the score.
No invented findings
Every figure is computed from real completed scans. If a number cannot be computed, it is shown as pending. It is never estimated or filled in.
Minimum sample thresholds
Numbers are not published as a benchmark below a set sample size, because a small sample is noisy and one or two outliers can move it.

Publication thresholds

The name we give the data depends on how many unique domains it covers. The bigger and more stable the sample, the stronger the claim we are willing to make.

Early Scan InsightsUnder 100 domains

Directional only. Read it as a hint, not a benchmark.

AI Visibility Benchmark Report100 to 499 domains

A publishable benchmark with a usable sample.

High Confidence Benchmark500 or more domains

A large, stable sample worth citing.

Until the data clears the first threshold and is computed from real scans, no benchmark is published. The framework is fixed now so that when the numbers arrive, they can be trusted on sight.

Related reading: the Evidence Ladder for how evidence is graded, the AI Visibility Playground for how each engine decides, the methodology for how one site is scored, and the glossary for the definitions.