Flintmere · State of Shopify catalogs · v1 · May 2026
100% of 408 Shopify catalogs grade D or F.
what we measured
The state of Shopify catalogs, measured against AI shopping agents.
FlintmereBot scans Shopify stores against the seven checks AI shopping agents run — drawn from published Shopify, GS1 UK, and Google Merchant Center specs. We publish the aggregate numbers (score, grade distribution, per-vertical gaps) and nothing else. No individual store is ever named. This is v1: 408 stores in the cohort, refreshed periodically.
Scores cluster inside a narrow band (47–50). The difference between the median catalog and the top decile is not sophistication; it is structured fields populated.
overall median
/ 100 · grade F
Most Shopify catalogs fail half the checks an AI shopping agent runs before it recommends a store.
Across 408 scanned stores, the median Shopify catalog earns a grade F — strong on visible surfaces (titles, imagery), weak on the structured fields agents depend on (barcodes, attribute metafields, category mapping). The difference between the median store and the top decile is not sophistication; it is fields populated.
grade distribution
How 408 Shopify stores stack up against grade D or F.
- A
·
- B
·
- C
·
- D
·
- F
·
by vertical
The gap between verticals is bigger than the gap between good and bad stores inside any one vertical.

Apparel
median · grade F · 140 stores
Size, colour, material, gender — the four fields apparel catalogs most often leave unstructured.
Read the apparel breakdown →

Beauty
median · grade F · 128 stores
Ingredients, shade, volume, claims — beauty agents filter on all four, and most catalogs ship none of them structured.
Read the beauty breakdown →

Food & drink
median · grade F · 131 stores
Allergens, nutrition, provenance, certifications — the regulatory fields food agents depend on to answer any query safely.
Read the food & drink breakdown →
methodology
Scanned by FlintmereBot.
Aggregate-only. Refreshed periodically.
FlintmereBot identifies itself as FlintmereBot/1.0 (+audit.flintmere.com/bot) and rate-limits to one request per two seconds per host. Each scan fetches robots.txt, sitemap.xml, llms.txt, products.json, and a small sample of product pages. Scores are computed by the same rule-based engine that powers the public scanner.
We publish medians, means, and grade distributions. We never publish the domain of any individual store. Merchants who want to be excluded can add FlintmereBot to their robots.txt and the next scan will honour it. The underlying dataset is never shared or sold.
a note on what we could reach
The v1 cohort is the stores FlintmereBot could read politely. A meaningful share of the Shopify market — mostly the larger catalogs sitting behind enterprise bot-management — returns a block before any product page loads. Those same blocks apply to ChatGPT, Perplexity, and every other AI shopping agent that comes knocking. So if a store isn’t in this sample, the agent reading its catalog today is getting the same answer: nothing. That’s the gap this research measures from both sides.
the next edition
Run a free scan. Your score sits inside the next refresh.
Scans initiated by store owners are tagged separately from FlintmereBot crawls and contribute to next month’s aggregates. You keep your report; we keep the anonymised score. The more stores in the dataset, the tighter the benchmark becomes for everyone.