AI Code and EAA Accessibility — The AIMAC Findings

AI coding tools have become a standard part of front-end development. GitHub Copilot, Claude, ChatGPT, Gemini — these tools generate significant portions of the HTML that ends up in production. The assumption among most teams is that the output is broadly correct. For accessibility, that assumption needs examining.

The 2026 WebAIM Million report — the largest annual study of web accessibility, covering one million home pages — found that errors per page jumped 10.1% this year to an average of 56.1 errors per page, reversing six years of gradual improvement. Page complexity increased 22.5% in the same period. 95.9% of pages now fail basic accessibility checks. WebAIM linked the reversal directly to increased reliance on AI-assisted coding practices. The AIMAC benchmark exists to understand why.

The AIMAC (AI Model Accessibility Checker) benchmark is a systematic evaluation of 36 AI coding models, developed by the GAAD Foundation in partnership with ServiceNow. It tests what each model produces by default — without explicit accessibility prompting — when asked to generate HTML. The benchmark measures whether the output meets accessibility criteria including WCAG conformance. It is an expert evaluation of model defaults, not a replacement for a full accessibility audit or user testing with disabled people.

The headline finding: most models produce non-compliant HTML by default.

What AIMAC measures and what it does not

The test is specifically about default output. What does the model produce when a developer asks it to write a form, a navigation component, a modal, a data table — without saying “make it accessible”? The benchmark captures the baseline behaviour of the tool as most developers use it most of the time.

AIMAC does not measure overall code quality. A model that scores poorly on the AIMAC accessibility benchmark may produce excellent code in other respects. The benchmark is narrow and specific: accessibility output by default. That narrowness is what makes it useful. It isolates exactly the question that matters for EAA compliance: what does AI-generated code look like before anyone has thought about accessibility?

These findings are reinforced by research presented at Microsoft Build in June 2026 by Aaron Gustafson (Microsoft), which found that out of the box, most AI coding agents pass only around 8–25% of automatable accessibility checks. Even with instruction files, the pass rate climbs only to 37–60%. It is not until models are given deterministic accessibility tests to run against their own output that results improve significantly — and even then, automated checks only cover around half of what genuine accessibility requires. The AIMAC benchmark and the Microsoft Build research are measuring different things, but they point to the same conclusion.

The current rankings (updated June 2026) show OpenAI at the top, with GPT 5.4 Mini ranking first with a median accessibility debt score of zero across all 28 categories. OpenAI holds all five top positions. At the other end of the 36-model leaderboard, the worst performers produce significantly more accessibility violations by default. The full rankings are at aimac.ai and updated as models change — worth checking which tools your team uses.

What the data shows

Across the 36 models tested, the majority produce HTML that fails accessibility requirements without specific prompting. The range is wide — some models perform significantly better than others — but the central finding holds: default output from most AI coding tools is not EAA-compliant.

The default problem: a developer who does not know to prompt for accessibility gets non-compliant code by default. Most developers are not accessibility specialists. Most AI coding prompts do not include accessibility requirements. The result is a systematic gap between what is generated and what EAA compliance requires — at scale, across every team using AI-assisted development.

Why this matters for EAA compliance

The EAA does not distinguish between how code was written. An organisation is legally responsible for the accessibility of its digital products regardless of whether those products were built by hand, generated by AI, or assembled from third-party components. “Our AI wrote it” is not a compliance defence. The obligation attaches to the output, not the process.

This has direct implications for organisations using AI-assisted development in e-commerce, financial services, and any other sector covered by the EAA. If AI tools are generating front-end code and accessibility is not being reviewed before that code goes into production, non-compliant features are being shipped continuously. Each release is a potential regression.

The enforcement consequences are real. Regulators are not currently asking whether AI was involved in building a product. They are asking whether the product is accessible. The mechanism of non-compliance is irrelevant to the penalty.

What prompting can do and what it cannot

Prompting AI tools to produce accessible output improves results. Asking for ARIA labels, semantic HTML, sufficient colour contrast, keyboard navigability — these prompts produce better output than the default. The AIMAC data supports this: models respond to accessibility-specific prompting.

But prompting is not sufficient on its own for two reasons. First, it requires developers to know what to ask for — which requires accessibility knowledge that most developers do not have. Second, even accessibility-prompted output needs review. AI tools can produce output that appears accessible but fails in specific contexts, for specific assistive technologies, or under conditions the prompt did not anticipate.

Prompting is a useful first step. It is not the same as a compliance process.

Research published in June 2026 by the University of Southampton, drawing on findings from 48 accessibility leaders, researchers and practitioners, found that verification demands from AI-generated content can double rather than reduce accessibility workloads. The same research found that the assumption automation improves accessibility is not evidenced — and that ableist bias in AI is structural, rooted in training data, and resistant to technical fix. These findings reinforce why prompting is a starting point, not a solution: the tools can be guided toward better output, but the underlying defaults reflect training choices that accessibility prompts do not fundamentally alter.

The governance argument

This is why governance matters beyond the one-off audit. A product that passes an accessibility check today can regress with the next feature release if no accessibility review is built into the development process. AI-assisted development accelerates the pace of that regression risk — more code is being generated more quickly, and if accessibility is not part of the review process, non-compliant features accumulate between audits.

The organisations that will maintain EAA compliance over time are the ones that treat accessibility as part of their release process, not as a periodic exercise. That means accessibility criteria in acceptance testing, a named owner who reviews AI-generated components, and a documented process that survives team changes. Not because the regulator will ask for it — though they will — but because it is the only way the work holds.

The practical question: does your development process include an accessibility review before AI-generated front-end code goes into production? If not, the AIMAC data suggests that non-compliant features are being shipped in every release cycle. That is the gap a governance process closes.

Find out where your product stands

Our free initial assessment covers your current accessibility position, including whether AI-assisted development is creating ongoing compliance risk and what a proportionate governance process looks like for your team.

Book your free assessment

AI code and EAA accessibility — what the AIMAC data shows

What AIMAC measures and what it does not

What the data shows

Why this matters for EAA compliance

What prompting can do and what it cannot

The governance argument

Find out where your product stands

Related intelligence