Test planning & management

The process of planning, estimating, monitoring, and controlling test activities, documented in a (risk‑based) test plan, strategy or policy, to achieve defined quality objectives within the project’s constraints of scope, time, and resources.

Maturity Levels

Level	Name	Description	Technology	Example tools
0	Non-Existent	There is no AI assistance, automation, or data integration of any kind. Corporate guidelines or governance for AI‑enabled workflows and tool usage are absent. Test policy, strategy, plan creation, estimation, progress tracking and reporting are fully manual; no AI evaluates readability, maintainability, or explainability of artefacts.	None	None
1	One-Off Assist	Test managers/coordinators occasionally ask an LLM for draft strategy text, workload estimates, or risk heat‑maps and paste results into documents; nothing is version‑controlled, results vary widely between individuals and are difficult to reproduce or scale.	Natural-language draft generation with Off-the-shelf LLM & prompt engineering	LLM chatbots / answer engines chatGPT (openAI), Gemini google studio AI (google), Claude desktop (Anthropic), Perplexity (Perplexity AI), …
2	Integrated Assist	AI is embedded in QA tooling process, providing suggestions for test‑policy clauses, strategy sections, resource/timeline forecasts, and risk heat‑maps. Artefacts are version‑controlled with project deliverables. AI flags readability, maintainability, and explainability issues.	AI Agents / Autonomous Agents + LLMOps (prompt / template management, deployment, guardrails)	User customized AI Agents / Personalised AI Tools customGPTs on chatGPT, Gems on Google Gemini, Artifacts on Claude, Spaces on Perplexity, …
3	AI-Human Collaboration	AI agents act as junior test managers, digesting code, requirements, and trends to suggest strategies, scope, team updates. Every recommendation is traceable and explainable. and subject to human review.	Agentic frameworks advanced RAG orchestration LLMOps (prompt / template management, deployment, guardrails) Deep research	Agentic frameworks Langgraph as orchestrator, vector DBs as knowledge, prompts and a model for interaction Deep research Deep research on chatGPT/Gemini/Claude using a reasoning model, Perplexity labs, Manus, …
4	Full Autonomy	Autonomous agents/AI systems create and update test policy, strategies and plans from live data. Projects are managed dynamically; and AI handles scope, milestones, KPIs. Human involvement is confined to strategic governance, on demand, the autonomous AI must supply a transparent, traceable explanation of its actions, input data, and decision rationale.	Autonomous agents, causal-inference models, continual learning, LLMOps pipelines	End-to-end QA orchestrators (no knowledge of tools who operate at this level)

^{AI Maturity Level: Indicates the level the technology vendors claim to have reached in deploying AI solutions that actually work in real-world applications}

Test analyse & design

The process of analysing the test basis and transforming it into test conditions, test cases, and test data using appropriate test design techniques to achieve required coverage and mitigate quality risks.

Maturity Levels

Level	Name	Description	Technology	Example tools
0	Non-Existent	All analysis and design tasks are fully manual. No AI, automation, or review for quality attributes like readability or explainability.	None	None
1	One-Off Assist	Test engineers use LLMs ad hoc to draft test cases or choose techniques. Prompts vary by user with no standards, reuse, or traceability. Results are inconsistent and unscalable.	Off-the-shelf LLM & prompt engineering	LLM chatbots / answer engines chatGPT (openAI), Gemini google studio AI (google), Claude desktop (Anthropic), Perplexity (Perplexity AI), …
2	Integrated Assist	AI assists with static oracle checks, structured case generation, and artefact review for quality. Work aligns with prompt standards and a feedback loop. Nearly full task support, enabling near end-to-end coverage with minimal manual effort. The AI system can review human‑created test artefacts for correctness, completeness, readability, maintainability, and explainability, flagging gaps or duplicates before peer review.	AI Agents / Autonomous Agents LLMOps (prompt / template management, deployment, guardrails)	Agentic frameworks Langgraph as orchestrator, vector DBs as knowledge, prompts and a model for interaction User customized AI Agents / Personalised AI Tools customGPTs on chatGPT, Gems on Google Gemini, Artifacts on Claude, Spaces on Perplexity, …
3	AI‑Human Collaboration	AI acts as a junior test analyst: analysing multimodal input and past defects to refine test oracles, recommend techniques, and generate test artefacts, including transparent explanation of its reasoning, while a human overseer guides and refines its output.	Agentic frameworks advanced RAG orchestration LLMOps (prompt / template management, deployment, guardrails) Agentic frameworks, code & UI embeddings	User customized AI Agents / Personalised AI Tools customGPTs on chatGPT, Gems on Google Gemini, Artifacts on Claude, Spaces on Perplexity, … Off-the-shelf-AI tools*** Testcraft, AskUI, Kusho.AI, …
4	Full Autonomy	Autonomous AI designs and maintains test suites, detects oracle issues, and regenerates impacted assets when input changes. Human involvement is confined to strategic governance, all actions are explainable	Autonomous agents, model‑based testing, continual learning pipelines	End‑to‑end test‑design orchestrators (no knowledge of tools who operate at his level)

*** Although many tools claim to operate at AI Maturity Level 3, these claims are often exaggerated, they typically require significant manual effort, lack true context awareness, and rely heavily on marketing buzzwords like “self-healing tests,” “autonomous agents,” “AI-driven quality,” “zero-touch automation,” “intelligent test orchestration,” and “continuous risk-based optimization.” In truth, most of these tools work more like Level 2, they help people but don’t really work alongside them. They still need detailed prompts, human guidance, and corrections to get good results. That said, some tools are starting to explore real Level 3 features. Early versions show potential. Progress is slow but steady, with better context awareness and more independence pushing things forward.

^{AI Maturity Level: Indicates the level the technology vendors claim to have reached in deploying AI solutions that actually work in real-world applications}

Test implementation, automation & test data generation

The phases of finalising testware by developing, maintaining, and automating executable test scripts, harnesses, and representative test data to enable efficient, repeatable, and scalable test execution.

Maturity Levels

Level	Name	Description	Technology	Example tools
0	Non-Existent	All test scripts and data are created and maintained manually; no AI assistance is used.	None	None
1	One-Off Assist	Engineers prompt an LLM to generate a skeleton script, a SQL dataset, or a simple page‑object and then refine manually., there are no corporate guidelines, shared prompt libraries, or optimisation practices, so results vary widely between individuals and are difficult to reproduce or scale.	Off-the-shelf LLM & prompt engineering code completers in IDE’s	LLM chatbots / answer engines chatGPT(openAI), Gemini google studio AI (google), Claude desktop (Anthropic), Perplexity (Perplexity AI), … IDE’s with AI capabilities Cursor ,windsurf, junie, github copilot … CLI code assistants Codex (openAI), Gemini CLI (Google), Claude Code (Anthropic), …
2	Integrated Assist	AI in IDEs/frameworks generates maintainable code, test data, and converts test cases to scripts based on human input. Integrated prompt standards and feedback loops ensure consistent, scalable results.	LLM + code embeddings, s, test‑data synthesis libs MCP AI Agents / Autonomous Agents LLMOps (prompt / template management, deployment, guardrails)	User customized AI Agents / Personalised AI Tools customGPTs on chatGPT, Gems on Google Gemini, Artifacts on Claude, Spaces on Perplexity, … IDE’s with AI capabilities Cursor, windsurf, junie, github copilot, … CLI code assistants Codex (openAI), Gemini CLI (Google), Claude Code (Anthropic), … MCPs Playwright, … Off-the-shelf tools Testcraft.
3	AI‑Human Collaboration	AI agent(s)/system(s) acts as an entry-level test automator. The AI system is fully context‑aware of the project: it implements tests for new requirements, refactors and optimises the automation suite, synthesises sophisticated test data (synthetic or privacy‑masked), and flags redundant scripts, always with human experts supervising and validating its output.	Agentic frameworks advanced RAG orchestration LLMOps (prompt / template management, deployment, guardrails) MCP	Agentic frameworks Langgraph as orchestrator, vector DBs as knowledge, prompts and a model for interaction IDE’s with AI capabilities Cursor, windsurf, junie, github copilot, … CLI code assistants Codex (openAI), Gemini CLI (Google), Claude Code (Anthropic), … MCPs Playwright, … Off-the-shelf tools*** Cypress cy.prompt, coTestPilot for testers, coTestPilot for developers, TestZeus-Hercules, Magic Inspector, Wopee, Katalon, Applitools, UIPath, Testers.ai, …
4	Full Autonomy	Autonomous AI maintains scripts and data, migrates frameworks, manages test infrastructure, generates mocks/stubs, and runs test sets unsupervised. Human involvement is confined to strategic governance, on demand, the autonomous AI must supply a transparent, traceable explanation of its actions, input data, and decision rationale.	Autonomous agents, self‑healing AI, continual learning pipelines	End‑to‑end automation orchestrators (no knowledge of tools who operate at his level)

^{AI Maturity Level: Indicates the level the technology vendors claim to have reached in deploying AI solutions that actually work in real-world applications}

Test execution

The activity of running test suites (groups/folders/sets of test cases/scenario’s/scripts), comparing actual and expected outcomes, logging incidents, and collecting metrics in the designated environment.

Maturity Levels

Level	Name	Description	Technology	Example tools
0	Non-Existent	All tests are executed manually/automated with no AI support or AI enhanced automation; results are logged by hand.	None	None
1	One-Off Assist	Testers occasionally use an LLM to auto‑generate a command‑line or interpret a log snippet to speed up manual execution, results vary widely between individuals and are difficult to reproduce or scale.	Off-the-shelf LLM & prompt engineering MCPs on off-the shelf LLMs	LLM chatbots / answer engines chatGPT(openAI), Gemini google studio AI (google), Claude desktop (Anthropic), Perplexity (Perplexity AI), … IDE’s with AI capabilities Cursor ,windsurf, junie, github copilot, …
2	Integrated Assist	AI is built into execution frameworks or processed to schedule suites, classify failures in real‑time dashboards and execute tests based on high level natural language descriptions.	LLM + code embeddings, s, test‑data synthesis libs MCP AI Agents / Autonomous Agents LLMOps (prompt / template management, deployment, guardrails)	User customized AI Agents / Personalised AI Tools customGPTs on chatGPT, Gems on Google Gemini, Artifacts on Claude, Spaces on Perplexity, … IDE’s with AI capabilities Cursor ,windsurf, junie, github copilot, … CLI code assistants Codex (openAI), Gemini CLI (Google), Claude Code (Anthropic), … MCPs Playwright MCP, Selenium MCP, Appium gestures MCP, …
3	AI‑Human Collaboration	AI agent(s)/system(s) acts as an entry level tester. who starts and monitors live runs, predicts remaining duration, suggests selective re‑runs, applies self‑healing and surfaces likely root causes for failed steps. While a human overseer guides and refines its output. It can also execute exploratory test flows based on high‑level natural‑language quality requests delivering summarised findings for human validation.	Agentic frameworks orchestration LLM + code embeddings, s, test‑data synthesis libs MCP AI Agents / Autonomous LLMOps (prompt / template management, deployment, guardrails)	Agentic frameworks Langgraph as orchestrator, vector DBs as knowledge, prompts and a model for interaction IDE’s with AI capabilities Cursor ,windsurf, junie, github copilot, … CLI code assistants Codex (openAI), Gemini CLI (Google), Claude Code (Anthropic), … MCPs Playwright, … Off-the-shelf tools*** coTestPilot for testers, coTestPilot for developers, TestZeus-Hercules, Magic Inspector, Wopee, Katalon, Applitools, UIPath, Testim, testers.ai, …
4	Full Autonomy	Autonomous execution agents provision environments, orchestrate parallelisation, self‑heal UI/API/… tests, run canary, chaos and other unsupervised experiments, continually optimising coverage, cost, and risk without hands‑on support. They also execute tests on the items under test, and can autonomously detect the need for and execute functional and non‑functional exploratory test flows from high‑level natural‑language quality objectives, Human interaction is limited to high‑level goal setting and periodic governance reviews, though the system remains available for on‑demand unsupervised natural‑language test runs. On demand, the autonomous AI must supply a transparent, traceable explanation of its actions, input data, and decision rationale.	Autonomous agents, reinforcement scheduling, chaos‑engineering AI	End‑to‑end execution orchestrators (no knowledge of tools who operate at his level)

^{AI Maturity Level: Indicates the level the technology vendors claim to have reached in deploying AI solutions that actually work in real-world applications}

Evaluating exit criteria & reporting

The activity of comparing actual test results and coverage to predefined exit criteria and producing concise, meaningful reports for stakeholders on product quality and residual risk.

Maturity Levels

Level	Name	Description	Technology	Example tools
0	Non-Existent	Exit criteria are evaluated manually. Reports are crafted by hand with no AI support. No quality analysis assistance is present.	None	None
1	One-Off Assist	Test analyst prompt an LLM with natural‑language summary generation capabilities summarise results into prose or create a simple chart for a one‑time release note. Prompts are improvised, with no standards or reusability.	Off-the-shelf LLM & prompt engineering MCPs on off-the shelf LLMs code completers in IDE’s	LLM chatbots / answer engines chatGPT (openAI), Gemini google studio AI (google), Claude desktop (Anthropic), Perplexity (Perplexity AI), le chat (Mistral AI), Qwen (Alibaba), Ollama (Meta), … IDE’s with AI capabilities Cursor ,windsurf, junie, github copilot, …
2	Integrated Assist	Standardised AI tasks are available and AI is integrated into tools to assist with KPI aggregation, script documentation, case-to-script mapping, and readiness scoring. Outputs are consistent and versioned.	LLM + code embeddings, s, test‑data synthesis libs MCP AI Agents / Autonomous Agents LLMOps (prompt / template management, deployment, guardrails) BI tooling, vector DB	User customized AI Agents / Personalised AI Tools customGPTs on ChatGPT, Gems on Google Gemini, Artifacts on Claude, Spaces on Perplexity, … IDE’s with AI capabilities Cursor ,windsurf, junie, github copilot, … CLI code assistants Codex (openAI), Gemini CLI (Google), Claude Code (Anthropic), …
3	AI‑Human Collaboration	AI agent(s)/system(s) acts as an entry-level functional reviewer, performing context-aware end-to-end technical or functional reviews of an item under test while a human overseer guides and refines its output. It can also generate stakeholder‑specific narrative reports and offers interactive Q&A on QA	Agentic dashboards causal analytics Agentic frameworks orchestration LLM + code embeddings, s, test‑data synthesis libs MCP AI Agents / Autonomous LLMOps (prompt / template management, deployment, guardrails)	Agentic frameworks Langgraph as orchestrator, vector DBs as knowledge, prompts and a model for interaction, IDE’s with AI capabilities Cursor ,windsurf, junie, github copilot, … CLI code assistants Codex (openAI), Gemini CLI (Google), Claude Code (Anthropic), … Off-the-shelf tools*** Katalon, Applitools, UIPath, Testim, testers.ai, …
4	Intelligent Automation	An autonomous quality‑governance agent continuously evaluates exit criteria and reports via live data, launching extra tests when needed. Human involvement is confined to strategic governance, on demand, the autonomous AI must supply a transparent, traceable explanation of its actions, input data, and decision rationale.	Autonomous agents, MLOps/CD integration, real‑time data streams	End‑to‑end quality governance platforms. (I have no knowledge of tools who operate at his level)

^{AI Maturity Level: Indicates the level the technology vendors claim to have reached in deploying AI solutions that actually work in real-world applications}

Test control

The ISTQB of comparing actual progress with planned progress, analysing variances, and taking corrective actions to meet test objectives.

AI Maturity Levels

Level	Name	Description	Technology	Example tools
0	Non-Existent	All test control is manual with no AI or automation. Variances are tracked by hand, with no predictive insights or decision support.	None	None
1	One-Off Assist	Test coordinators sporadically prompt an LLM to estimate remaining effort and surface process gaps or test debt,	Off-the-shelf LLM & prompt engineering MCPs on off-the shelf LLMs code completers in IDE’s	LLM chatbots / answer engines chatGPT (openAI), Gemini google studio AI (Google), Claude desktop (Anthropic) , Perplexity Perplexity AI), …
2	Integrated Assist	Specifically for this subdomain an AI is embedded in the test process to predict schedule slippage, KPI drift, and recommend minor scope changes or scope re-balance. Tasks are standardised, outputs consistent, and embedded in the test process.	LLM + code embeddings, s, test‑data synthesis libs MCP AI Agents / Autonomous Agents LLMOps (prompt / template management, deployment, guardrails) vector DB’s, BI tooling	User customized AI Agents / Personalised AI Tools customGPTs on ChatGPT, Gems on Google Gemini, Artifacts on Claude, Spaces on Perplexity, … NotebookLM
3	AI‑Human Collaboration	AI agent(s)/system(s) acts as an entry-level test coordinator who performs continuous what‑if analysis, correlates business impact, and proposes corrective test actions with explanations. It answers complex test control queries. Humans oversee and validate.	Causal inference models simulation agentic planners LLM + code embeddings, s, test‑data synthesis libs LLMOps (prompt / template management, deployment, guardrails) MCP vector DB’s, BI tooling	Agentic frameworks Langgraph as orchestrator, vector DBs as knowledge, prompts and a model for interaction, …
4	Full Autonomy	An autonomous agent simulates, forecasts, adjusts scope/staffing, and answers analytic queries. Human input is strategic only, with explainable outputs	Autonomous agents, reinforcement scheduling, closed‑loop governance	End‑to‑end adaptive QA orchestrators (I have no knowledge of tools who operate at his level)

^{AI Maturity Level: Indicates the level the technology vendors claim to have reached in deploying AI solutions that actually work in real-world applications}

Smals Research

Software testing

Test planning & management

Maturity Levels

Test analyse & design

Maturity Levels

Test implementation, automation & test data generation

Maturity Levels

Test execution

Maturity Levels

Evaluating exit criteria & reporting

Maturity Levels

Test control

AI Maturity Levels