Data Architecture for AI That Consumes Data Differently

This article is part of a 27-article series on the AI Business Transformation Methodology. This piece addresses the data architecture that the CIO and CDO’s organization must build at Level 4 to feed the AI systems deployed into redesigned workflows, and why that architecture is structurally different from anything the enterprise has built before.

Plaster Group Five-Level AI Business Transformation Methodology — Strategy, Transformation Imperatives, Workflow Transformation, AI Enablement, Continuous Transformation, with feedback loop from Level 5 back to Level 1.

Article 13 established a principle that the rest of the Methodology depends on: data constraints are discovered during workflow redesign, not assumed or solved beforehand. The workflow redesign teams at Level 3 did exactly what Article 12 prescribed. For each step in the redesigned workflow, they identified what data the AI needs, where it comes from, whether it is available and clean enough, and who owns it. Where the data did not exist or was not accessible, they flagged it immediately. Article 13 gave them four options for handling what they found: scope the design to current data realities, include a data readiness workstream, escalate to the Level 1 triad, or use AI to accelerate readiness.

That discovery and documentation work produced something most organizations attempting AI at scale do not have: a precise specification of what data the redesigned workflows require, from which systems, at what quality level, with identified gaps and resolution plans. Article 13 was explicit about the handoff: Level 3 produces the specification. Level 4 builds the infrastructure that makes the data available. Articles 17 through 20 addressed the CIO's side of that infrastructure. This article addresses the CDO's side.

This article is about what the CIO and CDO’s organization builds.

The urgency is not theoretical. McKinsey’s research across enterprises scaling AI found that while nearly two-thirds of enterprises have experimented with AI agents, fewer than 10% have scaled them to tangible value, and eight in ten companies cite data limitations as the roadblock.¹ Deloitte’s survey of 3,235 senior leaders found that only four in ten believe their data management is ready for AI, down from 43% the previous year, a decline that suggests organizations are discovering their infrastructure’s shortcomings as they actually attempt to deploy.² The Stanford Digital Economy Lab’s study of 51 successful enterprise AI deployments found that only 6% had data that was fully ready for AI. Yet 91% of those deployments successfully processed unstructured data that would have been unusable two years ago, and in 88% of cases, the AI itself unlocked data that was previously inaccessible.³ The research tells two stories simultaneously: data architecture is one of the most consequential investments in the entire transformation, and the organization does not need perfect data to begin. It needs the right architecture to make imperfect data usable while improving it continuously.

The CIO's organization is doing parallel data work at Level 4 that this article does not duplicate. Article 20's seven-environment topology includes a dedicated Data environment where masking, synthesis, and alignment of structured, unstructured, and semi-structured test data happens to feed the build, test, training, and staging environments where AI configuration and validation run. That environment is a build-and-test discipline owned by the CIO's organization. This article addresses the production data architecture the CDO's organization builds at Level 4 to support AI consumption of data at enterprise scale: unstructured data integration pipelines, semantic layer, continuous quality monitoring, governance embedded in pipelines, and the AI-specific access controls and audit trail. The two articles describe complementary work that the CIO and CDO organizations execute in parallel during waves 1 and 2, and that operates as steady-state engineering capability from wave 3 onward.

Why does AI break traditional enterprise data architecture?

AI breaks traditional data architecture because that architecture was built to serve structured data in predefined schemas, processed in overnight batches and read by humans — none of which matches what AI systems need. The CDO’s organization knows the traditional three-tier enterprise data warehouse well: source systems feed data through ETL pipelines into a central repository, where it is organized into OLAP cubes and dimensional models, and served to dashboards, reports, and business intelligence tools that humans read to make decisions. Tables have fixed columns designed before the data arrives. Processing happens in batches, typically overnight, with latency measured in hours or days. Governance is applied at system boundaries. Master data management is a periodic exercise in cleansing, deduplication, and standardization. The entire architecture produces a single source of truth that all reports reference.

This architecture was built for a specific purpose: serve structured data to humans who provide their own context when they interpret it. A finance director reading a quarterly revenue dashboard brings decades of domain knowledge to the numbers. They know which product lines are seasonal, which customers are strategic, which variances are meaningful. The data architecture never needed to encode that context because the human consumer supplied it.

AI consumes data in fundamentally different ways, and the differences are structural, not incremental. AI needs unstructured data alongside structured data. It needs data delivered with business context because it cannot infer that context independently. It needs real-time access rather than batch processing. Its governance cannot sit at system boundaries because data is continuously ingested, transformed, and recombined as it flows into models. And AI generates new data that must be governed as rigorously as any other enterprise data. Each of these differences requires architectural capabilities that the traditional data infrastructure does not have. The sections that follow address each one prescriptively.

How do you make unstructured data usable for AI?

Making unstructured data usable is the largest new data capability AI requires, because 90% of enterprise data is unstructured — contracts, emails, support transcripts, technical manuals, clinical notes, presentations, images, video — and less than 1% of it is being used in AI today.⁴ Yet IBM’s internal testing found that connecting AI to properly curated unstructured data produces 40% more accurate outputs than conventional retrieval approaches.⁴ The workflow designs from Level 3 identified which unstructured data each AI-enabled step needs. The CDO’s organization now builds the pipelines that make it accessible.

This is the single largest new capability the CDO’s organization must build, because traditional data architecture was never designed to handle unstructured content at enterprise scale. The discipline is called unstructured data integration, and it reimagines the traditional ETL process for content that does not fit into tables and schemas. IBM describes the end-to-end workflow: connect to raw unstructured data sources, enhance data quality by structuring, enriching, and cleansing the content, remove sensitive information such as personally identifiable data, and deliver the refined output to systems ready for use, whether that is a vector database, a language model, or an analytics engine.⁴ The pipeline automates text chunking, embedding generation, and vectorization, turning documents into representations that AI can search by meaning rather than keywords. It applies quality filtering to remove irrelevant content, language detection to handle multilingual environments, and confidence scoring to flag content whose quality may be insufficient for production use.

The Stanford research provides the empowering counterpoint to the scale of this challenge. In the 51 successful deployments studied, LLMs were not just consuming data. They were fixing it. AI processed voice transcripts, scanned documents, legacy code, and scattered knowledge bases that no prior technology could handle at the accuracy and scale required. A semiconductor manufacturer reduced data gathering time more than tenfold by deploying a multi-agent framework that pulled information from five or six different repositories automatically. A telecom company enabled field technicians to photograph equipment and receive instant AI-generated repair instructions by connecting visual data to technical documentation. The data was technically available but practically inaccessible until AI made it usable.³

The practical lesson for the CDO’s organization is that these pipelines must be reusable and repeatable, not one-off manual efforts. Each workflow design from Level 3 identified specific unstructured data sources. The CDO’s team builds pipelines that can serve multiple workflows across multiple domains, creating data assets that compound in value as the transformation expands. The construction company case study demonstrated the progressive approach: extract what you can, cleanse with AI, match against reference data despite imperfections, and reserve human review for exceptions. Each pipeline stage adds value even with imperfect input from the previous stage. The CDO does not need perfect unstructured data before deployment. They need the architecture to make it progressively better.³

Why must AI governance travel with the data?

AI governance must travel with the data because the data no longer sits still between checkpoints the way traditional architecture assumes — this is the most consequential conceptual shift for the CDO’s organization. In traditional data architecture, governance is a gate. Access controls are enforced when data enters the warehouse. Permissions are checked when users query reports. Quality is validated during periodic cleansing exercises. The data sits still between these checkpoints, and the governance model assumes it will. AI moves data continuously, so governance has to move with it.

AI data does not sit still. McKinsey’s research on data architecture for AI is direct: unstructured data is continuously ingested, transformed, and recombined as it flows into models, and governance must travel with it. Data quality checks, security controls, and lineage tracking must be automated and embedded directly into the pipelines, not handled as one-time reviews.¹ The governance framework from Article 7, operationalized into the workflow designs at Level 3, specified risk classifications, accountability structures, and oversight requirements for each AI-enabled step. Those specifications now become technical requirements for how the data architecture handles data in motion.

The CDO’s organization must make four governance decisions that have no precedent in traditional data architecture.

First, establish AI training data standards. Every dataset that enters an AI pipeline, whether for training, fine-tuning, or providing context through retrieval, must meet documented standards for provenance (where it came from, how it was collected, what consent framework applies), quality (measured completeness, accuracy, and freshness against defined thresholds), bias assessment (profiled for representation across protected categories), and usage rights (licensing for AI use, IP implications, regulatory constraints). This is the single most impactful governance action the CDO can take. No dataset enters an AI pipeline without passing this review.

Second, implement data-centric monitoring for AI systems. The CIO’s team monitors model performance. The CDO must own the data side: input data quality monitoring that tracks the quality of data feeding production AI systems in real time, because when input quality degrades, output quality follows. Data drift detection with automated alerts when the statistical properties of the data shift beyond normal bounds due to seasonality, market changes, or evolving customer behavior. Ground truth validation through regular sampling of AI outputs compared against human-verified correct answers. And feedback loop management that monitors for quality degradation when AI outputs generate new data that feeds back into enterprise systems.

Third, define access governance specifically for AI consumers. AI systems are voracious data consumers, and without access governance they become the largest data exposure risk in the organization. The principle of least data means AI systems receive only the data fields necessary for their function. Purpose limitation prevents data collected for one purpose from being used in AI training without explicit governance approval. Temporal controls define retention periods for AI training datasets because historical data older than three to five years may introduce bias by reflecting outdated patterns. And cross-border considerations map data flows across jurisdictions because AI training data moves across cloud regions and API calls to model providers.

Fourth, build the AI data audit trail. For every AI system classified as high-risk under the governance framework from Article 7, the CDO needs an auditable record: which datasets and which versions were used to train each model version, what data entered the system and when and from which source, what the AI produced and what action was taken on it, and for AI-assisted decisions, clear attribution of which data points influenced the output. The EU AI Act requires this documentation for high-risk AI systems. The enforcement timeline has moved: the Digital Omnibus package, endorsed by the European Parliament on June 16, 2026 and given final Council approval on June 29, 2026, postponed the Annex III high-risk obligations that were due in August 2026 to December 2, 2027, and the Annex I obligations for AI embedded in regulated products to August 2, 2028, with publication in the Official Journal expected before the original August 2026 date.⁵ The requirements themselves did not change, the GPAI model obligations in force since August 2025 are unaffected, and the legal commentary is consistent: use the extra time to build the documentation capability rather than defer it. The readiness data explains why. Deloitte finds that only 21% of organizations report a mature agentic AI governance model, and warns that skipping guardrails up front makes retrofitting oversight slower and costlier.⁵ The CDO who builds training data provenance into the standard data management process prevents compliance exposure later, and arrives at the December 2027 deadline holding an asset rather than a backlog.

IBM’s approach to this challenge makes governance inherited rather than imposed. Access controls from source document systems flow through to AI retrieval, with PII annotation preventing sensitive information from surfacing in AI outputs. The governance does not require a separate enforcement layer because it is embedded in the data pipeline itself.⁴ The research describes this as building trust into the platform by default: security, access controls, privacy, and AI governance should be automatic, not added later or managed manually.¹

What is the semantic layer in AI data architecture?

The semantic layer is an entirely new architectural capability that teaches data to describe its own meaning — something traditional data architecture does not have because it never needed it. It is the mechanism by which AI understands what the data means, not just what it contains.

When a finance director reads a revenue dashboard, they bring business context to the numbers. They know that "customer" in the CRM means something different from "customer" in the billing system. They know which product categories map to which business units. They understand the relationships between entities that no schema explicitly encodes. The data architecture never needed to make this context machine-readable because the human consumer provided it.

AI cannot provide that context independently. MIT Technology Review’s research, produced in partnership with SAP, is direct: the real risk for agentic AI is not lack of data, but lack of grounding. High-value data for AI agents is defined less by format and more by business context.⁶ Without a layer that encodes business meaning into machine-readable form, AI agents may act on incomplete or conflicting interpretations of the same data, increasing error rates and operational risk as scale grows.

McKinsey describes the semantic layer as the component that turns data into knowledge. It sits between raw data and AI applications and codifies the business meaning of data. Ontologies define how attributes and relationships add up to business reality. Knowledge graphs operationalize this vocabulary by linking real-world data across systems into a connected network of entities.¹ In practice, this means the CDO’s organization builds a machine-readable map of the enterprise’s data: what business entities exist, how they relate to each other, and what rules govern them. When an AI agent encounters "customer" in two different systems, the semantic layer tells it whether these are the same entity, how they relate, and which business rules apply.

Deloitte frames the challenge as a paradigm shift from traditional data pipelines to enterprise search and indexing, similar to how Google made the World Wide Web discoverable. This approach contextualizes enterprise data through knowledge graphs, making information discoverable without requiring the CDO to centralize or restructure all the underlying data.² The researchers reinforced this: 59% of successful implementations had data scattered across multiple systems, but success required access, not centralization. A telecom company built different knowledge bases for different equipment types, indexed them, and gave AI agents access through model context protocol without ever centralizing the underlying data.³

The semantic layer is not a one-time project. It evolves as the organization’s understanding of its data deepens and as new workflow designs introduce new data relationships. The CDO’s organization should start with the data domains the highest-priority workflow designs require and expand as subsequent domains enter Level 4.

Why does AI require continuous data quality instead of periodic cleanup?

AI requires continuous data quality because periodic cleanup projects decay: the data drifts back toward entropy within months because the processes that created the quality problems are still running. Traditional data quality is a project. The organization runs a cleansing initiative, deduplicates records, standardizes formats, validates against reference data, and declares the data clean. Six months later, the data has drifted back toward entropy. AI-enabled workflows cannot tolerate that decay, so quality has to become a continuous, automated discipline.

AI data quality is an operational discipline. McKinsey’s research prescribes the shift: organizations must move from periodic data cleanup to continuous, real-time quality management, supported by AI-enabled automated validation, anomaly detection, and enrichment pipelines that prevent issues from propagating across workflows.¹ This is not an incremental improvement on traditional data quality practices. It is a fundamentally different operating model where quality monitoring runs continuously against every data pipeline that feeds an AI system.

The workflow designs from Article 12 provide the quality specifications. Each governance classification tells the CDO what quality level each data flow must achieve. A workflow step classified as high-risk under the governance framework demands higher data quality thresholds than a workflow step where human oversight catches exceptions. The CDO’s organization builds continuous monitoring against these specifications: automated alerts when data quality drops below the threshold a specific workflow step requires, automated validation that catches inconsistencies before they reach the AI system, and anomaly detection that identifies when data patterns are shifting in ways that could degrade AI performance.

The Stanford research provides the realistic anchor for how to approach this discipline. Only 6% of successful implementations had fully ready data. The organizations that succeeded did not achieve perfection before deployment. They built architectures that improved data quality continuously as the AI used it. The construction company case study demonstrated this with a four-stage progressive pipeline where each stage added value even with imperfect input from the previous stage. The practical advice was to design for “good enough” rather than perfection, with the understanding that “good enough” improves over time through continuous monitoring and iteration.³

One dimension of continuous quality that traditional data architecture never anticipated: AI-generated data must meet the same governance and quality standards as any other enterprise data. When AI agents make decisions, create summaries, or generate content within the redesigned workflows, those outputs become enterprise data that may feed into other systems or inform other AI agents. The research is explicit: organizations must apply the same quality, lineage, and reconciliation standards to agent-generated outputs, including data retrieved or written through agent-invoked tools and APIs.¹ Without this discipline, errors compound across automated workflows in ways that are difficult to trace after the fact.

Why is proprietary data a competitive advantage in AI?

Proprietary data is a competitive advantage because every frontier AI lab is training on the same public data, so organizations cannot compete there — but every company has proprietary data that no AI lab has ever seen or is permitted to see, and that data is their edge. The public-data axis is a race no enterprise can win. The proprietary-data axis is the one where competitive differentiation is actually possible.

The Stanford research found that 75% of successful AI implementations mentioned proprietary data as a key factor in their strategy, and 47% explicitly described their accumulated data as a competitive moat. The pattern was consistent across industries: the organizations generating the most value from AI were those that had been storing data, even imperfect data, long before they knew how they would use it.³ An HR technology company built a knowledge graph of over 20 billion data points accumulated over 13 years. A technology company’s CTO described the competitive dynamic directly: differentiation requires what others cannot replicate quickly, and proprietary data is the hardest asset for competitors to reproduce.

Accenture’s research reinforces the strategic dimension: organizations creating enterprise-level value from AI are 2.9 times more likely to have a comprehensive data strategy supporting their efforts.⁷ The methodology gives the CDO a specific advantage here. The workflow designs from Article 12 identify which proprietary data gives the organization’s AI its competitive edge, because the designs specify what data each AI-enabled step needs to perform capabilities that differentiate the business. The CDO’s job is to ensure that data is accessible, governed, and continuously enriched.

The practical implication is straightforward: preserve everything that governance permits. The cost of storing data is negligible compared to the cost of not having it when the right capability arrives. The temporal controls and PII protections from the governance section apply to this preservation rather than override it: bias-introducing historical data is excluded from training datasets, sensitive content is masked or excluded as the governance framework specifies, and retention policies operate as designed. Within those boundaries, organizations that preserve their proprietary data, however imperfect, are building a competitive advantage that compounds over time. As open-source models close the performance gap with proprietary ones, the differentiator shifts from which model you use to what data you feed it.³

How do Level 3 deliverables become data architecture requirements?

The Level 3 workflow designs become the specific, quality-gated requirements for exactly what data the AI needs — the requirements most organizations attempting AI data architecture lack. The research identifies this as the fundamental gap: eight in ten companies cite data limitations as a roadblock to scaling AI, yet most are building generic data platforms without knowing what their AI systems will specifically require.¹ They are building the warehouse before they know what will be stored in it. This is the methodology’s most concrete advantage for data architecture, and it parallels the equivalent sections in Articles 18 and 19.

The methodology ensures the CDO’s organization knows exactly what to build because the Level 3 deliverables provide the specifications.

Each workflow design from Article 12 specifies what data each AI-enabled step needs, from which enterprise systems, in what format, at what freshness level, and under what governance classification. Article 13’s data readiness assessment documented which of those data requirements are currently met, which have gaps, and what resolution path was selected for each gap. The cross-department interface specifications from Article 14 identify where AI-enabled workflows in one domain depend on data from systems owned by another domain.

These specifications translate directly into data architecture priorities. The data sources that appear across the highest-priority workflows with the most demanding requirements should be addressed first. Workflows whose governance classifications demand the highest data quality should receive the most rigorous continuous monitoring. Unstructured data sources that multiple workflow designs identified should get reusable pipelines that serve all of them. The semantic layer should start with the data domains the first-wave workflow designs require.

The CDO’s organization should also validate the Level 3 specifications against what this article describes. The workflow redesign teams documented data requirements through the lens of business process design. The CDO’s team validates those requirements through the lens of data architecture: do the specifications adequately capture unstructured data needs alongside structured ones? Are the latency requirements specific enough to determine whether batch processing is sufficient or real-time access is needed? Are the governance classifications detailed enough to determine what level of access control, audit trail, and monitoring each data flow requires? Where the Level 3 specifications need more detail, the CDO’s team works with the domain owner to refine them before building.

This validation is not a rework of Level 3. It is the natural deepening that happens when the data architecture team applies their expertise to specifications the business team produced. The business team identified what data the workflow needs and what is missing. The CDO’s team determines how to provide it at the quality, freshness, governance, and scale the deployed AI systems require.

What Comes Next

Data architecture is the foundation that every deployed AI system depends on. Article 22 covers the training methodology that develops the judgment-based competence the workforce needs to operate within AI-enabled workflows. Article 23 establishes how Level 4 is measured, where the organization assesses whether the deployed AI systems are producing the business outcomes the transformation was designed to achieve.

The research consistently identifies data as the primary barrier to AI scaling. But the research also shows that the barrier is not bad data. It is the wrong architecture for how AI consumes data. Traditional data architecture serves humans reading reports. AI data architecture serves systems that need unstructured data alongside structured, business context encoded in machine-readable form, real-time access rather than batch processing, governance embedded in the pipeline rather than applied at boundaries, and continuous quality monitoring rather than periodic cleanup. The CDO’s organization builds these capabilities at Level 4, guided by the specific data requirements the workflow designs produced at Level 3. That specificity is what separates the organizations in the research that scale AI successfully from the eight in ten that cite data as the reason they stalled.

Start a Conversation

Sources

1.McKinsey, “Building the Foundations for Agentic AI at Scale,” April 2026. Eight in ten companies cite data limitations as roadblock; seven data architecture principles for scale; four-step methodology for data readiness; semantic layer turns data into knowledge through ontologies and knowledge graphs; governance must travel with data; continuous real-time quality monitoring replaces periodic cleanup; agent-generated data must meet same governance standards https://www.mckinsey.com/capabilities/mckinsey-technology/our-insights/building-the-foundations-for-agentic-ai-at-scale
2.Deloitte, “Agentic AI Strategy,” December 2025 (2025 Emerging Technology Trends study, 3,235 leaders). Only 4 in 10 believe data management ready for AI, down from 43% previous year; paradigm shift from traditional data pipelines to enterprise search and indexing; 48% cite searchability, 47% cite reusability as challenges; legacy data architectures cannot power real-time autonomous AI https://www.deloitte.com/us/en/insights/topics/technology-management/tech-trends/2026/agentic-ai-strategy.html
3.Stanford Digital Economy Lab, “The Enterprise AI Playbook: Lessons from 51 Successful Deployments,” March 2026 (Pereira, Graylin, Brynjolfsson). Only 6% had fully ready data; 91% processed unstructured data successfully; 88% unlocked previously inaccessible data; 59% had scattered data but success required access not centralization; 75% cited proprietary data as key factor; LLMs fixed data problems they were expected to struggle with https://digitaleconomy.stanford.edu/app/uploads/2026/03/EnterpriseAIPlaybook_PereiraGraylinBrynjolfsson.pdf
4.IBM, “Enabling AI at Scale with Unstructured Data Integration and Governance,” November 2025. 90% of enterprise data is unstructured, less than 1% used in AI (IDC); 40% more accurate AI outputs with properly curated unstructured data; unstructured data integration reimagines ETL for content that does not fit tables; governance must be applied to unstructured data with same rigor as structured data https://www.ibm.com/think/insights/enable-ai-unstructured-data-integration-governance
5.Gibson Dunn, “EU AI Act Omnibus Agreement: Postponed High-Risk Deadlines and Other Key Changes,” June 2026, https://www.gibsondunn.com/eu-ai-act-omnibus-agreement-postponed-high-risk-deadlines-and-other-key-changes/
6.MIT Technology Review, “Building a Strong Data Infrastructure for AI Agent Success,” March 2026 (in partnership with SAP). Real risk for agentic AI is not lack of data but lack of grounding; high-value data defined by business context not format; semantic layer encodes business rules and relationships; legacy architectures cannot power autonomous AI systems; two-thirds of business leaders do not fully trust their data https://www.technologyreview.com/2026/03/10/1134083/building-a-strong-data-infrastructure-for-ai-agent-success/
7.Accenture, “Making Reinvention Real with Gen AI,” March 2025 (3,000+ C-suite executives, 2,000+ gen AI projects). Organizations creating enterprise-level value are 2.9x more likely to have comprehensive data strategy; proprietary data sources refined into core data products are critical for differentiation https://www.accenture.com/us-en/insights/consulting/making-reinvention-real-with-gen-ai

Frequently Asked Questions

Our data is scattered across dozens of systems. Do we need to centralize it before deploying AI?

No. The Stanford research found that 59% of successful AI implementations had data scattered across multiple systems owned by different teams, and only 16% had fully centralized data. Success did not require centralization. It required access. Organizations that built integration layers to connect scattered data, whether through APIs, retrieval-augmented generation architectures, or multi-agent frameworks, performed as well as those with centralized stores. The CDO’s organization should build access and retrieval infrastructure that connects the AI to data where it lives, governed by the access controls and quality standards the workflow designs specify. Centralization is one option for some data domains, but it is not a prerequisite for deployment.

How do we govern unstructured data when we have never governed it before?

Start with the unstructured data sources the highest-priority workflow designs identified. Enterprise-wide unstructured data governance is a multi-year journey, but the deployed AI systems do not need all unstructured data governed. They need the specific sources the workflow designs require. Build the governance into the unstructured data pipelines themselves: classification, PII detection and masking, lineage tracking, and access controls applied as the data flows through the pipeline rather than enforced at a separate gate. The CDO’s organization does not need to solve unstructured data governance for the entire enterprise before deploying AI. They need to solve it for the data domains the first-wave workflows require and expand from there.

What does the CDO’s organization need to build that it does not have today?

Four architectural capabilities that traditional data architecture does not provide, with two governance sub-disciplines inside one of them. Unstructured data integration pipelines that ingest, transform, enrich, and govern content that does not fit into tables and schemas. Pipeline-embedded governance that travels with the data rather than sitting at system boundaries, including the AI data audit trail that documents training data provenance, input logging, output logging, and decision attribution, and the AI-specific access governance designed for AI consumers rather than human users. A semantic layer that encodes business meaning into machine-readable form through ontologies and knowledge graphs. Continuous quality monitoring infrastructure that runs against every AI data pipeline in real time. None of these existed in the traditional data architecture because no prior technology consumed data the way AI does.

How do we know what data quality level is “good enough” for AI?

The workflow designs from Level 3 provide the answer. Each workflow step carries a governance classification that specifies the level of human oversight and the acceptable quality threshold. A workflow step where AI makes recommendations that a human reviews before acting can tolerate lower data quality than a workflow step where AI takes autonomous action. The Stanford research documented organizations that deployed with imperfect data and succeeded because they designed human oversight to catch the errors that data quality gaps would produce. The acceptance threshold is not a single standard applied uniformly. It is a judgment made against each workflow step’s specific requirements, governance classification, and human oversight architecture. The domain owner makes that judgment, informed by the CDO’s assessment of actual data quality.

What is the semantic layer and why does AI need it when our BI systems never did?

The semantic layer encodes the business meaning of data into a form that machines can interpret. It defines what business entities exist (customers, products, transactions, employees), how they relate to each other, and what rules govern them. Your BI systems never needed this layer because humans provided the context: a finance director reading a report knows that the same customer may appear under different identifiers in different systems. AI cannot make that inference without the semantic layer telling it so. When AI agents retrieve data from multiple enterprise systems simultaneously, the semantic layer ensures they interpret the data consistently, understand which entities are the same across systems, and apply the correct business rules. Without it, AI agents acting on data from different systems may reach conflicting conclusions about the same business question.

How do we prevent AI from amplifying bad data into bad decisions at scale?

Three disciplines working together. First, continuous quality monitoring that detects when input data quality degrades before the AI produces outputs based on degraded data. Second, governance embedded in the pipeline that enforces quality thresholds, access controls, and audit trails as data moves through the architecture rather than relying on periodic reviews. Third, the human oversight architecture from the workflow designs: the governance classifications from Article 7, operationalized at Level 3, specified where human review is required precisely because the risk of error warrants it. When all three disciplines function together, bad data is caught by monitoring, prevented from reaching high-risk workflows by governance controls, and caught by human reviewers where governance classifications require oversight. The risk is not eliminated but it is managed to the level the governance framework specified.

This series addresses “what” to do, not “how” to do it. If you are a business executive and would like help thinking through the “how,” please feel comfortable reaching out.

About the author

Shawn Plaster

Founder & CEO, Plaster Group

Shawn is the author of Plaster Group's five-level AI Business Transformation Methodology and its 27-article Insights series, and leads the firm's enterprise AI transformation work.

Ready to move forward?

Let's discuss how your organization can build with AI — securely, strategically, and starting from where you are today.

Start a Conversation