Show:
A:
B:

By Role

By Organization Size

By Industry

AI Usage Frequency

Storage Environment

Architecture Trend

Team Growth Expectations

Biggest Bottleneck

By Region

Data Modeling Approach

Data Modeling Pain Points

Desired Training Topics

Select dimensions to see cross-tabulation

Showing 0 of 0 responses

Loading responses...

Page 1
SQL Query
Recent Queries
Results

Run a query to see results

Available Columns
timestamp role org_size industry team_focus storage_environment orchestration ai_usage_frequency ai_helps_with ai_adoption modeling_approach modeling_pain_points architecture_trend biggest_bottleneck team_growth_2026 education_topic industry_wish region
Example Queries

Explore the Data Yourself

This survey explorer includes interactive tools to dive deeper into the findings:

  • Charts — Filter and visualize responses by role, region, org size, and more
  • Crosstab — Cross-tabulate any two dimensions to discover patterns
  • Responses — Browse individual survey responses with open-ended answers
  • SQL Query — Run custom SQL queries directly against the dataset using DuckDB

Use the tabs above to explore, or read the full report below.

Welcome to the 2026 State of Data Engineering Survey.

In late 2025, I asked this community to share how you actually work. Here's the real world, no BS results, scar tissue and all. What tools you use. What slows you down. What keeps you up at night. 1,101 of you responded (thanks to the last person who tipped it past 1,100).

The findings may or may not surprise anyone who's been in the trenches. I found myself surprised by some of the results.

It won't surprise you that AI tools are everywhere now: 82% of you use them daily. What surprised me is how organizational AI adoption lags way behind. Also, data modeling is a mess. Most say the pressure to "move fast" is their biggest pain point, and only a small number say modeling is going well. And the bottlenecks that matter most aren't technical. Lack of leadership direction and poor requirements rank nearly as high as legacy systems and tech debt.

This isn't a vendor-sponsored report with an agenda. It's our community talking to itself.

I built an interactive explorer so you can slice the data however you want — by role, org size, industry, region. Run your own queries. Download the raw CSV. I'm certain there are patterns in here I didn't see.

Find them. Tell me what you learn.

Thanks and enjoy,

Joe Reis

THE 2026 Practical Data Community

STATE OF DATA ENGINEERING

SURVEY REPORT

A comprehensive survey of 1,101 data professionals
on tools, trends, challenges, and the future of the field

February 2026

Executive Summary

The 2026 State of Data Engineering Survey provides an in-depth look at the current landscape of data engineering, based on responses from 1,101 data professionals across six continents. The survey was conducted from December 2025 to early January 2026 via the Practical Data community and LinkedIn, capturing insights from practitioners, managers, and architects across industries, including technology, healthcare, finance, and manufacturing.

Key Findings at a Glance

82% Data professionals using AI tools daily or more frequently
25% Cite legacy systems and technical debt as their biggest bottleneck
42% Expect their data teams to grow in 2026
44% Use cloud data warehouses as their primary storage/processing environment

The survey reveals a field in transition. While cloud data warehouses remain dominant, lakehouse architectures are gaining ground, particularly in Europe and Latin America. AI tools have become ubiquitous, with only 3.7% of respondents finding them unhelpful. However, organizational challenges, including poor leadership direction and unclear requirements, continue to outweigh technical obstacles as the primary impediments to success.

Perhaps most notably, data modeling has emerged as a critical pain point. Nearly 90% of respondents report challenges with their modeling approach, with pressure to move fast and a lack of clear ownership topping the list. This finding aligns with strong demand for data modeling education, which ranks second among requested training topics, behind AI/LLM integration.

Methodology

Survey Design and Distribution

The survey consisted of 17 questions covering role demographics, technology stack, AI adoption, organizational challenges, and future outlook. Questions included single-select, multi-select, and open-text formats to capture both quantitative trends and qualitative insights.

The survey was distributed through two primary channels: the Practical Data community (Substack and newsletter subscribers) and LinkedIn. Data collection occurred over a two-week period in late 2025.

Response Demographics

A total of 1,101 complete responses were received. The respondent pool skews toward experienced practitioners, with a significant representation of managers and directors, reflecting the distribution channels' reach.

Respondent Role Count Percentage
Data Engineer42338.4%
Manager / Director / VP22620.5%
Analytics Engineer15313.9%
Data Architect13111.9%
Software Engineer (data focus)373.4%
Platform Engineer181.6%
ML Engineer / MLOps131.2%
AI Engineer131.2%
Other877.9%

Geographic Distribution

Responses came from six geographic regions, with North America and Europe comprising the majority of the sample.

Region Count Percentage
United States / Canada43639.6%
Europe (EU / UK)43239.2%
Asia-Pacific948.5%
Latin America524.7%
Australia / New Zealand524.7%
Middle East / Africa282.5%

Organization Size

The sample represents a balanced distribution across organization sizes, from startups to large enterprises.

Organization Size Count Percentage
10,000+ employees22920.8%
1,000-10,000 employees31128.2%
200-999 employees23121.0%
50-199 employees15614.2%
Under 50 employees17415.8%

Infrastructure and Architecture

Primary Storage and Processing Environment

Cloud data warehouses remain the dominant paradigm, used by 44% of respondents. However, lakehouse architectures have established a significant foothold at 27%, reflecting the maturation of technologies like Databricks, Apache Iceberg, Hudi, and Delta Lake.

Environment Percentage
Cloud Data Warehouse (Snowflake, BigQuery, Redshift)43.8%
Lakehouse (Databricks, Iceberg/Hudi/Delta)26.8%
Mixed/Hybrid11.7%
On-premises Data Warehouse9.4%
Cloud PostgreSQL/MySQL4.3%
Other4.0%

Regional variations are notable. North American organizations show stronger cloud data warehouse adoption (50%), while European respondents report more balanced adoption between warehouses (40%) and lakehouses (33%). Latin America shows the highest lakehouse adoption at 40%.

Orchestration Approaches

Orchestration remains fragmented, with Airflow (in various forms) leading but far from universal. A concerning 20.5% of respondents report having no orchestration or relying on ad-hoc approaches.

Orchestration Approach Percentage
Cloud-native (Composer, MWAA, etc.)24.4%
Self-managed Airflow22.9%
No orchestration / Ad-hoc20.5%
Dagster6.2%
Prefect1.3%
Other (Databricks Jobs, dbt, SSIS, etc.)24.7%

Dagster shows notably higher adoption in smaller organizations (11% in sub-50-employee companies) compared to enterprises (3% in 10,000+ employee organizations), suggesting it may be gaining traction as a modern alternative in greenfield environments.

Architectural Trends

When asked which architectural trend they are most aligned with, respondents preferred centralized approaches, though this varied significantly by organization size.

Architecture Trend Overall <50 Emp. 10,000+ Emp.
Centralized Warehouse40.1%43%29%
Lakehouse34.6%29%38%
Data Mesh / Federated16.2%10%27%
Event-driven Architecture6.8%15%4%

Data mesh adoption nearly triples from startups (10%) to large enterprises (27%), reflecting the organizational complexity that drives federated ownership models. Conversely, smaller organizations favor centralized warehouses, likely due to simpler organizational structures and smaller team sizes.

AI Tools and Adoption

Personal AI Tool Usage

AI tools have achieved near-universal adoption among data professionals. A remarkable 82% of respondents use AI tools (such as ChatGPT, Claude, Cursor, or GitHub Copilot) daily or more frequently.

Usage Frequency Percentage
Multiple times per day54.0%
Daily28.2%
Weekly10.7%
Rarely6.1%
Never1.0%

AI Engineers and ML Engineers show the highest adoption rates (92%+ daily usage), but even traditionally less technical roles like Data Architects report 79% daily usage. Only 3.7% of respondents reported not finding AI helpful for their work.

How AI Helps Most

Respondents were asked to select up to two areas where AI provides the most value. Code generation dominates, followed by documentation and pipeline debugging.

AI Use Case % Selected
Writing Code (SQL, Python, etc.)~82%
Documentation / Data Discovery~56%
Pipeline Debugging~29%
Architecture Design~21%
Data Modeling~13%
Governance / Quality Checks~11%

Organizational AI Adoption

While individual AI tool usage is high, organizational AI adoption shows a different picture. Most organizations are still in the early stages of systematic AI integration.

Adoption Stage Percentage
Using AI for tactical tasks33.8%
Experimenting30.5%
Building internal AI platforms13.6%
No meaningful adoption yet12.2%
AI embedded in most workflows9.9%

Tech companies lead in advanced AI adoption, with 31% either building AI platforms or embedding AI into workflows, compared to just 12% in the public sector. Organizations with higher AI adoption also show more optimistic team growth projections: 50% of those with embedded AI expect growth, versus 32% of those with no adoption.

Data Modeling Practices

Current Modeling Approaches

Data modeling approaches remain diverse, with no single methodology dominating. The Mixed approach, where modeling style depends on use case, is the most common response.

Modeling Approach Percentage
Mixed (depends on use case)36.8%
Kimball-style dimensional modeling27.8%
Ad-hoc / tables added as needed17.4%
Canonical/semantic models5.4%
One Big Table3.8%
Event-driven modeling3.3%
Data Vault3.3%

Modeling approaches correlate with architectural choices. Organizations aligned with centralized warehouses show higher Kimball adoption (34%), while those pursuing data mesh favor mixed approaches (44%) and show lower ad-hoc modeling (11%).

Data Modeling Pain Points

Nearly 90% of respondents report at least one data modeling pain point, revealing this as a critical area of industry-wide struggle.

Pain Point % Selected
Pressure to move fast59.3%
Lack of clear ownership50.7%
Hard to maintain over time39.2%
Tools do not support good modeling18.7%
None / modeling is going well11.3%
AI tools produce inconsistent schemas4.3%

The correlation between the modeling approach and operational health is striking. Organizations using ad-hoc modeling report the highest rates of firefighting (38%), while those with canonical/semantic models report the lowest (19%). This suggests that investment in thoughtful modeling approaches pays dividends in reduced operational burden.

Organizational Challenges

Biggest Bottlenecks

Respondents were asked to identify the single biggest bottleneck in their data organization. The results reveal that organizational and process issues outweigh purely technical challenges.

Bottleneck Percentage
Legacy systems / technical debt25.4%
Lack of leadership direction21.3%
Poor requirements / upstream issues18.8%
Talent / hiring challenges11.4%
Data quality issues10.1%
Compute costs5.2%
Tool complexity2.7%

While legacy systems top the list, the combined weight of organizational challenges (leadership direction, requirements, and talent) exceeds technical debt. This finding aligns with themes in the open-text responses, in which respondents frequently emphasized that data engineering success is a people problem as much as a technology one.

Where Teams Spend Their Time

Respondents selected up to two areas where their teams spend the most time, revealing priorities and potential inefficiencies.

Activity % Selected
Data modeling / transformation55.4%
Ingestion / pipelines48.1%
Analytics / BI34.2%
Data quality / reliability34.0%
Fighting fires26.2%
Infrastructure / platform work25.1%
ML / AI10.8%

More than one in four teams (26.2%) report that fighting fires consumes significant time, representing substantial lost productivity across the industry.

Team Outlook and Industry Sentiment

Team Growth Expectations for 2026

The outlook for data teams is cautiously optimistic, with more respondents expecting growth than contraction.

Expectation Percentage
Stay the same43.7%
Grow42.0%
Not sure7.3%
Shrink7.1%

Growth expectations correlate with organizational context. Teams whose primary bottleneck is talent/hiring are most bullish (59% expect growth), while those struggling with leadership direction are most pessimistic (35% expect growth, 10% expect shrinkage).

Education and Training Priorities

Respondents were asked what topic they most want education or training on in the coming year.

Topic Count
AI/LLM integration235
Data modeling211
Semantics / ontologies / knowledge graphs209
Architecture patterns180
Streaming / event-driven systems94
Career growth / leadership80
Reliability engineering66

The strong demand for semantics, ontologies, and knowledge graphs, combined with the earlier finding that only 5.4% currently use canonical/semantic models, suggests an emerging area of interest with significant room for adoption.

Industry Voices: What Practitioners Wish Others Understood

Respondents were asked: What is one thing you wish the wider industry understood about data engineering? The open-text responses reveal several recurring themes.

It Is a People Problem

Multiple respondents emphasized that data engineering challenges are fundamentally organizational rather than technical: "Data is a team sport; it requires sponsorship and alignment across both business and technical users to be fully successful." Another simply noted: "It is all a people problem."

Foundations Matter More Than Tools

A consistent theme emerged around the importance of fundamentals over tooling: "Data Engineering is not about the tools you use, but most jobs seem to require practice with tools." Respondents expressed frustration with tool-focused discourse: "I would like to see more people talking about foundations; more articles and talks about timeless tools."

Data Engineering Is Not Just Old Software Engineering

Several respondents pushed back on the perception that data engineering is simply software engineering with different data: "It is not just software engineering from ten years ago but has its own challenges that software engineers do not need to worry about."

Quality and Governance Cannot Be Afterthoughts

Data quality emerged as a persistent concern: "Data quality cannot be fixed by one person, team, or department, no matter how hard you shout." Respondents emphasized the need for shift-left approaches: "Data quality starts left."

It Takes Time

Finally, practitioners expressed frustration with unrealistic expectations about timelines. As one respondent put it succinctly: "It is not a project. It is a program that needs to be treated as a capital investment."

Conclusions and Implications

Key Takeaways

  1. AI adoption is no longer optional. With 82% of practitioners using AI tools daily, organizations that do not enable AI-assisted development are putting their teams at a competitive disadvantage.
  2. Organizational challenges outweigh technical ones. Leadership direction, clear requirements, and proper ownership are cited as bigger obstacles than tool complexity or compute costs.
  3. Data modeling is in crisis. Nearly 90% of respondents report modeling pain points, with pressure to move fast and lack of ownership leading the list. Organizations with disciplined modeling approaches spend less time firefighting.
  4. Architecture is converging around warehouse and lakehouse. Together, these paradigms represent over 70% of primary environments, with data mesh gaining traction primarily in large enterprises.
  5. Team growth outlook is cautiously positive. With 42% expecting growth and only 7% expecting shrinkage, the field remains healthy despite macroeconomic uncertainties.

Looking Ahead

The data engineering field in 2026 faces a dual challenge: rapidly integrating AI capabilities while addressing longstanding organizational and methodological gaps. The strong demand for education in data modeling, semantic layers, and architecture patterns suggests that practitioners recognize these gaps and are seeking to address them.

Organizations that invest in foundational practices, including thoughtful data modeling, clear ownership structures, and leadership alignment, will be better positioned to capitalize on AI capabilities. Those who prioritize speed over sustainability may find themselves trapped in cycles of technical debt and firefighting.

The message from the community is clear: data engineering success requires treating data as a strategic asset worthy of sustained investment, not a tactical problem to be solved with the next tool or platform migration.

About This Report

This survey was conducted and published on behalf of the Practical Data Community.
No vendors or other commercial interests influenced this survey. This is a grassroots initiative.

For questions or media inquiries, contact joe@joereismedia.com