The quiet progress is changing the face of the field of enterprise data engineering. Python developers are creating pipelines for production data in a matter of just a few minutes, using tools that would have required special teams only a few months ago.
This catalyst comes from the DLT, a free-of-cost Python library that can automate complex tasks in data engineering. The program has seen the milestone of 3 million daily downloads, and powers data workflows that are used by more than five hundred companies in regulated industries such as manufacturing, healthcare, and finance. The technology is receiving a strong endorsement this week as dltHub Berlin, the company that developed the free-of-cost dlt library, has raised $8 million of seed money under the direction of Bessemer Venture Partners.
What’s important isn’t just the number of people who have adopted it. It’s the way that developers use the software in conjunction with AI programming assistants to perform tasks that in the past required infrastructure experts, DevOps specialists, and on-call personnel.
The company is developing a cloud-hosted platform that extends its open-source library to provide an entire solution from beginning to end. The platform will enable developers to create notebooks, pipelines, transformations, and other notebooks in a single operation without having to worry about infrastructure. This is a major shift away from data engineering that required special teams, and is now available to any Python developer.
“Any Python developer should be capable of bringing their clients closer to accurate, up-to-date, and reliable data,” Matthaus Krzykowski, co-founder of dltHub and CEO, said to VentureBeat on an exclusive chat. “Our goal is to ensure that data engineering becomes easy and collaborative as well just as the writing of Python in itself.”
From SQL to data engineering using Python
The issue the company was trying to resolve was a result of real-world issues.
One is a result of a fundamental conflict in how the different generations of software developers deal with data. Krzykowski spoke of the developers of his generation, rooted in SQL and relational technology for databases. On the other hand, an era of developers is developing AI agents using Python.
This gap is a reflection of deeper technical difficulties. Data engineering based on SQL locks teams to particular platforms and demands a lot of infrastructure expertise. Python engineers working on AI require light, platform-agnostic tools that can work with notebooks and also integrate with the large language models (LLM) programming assistants.
The DLT library can change that equation through automation of difficult tasks of data engineering using easy Python code.
“If you are aware of the meaning of a function in Python and what a list is an example of, as well as a resource and source, then you can write this very explicit and simple code,” Krzykowski explained.
The most significant technological breakthrough deals with the automatic evolution of schemas. When data sources alter the format of their output, conventional pipelines are broken.
“DLT offers mechanisms that can automatically address these problems,” Thierry Jean, the founder of dltHub, said to VentureBeat. “So it can transfer data, and you can tell it, ‘Alert me when things alter downstream,’ or be flexible enough to modify the data as well as the destination to be able to accommodate.”
Real-world developer experience
Hoyt Emerson, a data consultant and creator of content on The Full Data Stack, recently embraced the software to transfer the data stored in Google Cloud Storage to multiple destinations, such as Amazon S3 and a data warehouse. Traditional methods require specific platform information for each destination. Emerson said to VentureBeat that his idea of what he needed was a simple, platform-independent method to transfer data from one location to another.
The entire pipeline was completed within five minutes by using the library’s manual, making it simple to start up quickly and with no hassle.
The process is even more efficient when it is used in conjunction with AI code assistants. Emerson noticed that he was employing agents in his AI principles of coding and realized that the documentation of the DLT could be sent with contextual information to the LLM to speed up and simplify his data-related work. By using documentation as the context for documentation, Emerson was able to develop templates that could be reused for future projects. Additionally, he employed AI assistants to create deployment configurations.
“It’s extremely LLM-friendly, as it’s well-documented,” he said.
The LLM-native pattern of development
The combination of well-documented software and AI assistance is a revolutionary design pattern. The company has developed a specific approach for what they refer to as “YOLO style” development, in which developers copy errors and paste them into AI code assistants.
“A majority of these users are merely pasting and copying messages from error reports, and are using editors in the code to find the cause,” Krzykowski said. The company is taking this type of behavior seriously enough that they correct problems specifically about AI-assisted workflows.
The results show the effectiveness of the method. In the month of September alone, users built over 50,000 customized connectors with the library. This is a 20X rise from January, driven mostly by the LLM-assisted development.
A Technical Architecture to Support Enterprise Scale
The design philosophy behind DLT puts emphasis on interoperability and not locking into platforms. The tool can be used anywhere from AWS Lambda to the existing corporate data stacks. It is compatible with platforms such as Snowflake while still maintaining the ability to use any source.
“We have always believed that DLT must be modular and interoperable,” Krzykowski explained. “It can be implemented wherever. It can be on Lambda. It is often part of other infrastructures for data.”
Some of the most critical technical capabilities include:
- Automated schema development handles updates to data in the upstream stream without affecting pipelines or requiring manual intervention.
- The accelerated loading Process only records those that have been modified or updated to reduce the amount of computational work and associated costs.
- Platform-agnostic deployment works across cloud providers as well as the on-premises infrastructure, without any modifications.
- Optimized documentation with LLM Specifically designed to support AI assistants, it allows quick problem-solving and the generation of templates.
The platform currently supports more than 4600 REST API data sources and is continuously expanding, driven by connectors created by users.
Competing with ETL giants, using a code-first method
The landscape of data engineering splits into distinct areas, with each one catering to distinct enterprise needs and preferences for developers.
Traditional ETL platforms such as Informatica and Talend are the most popular in enterprises using GUI-based tools, which require special training, but provide complete governance features.
The latest SaaS platforms, such as Fivetra, have been gaining traction through the emphasis on pre-built connectors as well as managed infrastructure. This has helped in reducing the operational cos,t but also creating dependency on the vendor.
The dlt open-source library is in a completely different place as an LLM-native, code-first infrastructure that developers are able to expand and modify.
This is a reflection of the larger shift towards what’s known as the composable stack of data, in which enterprises create infrastructure from interoperable components instead of monolithic platforms.
The most important thing is that the interaction with AI generates new market patterns. “LLMs don’t replace engineers working in data,” Krzykowski said. “But they significantly increase their productivity and reach.”
What does this mean for the enterprise data managers
If you are a business looking to be a leader in AI-driven processes, this development offers an opportunity to rethink fundamentally the strategies for data engineering.
The immediate tactical benefits are evident. Companies can make use of existing Python developers rather than hiring special data engineering teams. Organizations that adjust their tooling and hiring strategies to take advantage of this trend could gain significant benefits in terms of cost and agility over their competitors who still rely on traditional team-based data engineering.
