Datafold Partners With dbt Labs, Launches Integration to Enable Analytics Engineers to Deliver Trusted Data Faster
Datafold, a data quality platform that automates the most tedious parts of data engineering workflows, today announced a partnership with dbt Labs, the pioneer in analytics engineering, along with a new integration to deliver trusted data faster. Datafold has automated test coverage for analytics engineers which can now be added into a company’s CI/CD workflow in one click with dbt Cloud or with a Python SDK for dbt Core.
“As a data engineer at Lyft, I always struggled with data testing. We wrote hundreds of SQL tests but never got to significant test coverage. Data quality issues would inevitably get into production and affect the business,” said Datafold founder and CEO Gleb Mezhanskiy. “I built Datafold to fully automate data testing so that analytics engineers could see the impact of every pull request on data models and applications before merging to prevent any issues from getting into production.”
Analytics engineers need to regularly update models within massive and complex schemas without any mistakes. However, they don’t have time to write the thousands of tests needed to get complete test coverage across their schema and pipelines. This means that in many cases, updates to models happen without complete confidence as to how the updated dbt code will impact the data.
Marketing Technology News: Pavilion Data Systems Appoints Shridar Subramanian As Chief Marketing and Product Officer
“Improved data quality is one of the primary benefits of standardizing on dbt. Datafold’s data diff in continuous integration checks and fine-grained column-level lineage on top of dbt models augments this experience for analytics engineers”
Datafold automates writing these thousands of regression tests, so engineers know exactly what will happen to the data before they merge their update. Datafold embeds a summary of these automated tests directly in GitHub and GitLab, so engineers can see the impact in every pull request.
“Improved data quality is one of the primary benefits of standardizing on dbt. Datafold’s data diff in continuous integration checks and fine-grained column-level lineage on top of dbt models augments this experience for analytics engineers,” said Julia Schottenstein, product manager at dbt Labs. “We’re excited to further our partnership with Datafold and help customers gain confidence in their data.”
Marketing Technology News: MarTech Interview with Kristi Flores, VP of Global Marketing at Tektronix
dbt enabled the data community to build useful models easily in data warehouses. This created a strong foundation to build things on top of the warehouse. Companies went from only building dashboards to building notebooks, apps, ML/AI, and reverse ETL on the warehouse, all within the past few years. Due to this huge increase in leverage of the warehouse, data quality has become a focus.
Datafold built column-level lineage at scale which it uses to give analytics engineers complete visibility into how their work impacts their pipelines. It allows analytics engineers to fix data quality issues before they ever get to production. Working together, dbt and Datafold deliver trusted data faster.
“The integration between dbt and Datafold is a game-changer,” said Josh Devlin, senior analytics engineer, Brooklyn Data Co. “There is so much value in actually understanding the effect of your pull request. It’s easy to set up, and it gives me the confidence that my dbt code does what I expect it to do.”