Dremio’s new product functionality and expanding ecosystem combine data warehouse functionality and performance with the scale and cost advantages of a data lake
Dremio, the easy and open data lakehouse, announced key features for writing and updating data, enhanced support for semi-structured data, and expanded BI and data ecosystem integrations, that mark a turning point in data lakehouse evolution. On the heels of its inclusion among CNBC’s 25 unranked Top Startups for the Enterprise and its GA support of milestone Apache Iceberg 1.0, Dremio is ensuring easy, fast self-service analytics—with data warehouse functionality and data lake flexibility—across customer data.
“Dremio’s Apache Arrow-based query engine and patent-pending query acceleration technology, Data Reflections, have enabled companies to achieve sub-second performance and 1/10th the cost of cloud data warehouses when querying data,” said Tomer Shiran, co-founder and CPO at Dremio. “With Dremio’s recently released comprehensive support for Apache Iceberg 1.0, these benefits now span the full spectrum of data warehousing use cases, enabling companies to shift their strategy from a proprietary and expensive data warehouse to a flexible and open data lakehouse.”
Mature product functionality is powering increased data lakehouse adoption
With game-changing SQL improvements and new product capabilities added for performance, usability, security and ecosystem connectivity, Dremio’s open data lakehouse is now even further positioned to be adopted as a core part of modern analytics architectures. Some highlights include:
DML and time travel– GA support for DML operations (INSERT, UPDATE, DELETE) on Apache Iceberg tables and time travel for in-place querying of historical data means that Dremio has established a key pillar of data lakehouse operation, disrupted data lake innovation, and provided functionality previously only found in database and data warehouse technologies.
Additional new SQL functionality – Dremio software and Dremio Cloud now have a semi-structured MAP data type that allows you to query map data from Apache Parquet files, Apache Iceberg, and Delta Lake. Other updates include MERGE statement and FROM clause improvements, as well as improvements to scalar SQL UDFs, tabular UDFs, Listagg, QUALIFY clause, and LIKE ANY/ALL/SOME statements.
Security enhancements – These include row- and column-level policy-defined access control for users, new RBAC privileges for admin operations, and encryption for a project store (S3 buckets) with customer managed keys.
Performance improvements – Dremio is adding Graviton2 support as a new option for customers within AWS, and spillable hash join functionality, with which a join operator can spill to disk, when the build-side of a join operator does not fit in memory.
Usability updates – A functions list provides users with a searchable list of supported SQL functions and the syntax and description of each. Function syntax from this component can be added to the SQL runner with one click.
Dremio continues to expand its ecosystem with technology partners to add capabilities that enhance the open data lakehouse experience, including Single Sign-On (SSO) functionality with leading data visualization platforms, Tableau (Salesforce) and Power BI (Microsoft). That SSO functionality delivers granular access control and visibility into consumption.
A new partner-validated connector with dbt enables data teams to quickly and easily build production-grade data pipelines using SQL. Additionally, Dremio has added a native Snowflake connector to query data from Snowflake, along with native connectors for MongoDB, DB2, OpenSearch, and Azure Data Explorer. Other enhancements include Arrow Flight ODBC and JDBC drivers.
Commitment to open source and community remains uncompromised
Dremio has always been deeply involved in open source projects that are powering data’s independence and use, and incorporates all of the latest functionality from Apache Iceberg, Apache Parquet, Apache Calcite, Apache Arrow, Apache Arrow Flight, and Gandiva. Dremio continues to be a key contributor to these projects.
Dremio co-created Apache Arrow, which has become the industry standard in-memory columnar format for analytical systems. Arrow is downloaded over 60 million times each month and is embedded in some of the world’s largest analytical projects.
Dremio also offers a fully-managed service, where Dremio manages all administration (including updates, security, and uptime) and autoscaling to meet workload demands. Dremio’s open data lakehouse can be deployed anywhere, including public cloud providers and on-premises infrastructure.
In addition to 2022’s major product and ecosystem accomplishments, Dremio started the year with its $160M Series E round, doubling company valuation to $2B, launched GA Dremio Cloud in March and made it available in the AWS Marketplace at the end of the summer.