MapR Advances Support for Flexible and High Performance Analytics on JSON and S3 Data with Apache Drill
Drill 1.15 Delivers Enhanced s3 Connectivity, Expanded SQL Capabilities on Nested Data and Improved Performance in Multi-Tenant Enterprise Environments
MapR Technologies, Inc., provider of the industry’s next generation data platform for AI and Analytics, announced the support of Apache Drill 1.15. The new release offers new enhancements to conduct powerful queries on highly complex nested data structures; including files, MapR JSON database tables and cloud data sources specifically for S3.
“The latest Drill release is aimed at further improving intuitive access to different data types across on-premises and cloud data sources as well as enhancing performance and usability,” said Neeraja Rentachintala, vice president of product management, MapR. “We evolved Drill by closely listening to our customers, and it is exciting to see our customers achieve true self service data exploration without compromising on analytic flexibility and performance.”
Drill 1.15 expands on ANSI SQL compliance and query performance improvements both for Parquet and MapR-DB JSON tables. With the new release it is easier to deploy Drill in multi-tenant environments co-existing with other analytic frameworks such as Hive and Spark, while achieving predictable SLAs, to successfully conduct interactive analytics at any scale.
With today’s release, Drill 1.15 introduces the following features:
- S3 Plug-In Support. Customers can now access data in S3 through Drill and join them with other supported data sources like Parquet, Hive and JSON all through a single query. Drill also supports writing to S3 buckets by creating tables. By bringing read and write capability to S3 buckets, MapR continues to integrate with cloud applications and add to the existing object tiering offering.
- Expanded Spill to Disk Capability. Spill to disk for memory intensive queries has been expanded to include all SQL operations that rely on memory like GROUP BY, JOIN, ORDER BY, DISTINCT. Memory controls can now be put in place so that large memory intensive queries that pass a defined threshold spill to disk.
- Spin Multiple Drill Clusters and Set Resource Controls. Customers now have the ability to spin up multiple Drill clusters within a single MapR cluster to support multi-tenancy and the ability to segregate workloads by user personas, as set CPU resource limits through cgroups. In addition, users now have the ability to spin up multiple Drill clusters to cater to different user personas on a shared MapR cluster which allows isolated Drill compute workloads with guaranteed minimum resources.
- Leverage MapR Document Database Secondary Indexes for Complex Types. Secondary indexes can now be created on complex nested types like MAPs and ARRAY’s. An entire array, array elements, entire MAP’s or MAP elements regardless of whether they are primitive or complex can now be leveraged by Drill.
- Deeper Integration with Parquet. Drill 1.15 provides deeper integration with Parquet. Filters on strings can now be pushed down to the underlying parquet API so scanning parquet files only returns the rows that match query predicates. In addition, push down can occur across a broader range of queries such as JOIN’ed tables with predicates only on one table as well as predicates on complex nested types.
Recommended Read: MarTech Interview with Jeff Nolan, CMO, Kahuna