New O’Reilly Report Explores Tools and Best Practices for Advanced Analytics and Artificial Intelligence

Global Survey Sheds Light on Evolving Data Infrastructure as Enterprise Organizations Get Serious about Machine Learning and AI

O’Reilly, the premier source for insight-driven learning on technology and business, announced the results of its “Evolving Data Infrastructure” survey, which explores the tools companies are using for their advanced analytics and Artificial Intelligence (AI) projects – and the best practices they have acquired along the way.

#DataInfrastructure research by @OReillyMedia finds companies serious about #AI and #analytics; 58% building/evaluating data science platforms #StrataData

The research, which will be released in full at O’Reilly’s upcoming Strata Data Conference in San Francisco, found that more than half (58 percent) of today’s companies are either building or evaluating data science platforms – which are essential for companies that are keen on growing their data science teams and machine learning capabilities – while 85 percent of companies already have data infrastructure in the cloud.

Also Read: Mobile Channels Offer New Growth Opportunities in the Online Video Advertising Space

Some of the key other findings from the research include:

  • Companies are building or evaluating solutions in foundational technologies needed to sustain success in analytics and AI. These include data integration and Extract, Transform and Load (ETL) (60 percent), data preparation and cleaning (52 percent), data governance (31 percent), metadata analysis and management (28 percent) and data lineage management (21 percent).
  • Companies are building data infrastructure in the cloud. Eighty-five percent indicated that they had data infrastructure in at least one of the seven top cloud providers, with two-thirds (63 percent) using Amazon Web Services (AWS). The results also showed that users of AWS, Microsoft Azure or Google Cloud Platform (GCP) tended to use multiple cloud providers.
  • The use of durable cloud storage is prevalent. Sixty-two percent of all respondents indicated they used at least one of the following: Amazon S3 or Glacier, Azure Storage, or Google Cloud Storage.
  • Data scientists and data engineers are in demand. When asked what skills their teams needed to strengthen, 44 percent said data science and 41 percent said data engineering.

Also Read: Former Economist Director Joins SBDS as VP of Sales

  • Respondents used a variety of streaming and data processing technologies. Half of the respondents (49 percent) used either Apache Spark or Spark Streaming, while other popular tools included open source projects (Apache Kafka, Apache Hadoop) and their related managed services in the cloud (Elastic MapReduce, AWS Kinesis).
  • Business intelligence uses a mix of open source and managed services. When it comes to SQL, respondents favored open source tools (Spark SQL, Apache Hive) and managed services in the cloud (AWS RedShift, Google BigQuery).
  • Although a majority (60 percent) aren’t using serverless technologies, one-third (30 percent) are already using AWS Lambda. In fact, 38 percent indicated that they were using at least one serverless technology – a pattern that remained consistent across geographic regions.

“It is clear that in 2019 companies are planning to invest in implementing analytics, AI and automation tools,” said Ben Lorica, O’Reilly’s chief data scientist and chair of the Strata Data Conference. “However, in order to do so successfully, initial investments must be made in the foundational technologies and infrastructure needed to sustain success. Our research shows that a majority of companies understand this and are already building – or at the very least evaluating – platform solutions and tools to make this possible.”