The People-Based Data Lake – A More Nimble Marketing Data Solution
People-based marketing is only as good as the data that feeds the process. This has been true since the early days of database marketing, and it will be true for the most sophisticated artificial intelligence (AI) solutions of the future.
Traditional Marketing Databases
Historically, marketing data repositories have been built using data warehousing tools and techniques. The approach relies on a carefully designed data model intended to serve both marketing execution via connected campaign management tools, analytic activities that leverage business intelligence, and statistical modeling tools. Central to this approach was the belief that a single, well-designed database could support these different user communities and use cases. Data geeks, feel free to argue amongst yourselves as to whether that dream has ever been fully realized. In Looking forward, we’ll need a nimbler approach though.
Why Marketing Databases Struggle in the Digital Age
The traditional marketing database approach struggles in the fast-paced digital age for two broad reasons;
They’re too slow to meet the needs of marketers who need to incorporate new data streams in days rather than weeks or even months.
Ten years ago, it was rational to assume that the set of data sources for a marketing database would remain relatively static over time. That assumption is no longer valid. If you have read anything about marketing data over the past few years, then you have already seen the grounding for this point of view: new devices and marketing channels; the explosion of digital data sources and providers; and IoT – just to name a few. Even in the direct mail world – and yes, certain industries still rely on direct mail – marketers are always looking for an edge, and that means constantly testing the effectiveness of new third-party demographic data and lists.
The marketing software tools landscape has evolved in a way that makes the “one database” approach impractical, if not impossible.
Marketing software suites such as Adobe Marketing Cloud and Salesforce Marketing Cloud offer the ability to react in near real time to consumers’ activities, especially online. However, these modern tools also require a dedicated database, including at least some proprietary data layouts.
Marketing business intelligence (BI) is most effective when supported by data that has been organized for BI use cases. Those optimal data structures will be different than those leveraged by the marketing clouds.
Repurposing the Marketing Database
Organizations have attempted to deal with reason #2 above by repurposing marketing databases as data hubs from which data can be distributed to various tool-specific environments and marts. While this is a rational approach for dealing with the requirements for multiple tool-specific data stores, it does little to address reason #1, the need for significantly faster time to value for new or changing data sources.
People-Based Data Lake
A better solution to the tradition database challenge is to deploy a people-based data lake (PBDL). A PBDL is a foundational data repository and a set of associated processes for enabling all the activities of people-based marketing. It is based on the principles of “early linking” and “late structuring.”
The “lake” aspect of the PBDL is based on the idea that there is tremendous value in simply housing all your data in the same place. Data lakes are part of the hype associated with the advent of big data platforms, typically based on Hadoop. However, a PBDL does not require a Hadoop-based solution, but it does borrow from the idea that half the battle is getting all the data in one place so that analytical people can make sense of it.
Data lakes are different from data warehouses in that input data is not conformed to a centrally planned data model. In fact, input data is generally not transformed or conformed in any way and can be made available to users in hours or days, rather than months.
Late structuring advocates that transformation (structuring) of data should take place on the way out of rather than on the way into the data lake. Because those transformations will be built according to the clear requirements of the consuming application (e.g., marketing software suite, BI tool, etc.) the time required to analyze and design the target data structure is virtually eliminated – relative to the time required to build and then continuously extend, a centrally-managed data model.
With this methodology, data professionals are no longer burdened with the impossible task of anticipating and designing for all possible future use cases for data being stored in the core repository.
If late structuring represents the data lake aspect of PBDL, then early linking is the people-based part. Input data to a people-based marketing repository, as you might expect, is about people. While the generic data lake approach says to incorporate data in its native form, for a people-based data lake, there is huge value in one important exception to that rule – identity linking. All person-level datasets in the PBDL must share common person identifiers (keys).
With these common person keys in place, users of the data can easily connect data about individuals even when that data from widely disparate sources. Early Linking is the assignment of these keys.
Justification for Early Linking
Accurate matching of person data across datasets is inherently complex. It requires sophisticated, purpose-built algorithms and tooling that often rely on external reference bases to augment the matching process. This inherent complexity – and the fact that linkage at the person level is essential to building the rich individual profiles needed for effective people-based marketing – justifies this one exception to the “just land the data” part of the data lake philosophy.
The emergence of the data lake pattern was driven by the belief that data scientists and data analysts could innovate much more quickly and effectively, if only we could get all the data in one place. For them, the value of traditional data warehouses has always been data co-location, not the centrally-planned data model.
Likewise, in the marketing data space, rapid data co-location is a significant accelerator. The goal is to get all the data together quickly and let the innovators start innovating. The people-based data lake embraces this data co-location with minimal central planning philosophy. Additionally, the PBDL provides data consumers with one, all-important data value-add – a common person key across all datasets. This early linking delivers consistency in, and accelerates, downstream data usage by eliminating the complex and error-prone task of manually connecting persons across disparate data sources.
In short, unlike traditional marketing databases, PBDLs is designed to keep pace with the rapidly evolving and expanding data demands of the modern marketer.
Recommended Read: Flying Blind: The Struggle for People-Based Digital Identity Resolution