As organizations acquire more and more data, there’s a greater need to control it. In the data game, that control is known as governance – and proper governance is quickly becoming a priority in many organizations. The reasons for that include the increasingly lengthy, laborious process of searching for and utilizing data in reports, presentations, etc.; the increased demands of regulators, who require companies to quickly come up with information to prove that they are following the rules; and a need to be more versatile and flexible in order to quickly act on business opportunities and solve problems.
Increasingly, companies are coming to understand that proper governance involves implementing solutions that will ensure that metadata is accurate along the entire span of their data storage systems. Order in the metadata (the data that allows companies to quickly find the data they need) means the order in the organization, companies are discovering.
Indeed, a recent study by Gartner shows that more companies are implementing metadata solutions, especially automated solutions, to help them with data governance. In fact, 50% of those who told Gartner they were implementing those solutions said their foremost use case was data governance.
The Gartner study also said that for many companies, the idea of acquiring and implementing metadata solutions was “in its infancy.” “As a result, ‘product functionality and performance’ followed by ‘product roadmap and future vision’ were the two leading factors that drove product selection among survey respondents,” the report said.
Thus, companies seeking to improve their metadata “game”, will want to find solutions that will layout for the best practices, provide effective solutions, allow for expansion as data loads increase, and automate the process. But before jumping ahead into a solution, companies need to understand what specifically they are going to be solving – and thus need to be clear on what they can expect metadata solutions to fix.
An important piece of this is ensuring the integrity of Data Dictionaries, Data Catalogs, and Business Glossaries. Data is utilized and categorized in different ways within an organization – a situation that complicates data governance. In order to find something, you have to know what you are looking for – and if different departments or units use different definitions or terms in their metadata categorization, finding that data is going to be infinitely more challenging. Everyone has to be on the “same page” data-wise, and in order to avoid wasted time, money, and resources, each of the definition systems for metadata needs to be implemented properly.
Data Catalog: This term describes the metadata in a single database, defining things such as base tables, synonyms, views or synonyms, and indexes. The SQL standard actually provides full guidance on how to address this. Properly done, a data catalog will use the same terms (schema) in each database so that each category uses a common term within the database. Metadata solutions implemented by companies would need to ensure that data catalogs hold to single standards across all databases. To have the greatest impact, those solutions should be automated, as an automated system would be much more efficient at understanding which terms are currently being used, and which ones need to be adjusted to comply with the standards.
Data Dictionary: A data dictionary is a centralized repository of information about data (meaning, relationships to other data, origin, usage, and format), including all technical metadata from database sources, ETL systems and data warehouses (DWH). Properly done, a data dictionary will provide IT frameworks by defining where a data term fits into the overall structure and what values it may contain and enable quick searches of all databases, using any of the terms or attributes the organization utilizes. Here, too, a metadata solution will need to ensure that common terms are used in databases across the organization and as mentioned, automated systems will be much more effective at discovering all uses of terms.
Business Glossary: A business glossary defines terms across a business domain, providing an authoritative source for all business operations, and puts the terms used in the data dictionary into context. It includes all metadata from data warehouses as well as from all reports and is not limited to BI (Business Intelligence). The terms used in a data dictionary allow for quick and accurate searches of data throughout all data storage repositories; a business glossary – a collection of business terms that should be common to all reports, queries, publications, etc. – lets users plug the relevant data into context. Here especially an automated metadata solution will be useful in ensuring a unified approach throughout the organization to defining terms and ensuring one source of the truth for the business.
Thus, if a report were to include how many “adults” (a business glossary term) are using the company’s products, an employee would put together a search query for company databases that includes an equation relating to “age,” the term used by the data dictionary to describe age – and the data catalog in each database would “understand” that the metadata categories of date of birth, or DOB, or mm/dd/yy, etc. are all associated with “age.”
Automated metadata solutions can be the key to sorting all this out, and any metadata solution should include these features, Such solutions can indeed be the key to data governance success, says Gartner. Companies should “start developing a metadata management practice in support of [their] data and analytics strategy. Do not embark on a process of selection without identifying how the solution will deliver business value and who will be using it and benefiting from its implementation.” It sounds like very sage advice.