There are two broad categories of metadata: technical and business. Technical metadata is the description of the data needed by various tools to store, manipulate, or move data. These tools include relational databases, application development tools, database query tools, data modeling tools, data extraction tools, online analytical processing (OLAP) tools, and data mining tools.
Business metadata is the description of the data needed by business users to understand the business context and meaning of the data. Though acceptance of metadata has been slow by the business community, it is a fact that business metadata is equally important as technical metadata. In simple terms business metadata allows business users to use the data more effectively. The question is what should be the starting point to begin collecting business metadata and how should it be integrated with technical metadata. In this multi-part article we will focus on various areas of business metadata and how it integrates with technical metadata.
Business metadata can include, but is not limited to, information about corporate business elements, facts, processes, sub-processes or activities, data quality metrics, business rules, legal aspects of business elements, business events, associations of the business elements to various roles, unstructured information (captured in emails, word documents etc.) and much more. As a starting point let’s discuss the very basic kind of business metadata surrounding business element definitions and facts and how it integrates with the technical metadata.
What is meant by business elements?
Business elements are key business attributes as identified by the business processes and compiled in various subject areas of logical data models. These elements are compiled from extensive experience of various business architects/analysts.
What is a good source of these elements?
In large organizations there are multiple logical data models and business process models, business elements are a collection of all the unique logical data elements pulled out of these models. Ideally, these elements should have consistent definitions and characteristics and should be defined once at an enterprise level. Some organizations maintain business information dictionaries or business definition dictionaries to document these. The definitions can also be extracted from industry standards wherever available. For example, in the secondary mortgage business, the Mortgage Industry Standards Maintenance Organization’s (MISMO) mission is to develop, promote, and maintain voluntary electronic commerce standards for the mortgage industry.
For a business element, additional characteristics should be captured, such as:
Business Element Name(s) short & long
Well-defined short and long business names.
Business definition of an element as used by the organization. This definition could be extracted from an industry standard (if available). In cases where the industry standard definition is not available, the definition should be carefully drafted. There are several good articles that have documented guidelines on how to write business element definitions. If the internal definition differs from the industry definition, both definitions should be made available in the central repository with the appropriate source identified.
Business element aliases as used by different applications or systems. These aliases could be Element Screen Labels, XML Tags, Reporting Labels, etc.
Most commonly used acronym (if any) for the business element.
Business element synonyms as used by different divisions or departments.
Certain business elements are critical to the organization in terms of financial reporting or Key Performance Indicators (KPIs). These elements should be flagged critical for many reasons including addressing compliance issues. There are many more characteristics that can be tracked in terms of criticality, e.g. source of critical element, reason for criticality, effective date, expiration date, etc.
Frequency of use
How often is the element reported or used in various associated processes?
Business element attribute used to measure the organization’s faith in a specific data field. Usually expressed in a three-level scale (poor, moderate, high).
Which business area or division identified the need for this business element?
Is the element classified as “secure?” Furthermore is the classification internal or external to the organization? This classification helps determine what level of security is required for the data element and any transactions pertaining to it. Examples of the values can be: 1 = Highly Confidential (should be restricted for both internal and external viewing); 2 = Moderately Confidential (should be highly restricted for external parties and not as restricted for internal parties); 3 = Public (anyone can view or read the data).
Promoted Physical Name & Characteristics
Promoted physical name and characteristics that should be used by the data architects to ensure all physical models are consistent.
Element Creation Date
Date when element was first introduced in the organizations vocabulary of business elements.
Element categorization with buckets defined as “Technical Attribute,” “Risk Management Attribute,” “Operational Attribute,” “Legal Attribute,” etc.
A common problem in large organizations is an inconsistent set of valid values relating to enumerated elements. Several values and codes are defined across departments or divisions, which causes confusion. For example, the list of product(s) or product codes should be defined in the central repository once. Any new data model should extract these values from the central repository. Sometimes it is difficult to define a common set of values, as there might be a requirement that the same code might need to be used in different departments or divisions with different characteristics. In these cases, a source name (e.g., department or division) should be tracked in the central repository.
There can be multiple potential issues associated with an element at any given time. For example:
- The definition of a particular element has been changed in the industry, but the company still uses the old definition, or
- A required data element has a high percentage of “nulls” in a database which was identified as the system of record for that element.
If there are any issues, the elements should be flagged to alert users to be careful while using these elements.
Element level risks can be tracked and can be categorized as high, low or medium or level 1, level 2, level 3 depending on the industry. The meta-metadata about the categories should also be tracked in the CMDR. If we say a particular element is high-risk what does that mean to a technical or business user?
Roles, including element owner(s), advocate, steward, Subject Mater Expert(s) (SME) should be tracked at an element level. These roles are critical for maintaining the accuracy of what these elements mean and how they are used in the organization.
An owner is one who actually creates the data pertaining to the elements he is responsible for, and has the right to change the data.
A steward is one who is called upon to exercise responsible care over possessions entrusted to him or her. He or she does not own these possessions. Stewards are responsible of ensuring the definitions are complete and accurate. Ideally there should be only one steward per element. Stewards can also be identified at subject area levels in which case they own all the business elements in that subject area.
An SME or subject matter expert has the greatest knowledge as to how the elements are being, or should be, used in various applications.
Element Physical Implementations
In an ideal world, all physical data models should be generated from logical data models, and all logical models should be in sync in terms of logical names and characteristics. However, in the real world, some business elements are implemented in multiple logical data models with different names and characteristics. Similarly, various physical data models define some business elements with different physical names and physical characteristics. Some data architects take the short cut and skip the logical models altogether. All of these scenarios mandate a need to associate the business elements to all of the physical instances and implementations in the central repository. This can be a challenging task, but is also a key integration point between business metadata and technical metadata. Assuming that all the physical data models were imported in the central repository, associations need to be made in terms of business elements and all its possible physical implementations across all physical data models (or databases, if you load database catalogs in the repository). Some experts refer to this as “stitching” and argue that this is an easy step. Depending on how bad the situation is in your organization, this can be the most daunting task. It is essential that these associations are accurate to provide proper impact analysis reports from the repository.
**The above-mentioned list is not a complete list, but does cover some important aspects of what needs to be tracked for business elements.
At a high level, this is where we are at this time:
Figure-1: Logical Design Central Meta-data Repository
As depicted in the figure above, both technical and business metadata are sources from the sourcing layer in our repository. Though business metadata might exist in various forms, such as Excel spreadsheets, Access databases (maintained by shadow IT organizations), etc. It may also be necessary to input the data into the repository directly using a Graphical User Interface (GUI). The CMDR integrates both types of metadata and then the client layer is used for reporting and metadata dissemination. Again, a maintenance interface might need to be provided for maintaining the business metadata. This interface should be strictly for business metadata, as changes to technical metadata should be captured in an automated way. An obvious concern would be who has what privilege to change or maintain business metadata since it is critical to the organization. Let’s just say there is a role-based security layer that gives access to the interface. This layer could be integrated with your existing corporate security architecture, Lightweight Directory Access Protocol (LDAP).
Integrating business metadata in the central repository comes with its own set of problems. It is relatively easy to scan, load, and maintain technical metadata, but keeping business metadata up-to-date and synchronized can be a big challenge, especially if manual interfaces are being used to input and maintain it. Make sure you have “buy-in” from senior management (both IT and Business) to keep the business metadata accurate and complete. Ensure a “Change Management” process is in place and followed to document all changes made to the business metadata.
Remember, a half-baked metadata solution can quickly turn some users away. When they stop using metadata, they stop populating the metadata repository. An incomplete metadata solution would drive away more users and would result in a vicious circle. Fewer metadata users result in obsolete metadata. This erodes accuracy and completeness, which, in turn, drives more users away.