When is Different Not Different?
An MDM solution must be able to tell the difference between how data is represented versus what it represents. This is just another of the complications that MDM architects must deal with when designing a solution.
By Martin Dunn
[wd_hustle id=’Social’ type=’social_sharing’/]
What does an MDM Hub and a game of golf have in common? Both have participants that are likely to be separated by several decades.
This generation gap leads to some interesting MDM situations (and some interesting golfing situations but they extend beyond the boundaries of this article). Consider this example of a legacy mainframe record for the University of Wisconsin compared to the data stored within the MDM Hub.
There are many differences between the records but only one that really represents a difference in business information. The key is to be able to separate differences that matter from those that don’t.
Most legacy systems literally shout I’M OLD MAINFRAME DATA in uppercase only. Modern systems prefer the more pleasing Title Case. Case differences don’t represent a difference in business data so we treat fields like Street and City as identical values.
The Name1 field is clearly different but means the same thing. The mainframe system holds a contracted version of the university name to squeeze this name into 30 bytes. This contraction is typical of mainframe data where each byte was valuable. If we want to compare the source record to the Hub we need first to transform the mainframe data by expanding contractions. The expanded Name1 data can be considered equivalent to the hub data but not identical.
Street and City
Street and city are identical between the two systems (ignoring case). The equivalent Name1 and the identical Street and City data is enough for us to consider these two records to be good match pair even though there is a serious defect in the complete mainframe address.
There is a Madison in California but there is no N Charter St in Madison California – and it’s certainly not where the University of Wisconsin is located. It is no surprise that the mainframe data has no Zip code as the address has clearly never been validated. Within the MDM Hub we apply USPS address validation to figure out which Madison is correct based on the Street and also to lookup the Zip+4 code for this address.
Comparing the Records
So, an MDM comparison of the two records yields:
- There is no difference between the business meaning of Name1. As the mainframe system is unable to hold the expanded Name1 there is no need to communicate a difference back to the mainframe.
- The mainframe has an incorrect address and requires an update to State and Zip.
- The mainframe only holds a 5 digit Zip code and therefore only receives the first 5 digits. The mainframe zip is now considered the same as the master even though the master holds an expanded form of the Zip.
When the mainframe is updated with the changes it is able to consume we end up with the following.
The MDM Hub will treat these records as equivalent representations of the same business information.
The MDM process must consider the technical restrictions of contributing systems when making comparisons between information. The MDM Hub must be able to distinguish between how data is stored and what it represents.
Martin Dunn was the co-founder of Delos Technology which developed the MDM technology marketed under the Siperian brand. The Delos MDM technology introduced many MDM concepts that are now widespread within the MDM discipline including a data steward console to adjudicate match results, opt-in synchronization, cell level delta detection and the concept of measuring trust.
Martin is now a partner with Gaine Solutions and continues to advance the techniques by which enterprise Master Data is managed.
Key Questions to Ask During Master Data ConsolidationsTypical master data consolidation starts with combining the operational master records from all the data silos where they exist. The key aspect being, creation of master data indexes to support single view; knowing...
Opt-in SynchronizationNot all operational systems will choose to, or be able to, consume the changes made to master data in an MDM hub. The reasons for being out-of-synchronization may be technical, regulatory, political or economic but at some point it will be...
Changing a Match RuleWhen we are talking to companies about our MDM platform we cover a broad range of topics, from measuring ROI, to more technical questions about the way the software operates. A common technical question is "How do we change a match rule?" Our...
Ready to master data mastering?
Subscribe to our mailing list and we’ll send you courses, insights, product updates, and more. Get to know the ins-and-outs of your Gaine MDX platform, features, and solutions.
[wd_hustle id=”SimpleSubscribe” type=”embedded”/]