Consolidate and harmonize disparate data sources into a single, accurate, realizable master record, enabling more effective decision-making, improved operational efficiency, and enhanced data governance across the organization.
Deterministic Matching Engine
Exact Matching
Deterministic rules are based on the exact matching of specific data attributes, such as unique identifiers (e.g., customer IDs, product SKUs), names, or addresses. These rules ensure that identical records from different sources are recognized and merged with high precision.
Attribute Prioritization
The tool allows users to prioritize certain data attributes when applying deterministic rules. For instance, when merging records, a rule might prioritize the most recent update or the most trusted data source, ensuring the best data is retained in the master record.
Conflict Resolution
In cases where deterministic rules identify conflicts between data sources, the tool provides options for resolving these conflicts based on predefined rules or user-defined preferences, ensuring that the final master record is both accurate and reliable.
Predefined Conditions
Users can define precise conditions that must be met for records to be considered the same. For example, a rule might state that two customer records should be unified only if their names, birthdates, and addresses match exactly. This helps prevent false positives and ensures only truly identical records are merged.
Automated Rule Execution
Deterministic rules are automatically applied during the data unification process, ensuring consistent and repeatable results. This automation reduces the need for manual intervention and speeds up the
MDM process.
Auditing and Traceability
The tool tracks the application of deterministic rules, providing a clear audit trail. Users can review how and why specific records were merged, split, or modified, ensuring transparency and accountability in the MDM process.
Probabilistic Matching Engine
Fuzzy Matching
Probabilistic rules use fuzzy matching techniques to compare data attributes that may not exactly match but are likely to represent the same entity. For example, slight variations in names (“John Doe” vs. “Jon Doe”) or address formats (“123 Main St.” vs. “123 Main Street”) are recognized as potential matches.
Weighted Attributes
Probabilistic rules allow users to assign different weights to various data attributes based on their importance in identifying a match. For example, an exact match on a social security number might be weighted more heavily than a partial match on a last name, increasing the likelihood of correct unification.
Handling Ambiguities
In cases where the probability of a match falls within a certain range, the tool can be configured to either automatically merge records or flag them for manual review. This flexibility helps manage the uncertainty inherent in probabilistic matching while ensuring data quality.
Similarity Scoring
The tool calculates similarity scores for data attributes, assigning a probability that two records refer to the same entity. Users can define thresholds for these scores to determine when records should be merged or flagged for review, balancing accuracy with coverage.
Self-Healing
Machine learning models to refine probabilistic matching over time. By analyzing historical data and learning from steward actions, the tool learns patterns and improves the accuracy of its probabilistic rules, making smarter decisions about when to merge or separate records.
Auditing and Traceability
The tool tracks the application of deterministic rules, providing a clear audit trail. Users can review how and why specific records were merged, split, or modified, ensuring transparency and accountability in the MDM process.
Audit and Review Capabilities
The tool tracks all decisions made by probabilistic rules, providing a clear audit trail. Users can review how probabilities were calculated, why certain records were merged, and adjust rules or thresholds as needed to improve outcomes.
Identity Resolution
Blocking
This initial stage uses standard blocking techniques to group records into blocks based on shared characteristics, reducing the comparison space. Token-based blocking further refines this process, improving efficiency by focusing on specific tokens or attributes within the data.
Similarity Comparison
Various algorithms like Jaro Winkler, Edit Distance, and Soundex are employed to compare records within blocks. Each attribute of the records is scored based on its similarity to help identify potential matches.
Block Processing
Within each block, records are processed to prepare them for detailed comparisons. This might involve further data cleansing or deduplication within the block itself.with coverage.
Classification/Clustering
Using tools like itertools and network graph techniques, records are compared pairwise to detect clusters of records that likely represent the same entity.
Evaluation
After clustering, the system uses predefined match-merge rules to determine each record’s status: auto-merge, possible-merge, or no-match. Records identified as auto-merge are further processed based on trust score configurations to create or update the golden record, ensuring it accurately represents the entity.
Data Stewardship
Survivorship Rules



