Duplicate Prevention

Purpose

With Duplicate Prevention you can perform a check for duplicates when you create master data objects (business partners, customers, suppliers, products, etc.). Duplicate prevention is currently only available for business partners.

When the system saves a new master data record, it performs the duplicate check and informs the user about possible duplicates. The user decides whether or not the master data record to be created is a duplicate. If the master data record to be created is not a duplicate, the user saves the master data record. Otherwise the user terminates the creation.

Duplicates are master data records representing the same object in a system.  The field contents of the master data records in the individual fields need not be identical. Field contents might only be similar or could be missing altogether. A typical example is the different ways of writing names like Meier or using synonyms such as Bob and Robert. Compound names could be swapped. Company names could be written out in full or specified with the usual abbreviation. Below are some examples of personal names and company names that could be considered to be similar:

?     Meier, Maier, Meyer, Mayer, Mayr, Mair, Meir, Meyr

?     Hans-Otto, Otto-Hans, Hans, Otto

?     Aron, Aronn, Aaron

?     Bob, Rob, Bobby, Robby, Robert

?     Systems, Applications and Products in Data Processing, SAP AG, S.A.P.

?     John Wiley & Sons, Wiley Ltd

?     John Thomson (Person), Thomson Ltd. (Organization)

The last example shows that persons and organizations can also be similar. Similar examples could also be given for other fields, such as the names of towns or streets.

Duplicates result in problems and unnecessary costs. Preventing or eliminating duplicates is therefore an important goal. You can detect and eliminate duplicates within an existing dataset or prevent their creation.

Duplicate Prevention is aimed at preventing duplicates from being created.

Integration

The following components are used for duplicate prevention:

?     Business Address Services (BAS)

Master data objects for which duplicate prevention is to be used must be connected to the BAS (for example, business partner in CRM).

?     Search and Classification (TREX)

TREX is used as search engine.

Features

Services

Duplicate Prevention is performed in a framework and implemented by a series of services. The following services are involved:

Normalization Service

Field Grouping Service

Indexing Service

Search Service

Fine Matching Service

For more information about the individual services, see the relevant descriptions.

Interaction of the Services during Duplicate Prevention

During duplicate prevention, it is assumed that all data for which duplicates can be found are indexed in the search engine (TREX). This does not mean that all object master data need be indexed. Only the data fields that are required to detect duplicates need be indexed. Usually the normalized form of the values that could be created from phonetic keys, and not the original field values, are used. The normalized field values are created in the normalizing service according to the defined configuration. During normalization, you determine the object data that is relevant for detecting the duplicates and filter out data that is not relevant. A number of options for defining normalizing rules are offered in the configuration of the normalizing service. Examples are phonetic algorithms (Soundex, Metaphone, Cologne Phonetics …), string manipulations using regular expressions as well as character replacement lists and synonym lists.

After entering the master data, you automatically search for possible duplicates. Since the master data record to be created cannot be compared with all the master data records in the database for performance reasons, a two-step method is used. In the first step, you search for similar master data records in the search engine (TREX). All the options for an error-tolerant search that TREX provides can be used. This quickly reduces the number of data records that can contain possible duplicates. This is followed by a fine matching service, which makes a precise analysis of the reduced dataset. Each master data record that is found is compared with the master data record to be created according to the error-tolerant defined strategy and assigned a similarity value. The list of evaluated duplicates is displayed.

The search strategy used in the search service must of course be defined to agree with the fine matching strategy used in the fine matching service. Only the objects found in the search service are examined in detail in the fine matching service. The search strategy used must therefore return the potential duplicates. For performance reasons, however, you should make sure that the search strategy does not return too many objects that are then rejected in the fine matching service.