Golden record and data deduplication

New Data Dimension with AlgoMaps – part 2 – Golden Record (Location Intelligence series)

When discussing the process of data standardization and validation, it’s hard not to mention the savings (time and money) that a properly cleaned database – such as CRM – brings.

Therefore, in this article, we will focus on the fastest and most effective way to improve the quality of the records you already have on your own and with minimal time investment. Let’s clean up some data with AlgoMaps!

Benefits for your business, or WHY to clean and standardize your data?

As we mentioned in the previous material on geocoding, address data (and particular contact details) is basic information describing not only the customer, contractors, senders and recipients of the shipments, but also the locations of our business or members of the distribution network. Thanks to them that sales or behavioral analysis is possible, which brings an increase in the effectiveness of activities while optimizing the budget. It is therefore necessary to take care of these data by cleaning and standardizing them, i.e., removing duplicates, improving incorrectly entered or non-standardized records. Why are more and more companies interested in this type of process? The benefits of properly conducted cleaning and standardization are a promising prospect for business development:

  1. Improving data quality in CRM systems and databases means that your data will be up-to-date, complete, and reliable – providing the basis for analysis and effective strategies;
  2. Standardized customer address data allows you to effectively automate operational processes (without manual data processing);
  3. Standardized data brings more than just financial benefits – it saves, among other things, the time needed to standardize data before each use;
  4. Thanks to the reduction of duplicate or misaddressed shipments, costs of marketing campaigns, among others, are reduced;
  5. Elimination of error propagation in ongoing operational processes becomes fast and efficient;
  6. Consolidation of multiple databases or CRM systems into a common, standardized structure is possible in just a few clicks.

The challenge, however, may be the manual process of improving existing data – especially when we are talking about a combination of 2 (or more) databases. How to easily make the term ‘data quality’ not just a corporate phrase?

Address vs address: 3 pillars of Location Intelligence

Although the word ‘address’ seems to be an obvious concept, the optimal way to record it in a database, e.g., in CRM, can be quite a challenge. Maintaining a database containing high quality address data can be a challenge, while quality consists of such elements as completeness (did the customer provide the full address or did he use abbreviations?), timeliness (has the street name been changed recently?), and reliability (is the given address inhabited or is it e.g., a vacant building?). All these areas cover the functionality available in AlgoMaps.

Let’s break down an address into parts!

A standard address consists of the following elements:

  • town name;
  • postal code;
  • street name;
  • house number;
  • apartment number.

However, 2 of the above elements are optional – the street name (in Poland, there are towns where no division into streets exists) and the apartment number (in the case of single-family houses). Storing the above address components in the database can be implemented in different ways. Algolytics experts have met with many forms, look at some of them presented below.

different_databases

Examples of different ways of storing address data in databases.

Problems start when you have two or more databases in which addresses are stored in different ways and you want to create one database from them. What is more, if collected data is incomplete (e.g., no postal code, no city/town), contains errors in writing (typos, lack of Polish characters, use of abbreviations in names, etc.) or the additional information, which is not an address (e.g., addition of floor number, company name, etc.), and your database contains several hundreds, thousands, or millions of records – the scale of the problem becomes beyond human capabilities.

This is when AlgoMaps and artificial intelligence algorithms come into action to standardize, validate, and verify the correctness of address data. Within the scope of these functionalities, the following processes are implemented:

  • Unification of address data storage – here it does not matter in which form the input data is provided. Regardless of whether the address is in the so-called loose text or partially broken down into elements, AlgoMaps always returns results in the same form – in accordance with good practices for storing addresses in databases.
  • Removing errors in recording street and place names – standardized values of street and place names will be returned (according to registers that collect such information). Names that are written with abbreviations (e.g., “JP2” instead of “John Paul II”) or in a colloquial way (“waw” instead of “Warsaw”) will be replaced with standardized values.
  • Adding missing address elements – for addresses where the city name or postal code is missing, the correct standardized values for those elements will be added.
  • Updating postal codes, street names, and town names – in the case of older databases, this information will be updated.
  • Verification of address existence/property – AlgoMaps gathers almost all addresses in Poland in its reference databases, which are used to return information on the existence and correctness of an address provided by e.g., a client or contractor.

An example of input data and data standardized using AlgoMaps is presented in the table below.

standardized_data_AlgoMaps

The example of standardized address data using AlgoMaps

You can test how AlgoMaps works in terms of address data standardization in your own examples using the demo.

Data deduplication – how to create a golden record?

A slightly different data quality issue is maintaining the uniqueness of records stored in the database. An undesirable phenomenon is having in the database or CRM, for example, the data of the same customer stored in several different ways, and thus in separate records. The process that eliminates such cases and creates the so-called ‘golden record’, the best possible set of features describing e.g., a customer is called deduplication. During deduplication, data in the first stage is standardized and in the second stage, based on similarity assessment between records, it is deduplicated. An example of such a process performed with AlgoMaps is shown in the figure below.

Golden_record

Example of the deduplication process and creation of the so-called “golden record” using AlgoMaps

How to standardize data with AlgoMaps?

As in the case of the previously described geocoding function, data standardization can also be done in two ways:

  1. Integration via WebService/API

  2. Online application in the cloud

Both operations – geocoding and standardization can be done in one go. You send data to the service once, and as a result, you get geocoded and standardized results!

If you want real-time standardization on a request-response basis, use the API. This is a particularly useful function if you collect address data written in e.g., loose text, and you want to save it in the database or CRM in a standardized and error-free way. After setting up a free account and reading >the documentation, the time to implement the service is just a few hours!

If you want to clean up your database or CRM by standardizing many records at one time, it is worth using the AlgoMaps online application. Standardization consists of three simple steps – preparing the data as a CSV or XLSX file, defining the task in the application, and downloading the resulting data. The whole process is described in detail in the documentation, and you can access the application after creating a free account.

It is worth knowing that in AlgoMaps the first 5000 processed records are always free!

Now that you can clean your data yourself quickly and effectively, it is time to rise to another challenge. In the next article, you’ll learn how AlgoMaps helps you improve the user experience of your website, online form, or application.

Stay tuned!

Pin It on Pinterest