When discussing the process of data standardization and validation, it’s hard not to mention the savings (time and money) that a properly cleaned database – such as CRM – brings.
Therefore, in this article, we will focus on the fastest and most effective way to improve the quality of the records you already have on your own and with minimal time investment. Let’s clean up some data with AlgoMaps!
As we mentioned in the previous material on geocoding, address data (and particular contact details) is basic information describing not only the customer, contractors, senders and recipients of the shipments, but also the locations of our business or members of the distribution network. Thanks to them that sales or behavioral analysis is possible, which brings an increase in the effectiveness of activities while optimizing the budget. It is therefore necessary to take care of these data by cleaning and standardizing them, i.e., removing duplicates, improving incorrectly entered or non-standardized records. Why are more and more companies interested in this type of process? The benefits of properly conducted cleaning and standardization are a promising prospect for business development:
The challenge, however, may be the manual process of improving existing data – especially when we are talking about a combination of 2 (or more) databases. How to easily make the term ‘data quality’ not just a corporate phrase?
Although the word ‘address’ seems to be an obvious concept, the optimal way to record it in a database, e.g., in CRM, can be quite a challenge. Maintaining a database containing high quality address data can be a challenge, while quality consists of such elements as completeness (did the customer provide the full address or did he use abbreviations?), timeliness (has the street name been changed recently?), and reliability (is the given address inhabited or is it e.g., a vacant building?). All these areas cover the functionality available in AlgoMaps.
A standard address consists of the following elements:
However, 2 of the above elements are optional – the street name (in Poland, there are towns where no division into streets exists) and the apartment number (in the case of single-family houses). Storing the above address components in the database can be implemented in different ways. Algolytics experts have met with many forms, look at some of them presented below.
Examples of different ways of storing address data in databases.
Problems start when you have two or more databases in which addresses are stored in different ways and you want to create one database from them. What is more, if collected data is incomplete (e.g., no postal code, no city/town), contains errors in writing (typos, lack of Polish characters, use of abbreviations in names, etc.) or the additional information, which is not an address (e.g., addition of floor number, company name, etc.), and your database contains several hundreds, thousands, or millions of records – the scale of the problem becomes beyond human capabilities.
This is when AlgoMaps and artificial intelligence algorithms come into action to standardize, validate, and verify the correctness of address data. Within the scope of these functionalities, the following processes are implemented:
An example of input data and data standardized using AlgoMaps is presented in the table below.
The example of standardized address data using AlgoMaps
You can test how AlgoMaps works in terms of address data standardization in your own examples using the demo.
A slightly different data quality issue is maintaining the uniqueness of records stored in the database. An undesirable phenomenon is having in the database or CRM, for example, the data of the same customer stored in several different ways, and thus in separate records. The process that eliminates such cases and creates the so-called ‘golden record’, the best possible set of features describing e.g., a customer is called deduplication. During deduplication, data in the first stage is standardized and in the second stage, based on similarity assessment between records, it is deduplicated. An example of such a process performed with AlgoMaps is shown in the figure below.
Example of the deduplication process and creation of the so-called “golden record” using AlgoMaps
As in the case of the previously described geocoding function, data standardization can also be done in two ways:
Both operations – geocoding and standardization can be done in one go. You send data to the service once, and as a result, you get geocoded and standardized results!
If you want real-time standardization on a request-response basis, use the API. This is a particularly useful function if you collect address data written in e.g., loose text, and you want to save it in the database or CRM in a standardized and error-free way. After setting up a free account and reading >the documentation, the time to implement the service is just a few hours!
If you want to clean up your database or CRM by standardizing many records at one time, it is worth using the AlgoMaps online application. Standardization consists of three simple steps – preparing the data as a CSV or XLSX file, defining the task in the application, and downloading the resulting data. The whole process is described in detail in the documentation, and you can access the application after creating a free account.
It is worth knowing that in AlgoMaps the first 1000 processed records are always free!
Now that you can clean your data yourself quickly and effectively, it is time to rise to another challenge. In the next article, you’ll learn how AlgoMaps helps you improve the user experience of your website, online form, or application.
Stay tuned!