- data profiling,
- data cleaning (including parsing, standardization, de-duplication),
- conducting statistical analyses,
- data enrichment,
- cyclic data quality control,
- geocoding and data visualisations.
AdvancedMiner DQ can be used to detect, monitor and troubleshoot data sources. The application can be used also to automatically clean new data.
- scalability – the application has been proven in projects where the number of rows of demographic data exceeded 60 million records;
- customisability – rules and algorithms used in data cleaning can be freely modified and defined by the user;
- data import/export – AdvancedMiner allows for importing data from relational database systems, text files (using CSV, XML standards) and from spreadsheets;
- reporting – the application can create regular reports, including reports based on user-modified templates. The application is integrated with MS Office suite.
AdvancedMiner DQ is Algolytics’ proprietary software. This makes it possible to adjust the offered solution to individual customer needs.
Updates to dictionaries
AdvancedMiner DQ is provided with dictionaries necessary for its proper funcioning. These dictionaries are updated regularly with new information regarding the changes to phone/address databases (changes of street names, numbering, etc.).
Implementation of automated data quality processes
The application functionality allows for achieving and maintaining a certain level of data quality over a long period of time. This is accomplished by automating certain data quality processes and running them regularly.
The application enables data exploration to detect issues affecting poor data quality and allows for verification of data in terms of business and technical accuracy.
Data cleaning, which covers:
- parsing – breaking down a complex field into a number of fields based on the meaning of data and context (for example, first and last name, code and city, etc.). Additionally it is possible to determine gedner (in case of persons) or legal form (in case of companies);
- standardization – replacing a number of different instances of the same variable with one value. For example, „New York” and „NY” will be identified as the same value;
- deduplication – detecting duplicate records and their consolidation. It is also possible to search for multiple entries of the same customer even if his/her data is partially different (e.g. in case of address change).
- Combining external sources – matching data from different sources (databases). For example, it allows for linking a person from two sources:
|Source 1||John Smith, born 1975/01/27||10451||Park Ave||–|
|Source 2||Smith J.||New York||Park Avenue||January 27, 1975|
- Adding new information to the data using dictionaries.
- Detection of households – determining the relationships between customers, for example, identifying households or businesses based on customer information.
Geocoding and data visualisations
The application allows for assigning geographical coordinates of the building to the identified address and assigning the area, which then together with vector data supplied with the application allows for visualizing data on maps in popular programs such as MapInfo, ArcGIS or QGIS.