Defined Icon
BLOG

Dino or Biedronka? How ML Models Based on Spatial Data Support Business Decisions

Algolytics - Colorful abstract illustration depicting machine learning concepts with graphs, a head silhouette, and various data symbols.

The key components of Location Intelligence (LI) include geocoding address data (assigning geographic coordinates to addresses) as well as visualizing and analyzing data on maps—this enables users to uncover spatial relationships that are not visible in tabular data. Another important element is the ability to detect non-obvious correlations between spatial information and customer characteristics or preferences—this is where Machine Learning techniques come in, allowing for automatic knowledge extraction from data.

As part of the Algolytics platform, we provide a Location Intelligence module — a comprehensive solution that enables businesses to harness the full potential of geographic analytics. The package includes: a module for address data standardization and geocoding, an extensive spatial database with location-specific features at the single-building level, the Automatic Business Modeler AutoML tool for automatically generating predictive models, and the Scoring.one MLOps platform for deploying these models and making their results available in production environments. This integrated toolkit allows organizations to seamlessly move from raw address data, through enrichment with location context, all the way to building AI models and applying them in everyday business processes.

General architecture of the solution

The benefits of implementing Location Intelligence are particularly tangible for companies operating in the physical world—such as retail chains, banks, insurers, and logistics enterprises. By adding “location intelligence” to their decision-making processes, these organizations can gain a better understanding of the context surrounding their customers and branches, leading to more accurate and informed decisions. For example, a retailer can optimize the selection of new store locations based on population density and the presence of nearby competitors. A bank or insurance company can more precisely assess the risks associated with a client’s location (e.g. default or damage risk) using data on the characteristics of a given area. Geographic analysis also facilitates customer segmentation and geographically targeted marketing, as well as streamlining logistics—through route planning and optimal warehouse placement that takes distance, infrastructure, and terrain features into account. As a result, implementing the Algolytics LI suite translates into tangible business value: from increasing sales (through better customer reach and smarter location choices) to reducing operational costs (through more efficient resource allocation and mitigation of spatial risks).

ML + Location Intelligence System Architecture

The Algolytics Location Intelligence (LI) suite is built in layers, encompassing both data components and analytical and deployment tools. Let’s walk through the core elements of the system:

  • Spatial Data Layer – This is the foundation of the solution. Algolytics provides an extensive address database covering all buildings and apartments in Poland, along with precise geographic coordinates. Each address point is enriched with a set of several hundred features describing the location—from basic administrative data (voivodeship, county, municipality, postal code, etc.) and physical characteristics of the building, to environmental and external context. For example, each building has attributes such as the number of floors, year of construction, or building type. For residential locations, data includes the number of inhabitants, demographics, income levels, and the number of businesses operating at the address. The database also includes information about the surroundings, such as how many and what types of POIs (Points of Interest) exist within a given radius (e.g., shops, restaurants, schools), land characteristics (urban vs. rural, type of agglomeration), location potential indicators, and real estate market metrics. Additional layers cover safety (crime, accidents, fires), natural hazards (e.g., proximity to flood zones or landslide areas), competitor presence (locations and density of competing businesses nearby), quality of life (access to green areas, schools, EV charging stations, retail chains), and even internet availability (number and names of providers, maximum bandwidth available at the address). This wide range of data provides a complete contextual profile of every address point.
  • Address Standardization & Geocoding – Before spatial data can be utilized, the quality of the address data must be ensured. The Algolytics suite includes a DataQuality module that recognizes and standardizes addresses using unified formats and dictionaries (e.g., TERYT), and then geocodes them—assigning precise latitude and longitude coordinates. This ensures that even inconsistent address data from various sources is transformed into a unified, mappable format. Geocoding is a crucial step—only with accurate coordinates can we link an address to the described location features and perform spatial analysis. The Algolytics geocoding engine uses rich reference dictionaries and contextual matching techniques to ensure high precision—even when the input data is incomplete or ambiguous. The output is a standardized and validated address with geographic coordinates and associated identifiers (e.g., building ID in the reference database), ready to be enriched with spatial features.
  • ML Modeling Layer (Automatic Business Modeler) – Once the input data is enriched with spatial features, it can be analyzed and modeled. The Automatic Business Modeler (ABM) is an AutoML tool that automates the entire machine learning model-building process. The user (typically a data analyst) only needs to define the training dataset and the modeling objective—ABM takes care of the rest, from variable selection and algorithm testing to parameter tuning and model validation. ABM enables predictive model development within minutes—without programming or deep statistical knowledge. The tool extracts maximum value from available data, generating an efficient model ready for deployment. The result is a ready-to-use algorithm that can perform predictions (scoring) on new data—both in batch and real-time modes.
  • Deployment Layer (Scoring.one) – The final piece of the system is the environment that enables the deployment of built models at scale in real-world business scenarios. Scoring.one is an integrated MLOps platform used for deploying, managing, and monitoring AI/ML models. It allows for embedding the generated model into a decision-making process, defining input/output data flows, and exposing the scoring pipeline as an API service or batch application. Scoring.one includes data orchestration mechanisms, along with built-in functions for data standardization and enrichment—enabling the entire process from data ingestion, transformation, model scoring, and response delivery to be handled within a single platform. The solution is designed for performance and scalability—enabling model deployment in a matter of days (instead of weeks or months) and processing thousands of scoring requests per second across dozens of models simultaneously. In short, Scoring.one serves as a high-throughput engine capable of supporting production-grade AI workloads, while offering model versioning, quality monitoring, and lifecycle management. As a result, integrating Location Intelligence models into business processes (e.g., CRM systems, credit decision engines, or analytical dashboards) is both fast and secure—and long-term maintenance is simple and efficient.
Komponenty rozwiązania i przebieg procesu wytwarzania modelu do szacowania potencjału lokalizacji
Solution Components and Workflow for Building a Location Potential Scoring Model

The architecture described above—from the spatial data layer, through geocoding and modeling, to scoring—forms a cohesive ecosystem for executing Location Intelligence projects. All components are fully integrated, which means that, for example, providing an address to the system can automatically return a complete set of location attributes along with the model prediction generated based on them.

In performance tests, Algolytics demonstrated that this automated pipeline can attach up to ~1,000 spatial features to a single address and simultaneously run hundreds of predictive models to generate recommendation results. As a result, the LI platform is capable of supporting highly complex analytical scenarios in a fraction of the time normally required when using separate tools and manually integrating data.

Use Case: Modeling Potential Store Locations / Uncovering Retail Chain Strategie

To better illustrate the capabilities of the Algolytics Location Intelligence suite, let’s look at a sample case study that combines both business and technical perspectives. The scenario focuses on using spatial data to model optimal new store locations for a retail chain. This type of challenge is particularly relevant for retail companies planning expansion or analyzing their competition—we aim to identify high-potential sites, i.e., those that resemble the profile of locations where existing stores of a given chain are already successful. This type of task is commonly referred to as look-alike modeling, as it involves finding “twin” locations based on the pattern of top-performing existing outlets.

As an example, let’s take two popular grocery store chains in Poland—Biedronka and Dino. Let’s assume we want to predict the likelihood that a store from either chain exists—or could successfully be placed—at any given address in the country. In practice, this modeling exercise is used to identify new, untapped locations that share characteristics with areas already occupied by Biedronka or Dino—potentially ideal sites for the next store of the respective brand.

Store Locations of the Dino and Biedronka Chains

The goal of our modeling exercise is to identify look-alike locations and attempt to infer each chain’s store placement strategy.

In our case, we’ll build a model with a three-class target variable: one value indicating a location where a Biedronka store is present, another for Dino, and a third for locations without a store from either chain. The model will return a scoring output—a vector of values between 0 and 1 representing the probability that a given address matches the profile of chain X. The higher the score, the more closely the address aligns with the characteristics of a store location from that chain.

Data Sources and Features Used

To build the model, we need two main types of data:

  • Store location data (target) – a set of POI (Points of Interest) containing the addresses of all Biedronka and Dino stores across the country. Each of these addresses is labeled as a positive example (Biedronka or Dino), while addresses without a store are treated as negative examples. These POI databases are part of Algolytics’ location data layer.
  • Algolytics spatial data (features) – for each address point, we retrieve the full set of available geographic features from the Algolytics DataQuality tool, describing the location and its surroundings. These explanatory variables enable the model to differentiate between places that resemble existing store locations and those that do not. There are hundreds of features, ranging from the building’s characteristics to neighborhood demographics and the availability of nearby amenities and services.

    Here are a few examples of particularly useful variables in the context of grocery store site selection:
    • Building type and characteristics – e.g., is the address a retail facility, apartment block, or single-family home? Stores are more likely to be located near dense housing (e.g., ground floors of blocks or retail pavilions) rather than in remote or industrial zones.
    • Population density and resident profile – the number of people living nearby, population density in the census area, demographic (age) and economic (income) structures. A good store location is one with many potential customers within a few minutes’ reach.
    • Proximity to city centers and housing estates – the distance of a building from the nearest city center or large population cluster. Features such as whether the address is in a small or large town, rural area, downtown, or housing estate affect the likelihood of a store being located there.
    • Competition and surrounding POIs – key information includes nearby stores and services. For example, the number of grocery stores within 1 km of a given address—models learn what level of competition is typical for each chain. Other POIs can also be considered: schools and preschools (high local traffic), service points, ATMs, gas stations, etc.—all of which build the attractiveness and profile of the location.
    • Transport access and infrastructure – e.g., distance to a main road, availability of parking, public transit. (Some of these may be indirectly reflected in other features or quality-of-life indicators.) A store located on a busy street or near a major transit hub is likely to see more customer traffic.
    • Socio-economic indicators – such as unemployment rate and average income in the area.

Ultimately, each address in the training dataset is represented by a vector of several hundred such variables. The data is then standardized (e.g., missing values filled with median, scale normalization)—a process handled automatically by ABM. The result is a feature matrix, where rows represent addresses (both with and without stores), and columns contain the values of individual geospatial features.

Modeling Process Using Automatic Business Modeler

With the training dataset prepared, we can proceed to build the predictive model. We use the Automatic Business Modeler (ABM) module for this, which significantly simplifies and accelerates the entire process. The workflow is as follows:

  1. Experiment configuration – selecting the target variable and defining evaluation criteria for model quality.
  2. Model training (AutoML): Once launched, ABM automatically tests various modeling approaches. Each candidate model is validated on a portion of the dataset. After several iterations, ABM selects the model with the highest performance according to the chosen metric (e.g., maximizing AUC ROC if that metric was selected).
  3. Model outcome: ABM presents the best trained model along with reports – including accuracy, sensitivity, ROC curve, estimated AUC, etc. Importantly, it also provides a ranked list of the most influential variables (feature importance).
  4. Model export: Automatic Business Modeler enables automatic generation of scoring code for the model – for example, as an SQL script, or the model can be deployed directly via Scoring.one. In our case, we export the model to the Scoring.one platform to perform batch scoring of all addresses of interest.

Both models achieved high discriminatory power – with an AUC of 0.96 for Biedronka and 0.97 for Dino, indicating a very strong ability to distinguish between store and non-store locations.

Lista najważniejszych czynników w modelu
List of Key Model Factors

Scoring All Addresses and Aggregating Results (Census Areas)

Once the models have been built, we move on to applying them at scale. The business goal is to identify new high-potential store locations, so we need to evaluate all address points in the selected area (e.g., across Poland or within specific regions where the company operates). This means processing a very large dataset—over 9 million addresses.

Aggregation by census areas: Scoring individual addresses generates a large volume of data that can be difficult to analyze efficiently. Therefore, it is often more practical to aggregate the results spatially—e.g., at the district or microregion level. In this case, we aggregate by census enumeration areas (the smallest statistical units, typically covering a few hundred people). For each area, we calculate a cumulative location potential index for the respective retail network.

Once aggregated, the results can be visualized on a map—in the form of a heatmap where each census area is color-coded based on its predicted potential. This type of visualization helps to identify territorial clusters with high scores. Areas with characteristics similar to Biedronka store locations are marked in red, while Dino-type areas are marked in green. As the map shows, Biedronka tends to locate stores in highly urbanized zones, while Dino focuses on rural areas.

Wyniku scoringu i agregacji ocen do poziomu obwodów spisowych - czerwony kolor to obszary o wysokiej koncetracji lokalizyacji zgodnych ze strategię sieci Biedronka, zielony Dino

The results of the scoring and subsequent aggregation at the census area level allow us to visualize areas with the highest concentration of locations aligned with each network’s strategy. Red areas indicate a strong resemblance to Biedronka’s location profile, while green areas reflect Dino-like characteristics.
Wyniku scoringu i agregacji ocen do poziomu obwodów spisowych - czerwony kolor to obszary o wysokiej koncetracji lokalizyacji zgodnych ze strategię sieci Biedronka, zielony Dino
Kraków
Warszawa
potencjał lokalizacji nowych sklepów biedronka i dino obrzycko
Obrzycko

Ranking of TOP Locations with the Highest Potential

With scoring results available for all addresses, we can proceed to identify specific candidates for new store locations. To do this, we filter out addresses that already host one of the retail chains (i.e., those present in the training set as positive cases) and focus only on unoccupied addresses. These are then sorted in descending order based on their predicted score.
The result is a ranked list—from the most promising look-alike locations to the least likely matches—for each network.

Top 100 lokalizacji dla sieci Biedronka – kolor czerwony i Dino – kolor zielony

Top 100 Locations for Biedronka (marked in red) &
Top 100 Locations for Dino (marked in green)


For a retail chain, such a ranking is highly valuable: it enables early-stage identification of potential sites for further feasibility studies and strategic decision-making.

Key Features Influencing the Model (Feature Importance)

Alongside identifying the most promising locations, we conducted an analysis of which features had the strongest impact on the model’s predictions. The Automatic Business Modeler provides insight into the most important variables influencing its decisions—based on feature importance scores. Understanding these features helps define the ideal location profile for each retail chain and verify whether it aligns with real-world expectations. Here’s what we found:

Biedronka’s Location Strategy:

  • Focus on cities and metropolitan areas: Stores are mainly located in urban areas, often in densely populated districts.
  • Network densification: Biedronka frequently opens new stores near existing ones to dominate the local market and limit competition.
  • High pedestrian accessibility: Stores are often placed near residential neighborhoods, with easy access for customers without cars.
  • Well-connected locations: Strong emphasis on positioning along major roads, near public transport stops, and with available parking.

Dino’s Location Strategy:

  • Expansion into smaller towns and rural areas: Dino targets towns with around 3,000–10,000 residents, often where strong competitors are absent.
  • New developments: Dino typically builds new stores from scratch in low-density, less urbanized regions.
  • Car-oriented access: Locations are designed with motorized customers in mind, offering large parking areas.
  • Low cannibalization: Expansion focuses on market white spots, where there is no existing presence of major retail chains.
Characteristics of the relationships between selected variables from the top 10 predictors in the model and store locations of a given retail chain – degree of urbanization of the area, level of competition saturation, and access to roads.
Charakterystyka budynków - widać wyższą standaryzację budynków sieci Dino
Building characteristics – a higher standardization of buildings in the Dino chain is clearly visible.
Sieci Dino ulokowane sa w obszarach z niższymi wynagrodzaniami natomiast obie sieci lokują swoje sklepy w obszarach o wyższym ryzyku nie regulowania zobowiązań finansowych przez mieszkańców
Sieci Dino ulokowane sa w obszarach z niższymi wynagrodzaniami natomiast obie sieci lokują swoje sklepy w obszarach o wyższym ryzyku nie regulowania zobowiązań finansowych przez mieszkańców
Dino stores are located in areas with lower incomes, while both chains place their stores in areas with a higher risk of financial default among residents.

Summary and other applications

The above example of store location modeling demonstrates how Algolytics’ integrated Location Intelligence environment can generate real business value through geographic data. Within a single ecosystem, we standardized and geocoded the data, enriched it with hundreds of features describing spatial context, built predictive models using AutoML methods, and then deployed these models for large-scale scoring—resulting in actionable business recommendations. The entire process was fast and smooth because all tools (from Data Quality to Scoring.one) are fully integrated. This kind of platform-based approach lowers the entry barrier for advanced analytics—companies don’t need to piece together multiple tools or build a large team of specialists for every stage, as Algolytics provides a complete solution. From a business perspective, this means significantly faster time-to-market (reduced from months to days) and greater flexibility.

It’s worth noting that the potential of the Location Intelligence platform goes far beyond the use case described. Other possible applications of such an integrated environment in an organization include:

  • Location risk scoring – using spatial data to assess the risk associated with a place of residence or business activity. For example, credit or insurance risk assessments can be improved by incorporating neighborhood features (e.g., wealth of the area, crime rate, propensity for natural damages). Banks and insurers can better differentiate offers—e.g., higher premiums for properties in flood zones or lower credit limits for areas with high default rates. Similarly, telecom and utility providers can assess the risk of connecting customers (fraud, illegal connections) based on location characteristics.
  • Optimization of service point networks – supporting decisions on where to locate branches, warehouses, logistics centers, or customer service points in order to maximize reach and accessibility while minimizing costs. Site selection analyses using LI consider customer distribution, road infrastructure, travel times, availability of public transport, land costs, and even investment barriers (e.g., protected areas, zoning plans). The Algolytics platform can generate location scenarios—for example, identifying the 5 best sites for a new warehouse that meet criteria such as being within X km of 90% of clients and costing under amount Y. In logistics, spatial data analysis supports route and delivery planning (route optimization)—here LI provides both data (e.g., road maps, travel times) and models (e.g., delay prediction based on location).
  • Competition and market analysis – LI tools enable monitoring of competitor activity and identification of market niches. For example, a retail company can create a regional market share map by combining sales data with demographic data and competitor store locations. This reveals areas where competition is strong and where “white spots” (untapped customer potential) remain. Look-alike models can be applied not only to a company’s own stores but also to predict where competitors might expand—supporting preventive strategies. Another example is cannibalization analysis: when planning a new store, the company can estimate how many customers it might take from nearby branches (based on distance maps and customer density), optimizing the total network profit rather than just the new location.
  • Geomarketing and offer personalization – combining geographic customer data with transactional data enables new segmentation and targeting strategies. LI data also helps identify spatial trends in behavior—for example, whether residents of new developments prefer online or offline shopping, how far they travel to the nearest store, etc. This informs marketing budget allocation across regions. Additionally, Location Intelligence analyses can support outdoor campaign planning (billboards, out-of-home advertising) by highlighting where target audiences are concentrated geographically.

To sum up, Location Intelligence is becoming an increasingly essential element of data-driven strategy across industries. The Algolytics LI suite provides everything needed to implement that strategy—from data and analytics to deployment tools. It enables companies to fully leverage the richness of spatial information—making better location decisions, gaining deeper customer and market insights, managing risk more effectively, and uncovering competitive advantages. All while maintaining high operational efficiency, as the integrated environment simplifies and automates many previously complex tasks. In the Algolytics approach, Location Intelligence is not just data or maps—it is a complete business solution that translates geographic patterns into concrete actions that boost business performance.

Ready to grow your business with Machine Learning & AI?

Start leveraging the potential of machine learning and artificial intelligence in your business to achieve measurable benefits – increased sales, reduced costs, and operational efficiency. Contact us, and together we'll develop a modern strategy for managing business processes in your company.

Discover our other articles