Algolytics Technologies Documentation
  • End-to-end Data Science Platform
  • ABM
    • Introduction to ABM
    • Repository
    • Classification - adding, running and deleting projects
    • Approximation - adding, running and deleting projects
    • Models and variables statistics
    • Model deployment
    • ABM API
    • Data scoring
    • Adding, running and deleting projects
  • Event Engine [user]
    • Engine description
    • How the engine works
    • Events
    • Aggregate module
    • Metadata
    • Components of metadata
    • Off-line processing and modeling
    • Examples of API operations
    • Visualisation
  • Event Engine [administrator]
  • Scoring.One
    • Engine description
    • Panels overview
    • Implementation of scoring models
    • Creating and testing a scenario
    • SCE Tracking Script
  • Advanced Miner
    • Documentation
    • How to install license key
  • DataQuality [web app]
  • Algolytics APIs
    • DQ for Python API
    • Scoring Engine WEB API
    • ABM Restfull API
    • Other APIs
  • Privacy policy
  • GDPR
Powered by GitBook
On this page
  • What is Algolytics Data Quality?
  • Algolytics Data Quality Functionalities
  • Work with Algolytics Data Quality
  • Main Menu View
  • New task
  • Task and file name selection
  • File Information
  • Defining variables
  • Tasks
  • Job report
  • Retrieving dictionary data
  • My account
  • Changing your password
  • Appendix 1. Standard columns in the output file
  • Appendix 2. Statuses of the results of the standardisation of address data and geocoding

DataQuality [web app]

What is Algolytics Data Quality?

Algolytics Data Quality is a tool for standardizing and enriching customer data. An application running in batch mode is used to detect, monitor, and troubleshoot the data. The DataQuality.pl application provides an interface (accessible through a web browser) that allows you to load data into the application, define a standardization task and download the results.

Algolytics Data Quality Functionalities

  • data profiling

  • data cleansing (including parsing, standardization, deduplication)

  • conducting statistical analyses

  • data enrichment – matching data from different databases, adding new information about data

  • geocoding and data visualization

  • dictionary data retrieval

Work with Algolytics Data Quality

Using the app consists of two essential steps – adding a file to clean and download the result. The detailed steps of action are described below.

  1. Load the file with the database to be cleaned.

  2. Check the information about the uploaded file.

  3. Select the roles for the columns from the file.

  4. Add a task.

  5. Once the task is complete, download the data and generate the report.

Main Menu View

Fig. 1 Main application window

Description of the individual items of the main menu:

Start

Home page

Tasks

if you want to view the operations carried out so far or continue the work you have started

New task

if you want to perform an operation on new data

Data dictionaries

If you want to download dictionary data

Moje konto

if you want to edit your account details

Documentation

If you want to read a detailed description of the individual functions

Application

User Mail

after clicking on it, you can go to the My Account, Change Password or Log Out screen

Version

New task

When you select New Task from the main menu of the application, the wizard will open defining a new task.

Task and file name selection

The first thing you need to do is to specify the name of the task and load the data file. To do this, click the Feed button and select the file you want. Once selected, click the Load button. You can change the name of the task in the Task name field.

The following data formats are supported: CSV and XLSX.

Fig. 2 The first step of defining a new task

File Information

Once the file is loaded, an area will appear with information about the loaded file.

Fig. 3 Information about the loaded file when defining a new task

It presents the following information:

Name

filename

Size

file size, maximum possible size is 2 GB

Number of records

Number of records in the loaded file

Status

The state of the file after the initial verification. Possible values:

OK - a condition indicating no contraindications to continue

Note - indicates a potential lack of funds on the account, a message will be displayed below the table with information about the file: Processed file may be blocked if the fee exceeds the funds on the account

Error - prevents the task from continuing with creation. A simplified error message will be displayed below the file information table

Actions

a button to delete the file; When you delete a file, the file information area

will be hidden

Separator detected

detected data separator in the loaded file

Choose a different separator

field in which the user can point to another separator, which will reload the preview of the first lines of the loaded file

Text Qualifier

qualifier (character) pointing to text that will not be treated as

separator

File encoding

information about the detected encoding of the uploaded file

This area also displays a preview of the first 5 lines of the uploaded file. If you change the separator or qualifier of the text, the preview changes.

After verifying the correctness of the uploaded file, click Next.

Defining variables

In the next step, you need to indicate the roles for the data contained in the file. The screen will display a table where you need to set a role for each column found in the file.

Fig. 4 Defining the task of standardization and data enrichment

The table on this screen contains the following information:

  • Field in file – column name

  • Column Type

  • Role for column

You can choose from the following types and roles for columns:

Address variable

KOD_POCZTOWY CITY

ULICA_NUMER_DOMU_I_MIESZKANIA STREET

NUMER_DOMU NUMER_MIESZKANIA NUMER_DOMU_I_MIESZKANIA VOIVODESHIP

COUNTY

MUNICIPALITY

Name variable

NAME SURNAME

NAZWA_PODMIOTU

IMIE_I_NAZWISKO

Contact variable

EMAIL1 EMAIL2 TELEFON1

TELEFON2

Variable of people/companies

PESEL

NIP REGON

Date variable

DATA_URODZENIA

CZAS_AKTUALIZACJI

Unspecified variable

DANE_OGOLNE - the variable will be analyzed under

all possible information

Identifier

ID_REKORDU

Neutral variable

REWRITE – copies the variable to the output file

SKIP – does not copy the variable to the output file

A prerequisite for correct parameter validation is the assignment of the following type to each variable and role. Variable roles cannot be repeated.

In the case of clearing address data, the CITY OR DANE_OGOLNE.

Below the table with the selection of types and roles for columns, it is possible to select additional data to include in the output file.

Possible options are:

  • Information about the building - number of units, type, population and demographic structure

  • Building surroundings – the number of POIs (points of interest) or other objects, e.g. roads, built-up areas, etc. in a buffer of 1 km from the geocoded address data

  • TERYT IDs

  • Financial risk (requires the TERYT identifiers option) – default risk, fraud risk and the average income of the inhabitants

  • Diagnostic information—Address match and geocoding status

  • Geocoding – geographical coordinates of the building

  • Deduplication

  • Incremental deduplication

Deduplication requires ID_REKORDU and CZAS_AKTUALIZACJI in the input variables. Incremental deduplication collects customer data and confronts each new chunk of data with data from previous calls.

At the very bottom of the page, you will find information about the maximum price of the task and 2 buttons: Previous step, which you can use to return to the file information screen, and Add task, which you can use to add a task to be performed.

Clicking Add Task adds a task to be performed and takes you to the task screen. To download the results file, click the Download button. To view the task report, click View.

The first 5000 rows processed by the User are free. After using this number, the system requires you to fill in the invoicing data on the My Account screen and top up your account.

Tasks

On the Tasks screen, you have access to all tasks that have been completed, are in progress, or are waiting to be performed. The table also includes dictionary datasets, if they were retrieved by the user. The table can be sorted using headers; by default, it is sorted by the End Date column.

You can filter the table by typing or selecting the appropriate values from the fields below the table headers. You can remove filters with the Clear filters button above the table. There is also an option to select the number of tasks displayed in the table.

Below the table, there is a New task button, which takes you to the screen of creating a new task.

Fig. 5 Tasks screen view

Description of the Tasks table columns:

Task name

name defined by the user in the process of creating a new job, in the case of downloading dictionary data, the name is based on the data range selected by the user

Task status

Available job statuses: New, In Queue, In Progress, Completed

End date

the date and time when the system finished executing the task; for

The status of the currently running tasks is displayed: New, Pending, Running

Number of records

Number of records in the file

Fee

Fee for the performance of the task

Results

A link to the file processed by the system or to the corresponding dictionary data . If the file has been stored on the system (30 days), the Results column will show the word Expired. If the file is blocked due to lack of funds, the text Blocked (for credited settlements) or the Pay button will be displayed, which redirects to the My Account screen (in the case of a credit settlement)

method PrePaid)

Report

for completed tasks, the field contains a link to the Task report page,

which provides statistics on the quality of standardisation

Actions

There are two actions available:

Undo that removes the job from the list and from the queue for processing. The selected file will be deleted and the data will not be processed. This action is available only for tasks that have not started.

Delete, which will remove the task from the user's list and delete the data file from the server. This action is only available for completed tasks.

Note: The system automatically deletes files that are older than 30 days. Such a task will receive the status Expired in the Results column (no link to download the results is available).

Data Scrubbing Results

You can view the results file by clicking the Download button on the Tasks screen. A detailed description of all columns in the output file can be found in Appendix 1.

The file is a database table (.csv) that contains the following columns (except for the neutrals for which the role of the PRESCRIBER is defined):

out_miejscowosc

Town name

out_czesc_miejsc

name of the part of the town

out_ulica

street name created by Algolytics based on the CSO street name fields by the elimination of repetitions and the features of ul.

out_ulica_cecha

the feature name of the street GUS: ul., al., pl. etc.

out_ulica_nazwa_1

the main part of the name of the street of the Central Statistical Office (e.g. the name of the patron)

out_ulica_nazwa_2

the first, less important part of the street name, if it exists (e.g. the name of the

patron)

out_kod

Zip Code

out_nr_domu

house (building) number

out_nr_miesz

apartment number

out_gmina

Name of the municipality

out_powiat

County name

out_wojewodztwo

name of the voivodeship

out_mieszkania

number of apartments in the building

out_osoby_prawne

number of legal persons that have their registered office in the building

out_adr_id

address identifier, time-invariant

If you select Building Information, columns are added to enrich the input data with the following information:

out_zamieszkane

number of inhabited dwellings according to the Central Statistical Office (GUS)

out_mieszkancy

number of inhabitants according to the Central Statistical Office

out_popul_miesz

number of inhabited dwellings according to PESEL

out_popul_os

number of inhabitants according to PESEL

out_popul_kob

number of women according to PESEL

out_popul_mez

number of men according to PESEL

out_popul_

a group of columns with data on the population of the building's residents based on the PESEL register: kob – women, husband – men, 25_29k – women aged

25-29 years old, 30_34m – men aged 30-34, etc.

out_urzad_s

Name of the tax office for the address

If you select Diagnostic information, a out_status column is added that provides information about the match and geocoding quality of the address data. A detailed description of the possible statuses can be found in Appendix 2.

If you select Geocoding, the following columns are added:

  • out_wsp_x – longitude

  • out_wsp_y – latitude

If you select the TERYT identifiers option, the following columns are added:

out_sym_msc

identifier of the primary town in the SIMC GUS system

out_sym_cz_msc

identifier of the basic town part in the SIMC CSO system

out_sym_ul

street identifier in the ULIC GIS system

out_gmina06

Municipal identifier in the TERC GUS system

out_rodz_gmi

identifier of the type of municipality in the TERC CSO system

out_rodzaj_gminy

Name of the type of municipality

out_rejon13

statistical region code from the BREC system of statistical districts and census districts

out_obwod14

census tract code from the BREC system of statistical districts and census districts

A census tract is a spatial unit separated for censuses and other statistical surveys according to the number of dwellings and inhabitants. A statistical region is a spatial unit of statistical data aggregation consisting of several, no more than nine census districts.

If you select Financial risk , the following columns are added:

out_geoscore_prv_pd_level_I

Probability of PD (default) determined based on

location of the building for individuals

out_geoscore_prv_pd_level_II

Probability of PD (default) determined based on

location of the building and its features for individuals

out_geoscore_bus_pd_level_I

PD probability (default) determined based on the location of the building for companies with PESEL number, e.g. JDG, civil partnerships

out_geoscore_bus_pd_level_II

PD probability (default) determined based on the location of the building and its characteristics for companies with PESEL, e.g. JDG, civil partnerships

out_geoscore_bus2_pd_level_I

PD probability (default) determined based on the location of the building for companies that do not have a PESEL number, e.g. a limited liability company

out_geoscore_bus2_pd_level_II

PD probability (default) determined based on the location of the building and its characteristics for companies that do not have a PESEL number, e.g. a limited liability company

out_geoscore_prv_pf_level_I

PF (fraud) probability determined based on the location of the building for individuals

out_geoscore_prv_pf_level_II

PF (fraud) probability determined based on the location of the building and its characteristics for individuals

out_geoscore_bus_pf_level_I

PF (fraud) probability determined based on the location of the building for companies with PESEL number, e.g. JDG, civil partnerships

out_geoscore_bus_pf_level_II

PF (fraud) probability determined based on the location of the building and its characteristics for companies with PESEL number , e.g. JDG, civil partnerships

out_geoscore_bus2_pf_level_I

The probability of PF (fraud) determined based on the location of the building for companies that do not have a PESEL number, e.g. a limited liability company

out_geoscore_bus2_pf_level_II

PF (fraud) probability determined based on the location of the building and its characteristics for companies without PESEL, e.g. limited liability company

out_avg_income

An income index containing an average estimated index

income of people living in the micro-market

out_q5_income

5 quantiles of estimated income of residents

Micro-market

out_q25_income

25 quantiles of estimated income of persons living in

Micro-market

out_q50_income

50 Quantile (median) Estimated Income People

inhabiting the micro-market

out_q75_income

75 quantiles of estimated income of persons living in

Micro-market

out_q95_income

95 quantiles of estimated income of persons living in

Micro-market

If you select Building surroundings, the following columns are added:

out_buildings_a_

Number of different types of buildings within 1 km of the building, examples of building types: residential, service, public,

supermarket, swimming pool, school, university, hotel, etc.

out_buildings_a_area_

Area of buildings of various types at a distance of 1 km from the building, examples of building types: residential, service,

public, supermarket, swimming pool, school, university, hotel, etc.

out_landuse_a_

Number of different types of land cover at a distance of 1 km, examples of land cover: forest, natural areas, industrial areas, residential areas, commercial and service areas,

parks, parking lots, etc.

out_landuse_a_area_

Area of various types of land cover at a distance of 1 km, examples of types of land cover: forest, natural areas, industrial areas, residential areas, commercial areas

parks, parking lots, etc.

out_natural_a_

Number of different types of objects of natural origin

out_natural_a_area_

Surface area of various types of objects of natural origin

out_pois_

Number of POIs (points of interest) within 1 km from the building, examples of POIs: shop, park, school, kindergarten, café, bakery, cinema, theatre, pharmacy, petrol station,

supermarket, fryzjer itd.

out_railways_

Number of rail structures at a distance of 1 km, examples of types:

kolej, tramwaj, metro

out_railways_length_

Length of rail objects at a distance of 1 km, examples

Types: rail, tram, metro

out_roads_

Number of roads of different types within 1 km of the building, examples of road types: motorway, expressway, main road, secondary road, local road,

housing estate road, transport hub, bicycle path, pavement

etc.

out_roads_length_

Length of roads of various types at a distance of 1 km from the building, examples of road types: motorway, expressway, main road, secondary road, local road,

housing estate road, transport hub, bicycle path, pavement

etc.

out_traffic_, out_transport_, out_traffic_a_, out_transport_a_

Number of transport-related facilities within 1 km of the building, examples of object types: tram stop, bus stop, railway station, metro station, taxi rank, bus station, parking lot, etc.

out_traffic_a_area_, out_transport_a_area_

Area of transport-related facilities at a distance of 1 km from the building, examples of types of facilities: tram stop, bus stop, railway station, metro station, taxi stand,

bus station, parking lot, etc.

out_water_a_area_

Surface water surface of various types at a distance of 1 km from the building, examples of types of facilities: lake, sea,

river, swamp, etc.

out_waterways_length_

The length of surface water of various types at a distance of 1 km from the building, examples of types of objects: lake, sea, river, swamp, etc.

Job report

For each completed job, you can view a report on the quality of standardization and geocoding by clicking the View button on the Jobs screen.

Fig. 6 Task report with a description of the results of standardization

The report includes the following information:

Results

Status

File Cleanup Status

Error description

Short description of the error

Input Records

Number of records in the input file

Processed records

Number of records processed

Records to be billed

Number of records to bill

Records Skipped

Number of records skipped

City level

Number of records cleared to city level

Street level

Number of records cleared to street level

Building level

Number of records cleared to the building level

Apartment level

Number of records cleared to the premises level

Extracted data

Correctly extracted names

number of company name records for which

the name of the business owner has been extracted

Correctly extracted names

number of company name records for which

the name of the owner of the company has been extracted

Correctly extracted company names

number of company name records for which

the legal form of the company has been distinguished

Undistinguished names

Number of records for which the name could not be confirmed

Non-isolated names

Number of records for which the name could not be confirmed

Retrieving dictionary data

The user can download data from the address database of buildings after selecting the Data dictionaries from the main menu.

There are two ways to specify the scope of data to be downloaded:

  1. Downloading for entire areas of a province, county or municipality

    • Select a province or an area of the entire country

    • After selecting a state, select a county or an area of the entire province

    • Once you've selected a county, select a municipality or an area of the entire county

  2. Download by zip code

    • Enter the entire postal code - XX-XXX or XXXXX formats are acceptable

    • Enter the code pattern: XX* (e.g. 03*) – then data for all postal codes starting with XX digits are downloaded

After specifying the area, the User can calculate the price of downloading dictionary data from the selected area by using the Calculate fee button.

Fig. 7 Dictionary data to download

To download dictionary data, click the Download Data button. The app automatically changes to the Tasks screen. The task list will show an entry for dictionary data along with its status. If the data is available, it is possible to download it by clicking on the Download button.

Fig. 8 Dictionary data ready for download on the task list

Note: Before the actual data extraction is started for the user, after clicking Downloading data, the system will calculate the fee. If the fee is greater than the amount of funds in the User's account, the following message will be displayed: The fee for dictionary data is greater than the amount of funds in the account. Top up your account. Click the Top up button to top up your account and be able to download the dictionary data of your choice.

The resulting table is a . CSV file that contains the following columns:

id

Building ID

sym_msc

symbol of the town of GUS TERYT

Location

Town name

sym_ul

symbol of the street GUS TERYT

feature

the feature name of the street GUS: ul., al., pl. etc.

nazwa_1

the main part of the name of the street of the Central Statistical Office (e.g. the name of the patron)

nazwa_2

the first, less important part of the street name, if it exists (e.g. the name of the patron)

nr_calk

numerical part of the building number

street

street name created by Algolytics based on the street name fields of the Central Statistical Office by eliminating repetitions and ul.

nr_domu

Building No.

x

longitude

y

latitude

wojewodztwo

name of the voivodeship

county

County name

municipality

Name of the municipality

Municipality06

TERYT identifier of the municipality

Area13

statistical region identifier

Circuit14

census tract ID

Apartment

number of apartments in the building

Inhabited

number of inhabited dwellings according to the Central Statistical Office (GUS)

Residents

number of inhabitants according to the Central Statistical Office

Status

Geocoding Status: Exact, Adjacent Building, Perimeter Center

code

Zip Code

Detailed descriptions of the columns can be found in Appendix 1.

My account

The My Account screen displays information about billing, API access, user data, and invoicing data.

Fig. 9 My Account screen view

The screen is divided into 4 areas: Billing, JSON API Access, Customer Data, Data to Invoicing.

The Billing area contains the following information:

Current Credit/ Available Funds

the sum of credited fees charged to the User when settling with the crediting method/the currently available amount of cash, topped up via the on-line payment system, if the User settles using the PrePaid method

Fees charged

the sum of all fees charged so far

Receivables – last month

the sum of unpaid fees charged to the User for the previous billing period, only for Users settling using the credit method

Processed records

sum of all records from all user-loaded

Files

Including cleared records

sum of all records purged successfully that

generated fees

Last Supply

date of the last credit in the case of a User settling with the PrePaid method, in the case of settling with the credit method, this field is empty

Final task

Date of the last user task performed by the system

Payment of the amount due/ Top-up of the account with funds

the amount due and the Pay/field for entering the top-up amount and the Top-up account button. The buttons direct the User to the PayU system, where they need to make an online payment

You can also use dataquality.pl applications through the API service. The JSON API Access area includes the following information:

  • API access key – a 40-character key used with the use of API to authorize and identify the User to perform tasks

  • Generate new key - allows you to generate a new API key, the previous key will no longer be active and will no longer be available in the system

  • API documentation – a link to the API documentation dataquality.pl

Customer data area contains the following information:

  • Gender – Water/Paan

  • Name

  • Surname

  • Company

  • Numer by phone

  • Email

The Invoicing area contains the following information:

  • Name and surname/Name of the entity

  • TIN

  • Street

  • House number

  • Apartment number

  • Zip code

  • City

Each of the data from the two above areas (except for the e-mail address) can be changed and confirmed by clicking the Save data button.

Changing your password

You can change your password by clicking the email address button in the top menu and then selecting Change Password.

Fig. 10 Change Password Screen View

To change your password, enter your current password, type your new password, and type your new password again, and then click Change Password.

If the password has not been changed for 30 days, the system will force the User to change the password. After logging in, the User will be taken to the Change Password screen, which will require you to enter a new password and confirm it. The new password cannot be the same as the previous three passwords. Only after changing the password and logging in, the system will redirect to the main screen as standard.

Appendix 1. Standard columns in the output file

Below are the column names and their definitions, and in parentheses - the types with the maximum suggested number of characters for text variables (set with a margin where it may be useful). The following list includes columns that are included in the output file of the Address Standardization process, both by default and by selecting the appropriate enrichment option, which are available to all DataQuality Algolytics users.

  • out_id (integer) – the number of the record in the set;

  • out_sym_msc_pod (text, 7) – symbol of the basic locality of the Central Statistical Office TERYT;

  • out_miejscowosc (text, 75) – the name of the (basic) place;

  • out_sym_cz_msc (text, 7) – symbol of a part of the locality of the Central Statistical Office TERYT; if no part of the locality has been distinguished that is not identical to the basic locality, there will be an empty text in this field;

  • out_czesc_miejsc (text, 75) – the name of the part of the locality distinguished by the Central Statistical Office, with the reservation as above;

  • out_sym_ul (tekst, 5) – symbol nazwy ulicy GUS TERYT;

  • out_ulica (text, 150) – street name created by Algolytics on the basis of the CSO street name fields by the elimination of repetitions and features of ul.;

  • out_ulica_cecha (tekst, 15) – cecha nazwa ulicy GUS: ul., al., pl. itp.;

  • out_ulica_nazwa_1 (text, 150) – the main part of the name of the street of the Central Statistical Office (e.g. the name of the patron);

  • out_ulica_nazwa_2 (text, 100) – the first, less important part of the street name, if it exists (e.g. the name patron);

  • out_kod (text, 6) – postal code in the format: two digits, dash, three digits;

  • out_nr_domu (tekst, 10) – numer domu, budynku;

  • out_nr_miesz (text, 15) – apartment number;

  • out_adr_id (text, 35) – an identifier of an address, invariant over time;

  • out_status (text, 500) – a sequence of entries in the form of <operacja_lub_informacja: result> or <information>, for example: '<match: building><geocoding: matched building>'; more about address standardization statuses in point 2;

  • out_gmina (tekst, 50) – nazwa gminy;

  • out_powiat (text, 50) – the name of the county;

  • out_wojewodztwo (text, 50) – name of the voivodeship;

  • out_gmina06 (text, 6) – a six-character code of the commune, consisting of the codes: voivodship (positions 1-2 ), powiat in voivodship (positions 3-4), commune in poviat (positions 5-6);

  • out_rodz_gmi (text, 1) and out_rodzaj_gminy (text, 50) – numerical and verbal designation of the type of gmina, respectively: 1 – urban gmina, 2 – rural gmina, 4 – urban part of urban-rural gmina, 5 – rural part of urban-rural gmina;

  • out_rejon13 (text, 13) – code of the commune, along with its type (items 1-7) and statistical region (items 8-13) of the Central Statistical Office TERYT;

  • out_obwod14 (text, 14) – code of the commune, along with its type (items 1-7), statistical region (items 8-13) and census tract within this region (item 14) of the Central Statistical Office TERYT;

  • out_wsp_x (floating point number) – east longitude;

  • out_wsp_y (floating point number) – north latitude;

  • out_mieszkania (integer), out_zamieszkane (integer), out_mieszkancy ( integer) – these fields represent, respectively: the number of dwellings, the number of inhabited dwellings and the number of inhabitants of the building according to the Central Statistical Office; these and other fields concerning the building are filled in only if the address provided at the entrance has been successfully matched to the level of the building;

  • out_osoby_prawne (integer) – the number of legal persons and organizational units entered in the REGON register, which have their registered office in a given building;

  • out_popul_ (integers) – a group of columns with data on the population of buildings based on the PESEL register made available by the Ministry of Digital Affairs: miesz – inhabited dwellings, os – people, kob – women, husband – men, 25_29k – women aged 25-29, 30_34m – men aged 30-34, etc.

  • out_buildings_a_ (integer) – a group of columns with the number of different types of buildings at a distance of 1 km from the building/address, examples of building types: residential, service, public, supermarket, swimming pool, school, university, hotel, etc.

  • out_buildings_a_area_ (floating point number) – a group of columns with the area of buildings of various types at a distance of 1 km from the building/address, examples of building types: residential, service, public, supermarket, swimming pool, school, university, hotel, etc.

  • out_landuse_a_ (integer) – a group of columns with the number of different types of land cover at a distance of 1 km from the building/address, examples of land cover: forest, natural areas, industrial areas, residential areas, commercial and service areas, parks, parking lots, etc.

  • out_landuse_a_area_ (floating point number) – a group of columns with the area of various types of land cover at a distance of 1 km from the building/address, examples of land cover: forest, natural areas, industrial areas, residential areas, commercial and service areas, parks, parking lots, etc.

  • out_natural_a_ (integer) – a group of columns with the number of different types of objects of natural origin, e.g. beach, cliff, etc.

  • out_natural_a_area_ (floating point number) – a group of columns with the surface of various types of objects of natural origin, e.g. a beach, a cliff, etc.

  • out_pois_, out_pois_a_ (integer) – groups of columns with the number of POIs (points of interest) at a distance of 1 km from the building/address, examples of POIs: shop, park, school, kindergarten, café, bakery, cinema, theatre, pharmacy, gas station, supermarket, hairdresser, etc.

  • out_railways_ (integer) – a group of columns with the number of rail objects at a distance of 1 km from the building/address, examples of types: railway, tram, metro.

  • out_railways_length_ (floating-point number) – a group of columns with the length of objects at a distance of 1 km from the building/address, examples of types: rail, tram, metro.

  • out_roads_ (integer) – a group of columns with the number of roads of different types at a distance of 1 km from the building/address, examples of road types: motorway, expressway, main road, secondary road, local road, housing estate road, transport hub, bicycle path, pavement, etc.

  • out_roads_length_ (floating point number) – a group of columns with the length of different roads within 1 km of the building/address, examples of road types: motorway, road expressway, main road, secondary road, local road, housing estate road, transport hub, bicycle path, pavement, etc.

  • out_traffic_, out_transport_, out_traffic_a_, out_transport_a_ (integer) – groups of columns with the number of transport-related objects at a distance of 1 km from the building/address, examples of types of objects: tram stop, bus stop, railway station, metro station, taxi rank, bus station, parking lot, etc.

  • out_traffic_a_area_, out_transport_a_area_ (floating point number) – groups of columns with the area of transport-related objects at a distance of 1 km from the building/address, examples of types of objects: tram stop, bus stop, railway station, metro station, taxi stand, bus station, parking lot, etc.

  • out_water_a_area_ (floating point number) – a group of columns with the surface of surface water of various types at a distance of 1 km from a building/address, examples of types of objects: lake, sea, river, swamp, etc.

  • out_waterways_length_ (floating point number) – a group of columns with the length of surface water of various types at a distance of 1 km from a building/address, examples of types of objects: lake, sea, river, swamp, etc.

  • out_geoscore_ (floating point number) – a group of columns related to financial risk – default level and fraud level for individuals, JDGs and companies

  • out_average_income (floating point number) – Income index containing an estimated income index of people living in the micro-market

Appendix 2. Statuses of the results of the standardisation of address data and geocoding

The status field of the result table contains a sequence of combined entries in the form <category: result> or <category>; in terms of standardization of address data, the categories ( bold letters, first level of the list) and results ( italics, second level of the list) include the following items:

  • ambiguous assignment – entries match different types of information to a very similar extent candidates – possible results:

    • one of the similarly matched streets in one town was selected – data is assigned one of the streets;

    • one of the similarly matched localities in one municipality has been selected – data from one of the localities is assigned;

    • similarly matched localities in different municipalities – the minimum accuracy of the assignment has not been achieved, no data is assigned to a given record.

  • change of place name – the name of the place has changed since it was introduced to the processed set, the output is given with a new name of the place.

  • Street name change – the street name has changed since it was entered into the processed set, the new street name is given at the output.

  • Matching – possible outcomes:

    • apartment – an apartment with a given address has been identified;

    • building with no dwellings – the building with the address has been identified, and the building This one has no housing, and therefore the maximum possible fit has been achieved;

    • building in which there are dwellings – the building with the address has been identified, this building However, he has apartments, while the apartment number has not been recognized;

    • building with the same integer number – no building with a given address has been identified, but in a given town and on a given street there is a building with the same integer number – e.g. in the processed set there is the number 18b, which is not in the dictionary, but there is a building with the number 18; Note - the service will return the building number provided in the query. The table below presents examples of situations with DQ responses for the status in question:

Existing buildings in the dictionary

Request to DQ

Re. DQ

1, 1A, 1B

1C

1C

1A, 1B

1C

1C

1A, 1B, 1C, 1D, 1E, 1F, 1G

1Z

1Z

2B, 2D, 2E, 2F/1, 2K

2

2

  • neighbouring building – no building with a given address has been identified, but in a given town and on a given street there is a building with an integer number differing by no more than 4 from the total number of the building included in the processed data;

    • street – the street was identified, the building with the given number or the adjacent number was not found;

    • a town that has no streets – the town has been identified, the town has no streets, the building with a given number or neighbouring number has not been found;

    • locality – the locality has been identified, no street or building assigned directly to the locality with a given number or neighbouring number has been found (in some villages there are mixed addresses – some addresses are assigned to the level of the locality, and others to the street);

    • none – no match was obtained, the locality was not identified;

In addition, there are variants of the above-mentioned results for cases in which the entry matches a given candidate (town or street) to a greater extent than in the case of none, and to a lesser extent than for the other results; results indicating that the entry matches the candidate with a high degree of probability, but there is also a moderate risk of error:

  • an apartment on a probable street;

  • living in a probable locality;

  • a building in which there are no apartments, on a probable street;

  • a building in which there are no apartments, in a probable locality;

  • a building in which there are apartments, on a probable street;

  • a building in which there are dwellings in a probable locality;

  • probable street;

  • probable locality that has no streets;

  • probable locality.

  • Geocoding – possible results:

    • matched building – the coordinates of the building defined by the match status (building with a given address, neighboring building, etc.) have been assigned;

    • neighbouring building – coordinates of a building in a given town and on a given street (if the street appears in the address) with a different number, the total part of which differs from the total part of the number of a given building by no more than 4;

    • center of the street within the postal code (<category of the number of buildings in the group>) – in the case of a street for which there is more than one postal code and the postal code is known, the coordinates of the center of the group of buildings located on a specific street and having a given code are assigned; in brackets the category of the number of buildings constituting such a defined group is given:

      • less than 20 buildings;

      • 20-49 buildings;

      • 50 and more buildings;

    • the centre of a street in a part of a locality (<the category of the number of buildings in a group>) – in the case of localities that have separate parts, and a given street is located in more than one of them and it was possible to determine which part it is, the coordinates of the centre of the group of buildings located on a given street and in a specific part of the locality were assigned (e.g. the centre of Puławska Street in a part of Mokotów); in brackets is the category of the number of buildings constituting such a defined group, as described above;

    • center of the street (<category of the number of buildings in the group>) – the coordinates of the center of the group of buildings located on a given street are assigned; the category of the number of buildings constituting such a defined group, as described above, is given in brackets;

    • center of the district – coordinates of the center of the census tract of the Central Statistical Office have been assigned – the territory of Poland is divided into over 180 thousand such districts based on the distribution of population, a typical district is inhabited by about 200 people;

    • center of the region – coordinates of the center of the statistical region of the Central Statistical Office – the territory of Poland have been assigned is divided into more than 30 thousand such districts based on the distribution of population,

    • Town center – coordinates of the village center have been assigned.

PreviousHow to install license keyNextAlgolytics APIs

Last updated 2 months ago

Coordinates Geographic are Calculated according to System Reference WGS 84 ()

More information about the TERC, BREC, SIMC and ULIC systems can be found on the website of the Central Statistical Office:

https://pl.wikipedia.org/wiki/System_odniesienia_WGS_84
http://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/ogolna_charakterystyka_systemow_
rejestru/ogolna_charakterystyka_systemow_rejestru.aspx?contrast=default
Image containing text, screenshot, software, computer icon Description auto-generated
http://dataquality.pl/wp-content/uploads/2016/11/nowe_zadanie_4.png
http://dataquality.pl/wp-content/uploads/2016/11/raport_1.png
http://dataquality.pl/wp-content/uploads/2016/11/dane_slownikowe_2.png
http://dataquality.pl/wp-content/uploads/2016/11/moje_konto_1.png
http://dataquality.pl/wp-content/uploads/2016/11/zmien_haslo_1.png