DataQuality [web app]
What is Algolytics Data Quality?
Algolytics Data Quality is a tool for standardizing and enriching customer data. An application running in batch mode is used to detect, monitor, and troubleshoot the data. The DataQuality.pl application provides an interface (accessible through a web browser) that allows you to load data into the application, define a standardization task and download the results.
Algolytics Data Quality Functionalities
data profiling
data cleansing (including parsing, standardization, deduplication)
conducting statistical analyses
data enrichment – matching data from different databases, adding new information about data
geocoding and data visualization
dictionary data retrieval
Work with Algolytics Data Quality
Using the app consists of two essential steps – adding a file to clean and download the result. The detailed steps of action are described below.
Load the file with the database to be cleaned.
Check the information about the uploaded file.
Select the roles for the columns from the file.
Add a task.
Once the task is complete, download the data and generate the report.
Main Menu View
Fig. 1 Main application window
Description of the individual items of the main menu:
Start
Home page
Tasks
if you want to view the operations carried out so far or continue the work you have started
New task
if you want to perform an operation on new data
Data dictionaries
If you want to download dictionary data
Moje konto
if you want to edit your account details
Documentation
If you want to read a detailed description of the individual functions
Application
User Mail
after clicking on it, you can go to the My Account, Change Password or Log Out screen
Version
New task
When you select New Task from the main menu of the application, the wizard will open defining a new task.
Task and file name selection
The first thing you need to do is to specify the name of the task and load the data file. To do this, click the Feed button and select the file you want. Once selected, click the Load button. You can change the name of the task in the Task name field.
The following data formats are supported: CSV and XLSX.
Fig. 2 The first step of defining a new task
File Information
Once the file is loaded, an area will appear with information about the loaded file.
Fig. 3 Information about the loaded file when defining a new task
It presents the following information:
Name
filename
Size
file size, maximum possible size is 2 GB
Number of records
Number of records in the loaded file
Status
The state of the file after the initial verification. Possible values:
OK - a condition indicating no contraindications to continue
Note - indicates a potential lack of funds on the account, a message will be displayed below the table with information about the file: Processed file may be blocked if the fee exceeds the funds on the account
Error - prevents the task from continuing with creation. A simplified error message will be displayed below the file information table
Actions
a button to delete the file; When you delete a file, the file information area
will be hidden
Separator detected
detected data separator in the loaded file
Choose a different separator
field in which the user can point to another separator, which will reload the preview of the first lines of the loaded file
Text Qualifier
qualifier (character) pointing to text that will not be treated as
separator
File encoding
information about the detected encoding of the uploaded file
This area also displays a preview of the first 5 lines of the uploaded file. If you change the separator or qualifier of the text, the preview changes.
After verifying the correctness of the uploaded file, click Next.
Defining variables
In the next step, you need to indicate the roles for the data contained in the file. The screen will display a table where you need to set a role for each column found in the file.
Fig. 4 Defining the task of standardization and data enrichment
The table on this screen contains the following information:
Field in file – column name
Column Type
Role for column
You can choose from the following types and roles for columns:
Address variable
KOD_POCZTOWY CITY
ULICA_NUMER_DOMU_I_MIESZKANIA STREET
NUMER_DOMU NUMER_MIESZKANIA NUMER_DOMU_I_MIESZKANIA VOIVODESHIP
COUNTY
MUNICIPALITY
Name variable
NAME SURNAME
NAZWA_PODMIOTU
IMIE_I_NAZWISKO
Contact variable
EMAIL1 EMAIL2 TELEFON1
TELEFON2
Variable of people/companies
PESEL
NIP REGON
Date variable
DATA_URODZENIA
CZAS_AKTUALIZACJI
Unspecified variable
DANE_OGOLNE - the variable will be analyzed under
all possible information
Identifier
ID_REKORDU
Neutral variable
REWRITE – copies the variable to the output file
SKIP – does not copy the variable to the output file
A prerequisite for correct parameter validation is the assignment of the following type to each variable and role. Variable roles cannot be repeated.
In the case of clearing address data, the CITY OR DANE_OGOLNE.
Below the table with the selection of types and roles for columns, it is possible to select additional data to include in the output file.
Possible options are:
Information about the building - number of units, type, population and demographic structure
Building surroundings – the number of POIs (points of interest) or other objects, e.g. roads, built-up areas, etc. in a buffer of 1 km from the geocoded address data
TERYT IDs
Financial risk (requires the TERYT identifiers option) – default risk, fraud risk and the average income of the inhabitants
Diagnostic information—Address match and geocoding status
Geocoding – geographical coordinates of the building
Deduplication
Incremental deduplication
Deduplication requires ID_REKORDU and CZAS_AKTUALIZACJI in the input variables. Incremental deduplication collects customer data and confronts each new chunk of data with data from previous calls.
At the very bottom of the page, you will find information about the maximum price of the task and 2 buttons: Previous step, which you can use to return to the file information screen, and Add task, which you can use to add a task to be performed.
Clicking Add Task adds a task to be performed and takes you to the task screen. To download the results file, click the Download button. To view the task report, click View.
The first 5000 rows processed by the User are free. After using this number, the system requires you to fill in the invoicing data on the My Account screen and top up your account.
Tasks
On the Tasks screen, you have access to all tasks that have been completed, are in progress, or are waiting to be performed. The table also includes dictionary datasets, if they were retrieved by the user. The table can be sorted using headers; by default, it is sorted by the End Date column.
You can filter the table by typing or selecting the appropriate values from the fields below the table headers. You can remove filters with the Clear filters button above the table. There is also an option to select the number of tasks displayed in the table.
Below the table, there is a New task button, which takes you to the screen of creating a new task.
Fig. 5 Tasks screen view
Description of the Tasks table columns:
Task name
name defined by the user in the process of creating a new job, in the case of downloading dictionary data, the name is based on the data range selected by the user
Task status
Available job statuses: New, In Queue, In Progress, Completed
End date
the date and time when the system finished executing the task; for
The status of the currently running tasks is displayed: New, Pending, Running
Number of records
Number of records in the file
Fee
Fee for the performance of the task
Results
A link to the file processed by the system or to the corresponding dictionary data . If the file has been stored on the system (30 days), the Results column will show the word Expired. If the file is blocked due to lack of funds, the text Blocked (for credited settlements) or the Pay button will be displayed, which redirects to the My Account screen (in the case of a credit settlement)
method PrePaid)
Report
for completed tasks, the field contains a link to the Task report page,
which provides statistics on the quality of standardisation
Actions
There are two actions available:
Undo that removes the job from the list and from the queue for processing. The selected file will be deleted and the data will not be processed. This action is available only for tasks that have not started.
Delete, which will remove the task from the user's list and delete the data file from the server. This action is only available for completed tasks.
Note: The system automatically deletes files that are older than 30 days. Such a task will receive the status Expired in the Results column (no link to download the results is available).
Data Scrubbing Results
You can view the results file by clicking the Download button on the Tasks screen. A detailed description of all columns in the output file can be found in Appendix 1.
The file is a database table (.csv) that contains the following columns (except for the neutrals for which the role of the PRESCRIBER is defined):
out_miejscowosc
Town name
out_czesc_miejsc
name of the part of the town
out_ulica
street name created by Algolytics based on the CSO street name fields by the elimination of repetitions and the features of ul.
out_ulica_cecha
the feature name of the street GUS: ul., al., pl. etc.
out_ulica_nazwa_1
the main part of the name of the street of the Central Statistical Office (e.g. the name of the patron)
out_ulica_nazwa_2
the first, less important part of the street name, if it exists (e.g. the name of the
patron)
out_kod
Zip Code
out_nr_domu
house (building) number
out_nr_miesz
apartment number
out_gmina
Name of the municipality
out_powiat
County name
out_wojewodztwo
name of the voivodeship
out_mieszkania
number of apartments in the building
out_osoby_prawne
number of legal persons that have their registered office in the building
out_adr_id
address identifier, time-invariant
If you select Building Information, columns are added to enrich the input data with the following information:
out_zamieszkane
number of inhabited dwellings according to the Central Statistical Office (GUS)
out_mieszkancy
number of inhabitants according to the Central Statistical Office
out_popul_miesz
number of inhabited dwellings according to PESEL
out_popul_os
number of inhabitants according to PESEL
out_popul_kob
number of women according to PESEL
out_popul_mez
number of men according to PESEL
out_popul_
a group of columns with data on the population of the building's residents based on the PESEL register: kob – women, husband – men, 25_29k – women aged
25-29 years old, 30_34m – men aged 30-34, etc.
out_urzad_s
Name of the tax office for the address
If you select Diagnostic information, a out_status column is added that provides information about the match and geocoding quality of the address data. A detailed description of the possible statuses can be found in Appendix 2.
If you select Geocoding, the following columns are added:
out_wsp_x – longitude
out_wsp_y – latitude
If you select the TERYT identifiers option, the following columns are added:
out_sym_msc
identifier of the primary town in the SIMC GUS system
out_sym_cz_msc
identifier of the basic town part in the SIMC CSO system
out_sym_ul
street identifier in the ULIC GIS system
out_gmina06
Municipal identifier in the TERC GUS system
out_rodz_gmi
identifier of the type of municipality in the TERC CSO system
out_rodzaj_gminy
Name of the type of municipality
out_rejon13
statistical region code from the BREC system of statistical districts and census districts
out_obwod14
census tract code from the BREC system of statistical districts and census districts
A census tract is a spatial unit separated for censuses and other statistical surveys according to the number of dwellings and inhabitants. A statistical region is a spatial unit of statistical data aggregation consisting of several, no more than nine census districts.
If you select Financial risk , the following columns are added:
out_geoscore_prv_pd_level_I
Probability of PD (default) determined based on
location of the building for individuals
out_geoscore_prv_pd_level_II
Probability of PD (default) determined based on
location of the building and its features for individuals
out_geoscore_bus_pd_level_I
PD probability (default) determined based on the location of the building for companies with PESEL number, e.g. JDG, civil partnerships
out_geoscore_bus_pd_level_II
PD probability (default) determined based on the location of the building and its characteristics for companies with PESEL, e.g. JDG, civil partnerships
out_geoscore_bus2_pd_level_I
PD probability (default) determined based on the location of the building for companies that do not have a PESEL number, e.g. a limited liability company
out_geoscore_bus2_pd_level_II
PD probability (default) determined based on the location of the building and its characteristics for companies that do not have a PESEL number, e.g. a limited liability company
out_geoscore_prv_pf_level_I
PF (fraud) probability determined based on the location of the building for individuals
out_geoscore_prv_pf_level_II
PF (fraud) probability determined based on the location of the building and its characteristics for individuals
out_geoscore_bus_pf_level_I
PF (fraud) probability determined based on the location of the building for companies with PESEL number, e.g. JDG, civil partnerships
out_geoscore_bus_pf_level_II
PF (fraud) probability determined based on the location of the building and its characteristics for companies with PESEL number , e.g. JDG, civil partnerships
out_geoscore_bus2_pf_level_I
The probability of PF (fraud) determined based on the location of the building for companies that do not have a PESEL number, e.g. a limited liability company
out_geoscore_bus2_pf_level_II
PF (fraud) probability determined based on the location of the building and its characteristics for companies without PESEL, e.g. limited liability company
out_avg_income
An income index containing an average estimated index
income of people living in the micro-market
out_q5_income
5 quantiles of estimated income of residents
Micro-market
out_q25_income
25 quantiles of estimated income of persons living in
Micro-market
out_q50_income
50 Quantile (median) Estimated Income People
inhabiting the micro-market
out_q75_income
75 quantiles of estimated income of persons living in
Micro-market
out_q95_income
95 quantiles of estimated income of persons living in
Micro-market
If you select Building surroundings, the following columns are added:
out_buildings_a_
Number of different types of buildings within 1 km of the building, examples of building types: residential, service, public,
supermarket, swimming pool, school, university, hotel, etc.
out_buildings_a_area_
Area of buildings of various types at a distance of 1 km from the building, examples of building types: residential, service,
public, supermarket, swimming pool, school, university, hotel, etc.
out_landuse_a_
Number of different types of land cover at a distance of 1 km, examples of land cover: forest, natural areas, industrial areas, residential areas, commercial and service areas,
parks, parking lots, etc.
out_landuse_a_area_
Area of various types of land cover at a distance of 1 km, examples of types of land cover: forest, natural areas, industrial areas, residential areas, commercial areas
parks, parking lots, etc.
out_natural_a_
Number of different types of objects of natural origin
out_natural_a_area_
Surface area of various types of objects of natural origin
out_pois_
Number of POIs (points of interest) within 1 km from the building, examples of POIs: shop, park, school, kindergarten, café, bakery, cinema, theatre, pharmacy, petrol station,
supermarket, fryzjer itd.
out_railways_
Number of rail structures at a distance of 1 km, examples of types:
kolej, tramwaj, metro
out_railways_length_
Length of rail objects at a distance of 1 km, examples
Types: rail, tram, metro
out_roads_
Number of roads of different types within 1 km of the building, examples of road types: motorway, expressway, main road, secondary road, local road,
housing estate road, transport hub, bicycle path, pavement
etc.
out_roads_length_
Length of roads of various types at a distance of 1 km from the building, examples of road types: motorway, expressway, main road, secondary road, local road,
housing estate road, transport hub, bicycle path, pavement
etc.
out_traffic_, out_transport_, out_traffic_a_, out_transport_a_
Number of transport-related facilities within 1 km of the building, examples of object types: tram stop, bus stop, railway station, metro station, taxi rank, bus station, parking lot, etc.
out_traffic_a_area_, out_transport_a_area_
Area of transport-related facilities at a distance of 1 km from the building, examples of types of facilities: tram stop, bus stop, railway station, metro station, taxi stand,
bus station, parking lot, etc.
out_water_a_area_
Surface water surface of various types at a distance of 1 km from the building, examples of types of facilities: lake, sea,
river, swamp, etc.
out_waterways_length_
The length of surface water of various types at a distance of 1 km from the building, examples of types of objects: lake, sea, river, swamp, etc.
Job report
For each completed job, you can view a report on the quality of standardization and geocoding by clicking the View button on the Jobs screen.
Fig. 6 Task report with a description of the results of standardization
The report includes the following information:
Results
Status
File Cleanup Status
Error description
Short description of the error
Input Records
Number of records in the input file
Processed records
Number of records processed
Records to be billed
Number of records to bill
Records Skipped
Number of records skipped
City level
Number of records cleared to city level
Street level
Number of records cleared to street level
Building level
Number of records cleared to the building level
Apartment level
Number of records cleared to the premises level
Extracted data
Correctly extracted names
number of company name records for which
the name of the business owner has been extracted
Correctly extracted names
number of company name records for which
the name of the owner of the company has been extracted
Correctly extracted company names
number of company name records for which
the legal form of the company has been distinguished
Undistinguished names
Number of records for which the name could not be confirmed
Non-isolated names
Number of records for which the name could not be confirmed
Retrieving dictionary data
The user can download data from the address database of buildings after selecting the Data dictionaries from the main menu.
There are two ways to specify the scope of data to be downloaded:
Downloading for entire areas of a province, county or municipality
Select a province or an area of the entire country
After selecting a state, select a county or an area of the entire province
Once you've selected a county, select a municipality or an area of the entire county
Download by zip code
Enter the entire postal code - XX-XXX or XXXXX formats are acceptable
Enter the code pattern: XX* (e.g. 03*) – then data for all postal codes starting with XX digits are downloaded
After specifying the area, the User can calculate the price of downloading dictionary data from the selected area by using the Calculate fee button.
Fig. 7 Dictionary data to download
To download dictionary data, click the Download Data button. The app automatically changes to the Tasks screen. The task list will show an entry for dictionary data along with its status. If the data is available, it is possible to download it by clicking on the Download button.
Fig. 8 Dictionary data ready for download on the task list
Note: Before the actual data extraction is started for the user, after clicking Downloading data, the system will calculate the fee. If the fee is greater than the amount of funds in the User's account, the following message will be displayed: The fee for dictionary data is greater than the amount of funds in the account. Top up your account. Click the Top up button to top up your account and be able to download the dictionary data of your choice.
The resulting table is a . CSV file that contains the following columns:
id
Building ID
sym_msc
symbol of the town of GUS TERYT
Location
Town name
sym_ul
symbol of the street GUS TERYT
feature
the feature name of the street GUS: ul., al., pl. etc.
nazwa_1
the main part of the name of the street of the Central Statistical Office (e.g. the name of the patron)
nazwa_2
the first, less important part of the street name, if it exists (e.g. the name of the patron)
nr_calk
numerical part of the building number
street
street name created by Algolytics based on the street name fields of the Central Statistical Office by eliminating repetitions and ul.
nr_domu
Building No.
x
longitude
y
latitude
wojewodztwo
name of the voivodeship
county
County name
municipality
Name of the municipality
Municipality06
TERYT identifier of the municipality
Area13
statistical region identifier
Circuit14
census tract ID
Apartment
number of apartments in the building
Inhabited
number of inhabited dwellings according to the Central Statistical Office (GUS)
Residents
number of inhabitants according to the Central Statistical Office
Status
Geocoding Status: Exact, Adjacent Building, Perimeter Center
code
Zip Code
Detailed descriptions of the columns can be found in Appendix 1.
My account
The My Account screen displays information about billing, API access, user data, and invoicing data.
Fig. 9 My Account screen view
The screen is divided into 4 areas: Billing, JSON API Access, Customer Data, Data to Invoicing.
The Billing area contains the following information:
Current Credit/ Available Funds
the sum of credited fees charged to the User when settling with the crediting method/the currently available amount of cash, topped up via the on-line payment system, if the User settles using the PrePaid method
Fees charged
the sum of all fees charged so far
Receivables – last month
the sum of unpaid fees charged to the User for the previous billing period, only for Users settling using the credit method
Processed records
sum of all records from all user-loaded
Files
Including cleared records
sum of all records purged successfully that
generated fees
Last Supply
date of the last credit in the case of a User settling with the PrePaid method, in the case of settling with the credit method, this field is empty
Final task
Date of the last user task performed by the system
Payment of the amount due/ Top-up of the account with funds
the amount due and the Pay/field for entering the top-up amount and the Top-up account button. The buttons direct the User to the PayU system, where they need to make an online payment
You can also use dataquality.pl applications through the API service. The JSON API Access area includes the following information:
API access key – a 40-character key used with the use of API to authorize and identify the User to perform tasks
Generate new key - allows you to generate a new API key, the previous key will no longer be active and will no longer be available in the system
API documentation – a link to the API documentation dataquality.pl
Customer data area contains the following information:
Gender – Water/Paan
Name
Surname
Company
Numer by phone
Email
The Invoicing area contains the following information:
Name and surname/Name of the entity
TIN
Street
House number
Apartment number
Zip code
City
Each of the data from the two above areas (except for the e-mail address) can be changed and confirmed by clicking the Save data button.
Changing your password
You can change your password by clicking the email address button in the top menu and then selecting Change Password.
Fig. 10 Change Password Screen View
To change your password, enter your current password, type your new password, and type your new password again, and then click Change Password.
If the password has not been changed for 30 days, the system will force the User to change the password. After logging in, the User will be taken to the Change Password screen, which will require you to enter a new password and confirm it. The new password cannot be the same as the previous three passwords. Only after changing the password and logging in, the system will redirect to the main screen as standard.
Appendix 1. Standard columns in the output file
Below are the column names and their definitions, and in parentheses - the types with the maximum suggested number of characters for text variables (set with a margin where it may be useful). The following list includes columns that are included in the output file of the Address Standardization process, both by default and by selecting the appropriate enrichment option, which are available to all DataQuality Algolytics users.
out_id (integer) – the number of the record in the set;
out_sym_msc_pod (text, 7) – symbol of the basic locality of the Central Statistical Office TERYT;
out_miejscowosc (text, 75) – the name of the (basic) place;
out_sym_cz_msc (text, 7) – symbol of a part of the locality of the Central Statistical Office TERYT; if no part of the locality has been distinguished that is not identical to the basic locality, there will be an empty text in this field;
out_czesc_miejsc (text, 75) – the name of the part of the locality distinguished by the Central Statistical Office, with the reservation as above;
out_sym_ul (tekst, 5) – symbol nazwy ulicy GUS TERYT;
out_ulica (text, 150) – street name created by Algolytics on the basis of the CSO street name fields by the elimination of repetitions and features of ul.;
out_ulica_cecha (tekst, 15) – cecha nazwa ulicy GUS: ul., al., pl. itp.;
out_ulica_nazwa_1 (text, 150) – the main part of the name of the street of the Central Statistical Office (e.g. the name of the patron);
out_ulica_nazwa_2 (text, 100) – the first, less important part of the street name, if it exists (e.g. the name patron);
out_kod (text, 6) – postal code in the format: two digits, dash, three digits;
out_nr_domu (tekst, 10) – numer domu, budynku;
out_nr_miesz (text, 15) – apartment number;
out_adr_id (text, 35) – an identifier of an address, invariant over time;
out_status (text, 500) – a sequence of entries in the form of <operacja_lub_informacja: result> or <information>, for example: '<match: building><geocoding: matched building>'; more about address standardization statuses in point 2;
out_gmina (tekst, 50) – nazwa gminy;
out_powiat (text, 50) – the name of the county;
out_wojewodztwo (text, 50) – name of the voivodeship;
out_gmina06 (text, 6) – a six-character code of the commune, consisting of the codes: voivodship (positions 1-2 ), powiat in voivodship (positions 3-4), commune in poviat (positions 5-6);
out_rodz_gmi (text, 1) and out_rodzaj_gminy (text, 50) – numerical and verbal designation of the type of gmina, respectively: 1 – urban gmina, 2 – rural gmina, 4 – urban part of urban-rural gmina, 5 – rural part of urban-rural gmina;
out_rejon13 (text, 13) – code of the commune, along with its type (items 1-7) and statistical region (items 8-13) of the Central Statistical Office TERYT;
out_obwod14 (text, 14) – code of the commune, along with its type (items 1-7), statistical region (items 8-13) and census tract within this region (item 14) of the Central Statistical Office TERYT;
out_wsp_x (floating point number) – east longitude;
out_wsp_y (floating point number) – north latitude;
out_mieszkania (integer), out_zamieszkane (integer), out_mieszkancy ( integer) – these fields represent, respectively: the number of dwellings, the number of inhabited dwellings and the number of inhabitants of the building according to the Central Statistical Office; these and other fields concerning the building are filled in only if the address provided at the entrance has been successfully matched to the level of the building;
out_osoby_prawne (integer) – the number of legal persons and organizational units entered in the REGON register, which have their registered office in a given building;
out_popul_ (integers) – a group of columns with data on the population of buildings based on the PESEL register made available by the Ministry of Digital Affairs: miesz – inhabited dwellings, os – people, kob – women, husband – men, 25_29k – women aged 25-29, 30_34m – men aged 30-34, etc.
out_buildings_a_ (integer) – a group of columns with the number of different types of buildings at a distance of 1 km from the building/address, examples of building types: residential, service, public, supermarket, swimming pool, school, university, hotel, etc.
out_buildings_a_area_ (floating point number) – a group of columns with the area of buildings of various types at a distance of 1 km from the building/address, examples of building types: residential, service, public, supermarket, swimming pool, school, university, hotel, etc.
out_landuse_a_ (integer) – a group of columns with the number of different types of land cover at a distance of 1 km from the building/address, examples of land cover: forest, natural areas, industrial areas, residential areas, commercial and service areas, parks, parking lots, etc.
out_landuse_a_area_ (floating point number) – a group of columns with the area of various types of land cover at a distance of 1 km from the building/address, examples of land cover: forest, natural areas, industrial areas, residential areas, commercial and service areas, parks, parking lots, etc.
out_natural_a_ (integer) – a group of columns with the number of different types of objects of natural origin, e.g. beach, cliff, etc.
out_natural_a_area_ (floating point number) – a group of columns with the surface of various types of objects of natural origin, e.g. a beach, a cliff, etc.
out_pois_, out_pois_a_ (integer) – groups of columns with the number of POIs (points of interest) at a distance of 1 km from the building/address, examples of POIs: shop, park, school, kindergarten, café, bakery, cinema, theatre, pharmacy, gas station, supermarket, hairdresser, etc.
out_railways_ (integer) – a group of columns with the number of rail objects at a distance of 1 km from the building/address, examples of types: railway, tram, metro.
out_railways_length_ (floating-point number) – a group of columns with the length of objects at a distance of 1 km from the building/address, examples of types: rail, tram, metro.
out_roads_ (integer) – a group of columns with the number of roads of different types at a distance of 1 km from the building/address, examples of road types: motorway, expressway, main road, secondary road, local road, housing estate road, transport hub, bicycle path, pavement, etc.
out_roads_length_ (floating point number) – a group of columns with the length of different roads within 1 km of the building/address, examples of road types: motorway, road expressway, main road, secondary road, local road, housing estate road, transport hub, bicycle path, pavement, etc.
out_traffic_, out_transport_, out_traffic_a_, out_transport_a_ (integer) – groups of columns with the number of transport-related objects at a distance of 1 km from the building/address, examples of types of objects: tram stop, bus stop, railway station, metro station, taxi rank, bus station, parking lot, etc.
out_traffic_a_area_, out_transport_a_area_ (floating point number) – groups of columns with the area of transport-related objects at a distance of 1 km from the building/address, examples of types of objects: tram stop, bus stop, railway station, metro station, taxi stand, bus station, parking lot, etc.
out_water_a_area_ (floating point number) – a group of columns with the surface of surface water of various types at a distance of 1 km from a building/address, examples of types of objects: lake, sea, river, swamp, etc.
out_waterways_length_ (floating point number) – a group of columns with the length of surface water of various types at a distance of 1 km from a building/address, examples of types of objects: lake, sea, river, swamp, etc.
out_geoscore_ (floating point number) – a group of columns related to financial risk – default level and fraud level for individuals, JDGs and companies
out_average_income (floating point number) – Income index containing an estimated income index of people living in the micro-market
Appendix 2. Statuses of the results of the standardisation of address data and geocoding
The status field of the result table contains a sequence of combined entries in the form <category: result> or <category>; in terms of standardization of address data, the categories ( bold letters, first level of the list) and results ( italics, second level of the list) include the following items:
ambiguous assignment – entries match different types of information to a very similar extent candidates – possible results:
one of the similarly matched streets in one town was selected – data is assigned one of the streets;
one of the similarly matched localities in one municipality has been selected – data from one of the localities is assigned;
similarly matched localities in different municipalities – the minimum accuracy of the assignment has not been achieved, no data is assigned to a given record.
change of place name – the name of the place has changed since it was introduced to the processed set, the output is given with a new name of the place.
Street name change – the street name has changed since it was entered into the processed set, the new street name is given at the output.
Matching – possible outcomes:
apartment – an apartment with a given address has been identified;
building with no dwellings – the building with the address has been identified, and the building This one has no housing, and therefore the maximum possible fit has been achieved;
building in which there are dwellings – the building with the address has been identified, this building However, he has apartments, while the apartment number has not been recognized;
building with the same integer number – no building with a given address has been identified, but in a given town and on a given street there is a building with the same integer number – e.g. in the processed set there is the number 18b, which is not in the dictionary, but there is a building with the number 18; Note - the service will return the building number provided in the query. The table below presents examples of situations with DQ responses for the status in question:
Existing buildings in the dictionary
Request to DQ
Re. DQ
1, 1A, 1B
1C
1C
1A, 1B
1C
1C
1A, 1B, 1C, 1D, 1E, 1F, 1G
1Z
1Z
2B, 2D, 2E, 2F/1, 2K
2
2
neighbouring building – no building with a given address has been identified, but in a given town and on a given street there is a building with an integer number differing by no more than 4 from the total number of the building included in the processed data;
street – the street was identified, the building with the given number or the adjacent number was not found;
a town that has no streets – the town has been identified, the town has no streets, the building with a given number or neighbouring number has not been found;
locality – the locality has been identified, no street or building assigned directly to the locality with a given number or neighbouring number has been found (in some villages there are mixed addresses – some addresses are assigned to the level of the locality, and others to the street);
none – no match was obtained, the locality was not identified;
In addition, there are variants of the above-mentioned results for cases in which the entry matches a given candidate (town or street) to a greater extent than in the case of none, and to a lesser extent than for the other results; results indicating that the entry matches the candidate with a high degree of probability, but there is also a moderate risk of error:
an apartment on a probable street;
living in a probable locality;
a building in which there are no apartments, on a probable street;
a building in which there are no apartments, in a probable locality;
a building in which there are apartments, on a probable street;
a building in which there are dwellings in a probable locality;
probable street;
probable locality that has no streets;
probable locality.
Geocoding – possible results:
matched building – the coordinates of the building defined by the match status (building with a given address, neighboring building, etc.) have been assigned;
neighbouring building – coordinates of a building in a given town and on a given street (if the street appears in the address) with a different number, the total part of which differs from the total part of the number of a given building by no more than 4;
center of the street within the postal code (<category of the number of buildings in the group>) – in the case of a street for which there is more than one postal code and the postal code is known, the coordinates of the center of the group of buildings located on a specific street and having a given code are assigned; in brackets the category of the number of buildings constituting such a defined group is given:
less than 20 buildings;
20-49 buildings;
50 and more buildings;
the centre of a street in a part of a locality (<the category of the number of buildings in a group>) – in the case of localities that have separate parts, and a given street is located in more than one of them and it was possible to determine which part it is, the coordinates of the centre of the group of buildings located on a given street and in a specific part of the locality were assigned (e.g. the centre of Puławska Street in a part of Mokotów); in brackets is the category of the number of buildings constituting such a defined group, as described above;
center of the street (<category of the number of buildings in the group>) – the coordinates of the center of the group of buildings located on a given street are assigned; the category of the number of buildings constituting such a defined group, as described above, is given in brackets;
center of the district – coordinates of the center of the census tract of the Central Statistical Office have been assigned – the territory of Poland is divided into over 180 thousand such districts based on the distribution of population, a typical district is inhabited by about 200 people;
center of the region – coordinates of the center of the statistical region of the Central Statistical Office – the territory of Poland have been assigned is divided into more than 30 thousand such districts based on the distribution of population,
Town center – coordinates of the village center have been assigned.
Last updated