# DataQuality \[web app]

## What is Algolytics Data Quality? <a href="#bookmark0" id="bookmark0"></a>

Algolytics Data Quality is a tool for standardizing and enriching customer data. An application running in batch mode is used to detect, monitor, and troubleshoot the data. The DataQuality.pl application provides an interface (accessible through a web browser) that allows you to load data into the application, define a standardization task and download the results.

### Algolytics Data Quality Functionalities <a href="#bookmark1" id="bookmark1"></a>

* data profiling
* data cleansing (including parsing, standardization, deduplication)
* conducting statistical analyses
* data enrichment – matching data from different databases, adding new information about data
* geocoding and data visualization
* dictionary data retrieval

### Work with Algolytics Data Quality

Using the app consists of two essential steps – adding a file to clean and download the result. The detailed steps of action are described below.

1. Load the file with the database to be cleaned.
2. Check the information about the uploaded file.
3. Select the roles for the columns from the file.
4. Add a task.
5. Once the task is complete, download the data and generate the report.

### Main Menu View <a href="#bookmark4" id="bookmark4"></a>

![](/files/1edJwJZzx6QYcixQ07tY)

*Fig. 1 Main application window*

Description of the individual items of the main menu:

| Start             | Home page                                                                                       |
| ----------------- | ----------------------------------------------------------------------------------------------- |
| Tasks             | if you want to view the operations carried out so far or continue the work you have started     |
| New task          | if you want to perform an operation on new data                                                 |
| Data dictionaries | If you want to download dictionary data                                                         |
| Moje konto        | if you want to edit your account details                                                        |
| Documentation     | <p>If you want to read a detailed description of the individual functions</p><p>Application</p> |
| User Mail         | after clicking on it, you can go to the My Account, Change Password or Log Out screen           |
| Version           |                                                                                                 |

## New task

When you select *New Task* from the main menu of the application, the wizard will open defining a new task.

### Task and file name selection <a href="#bookmark5" id="bookmark5"></a>

The first thing you need to do is to specify the name of the task and load the data file. To do this, click the *Feed button* and select the file you want. Once selected, click the *Load button*. You can change the name of the task in the Task *name field*.

**The following data formats are supported: CSV and XLSX.**

![http://dataquality.pl/wp-content/uploads/2016/11/nowe\_zadanie\_1.png](/files/DH7J1Bcmy1qaeCMs9RWT)

*Fig. 2 The first step of defining a new task*

### File Information <a href="#bookmark6" id="bookmark6"></a>

Once the file is loaded, an area will appear with information about the loaded file.

![](/files/lf8nTpuEok5bmIV933OD)

*Fig. 3 Information about the loaded file when defining a new task*

It presents the following information:

| Name                         | filename                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Size                         | file size, maximum possible size is 2 GB                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| Number of records            | Number of records in the loaded file                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| Status                       | <p>The state of the file after the initial verification. Possible values:</p><p>OK - a condition indicating no contraindications to continue</p><p>Note - indicates a potential lack of funds on the account, a message will be displayed below the table with information about the file: Processed file may be blocked if the fee exceeds the funds on the account</p><p>Error - prevents the task from continuing with creation. A simplified error message will be displayed below the file information table</p> |
| Actions                      | <p>a button to delete the file; When you delete a file, the file information area</p><p>will be hidden</p>                                                                                                                                                                                                                                                                                                                                                                                                            |
| Separator detected           | detected data separator in the loaded file                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| Choose a different separator | field in which the user can point to another separator, which will reload the preview of the first lines of the loaded file                                                                                                                                                                                                                                                                                                                                                                                           |
| Text Qualifier               | <p>qualifier (character) pointing to text that will not be treated as</p><p>separator</p>                                                                                                                                                                                                                                                                                                                                                                                                                             |
| File encoding                | information about the detected encoding of the uploaded file                                                                                                                                                                                                                                                                                                                                                                                                                                                          |

This area also displays a preview of the first 5 lines of the uploaded file. If you change the separator or qualifier of the text, the preview changes.

After verifying the correctness of the uploaded file, click *Next*.

### Defining variables <a href="#bookmark7" id="bookmark7"></a>

In the next step, you need to indicate the roles for the data contained in the file. The screen will display a table where you need to set a role for each column found in the file.

![Image containing text, screenshot, software, computer icon Description auto-generated](/files/Cqf3wMNs42s6a5ae9S2K)

*Fig. 4 Defining the task of standardization and data enrichment*

The table on this screen contains the following information:

* *Field in file –* column name
* *Column Type*
* *Role for column*

You can choose from the following types and roles for columns:

| Address variable             | <p>KOD\_POCZTOWY CITY</p><p>ULICA\_NUMER\_DOMU\_I\_MIESZKANIA STREET</p><p>NUMER\_DOMU NUMER\_MIESZKANIA NUMER\_DOMU\_I\_MIESZKANIA VOIVODESHIP</p><p>COUNTY</p><p>MUNICIPALITY</p> |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Name variable                | <p>NAME SURNAME</p><p>NAZWA\_PODMIOTU</p><p>IMIE\_I\_NAZWISKO</p>                                                                                                                   |
| Contact variable             | <p>EMAIL1 EMAIL2 TELEFON1</p><p>TELEFON2</p>                                                                                                                                        |
| Variable of people/companies | <p>PESEL</p><p>NIP REGON</p>                                                                                                                                                        |
| Date variable                | <p>DATA\_URODZENIA</p><p>CZAS\_AKTUALIZACJI</p>                                                                                                                                     |
| Unspecified variable         | <p>DANE\_OGOLNE - the variable will be analyzed under</p><p>all possible information</p>                                                                                            |

| Identifier       | ID\_REKORDU                                                                                                        |
| ---------------- | ------------------------------------------------------------------------------------------------------------------ |
| Neutral variable | <p>REWRITE – copies the variable to the output file</p><p>SKIP – does not copy the variable to the output file</p> |

A prerequisite for correct parameter validation is the assignment of the following type to each variable and role. Variable roles cannot be repeated.

In the case of clearing address data, the\
CITY OR DANE\_OGOLNE.

Below the table with the selection of types and roles for columns, it is possible to select additional data to include in the output file.

Possible options are:

* Information about the building - number of units, type, population and demographic structure
* Building surroundings – the number of POIs (points of interest) or other objects, e.g. roads, built-up areas, etc. in a buffer of 1 km from the geocoded address data
* TERYT IDs
* Financial risk (requires the TERYT identifiers option) – default risk, fraud risk and the average income of the inhabitants
* Diagnostic information—Address match and geocoding status
* Geocoding – geographical coordinates of the building
* Deduplication
* Incremental deduplication

Deduplication requires ID\_REKORDU and CZAS\_AKTUALIZACJI in the input variables. Incremental deduplication collects customer data and confronts each new chunk of data with data from previous calls.

At the very bottom of the page, you will find information about the maximum price of the task and 2 buttons: *Previous step*, which you can use to return to the file information screen, and *Add task*, which you can use to add a task to be performed.

Clicking Add *Task* adds a task to be performed and takes you to the task screen. To download the results file, click the *Download* button. To view the task report, click View.

**The first 5000 rows processed by the User are free**. After using this number, the system requires you to fill in the invoicing data on the My *Account screen* and top up your account.

## Tasks <a href="#bookmark8" id="bookmark8"></a>

On the *Tasks* screen, you have access to all tasks that have been completed, are in progress, or are waiting to be performed. The table also includes dictionary datasets, if they were retrieved by the user. The table can be sorted using headers; by default, it is sorted by the End *Date column*.

You can filter the table by typing or selecting the appropriate values from the fields below the table headers. You can remove filters with the *Clear filters* button above the table. There is also an option to select the number of tasks displayed in the table.

Below the table, there is a New *task button*, which takes you to the screen of creating a new task.

![http://dataquality.pl/wp-content/uploads/2016/11/nowe\_zadanie\_4.png](/files/8osjYwco0nbJL8hX7hmK)

*Fig. 5 Tasks screen view*

Description of the Tasks table columns:

| Task name         | name defined by the user in the process of creating a new job, in the case of downloading dictionary data, the name is based on the data range selected by the user |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Task status       | Available job statuses: New, In Queue, In Progress, Completed                                                                                                       |
| End date          | <p>the date and time when the system finished executing the task; for</p><p>The status of the currently running tasks is displayed: New, Pending, Running</p>       |
| Number of records | Number of records in the file                                                                                                                                       |

| Fee     | Fee for the performance of the task                                                                                                                                                                                                                                                                                                                                                                                                        |
| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Results | <p>A link to the file processed by the system or to the corresponding dictionary data . If the file has been stored on the system (30 days), the Results column will show the word Expired. If the file is blocked due to lack of funds, the text Blocked (for credited settlements) or the Pay button will be displayed, which redirects to the My Account screen (in the case of a credit settlement)</p><p>method PrePaid)</p>          |
| Report  | <p>for completed tasks, the field contains a link to the Task report page,</p><p>which provides statistics on the quality of standardisation</p>                                                                                                                                                                                                                                                                                           |
| Actions | <p>There are two actions available:</p><p><em>Undo</em> that removes the job from the list and from the queue for processing. The selected file will be deleted and the data will not be processed. This action is available only for tasks that have not started.</p><p><em>Delete</em>, which will remove the task from the user's list and delete the data file from the server. This action is only available for completed tasks.</p> |

**Note:** The system automatically deletes files that are older than 30 days. Such a task will receive the *status* Expired in the Results column (no link to download the results is available).

#### Data Scrubbing Results

You can view the results file by clicking the *Download* button on the *Tasks screen*. A detailed description of all columns in the output file can be found in Appendix 1.

The file is a database table (.csv) that contains the following columns (except for the neutrals for which the role of the PRESCRIBER is defined):

| out\_miejscowosc     | Town name                                                                                                                       |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| out\_czesc\_miejsc   | name of the part of the town                                                                                                    |
| out\_ulica           | street name created by Algolytics based on the CSO street name fields by the elimination of repetitions and the features of ul. |
| out\_ulica\_cecha    | the feature name of the street GUS: ul., al., pl. etc.                                                                          |
| out\_ulica\_nazwa\_1 | the main part of the name of the street of the Central Statistical Office (e.g. the name of the patron)                         |
| out\_ulica\_nazwa\_2 | <p>the first, less important part of the street name, if it exists (e.g. the name of the</p><p>patron)</p>                      |
| out\_kod             | Zip Code                                                                                                                        |
| out\_nr\_domu        | house (building) number                                                                                                         |
| out\_nr\_miesz       | apartment number                                                                                                                |
| out\_gmina           | Name of the municipality                                                                                                        |
| out\_powiat          | County name                                                                                                                     |
| out\_wojewodztwo     | name of the voivodeship                                                                                                         |
| out\_mieszkania      | number of apartments in the building                                                                                            |
| out\_osoby\_prawne   | number of legal persons that have their registered office in the building                                                       |
| out\_adr\_id         | address identifier, time-invariant                                                                                              |

If you select *Building Information,* columns are added to enrich the input data with the following information:

| out\_zamieszkane | number of inhabited dwellings according to the Central Statistical Office (GUS) |
| ---------------- | ------------------------------------------------------------------------------- |
| out\_mieszkancy  | number of inhabitants according to the Central Statistical Office               |

| out\_popul\_miesz | number of inhabited dwellings according to PESEL                                                                                                                                                                      |
| ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| out\_popul\_os    | number of inhabitants according to PESEL                                                                                                                                                                              |
| out\_popul\_kob   | number of women according to PESEL                                                                                                                                                                                    |
| out\_popul\_mez   | number of men according to PESEL                                                                                                                                                                                      |
| out\_popul\_      | <p>a group of columns with data on the population of the building's residents based on the PESEL register: kob – women, husband – men, 25\_29k – women aged</p><p>25-29 years old, 30\_34m – men aged 30-34, etc.</p> |
| out\_urzad\_s     | Name of the tax office for the address                                                                                                                                                                                |

If you select *Diagnostic information,* a out\_status column is added that provides information about the match and geocoding quality of the address data. A detailed description of the possible statuses can be found in Appendix 2.

If you select *Geocoding,* the following columns are added:

* out\_wsp\_x – longitude
* out\_wsp\_y – latitude

Coordinates Geographic are Calculated according to System Reference WGS 84 (<https://pl.wikipedia.org/wiki/System_odniesienia_WGS_84>)

If you select *the TERYT identifiers option*, the following columns are added:

| out\_sym\_msc      | identifier of the primary town in the SIMC GUS system                                      |
| ------------------ | ------------------------------------------------------------------------------------------ |
| out\_sym\_cz\_msc  | identifier of the basic town part in the SIMC CSO system                                   |
| out\_sym\_ul       | street identifier in the ULIC GIS system                                                   |
| out\_gmina06       | Municipal identifier in the TERC GUS system                                                |
| out\_rodz\_gmi     | identifier of the type of municipality in the TERC CSO system                              |
| out\_rodzaj\_gminy | Name of the type of municipality                                                           |
| out\_rejon13       | statistical region code from the BREC system of statistical districts and census districts |
| out\_obwod14       | census tract code from the BREC system of statistical districts and census districts       |

A census tract is a spatial unit separated for censuses and other statistical surveys according to the number of dwellings and inhabitants. A statistical region is a spatial unit of statistical data aggregation consisting of several, no more than nine census districts.

More information about the TERC, BREC, SIMC and ULIC systems can be found on the website of the Central Statistical Office: [http://eteryt.stat.gov.pl/eTeryt/rejestr\_teryt/ogolna\_charakterystyka\_systemow\_](http://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/ogolna_charakterystyka_systemow_rejestru/ogolna_charakterystyka_systemow_rejestru.aspx?contrast=default) [rejestru/ogolna\_charakterystyka\_systemow\_rejestru.aspx?contrast=default](http://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/ogolna_charakterystyka_systemow_rejestru/ogolna_charakterystyka_systemow_rejestru.aspx?contrast=default)

If you select *Financial risk* , the following columns are added:

| out\_geoscore\_prv\_pd\_level\_I   | <p>Probability of PD (default) determined based on</p><p>location of the building for individuals</p>                                                                             |
| ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| out\_geoscore\_prv\_pd\_level\_II  | <p>Probability of PD (default) determined based on</p><p>location of the building and its features for individuals</p>                                                            |
| out\_geoscore\_bus\_pd\_level\_I   | PD probability (default) determined based on the location of the building for companies with PESEL number, e.g. JDG, civil partnerships                                           |
| out\_geoscore\_bus\_pd\_level\_II  | PD probability (default) determined based on the location of the building and its characteristics for companies with PESEL, e.g. JDG, civil partnerships                          |
| out\_geoscore\_bus2\_pd\_level\_I  | PD probability (default) determined based on the location of the building for companies that do not have a PESEL number, e.g. a limited liability company                         |
| out\_geoscore\_bus2\_pd\_level\_II | PD probability (default) determined based on the location of the building and its characteristics for companies that do not have a PESEL number, e.g. a limited liability company |
| out\_geoscore\_prv\_pf\_level\_I   | PF (fraud) probability determined based on the location of the building for individuals                                                                                           |
| out\_geoscore\_prv\_pf\_level\_II  | PF (fraud) probability determined based on the location of the building and its characteristics for individuals                                                                   |
| out\_geoscore\_bus\_pf\_level\_I   | PF (fraud) probability determined based on the location of the building for companies with PESEL number, e.g. JDG, civil partnerships                                             |
| out\_geoscore\_bus\_pf\_level\_II  | PF (fraud) probability determined based on the location of the building and its characteristics for companies with PESEL number , e.g. JDG, civil partnerships                    |

| out\_geoscore\_bus2\_pf\_level\_I  | The probability of PF (fraud) determined based on the location of the building for companies that do not have a PESEL number, e.g. a limited liability company |
| ---------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| out\_geoscore\_bus2\_pf\_level\_II | PF (fraud) probability determined based on the location of the building and its characteristics for companies without PESEL, e.g. limited liability company    |
| out\_avg\_income                   | <p>An income index containing an average estimated index</p><p>income of people living in the micro-market</p>                                                 |
| out\_q5\_income                    | <p>5 quantiles of estimated income of residents</p><p>Micro-market</p>                                                                                         |
| out\_q25\_income                   | <p>25 quantiles of estimated income of persons living in</p><p>Micro-market</p>                                                                                |
| out\_q50\_income                   | <p>50 Quantile (median) Estimated Income People</p><p>inhabiting the micro-market</p>                                                                          |
| out\_q75\_income                   | <p>75 quantiles of estimated income of persons living in</p><p>Micro-market</p>                                                                                |
| out\_q95\_income                   | <p>95 quantiles of estimated income of persons living in</p><p>Micro-market</p>                                                                                |

If you select *Building surroundings,* the following columns are added:

| out\_buildings\_a\_       | <p>Number of different types of buildings within 1 km of the building, examples of building types: residential, service, public,</p><p>supermarket, swimming pool, school, university, hotel, etc.</p>                  |
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| out\_buildings\_a\_area\_ | <p>Area of buildings of various types at a distance of 1 km from the building, examples of building types: residential, service,</p><p>public, supermarket, swimming pool, school, university, hotel, etc.</p>          |
| out\_landuse\_a\_         | <p>Number of different types of land cover at a distance of 1 km, examples of land cover: forest, natural areas, industrial areas, residential areas, commercial and service areas,</p><p>parks, parking lots, etc.</p> |
| out\_landuse\_a\_area\_   | <p>Area of various types of land cover at a distance of 1 km, examples of types of land cover: forest, natural areas, industrial areas, residential areas, commercial areas</p><p>parks, parking lots, etc.</p>         |
| out\_natural\_a\_         | Number of different types of objects of natural origin                                                                                                                                                                  |

| out\_natural\_a\_area\_                                                  | Surface area of various types of objects of natural origin                                                                                                                                                                                        |
| ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| out\_pois\_                                                              | <p>Number of POIs (points of interest) within 1 km from the building, examples of POIs: shop, park, school, kindergarten, café, bakery, cinema, theatre, pharmacy, petrol station,</p><p>supermarket, fryzjer itd.</p>                            |
| out\_railways\_                                                          | <p>Number of rail structures at a distance of 1 km, examples of types:</p><p>kolej, tramwaj, metro</p>                                                                                                                                            |
| out\_railways\_length\_                                                  | <p>Length of rail objects at a distance of 1 km, examples</p><p>Types: rail, tram, metro</p>                                                                                                                                                      |
| out\_roads\_                                                             | <p>Number of roads of different types within 1 km of the building, examples of road types: motorway, expressway, main road, secondary road, local road,</p><p>housing estate road, transport hub, bicycle path, pavement</p><p>etc.</p>           |
| out\_roads\_length\_                                                     | <p>Length of roads of various types at a distance of 1 km from the building, examples of road types: motorway, expressway, main road, secondary road, local road,</p><p>housing estate road, transport hub, bicycle path, pavement</p><p>etc.</p> |
| out\_traffic\_, out\_transport\_, out\_traffic\_a\_, out\_transport\_a\_ | Number of transport-related facilities within 1 km of the building, examples of object types: tram stop, bus stop, railway station, metro station, taxi rank, bus station, parking lot, etc.                                                      |
| out\_traffic\_a\_area\_, out\_transport\_a\_area\_                       | <p>Area of transport-related facilities at a distance of 1 km from the building, examples of types of facilities: tram stop, bus stop, railway station, metro station, taxi stand,</p><p>bus station, parking lot, etc.</p>                       |
| out\_water\_a\_area\_                                                    | <p>Surface water surface of various types at a distance of 1 km from the building, examples of types of facilities: lake, sea,</p><p>river, swamp, etc.</p>                                                                                       |
| out\_waterways\_length\_                                                 | The length of surface water of various types at a distance of 1 km from the building, examples of types of objects: lake, sea, river, swamp, etc.                                                                                                 |

## Job report <a href="#bookmark9" id="bookmark9"></a>

For each completed job, you can view a report on the quality of standardization and geocoding by clicking the *View button* on the *Jobs screen*.

![http://dataquality.pl/wp-content/uploads/2016/11/raport\_1.png](/files/6R5UA5WJyvoMfz1GCylZ)

*Fig. 6 Task report with a description of the results of standardization*

The report includes the following information:

| **Results**          |                                                 |
| -------------------- | ----------------------------------------------- |
| Status               | File Cleanup Status                             |
| Error description    | Short description of the error                  |
| Input Records        | Number of records in the input file             |
| Processed records    | Number of records processed                     |
| Records to be billed | Number of records to bill                       |
| Records Skipped      | Number of records skipped                       |
| City level           | Number of records cleared to city level         |
| Street level         | Number of records cleared to street level       |
| Building level       | Number of records cleared to the building level |

| Apartment level                   | Number of records cleared to the premises level                                                               |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| **Extracted data**                |                                                                                                               |
| Correctly extracted names         | <p>number of company name records for which</p><p>the name of the business owner has been extracted</p>       |
| Correctly extracted names         | <p>number of company name records for which</p><p>the name of the owner of the company has been extracted</p> |
| Correctly extracted company names | <p>number of company name records for which</p><p>the legal form of the company has been distinguished</p>    |
| Undistinguished names             | Number of records for which the name could not be confirmed                                                   |
| Non-isolated names                | Number of records for which the name could not be confirmed                                                   |

## Retrieving dictionary data <a href="#bookmark10" id="bookmark10"></a>

The user can download data from the address database of buildings after selecting the *Data dictionaries* from the main menu.

There are two ways to specify the scope of data to be downloaded:

1. Downloading for entire areas of a province, county or municipality
   * Select a province or an area of the entire country
   * After selecting a state, select a county or an area of the entire province
   * Once you've selected a county, select a municipality or an area of the entire county
2. Download by zip code
   * Enter the entire postal code - XX-XXX or XXXXX formats are acceptable
   * Enter the code pattern: XX\* (e.g. 03\*) – then data for all postal codes starting with XX digits are downloaded

After specifying the area, the User can calculate the price of downloading dictionary data from the selected area by using the *Calculate fee button*.

![](/files/wqL84dnZPBaIUPO5wkOT)

*Fig. 7 Dictionary data to download*

To download dictionary data, click the Download *Data button*. The app automatically changes to the *Tasks* screen. The task list will show an entry for dictionary data along with its status. If the data is available, it is possible to download it by clicking on the *Download* button.

![http://dataquality.pl/wp-content/uploads/2016/11/dane\_slownikowe\_2.png](/files/RAmotC1pV8CKbup15YNS)

*Fig. 8 Dictionary data ready for download on the task list*

**Note**: Before the actual data extraction is started for the user, after clicking *Downloading data*, the system will calculate the fee. If the fee is greater than the amount of funds in the User's account, the following message will be displayed: *The fee for dictionary data is greater than the amount of funds in the account. Top up your account*. Click the *Top up button* to top up your account and be able to download the dictionary data of your choice.

The resulting table is a . CSV file that contains the following columns:

| id       | Building ID                                                                                                                            |
| -------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| sym\_msc | symbol of the town of GUS TERYT                                                                                                        |
| Location | Town name                                                                                                                              |
| sym\_ul  | symbol of the street GUS TERYT                                                                                                         |
| feature  | the feature name of the street GUS: ul., al., pl. etc.                                                                                 |
| nazwa\_1 | the main part of the name of the street of the Central Statistical Office (e.g. the name of the patron)                                |
| nazwa\_2 | the first, less important part of the street name, if it exists (e.g. the name of the patron)                                          |
| nr\_calk | numerical part of the building number                                                                                                  |
| street   | street name created by Algolytics based on the street name fields of the Central Statistical Office by eliminating repetitions and ul. |
| nr\_domu | Building No.                                                                                                                           |

| x              | longitude                                                                       |
| -------------- | ------------------------------------------------------------------------------- |
| y              | latitude                                                                        |
| wojewodztwo    | name of the voivodeship                                                         |
| county         | County name                                                                     |
| municipality   | Name of the municipality                                                        |
| Municipality06 | TERYT identifier of the municipality                                            |
| Area13         | statistical region identifier                                                   |
| Circuit14      | census tract ID                                                                 |
| Apartment      | number of apartments in the building                                            |
| Inhabited      | number of inhabited dwellings according to the Central Statistical Office (GUS) |
| Residents      | number of inhabitants according to the Central Statistical Office               |
| Status         | Geocoding Status: Exact, Adjacent Building, Perimeter Center                    |
| code           | Zip Code                                                                        |

Detailed descriptions of the columns can be found in Appendix 1.

## My account <a href="#bookmark11" id="bookmark11"></a>

The My *Account* screen displays information about billing, API access, user data, and invoicing data.

![http://dataquality.pl/wp-content/uploads/2016/11/moje\_konto\_1.png](/files/nlpZyOF32w8kBZ9NthaO)

*Fig. 9 My Account screen view*

The screen is divided into 4 areas: Billing, JSON API Access, Customer Data, Data to Invoicing.

The *Billing area* contains the following information:

| Current Credit/ Available Funds | the sum of credited fees charged to the User when settling with the crediting method/the currently available amount of cash, topped up via the on-line payment system, if the User settles using the PrePaid method |
| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Fees charged                    | the sum of all fees charged so far                                                                                                                                                                                  |
| Receivables – last month        | the sum of unpaid fees charged to the User for the previous billing period, only for Users settling using the credit method                                                                                         |
| Processed records               | <p>sum of all records from all user-loaded</p><p>Files</p>                                                                                                                                                          |
| Including cleared records       | <p>sum of all records purged successfully that</p><p>generated fees</p>                                                                                                                                             |

| Last Supply                                                 | date of the last credit in the case of a User settling with the PrePaid method, in the case of settling with the credit method, this field is empty                                          |
| ----------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Final task                                                  | Date of the last user task performed by the system                                                                                                                                           |
| Payment of the amount due/ Top-up of the account with funds | the amount due and the *Pay*/field for entering the top-up amount and the *Top-up account button*. The buttons direct the User to the PayU system, where they need to make an online payment |

You can also use dataquality.pl applications through the API service. The *JSON API Access area* includes the following information:

* API access key – a 40-character key used with the use of API to authorize and identify the User to perform tasks
* Generate new key - allows you to generate a new API key, the previous key will no longer be active and will no longer be available in the system
* API documentation – a link to the API documentation dataquality.pl&#x20;

Customer *data area* contains the following information:

* Gender – Water/Paan
* Name
* Surname
* Company
* Numer by phone
* Email

The *Invoicing area* contains the following information:

* Name and surname/Name of the entity
* TIN
* Street
* House number
* Apartment number
* Zip code
* City

Each of the data from the two above areas (except for the e-mail address) can be changed and confirmed by clicking the Save *data button*.

## Changing your password <a href="#bookmark12" id="bookmark12"></a>

You can change your password by clicking the email address button in the top menu and then selecting *Change Password*.

![http://dataquality.pl/wp-content/uploads/2016/11/zmien\_haslo\_1.png](/files/WJAjAjuGPqtk9xwLtlH9)

*Fig. 10 Change Password Screen View*

To change your password, enter your current password, type your new password, and type your new password again, and then click *Change Password*.

If the password has not been changed for 30 days, the system will force the User to change the password. After logging in, the User will be taken to the *Change Password screen*, which will require you to enter a new password and confirm it. The new password cannot be the same as the previous three passwords. Only after changing the password and logging in, the system will redirect to the main screen as standard.

## Appendix 1. Standard columns in the output file <a href="#bookmark13" id="bookmark13"></a>

Below are the column names and their definitions, and in parentheses - the types with the maximum suggested number of characters for text variables (set with a margin where it may be useful). The following list includes columns that are included in the output file of the Address Standardization process, both by default and by selecting the appropriate enrichment option, which are available to all DataQuality Algolytics users.

* ***out\_id*** (integer) *–* the number of the record in the set;
* ***out\_sym\_msc\_pod*** (text, 7) *–* symbol of the basic locality of the Central Statistical Office TERYT;
* ***out\_miejscowosc*** (text, 75) *–* the name of the (basic) place;
* ***out\_sym\_cz\_msc*** (text, 7) *–* symbol of a part of the locality of the Central Statistical Office TERYT; if no part of the locality has been distinguished that is not identical to the basic locality, there will be an empty text in this field;
* ***out\_czesc\_miejsc*** (text, 75) *–* the name of the part of the locality distinguished by the Central Statistical Office, with the reservation as abov&#x65;*;*
* ***out\_sym\_ul*** (tekst, 5) *–* symbol nazwy ulicy GUS TERYT;
* ***out\_ulica*** (text, 150) *–* street name created by Algolytics on the basis of the CSO street name fields by the elimination of repetitions and features of *ul.;*
* ***out\_ulica\_cecha*** (tekst, 15) *–* cecha nazwa ulicy GUS: *ul.*, *al.*, *pl.* itp.;
* ***out\_ulica\_nazwa\_1*** (text, 150) *–* the main part of the name of the street of the Central Statistical Office (e.g. the name of the patron);
* ***out\_ulica\_nazwa\_2*** (text, 100) *–* the first, less important part of the street name, if it exists (e.g. the name patron);
* ***out\_kod*** (text, 6) *–* postal code in the format: two digits, dash, three digits;
* ***out\_nr\_domu*** (tekst, 10) *–* numer domu, budynku;
* ***out\_nr\_miesz*** (text, 15) *–* apartment number;
* ***out\_adr\_id*** (text, 35) *–* an identifier of an address, invariant over time;
* ***out\_status*** (text, 500) *–* a sequence of entries in the form of *\<operacja\_lub\_informacja: result>* or \<information>, for example: '*\<match: building>\<geocoding: matched building>'*; more about address standardization statuses in point 2;
* ***out\_gmina*** (tekst, 50) *–* nazwa gminy;
* ***out\_powiat*** (text, 50) *–* the name of the county;
* ***out\_wojewodztwo*** (text, 50) *–* name of the voivodeship;
* ***out\_gmina06*** (text, 6) *–* a six-character code of the commune, consisting of the codes: voivodship (positions 1-2 ), powiat in voivodship (positions 3-4), commune in poviat (positions 5-6);
* ***out\_rodz\_gmi*** (text, 1) and ***out\_rodzaj\_gminy*** (text, 50) *–* numerical and verbal designation of the type of gmina, respectively: *1 – urban gmina, 2 – rural gmina, 4 – urban part of urban-rural gmina, 5 – rural part of urban-rural gmina*;
* ***out\_rejon13*** (text, 13) *–* code of the commune, along with its type (items 1-7) and statistical region (items 8-13) of the Central Statistical Office TERYT;
* ***out\_obwod14*** (text, 14) *–* code of the commune, along with its type (items 1-7), statistical region (items 8-13) and census tract within this region (item 14) of the Central Statistical Office TERYT;
* ***out\_wsp\_x*** (floating point number) *–* east longitude;
* ***out\_wsp\_y*** (floating point number) *–* north latitude;
* ***out\_mieszkania*** (integer), ***out\_zamieszkane*** (integer), ***out\_mieszkancy*** ( integer) *–* these fields represent, respectively: the number of dwellings, the number of inhabited dwellings and the number of inhabitants of the building according to the Central Statistical Office; these and other fields concerning the building are filled in only if the address provided at the entrance has been successfully matched to the level of the building;
* ***out\_osoby\_prawne*** (integer) *–* the number of legal persons and organizational units entered in the REGON register, which have their registered office in a given building;
* ***out\_popul\_*** (integers) *– a* group of columns with data on the population of buildings based on the PESEL register made available by the Ministry of Digital Affairs: *miesz –* inhabited dwellings, *os –* peopl&#x65;*, kob –* women, *husband –* men, *25\_29k –* women aged 25-29, *30\_34m –* men aged 30-34, etc.
* ***out\_buildings\_a\_*** (integer) – a group of columns with the number of different types of buildings at a distance of 1 km from the building/address, examples of building types: residential, service, public, supermarket, swimming pool, school, university, hotel, etc.
* ***out\_buildings\_a\_area\_*** (floating point number) – a group of columns with the area of buildings of various types at a distance of 1 km from the building/address, examples of building types: residential, service, public, supermarket, swimming pool, school, university, hotel, etc.
* ***out\_landuse\_a\_*** (integer) – a group of columns with the number of different types of land cover at a distance of 1 km from the building/address, examples of land cover: forest, natural areas, industrial areas, residential areas, commercial and service areas, parks, parking lots, etc.
* ***out\_landuse\_a\_area\_*** (floating point number) – a group of columns with the area of various types of land cover at a distance of 1 km from the building/address, examples of land cover: forest, natural areas, industrial areas, residential areas, commercial and service areas, parks, parking lots, etc.
* ***out\_natural\_a\_*** (integer) – a group of columns with the number of different types of objects of natural origin, e.g. beach, cliff, etc.
* ***out\_natural\_a\_area\_*** (floating point number) – a group of columns with the surface of various types of objects of natural origin, e.g. a beach, a cliff, etc.
* ***out\_pois\_, out\_pois\_a\_*** (integer) – groups of columns with the number of POIs (points of interest) at a distance of 1 km from the building/address, examples of POIs: shop, park, school, kindergarten, café, bakery, cinema, theatre, pharmacy, gas station, supermarket, hairdresser, etc.
* ***out\_railways\_*** (integer) – a group of columns with the number of rail objects at a distance of 1 km from the building/address, examples of types: railway, tram, metro.
* ***out\_railways\_length\_*** (floating-point number) – a group of columns with the length of objects at a distance of 1 km from the building/address, examples of types: rail, tram, metro.
* ***out\_roads\_*** (integer) – a group of columns with the number of roads of different types at a distance of 1 km from the building/address, examples of road types: motorway, expressway, main road, secondary road, local road, housing estate road, transport hub, bicycle path, pavement, etc.
* ***out\_roads\_length\_*** (floating point number) – a group of columns with the length of different roads within 1 km of the building/address, examples of road types: motorway, road expressway, main road, secondary road, local road, housing estate road, transport hub, bicycle path, pavement, etc.
* ***out\_traffic\_, out\_transport\_, out\_traffic\_a\_, out\_transport\_a\_*** (integer) – groups of columns with the number of transport-related objects at a distance of 1 km from the building/address, examples of types of objects: tram stop, bus stop, railway station, metro station, taxi rank, bus station, parking lot, etc.
* ***out\_traffic\_a\_area\_, out\_transport\_a\_area\_*** (floating point number) – groups of columns with the area of transport-related objects at a distance of 1 km from the building/address, examples of types of objects: tram stop, bus stop, railway station, metro station, taxi stand, bus station, parking lot, etc.
* ***out\_water\_a\_area\_*** (floating point number) – a group of columns with the surface of surface water of various types at a distance of 1 km from a building/address, examples of types of objects: lake, sea, river, swamp, etc.
* ***out\_waterways\_length\_*** (floating point number) – a group of columns with the length of surface water of various types at a distance of 1 km from a building/address, examples of types of objects: lake, sea, river, swamp, etc.
* ***out\_geoscore\_*** (floating point number) – a group of columns related to financial risk – default level and fraud level for individuals, JDGs and companies
* ***out\_average\_income*** (floating point number) – Income index containing an estimated income index of people living in the micro-market

## Appendix 2. Statuses of the results of the standardisation of address data and geocoding <a href="#bookmark14" id="bookmark14"></a>

The status ***field of the*** result table contains a sequence of combined entries in the form *\<category: result>* or \<category>; in terms of standardization of address data, the categories ( **bold letters**, first level of the list) and results ( *italics*, second level of the list) include the following items:

* ***ambiguous assignment*** – entries match different types of information to a very similar extent candidates *–* possible results:
  * *one of the similarly matched streets in one town was selected –* data is assigned one of the streets;
  * *one of the similarly matched localities in one municipality has been selected –* data from one of the localities is assigned;
  * *similarly matched localities in different municipalities –* the minimum accuracy of the assignment has not been achieved, no data is assigned to a given record.
* ***change of place name*** – the name of the place has changed since it was introduced to the processed set, the output is given with a new name of the place.
* ***Street name change*** – the street name has changed since it was entered into the processed set, the new street name is given at the output.
* ***Matching*** – possible outcomes:
  * *apartment* – an apartment with a given address has been identified;
  * *building with no dwellings* – the building with the address has been identified, and the building\
    This one has no housing, and therefore the maximum possible fit has been achieved;
  * *building in which there are dwellings* – the building with the address has been identified, this building\
    However, he has apartments, while the apartment number has not been recognized;
  * *building with the same integer number* – no building with a given address has been identified, but in a given town and on a given street there is a building with the same integer number – e.g. in the processed set there is the number *18b*, which is not in the dictionary, but there is a building with the number *18*; Note - the service will return the building number provided in the query. The table below presents examples of situations with DQ responses for the status in question:

| **Existing buildings in the dictionary** | **Request to DQ** | **Re. DQ** |
| ---------------------------------------- | ----------------- | ---------- |
| 1, 1A, 1B                                | 1C                | 1C         |
| 1A, 1B                                   | 1C                | 1C         |
| 1A, 1B, 1C, 1D, 1E, 1F, 1G               | 1Z                | 1Z         |
| 2B, 2D, 2E, 2F/1, 2K                     | 2                 | 2          |

* *neighbouring building* – no building with a given address has been identified, but in a given town and on a given street there is a building with an integer number differing by no more than 4 from the total number of the building included in the processed data;
  * *street* – the street was identified, the building with the given number or the adjacent number was not found;
  * *a town that has no streets* – the town has been identified, the town has no streets, the building with a given number or neighbouring number has not been found;
  * *locality* – the locality has been identified, no street or building assigned directly to the locality with a given number or neighbouring number has been found (in some villages there are mixed addresses – some addresses are assigned to the level of the locality, and others to the street);
  * *none* – no match was obtained, the locality was not identified;

In addition, there are variants of the above-mentioned results for cases in which the entry matches a given candidate (town or street) to a greater extent than in the case of *none*, and to a lesser extent than for the other results; results indicating that the entry matches the candidate with a high degree of probability, but there is also a moderate risk of error:

* *an apartment on a probable street;*
* *living in a probable locality;*
* *a building in which there are no apartments, on a probable street;*
* *a building in which there are no apartments, in a probable locality;*
* *a building in which there are apartments, on a probable street;*
* *a building in which there are dwellings in a probable locality;*
* *probable street;*
* *probable locality that has no streets;*
* *probable locality.*<br>
* ***Geocoding*** – possible results:
  * *matched building* – the coordinates of the building defined by the match status (building with a given address, neighboring building, etc.) have been assigned;
  * *neighbouring building* – coordinates of a building in a given town and on a given street (if the street appears in the address) with a different number, the total part of which differs from the total part of the number of a given building by no more than 4;
  * *center of the street within the postal code (\<category of the number of buildings in the group>)* – in the case of a street for which there is more than one postal code and the postal code is known, the coordinates of the center of the group of buildings located on a specific street and having a given code are assigned; in brackets the category of the number of buildings constituting such a defined group is given:
    * *less than 20 buildings*;
    * *20-49 buildings*;
    * *50 and more buildings;*
  * *the centre of a street in a part of a locality (\<the category of the number of buildings in a group>)* – in the case of localities that have separate parts, and a given street is located in more than one of them and it was possible to determine which part it is, the coordinates of the centre of the group of buildings located on a given street and in a specific part of the locality were assigned (e.g. the centre *of Puławska* Street in a part of *Mokotów*); in brackets is the category of the number of buildings constituting such a defined group, as described above;
  * *center of the street (\<category of the number of buildings in the group>)* – the coordinates of the center of the group of buildings located on a given street are assigned; the category of the number of buildings constituting such a defined group, as described above, is given in brackets;
  * *center of the district* – coordinates of the center of the census tract of the Central Statistical Office have been assigned – the territory of Poland is divided into over 180 thousand such districts based on the distribution of population, a typical district is inhabited by about 200 people;
  * *center of the region* – coordinates of the center of the statistical region of the Central Statistical Office – the territory of Poland have been assigned is divided into more than 30 thousand such districts based on the distribution of population,
  * *Town center* – coordinates of the village center have been assigned.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://algolytics-technologies.gitbook.io/algolytics/dataquality-web-app.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
