# Event Engine \[administrator]

**Event engine – functional specification of the solution**

### Purpose of the solution

* Efficient real-time event stream processing and on-line scoring based on analytical models using information from these event streams
* The solution is scalable and configurable, which allows it to be used in various business branches (including: gaming, recommendations, Web analytics, IoT – e.g. processing of event streams from device sensors)

### Scope of the solution

It is a complete system for processing event streams on-line (event aggregation and real-time scoring) and off-line (event aggregation to automatically build analytical models). The scope of the solution includes:

* Ensure efficient handling of event streams from multiple clients at the same time
* Write retail events to a repository for off-line aggregation and create analytic tables for modelling
* Storing the state of off-line users
* Aggregate counting – a module used for counting aggregates in off-line (building models) and on-line (on-line scoring), connected to the processing path in the selected environment
* Automatic creation of analytical models (using ABM)
* Automatic deployment of new models for use for on-line scoring (via metadata)
* On-line scoring – checking the conditions that trigger scoring with individual models, triggering scoring and returning a response to the customer

### Solution assumptions

The client application sends messages (events) in the form of jsons via an http connection (REST API). Events are fed into the event engine via the Kafka queue. Each event is written to the event repository to enable off-line processing.

Then:

In the on-line version:

* The event is converted into variables (definition in metadata)
* The user's aggregate values using these variables are refreshed
* The scoring conditions for each model are checked (conditions that trigger scoring and conditions that check whether a given user should be scored with a given model)
* For each model for which the conditions are met, a row of data is prepared for recalculation (based on the model description in the meta data)
* Scoring is triggered
* The response is returned to the client

In the off-line version (triggered every set period of time, automated process):

* For each customer and model, aggregates are counted based on messages stored in a text file
* For each user, 1 row can be created in the result analytic table containing the counted aggregates and the value of the target variable. For some users, the poem will not be created, because:

  * The scoring condition will not be met
  * The conditions for calculating the target window will not be met (e.g. the target window will be exactly 3 days, and there are only 2 days of history in the data)

  **Note**: in the case of programmatic, multiple lines can be created for each user, because input jsons (bid requests) can actually contain several bid requests for different impressions. Then as many rows are created for the user as there was an impression id.
* A separate analytical table is created for each model
* The analysis table is the input for [ABM processes](#user-content-fn-1)[^1] that count models
* Selected models are automatically deployed (scoring code and information about the variables used are saved to metadata)

Scheme of operation of the system

<figure><img src="/files/PL0PUl0qS3VAW6SqYrbb" alt=""><figcaption></figcaption></figure>

### Messages

1. The client application sends messages (events) in the form of jsons over an http connection
   * http connection sends messages in packets (in particular, there can be 1 message in a packet)
   * Data encryption: SSL (can be disabled by setting it in the http server configuration file)
2. Messages can come from multiple sources (e.g., game servers, users)
   * Differentiation by client\_id
   * The application is configurable for specific customers by defining dedicated metadata (variables, aggregates, models)
3. The order in which messages are processed, based on the time the event arrives in the system, is maintained
4. Event json format
   * Generic formats – for them we provide effective processing

{% code overflow="wrap" %}

```json
{"arrival_ts":1232344,
 "client_id":1,
 "user_id":5,
 "event_id":23,
 "eventType1":
  {"eventA":{"value":10, "name":"AAA"},
  "value":"value1"
  }
}
```

{% endcode %}

* Variable example: `$.['eventType1'].['eventA'].['value']` returns `10`
* Variable example: `$.['eventType1'].['value']` returns "`value1`"
* In addition, queries that allow the insertion of conditions that check for equality "`==`" and are combined with "`&&`" are optimized:\
  `$.['eventType1'].['eventA'].[?(@.['value'] == 10 && @.['name'] == "AAA")].value` returns `[10]`\
  or:\
  `$.['eventType1'].[?(@.['eventA'].['value'] == 10 && @.['eventA'].['name'] == "AAA")].value` returns `["value1"]`\
  Filters using "`==`" and "`&&`" can be at different levels of the querycode
* Lists of values of the "category" type are also optimized: \["A1", "A2", "A3"] if we want to pull out a category variable that is a list, but ultimately we want to build aggregates that count the number of events with a given value in the list, e.g. A1\_cnt\_all, A2\_cnt\_all, A3\_cnt\_all

```json
{"arrival_ts":1232344,
 "client_id":1,
 "user_id":5,
 "event_id":23,
 "category":["A1", "A2", "A3"]
}
```

* Arbitrary formats, compliant with JSONPath (slower processing efficiency) – the analyst adds rules for converting to a variable to the metadata (table variables, point [#definition-of-target](#definition-of-target "mention")):

```json
{"arrival_ts":1232344,
 "client_id":1,
 "user_id":5,
 "event_id":23,
 "name":"eventType1",
 "type":
  [
    {"event":"eventA",
     "value":10
    },
    {"event":"eventB",
     "value":45
    }
  ]
}
```

* Example using a regular expression:

`$.[? (@.name =~ /.*eve.*/i)].['type'].[1].['value']` returns `[45]`

5. Response to the customer's message:

* Score is returned directly in the query response
* The results include the following fields:
  * userId – The user's ID
  * scores – a list of models and scores; empty if no scoring has occurred
  * modelsId - models
  * score – the value of the score for the model

```json
{"userId":5,
 "scores":
  [
    {"modelId":1, "score":0.16596808075912742},
    {"modelId":2, "score":0.56665555575912789},
    {"modelId":3, "score":0.78954308075912573}
  ]
}
```

6. Response to the client message in the case of programmatic:

* Score is returned directly in response to the bid request
* The results include a list of suggested bidding prices for each impression:
  * impid – impression id
  * scores – a list of models and scores. The stores object is empty if no scoring has occurred or there is no active deployed model. Otherwise, it contains elements where the key is the model id and the value is the suggested bidding price

```json
[
    {
        "impid": "424c6a8db16d4fa8ab853c5cd7b04ac7",
        "scores": {
            "1314": 0.6930373068263596,
            "1315": 5.24522720421126,
            "1316": 44.87740531791001
            }
        },
    {
        "impid": "e5d26e91ac2441f997e3b1bee6168562",
        "scores": {
            "1314": 0.6930373068263596,
            "1315": 5.24522720421126,
            "1316": 44.87740531791001
            }
        }
]// Some code
```

* The price is calculated according to the formula:\
  `min(10 * value, score * value * weight)`
  * score
  * value – the value of the variable from the models table (CPC)
  * weight – weight from the models table (by default 1000, because the prices are bid in the CPM rate)

### Unit counting module

1. Counting aggregates in the on-line and off-line versions (in both versions, the aggregates are counted with the same code)
2. Off-line version
   * Processing is parallelized after user id
   * Aggregates are counted based on retail events written to the event repository
   * The launch of the off-line version can be scheduled (offline scheduling, point [#scheduling-offline-processing](#scheduling-offline-processing "mention"))
3. On-line version
   * Processing is parallelized after user id
   * On-line generators are counted and stored in memory (for on-line users)
   * After a set time of user inactivity defined in the configuration file (no messages about a given user), the aggregates are saved in the mongo db database (the user logs out)
4. Types of aggregates
   * Incremental Count for Whole Data
   * Counted in a window (time and with a specific number of messages)
   * Sliding windows (timed and with a specific number of messages)
   * Target windows – used for target calculation, only in the off-line version
   * A single message can belong to multiple windows
   * The list of counted aggregates is defined in the metadata
5. Aggregate List
   * Number of occurrences, sum, last value, flag if the event occurred, min, max, current value (from the currently processed json)
   * Derivative aggregates defined in the form of expressions in Java (e.g. aggr1 + aggr2)
   * Aggregates resulting from defined dictionaries (described in the section on dictionaries)

### How messages are stored

1. All messages are written to the message repository. If necessary (e.g. customer requirement), a backup can be created
2. Writing messages to the repository does not block further processing
3. Repository for storing messages: txt files

### Structure and storage of metadata

1. Metadata is created and stored in the [GDBase database](#user-content-fn-2)[^2]
2. Creating tables:

```sql
CREATE TABLE clients(id INTEGER PRIMARY KEY /*, ...*/);
CREATE TABLE external_data(id INTEGER PRIMARY KEY, client_id INTEGER /*, ...*/);
CREATE TABLE variables( id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, event_id INTEGER NOT NULL CHECK(event_id >= -1), definition TEXT NOT NULL, definition_type TEXT NOT NULL CHECK(definition_type = 'JSON_PATH' OR definition_type = 'TRANSFORMED'), input_id INTEGER CHECK((input_id IS NULL AND definition_type != 'TRANSFORMED') OR (input_id IS NOT NULL AND definition_type = 'TRANSFORMED')), category TEXT CHECK(category IS NULL OR definition_type = 'JSON_PATH'), type INTEGER NOT NULL CHECK(type = 1 OR type = 2), default_value TEXT);
CREATE INDEX ON variables(client_id);
CREATE INDEX ON variables(input_id);
CREATE TABLE aggregates( id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, variable_id INTEGER, aggregate_type INTEGER, window_type INTEGER, window_size/*count|time*/ INTEGER, window_shift/*count|time*/ INTEGER, definition TEXT, return_type INTEGER CHECK((definition IS NULL AND return_type IS NULL) OR (definition IS NOT NULL AND return_type IS NOT NULL AND (return_type = 1 OR return_type = 2))), dictionary_id INTEGER, external_data_id INTEGER, external_data_name TEXT, name TEXT NOT NULL CHECK(name REGEXP '[a-zA-Z_$][a-zA-Z_$0-9]*'));
CREATE UNIQUE INDEX ON aggregates(client_id, name);
CREATE TABLE derived_aggregates( derived_aggregate_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL
);
CREATE INDEX ON derived_aggregates(derived_aggregate_id);
CREATE TABLE trigger_aggregates( trigger_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL
);
CREATE INDEX ON trigger_aggregates(trigger_id, aggregate_id);
CREATE TABLE triggers( id INTEGER PRIMARY KEY, definition TEXT NOT NULL, needs_change INTEGER/*boolean*/ NOT NULL CHECK(needs_change = 0 OR needs_change = 1), `group` INTEGER NOT NULL
);
CREATE TABLE model_triggers( model_id INTEGER NOT NULL, trigger_id INTEGER NOT NULL
);
CREATE INDEX ON model_triggers(model_id, trigger_id);
CREATE TABLE model_aggregates( model_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL, used INTEGER/*boolean*/ NOT NULL CHECK(used = 0 OR used = 1)
);
CREATE INDEX ON model_aggregates(model_id, aggregate_id);
CREATE TABLE trigger_validator_aggregates( model_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL
);
CREATE INDEX ON trigger_validator_aggregates(model_id, aggregate_id);
CREATE TABLE models( id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, active INTEGER/*boolean*/ NOT NULL CHECK(active = 0 OR active = 1), used INTEGER/*boolean*/ NOT NULL CHECK(used = 0 OR (active = 1 AND used = 1)), target_aggregate_id INTEGER NOT NULL, target_type TEXT NOT NULL CHECK((target_type = 'GENERAL' AND trigger_validator IS NULL) OR ((target_type = 'CLICK' OR target_type = 'CLICK_AD') AND target_start = 0 AND target_length = 0 AND target_length_exact = 0)), target_start INTEGER NOT NULL CHECK(target_start >= 0), target_length INTEGER NOT NULL CHECK(target_length >= 0), target_length_exact INTEGER/*boolean*/ NOT NULL CHECK(target_length_exact = 0 OR target_length_exact = 1), trigger_validator TEXT, scoring_code TEXT CHECK(NOT used OR scoring_code IS NOT NULL), saved_state_time INTEGER CHECK(NOT used OR saved_state_time IS NOT NULL), first_model_time INTEGER CHECK(NOT used OR first_model_time IS NOT NULL), saved_model_time INTEGER CHECK(NOT used OR saved_model_time IS NOT NULL), value FLOAT NOT NULL CHECK(value > 0), client_value FLOAT, positive_target_cnt INTEGER, weight FLOAT NOT NULL CHECK(weight > 0 AND weight <= 1000), positive_target_ratio FLOAT NOT NULL CHECK(positive_target_ratio >= 0 AND positive_target_ratio <= 1), category_id INTEGER CHECK(NOT use_category OR category_id IS NOT NULL), use_category INTEGER/*boolean*/ NOT NULL CHECK(use_category = 0 OR use_category = 1), end_date INTEGER
);
CREATE INDEX ON models(client_id);# Przy zmianach check-ow w tabeli models nalezy pamietac o wprowadzeniu tych samych zmian w odpowiadajacych zmiennych w tabeli default_models i odwrotnie (zmienna trigger_validator w models odpowiada zmiennej trigger_validator_definition w default_model)
CREATE TABLE default_model( id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, target_definition TEXT NOT NULL, target_aggregates TEXT NOT NULL, trigger_definition TEXT NOT NULL, trigger_aggregates TEXT NOT NULL, trigger_validator_definition TEXT NOT NULL, trigger_validator_aggregates TEXT NOT NULL, target_type TEXT NOT NULL CHECK((target_type = 'GENERAL' AND trigger_validator_definition IS NULL) OR ((target_type = 'CLICK' OR target_type = 'CLICK_AD') AND target_start = 0 AND target_length = 0 AND target_length_exact = 0)), target_start INTEGER NOT NULL CHECK(target_start >= 0), target_length INTEGER NOT NULL CHECK(target_length >= 0), target_length_exact INTEGER/*boolean*/ NOT NULL CHECK(target_length_exact = 0 OR target_length_exact = 1), target_aggregate_name TEXT NOT NULL, weight FLOAT NOT NULL CHECK(weight > 0 AND weight <= 1000));
CREATE TABLE dictionary(
id INTEGER, categorical INTEGER/*boolean*/ NOT NULL CHECK(categorical = 0 OR categorical = 1), value TEXT CHECK(categorical OR value IS NULL), start FLOAT CHECK(NOT categorical OR start IS NULL), start_inclusive FLOAT CHECK((start IS NULL AND start_inclusive IS NULL) OR (start IS NOT NULL AND start_inclusive IS NOT NULL AND (start_inclusive = 0 OR start_inclusive = 1))), end FLOAT CHECK((NOT categorical OR end IS NULL) AND (start IS NULL OR end IS NULL OR end >= start)), end_inclusive FLOAT CHECK((end IS NULL AND end_inclusive IS NULL) OR (end IS NOT NULL AND end_inclusive IS NOT NULL AND (end_inclusive = 0 OR end_inclusive = 1))), mapped_value TEXT NOT NULL
);
CREATE INDEX ON dictionary(id);
CREATE TABLE model_urls( model_id INTEGER NOT NULL, client_id INTEGER NOT NULL, url TEXT NOT NULL, included INTEGER/*boolean*/ NOT NULL CHECK(included = 0 OR included = 1));
CREATE INDEX ON model_urls(model_id, client_id);
```

3. Tables structure:

**`clients`**`:` (not used yet)

* id `INTEGER PRIMARY KEY`
* authorization data
* other customer data, e.g. payments, access restrictions

**`external_data`**`:` (not used yet)

* `id INTEGER PRIMARY KEY`
* `client_id INTEGER`
* Data pointing to an external data source

**`variables`**`:` (stores definitions of variables obtained from json)

* `id INTEGER PRIMARY KEY`
* `event_id INTEGER` - corresponds to the eventId field passed in json
* `definition_type` – definition type: JSON\_PATH lub TRANSFORMED
* `definition TEXT` - definition of converting an event to a variable (JSONPath)
  * if definition\_type = JSON\_PATH then definition contains a JsonPath expression. It should return a numeric or text value. The syntax for expressions is described in the JsonPath\_README.md file. For performance reasons, it's best to use only expressions like $.x.y.z. In addition, optimization for \[?( @.a == 'x' && @.b == 3 && @.c == @.d && ...)]
  * if definition\_type = TRANSFORMED then definition contains the full definition of the Java class, which must inherit from DoubleTransformation or StringTransformation (there should be no import of this class). There should be no specified package in the class being defined
* `input_id` - for the variable TRANSFORMED indicates the id of the input variable passed to the class defined in the definition\_type. This id must be a variable JSON\_PATH
* `category` - if non-null then definition should return a list and a variable will be created if category is present in this list
* `type` (numerical, categorical) - variable type defined in DBConstants (VARIABLE\_...)
* `default_value` – the value of the variable used if definition returns null

**`aggregates`**`:` (stores the aggregate definitions used by models and triggers)

* `id INTEGER PRIMARY KEY`
* `variable_id INTEGER` - points to the variables table. Null if the aggregate is not produced from the variables table
* `aggregate_type INTEGER` - type of aggregate defined in DBConstants (AGGREGATE\_...), the list of possible types is permanently saved in Java - maybe add a dictionary)
* `window_type INTEGER` - a type of window defined in DBConstants (WINDOW\_...). All variables used by the target must be set WINDOW\_TARGET
* `window_size /*count|time*/ INTEGER` - window size as number of event occurrences or time in ms
* `window_shift /*count|time*/ INTEGER` - for windows with offset - number of event occurrences or time in ms
* `definition TEXT` - a java expression that defines an aggregate based on the values of other aggregates. The derived\_aggregates table must list all aggregates used in the definition. Null if the aggregate is not a derived aggregate
* `return_type` - type of aggregate defined by definition (DBConstants.VARIABLE\_...)
* `dictionary_id` - if null then the aggregate does not use the dictionary. If non-null then points to rows from the dictionary table
* `external_data_id INTEGER` - not used yet
* `external_data_name TEXT` - not used yet - name of the variable in the external data source
* `name TEXT` - the name of the variable. This name is used in java expressions, including the definition in this table (i.e., it should not be a java keyword), unique to client\_id

**`derived_aggregates`**`:` (defines arguments for derived variables)

* `derived_aggregate_id INTEGER` – identifier of the aggregate from the aggregates table for which arguments are defined
* `aggregate_id INTEGER` - aggregate id from the aggregates table being an argument (there can be many for a given derived\_aggregate\_id)

**`triggers`**`:` (i.e. definition of the moment of scoring == definition of the moment of creation of the training row with the target, in this table also definitions of groups of users scored with the same model)

* `id INTEGER PRIMARY KEY`
* `definition TEXT` - definition of a given trigger variable, i.e. an expression in java that uses input variables to calculate the value of a given trigger variable, the expression returns true or false, it can also be a definition of the user's membership in the model. These variables must be listed in the trigger\_aggregates table
* `needs_change INTEGER/*boolean*/` - scoring will occur only if the definition expression returned false on the previous trigger run
* `group INTEGER` - the group to which the trigger belongs. If the model defines triggers from several groups and there are several triggers in each group, then scoring will occur if at least 1 trigger for each group returns that scoring should be performed

**`models`**`:`

* `id INTEGER PRIMARY KEY`
* `active` - if false then the model is inactive, it cannot be used for modeling or scoring. If a new model is added, but there is no scoring code yet, then active should be true
* `used` - if false, the model is not used for scoring (it does not affect the calculation of the table to be modeled)
* `target_aggregate_id` - indicates the target. It is used in the construction of the model
* `target_type:`
  * if the value is GENERAL then the target\_start, target\_length, target\_length\_exact described below will be taken into account
  * if the value is CLICK then:
    * will be taken into account trigger\_validator described below
    * When starting offline, it is necessary to enter the positive and negative target values
    * The table is credited with a row for the last positive trigger before the positive target occurred, or the last one in the data if there was no positive target
    * To count the aggregates of the target and validator, all messages after the trigger occurrence are taken into account. For each new occurrence of the trigger (if there was no positive target before), aggregates are counted anew (previously counted are forgotten)
  * If the value is CLICK\_AD then the handling is specific to ClickAd, including:
    * When starting offline, it is necessary to enter the positive and negative target values
    * Each message that corresponds to the display of the ad to the user is saved in the table (multiple rows may appear for one user)
    * A specific JSON format is assumed. Among other things, there are fields from com.algolytics.streaming.clickad.Constants.
    * Matching the target to the trigger is based on specific fields in JSON. For a trigger, if userId is null, then target will still be found based on other fields
* `target_start` - the time in ms after which the target window starts
* `target_length` – window length in ms
* `target_length_exact` - negative value means that the target window is counted until the end of the data (but cannot be less than target\_length)
* `trigger_validator` - an expression in Java that must return true after the trigger occurs in order to write the row to the training table. If equal to null, then there is no additional restriction on the written rows. The Validator can use aggregates with any type of window (WINDOW\_TARGET and WINDOW\_GLOBAL are treated the same)
* `saved_state_time` - the time in ms to which the state of users after building the model was counted. Set by apps
* `first_model_time` - time in ms of the first saving of the scoring code
* `saved_model_time` - time in ms of the last time the scoring code was saved
* `Value` - A value associated with the model, e.g. CPC (Cost per Click) in the case of RTB. If we return a score in response to a request and not a value, then value should be 1
* `client_value` - value related to the model, e.g. client CPC in the case of RTB, how unnecessary is null
* `positive_target_cnt` - in the case of RTB - the ordered number of positive targets to be generated in the campaign (e.g. clickow), if unnecessary is null
* `weight` - weight, in the case of RTB the bid price is: value \* score \* weight
* `positive_target_ratio` - the ratio of the number of rows with a positive target to all rows in the table used for modeling, until counted then it should be set to 0
* `category_id` - campaign category
* `use_category` - 1 - model built per campaign category, 0 - model built per campaign
* `end_date` - the time in MS of the end of the campaign (in the case of RTB, the time by which the ordered number of positive targets is to be made positive\_target\_cnt)

**`model_aggregates`**`:` (table needed as each model can use multiple generators and each generator can be used in multiple models)

* `model_id INTEGER` - id of the model from models table
* `aggregate_id INTEGER` - the id of the variable from the aggregates tablel (there can be many for a given model\_id)
* `used` - positive value means that the variable is passed to the scoring code, negative value means that the variable will be used to build the model

**`trigger_aggregates`**`:` (defines the triggers arguments, table needed, because a given variable of triggers can use multiple aggregates, and each aggregate can be used by multiple variables)

* `trigger_id INTEGER` - id of trigger from trigger table
* `aggregate_id INTEGER` - the aggregate id from the aggregates table, which is an argument (there can be many for a given trigger\_id)

**`model_triggers`**`:` (defines triggers for models, table needed because a given variable of triggers can be used by multiple models, and each model can use multiple triggers)

* `model_id INTEGER` - id of the model from models table
* `trigger_id INTEGER` - trigger id from the triggers table (there can be many for a given model\_id)

**`dictionary`**`:` (dictionary that maps values from JSON to values passed to the aggregate)

* `id` - the id of a group of values (they should not be repeated for the same id) or ranges (they should be disjoint for the same id)
* `categorical` - if true, the value is taken as input. If false, start, start\_inclusive, end, end\_inclusive are taken as input
* `Value` - a specific value from JSON
* `start` - specifies the beginning of the numeric value interval
* `start_inclusive` - specifies whether the start of the interval is open or closed
* `end` - specifies the end of the range of numeric values
* `end_inclusive` - determines whether the end of the interval is open or closed
* `mapped_value` - value transferred to the aggregate

**`default_model:`** default parameters used when automatically adding new models from the API level

* `client_id` - id of a client
* `target_definition` - definition of the target derived aggregate (when loading and creating a new model, the string ${model\_id} will be replaced with the model\_id of the new model)
* `target_aggregates` - aggregates needed to count the target (aggregate names separated by commas, e.g.: 'agg1, agg2, agg3')
* `trigger_definition` - trigger definition (when loading and creating a new model, the ${model\_id} string will be replaced with the model\_id of the new model)
* `trigger_aggregates` - aggregates needed to count the trigger (aggregate names separated by commas, e.g.: 'agg1, agg2, agg3')
* `trigger_validator_definition` - definition of trigger validator (when loading and creating a new model, the string ${model\_id} will be replaced with the model\_id of the new model)
* `trigger_validator_aggregates` - aggregates needed to count trigger validator (aggregate names separated by commas, e.g.: 'agg1, agg2, agg3')
* `target_type` - corresponds to the variable target\_type in the models table
* `target_start` - corresponds to the variable target\_start in the models table
* `target_length` - corresponds to target\_length variable in models table
* `target_length_exact` - corresponds to target\_length\_exact variable in models table
* `target_aggregate_name` - name of the target aggregate (when loading and creating a new model, the string ${model\_id} will be replaced with the model\_id of the new model)
* `weight` - corresponds to the weight variable in the models table

**`model_urls:`** URLs to be included/excluded when building the model

* `model_id` - id of a model
* `client_id` - id of a client
* `url` - URL of the website (domain)
* `included` - if 1 then the page should be included in the modeling, if 0 then excluded

4. The types of variables, aggregates, and types of available aggregation windows are defined in the DBConstants file:

**Types of variables:**

```
public final static int VARIABLE_TYPE_DOUBLE = 1;
public final static int VARIABLE_TYPE_TEXT = 2;
```

**Types of windows:**

```
public final static int WINDOW_TARGET = 0;
public final static int WINDOW_GLOBAL = 1;
public final static int WINDOW_TIME = 2;
public final static int WINDOW_TIME_SLIDE = 3;
public final static int WINDOW_COUNT = 4;
public final static int WINDOW_COUNT_SLIDE = 5;
public final static int WINDOW_CURRENT_TIME = 6;
```

* **WINDOW\_TARGET** – a specific window type to define a target variable for the predictive model training process
* **WINDOW\_GLOBAL** – The window includes all the data history saved in the tool
* **WINDOW\_TIME** – The window aggregates the data in a window specified by time (given in ms). The length of the window is given in window\_size. The window is of the "tumbling window" type
* **WINDOW\_TIME\_SLIDE** - The window aggregates data in a window defined by time (given in ms) and offset by the time specified in the parameter window\_shift (in ms). The length of the window is given in window\_size. Sliding window
* **WINDOW\_COUNT** – window defined as an aggregate from window\_size events
* **WINDOW\_COUNT\_SLIDE** - window defined as an aggregate of window\_size events moved back by window\_shift events. Sliding window
* **WINDOW\_CURRENT\_TIME –** window length window\_size aggregated in real time

For the above window units, the window\_lag parameter allows the window to be moved away by window\_lag from the current moment.

* **Tumbling window** - events are summarized in fixed time-fixed windows. The value changes when the window is closed

<figure><img src="/files/vO1VEXIkNCLoh74hsYJV" alt=""><figcaption></figcaption></figure>

* **Sliding window** - Events are summarized in fixed time-fixed windows but in this case the windows overlap each other, thanks to which we get more frequent updates of the value than for the "tumbling window"

<figure><img src="/files/0k2eANFWNZcLzeNgpliM" alt=""><figcaption></figcaption></figure>

* **Real time window** – aggregate values are calculated in real time

<figure><img src="/files/osijumFGQyLHVkogCT3x" alt=""><figcaption></figcaption></figure>

Please note that real-time window calculation is computationally expensive, so avoid using it where it is not necessary (only if the application requires a real-time aggregate).

**Types of aggregates:**

```
public final static int AGGREGATE_COUNT     	= 1; 
public final static int AGGREGATE_SUM       	= 2;
public final static int AGGREGATE_LASTVALUE 	= 3; //last value based on the order defined by the timestamp field
public final static int AGGREGATE_EXISTS    	= 4; //counts number of events with non null value of a field
public final static int AGGREGATE_MIN       	= 5;
public final static int AGGREGATE_MAX       	= 6; 
public final static int AGGREGATE_CURRENT_VALUE = 7; //last value according to the time of event consumption by EventEngine

```

### High-level events

1. Events that trigger scoring / events that trigger target counting
2. Defined in metadata in the form of expressions in Java (triggers table)
3. Example: "`(aggr1 == 5 && aggr2 == 8) || (aggr1 < 4 && aggr2 == 1)`"

### Definition of target

1. Target is defined as an aggregate in the aggregates table with a special window type window\_type = 0. Then the aggregate id should be entered in the models table
2. An example of the aggregates table – aggregate count as target: `id`, `variable_id`, `aggregate_type`, `window_type`, `name`, … `1`, `1`, `1`, `0`, `D_exists_all`
3. Example form of the models table: `id`, `used`, `target_aggregate_id`, `target_start`, `target_length`, `target_length_exact`, … `1`, `1`, `1`, `0`, `0`, `0`
4. In the example above, the target window is counted from the next event after the event setting the trigger to true (target\_start = 0) and is counted until the end of the data (target\_length = 0 and target\_length\_exact = 0)
5. Example:
   * The target is the aggregate D\_exists\_all taking the value 1 if the event "D" occurred in the given target window and 0 if it did not occur
   * If target\_start = 0, then the target window is counted from the next event after the trigger event is true. Otherwise, the target window starts counting target\_start milliseconds after the trigger event occurs
   * The target window has a length of target\_length in milliseconds. If target\_length = 0, then all messages from the trigger occurrence to the end of the data are taken into account to calculate the target value
   * If target\_length > 0 and target\_length\_exact = 1, it means that if there is no data for the entire window length period, the target will not be counted and the row for the given user will not appear in the resulting table with aggregates

### Converting Events to Variables

1. Defined in metadata in the form of rules in JSONPath. It is a library that allows you to search in json (variables table)
2. Example in point [#messages](#messages "mention")
3. Variable derivatives\
   It is possible to define derived variables. The definition of such a variable is in the form of the Java class.
   * In the variables table, define a variable of the type: definition\_type = TRANSFORMED
   * In the definition field, enter the full definition of the Java class, which must inherit from DoubleTransformation or StringTransformation (there should be no import of this class).\
     There should be no specified package in the class being defined.
   * For a variable of type TRANSFORMED, type the id of the input variable passed to the class defined in the definition\_type in the input\_id field. This id must be a variable of type JSON\_PATH.
   * Example1 – a variable returning a domain from a url:
     * Input variable:
       * id = 1
       * definition\_type = JSON\_PATH
       * definition = $. \['url']
     * Variable derivative
       * id = 2
       * input\_id = 1
       * definition\_type = TRANSFORMED
       * definition:

```
public class Domain extends StringTransformation {
public Object transform(String s) {
int startInd = s.indexOf("//");
int endInd = s.indexOf("/", startInd + 2);
return s.substring(startInd > -1 && startInd < endInd ? startInd + 2 : 0, endInd > -1 ? endInd : s.length());
}
}
```

* Example2 – a variable returning the day of the week based on the time in ms, by default the time field in json. The name of the field in json denoting time can be configured in config (jsonTimeName field):
  * Input variable:
    * id = 1
      * definition\_type = JSON\_PATH
      * definition = $. \['time']
    * Variable derivative
      * id = 2
      * input\_id = 1
      * definition\_type = TRANSFORMED
      * definition:

```
import java.util.Calendar;
public class DayOfWeek extends DoubleTransformation {
Calendar calendar;
public boolean needsCalendar() {
return true;
}
public void setCalendar(Calendar calendar) {
this.calendar = calendar;
}
public Object transform(double d) {
calendar.setTimeInMillis((long) d);
return calendar.get(Calendar.DAY_OF_WEEK);
}
}
```

### State storage for off-line users

1. For users who are currently off-line, aggregate values are stored.
2. Database for storing state: mongo db

### Off-line event processing and modeling

1. The off-line aggregate counting and modeling process can be repeated at pre-set intervals (triggered automatically by offline scheduling or manually on demand)
2. For each model, an analytical table is created containing 1 row for each user for whom the trigger (the condition that triggers the scoring of a given model) has been met and the target has been calculated\
   \
   **Note**: in the case of programmatic, multiple lines can be created for each user, because input jsons (bid requests) can actually contain several bid requests for different impressions. Then as many rows are created for the user as there was an impression id.
3. For each analysis table, the calculation of ABM processes is run, and the finished models (scoring code) and information about the variables used (model signature) are saved to the engine meta data
4. The method that invokes off-line processing calls the ABM API method that allows you to build the model in a single query:\
   <http://e-abm.com/api_documentation.html#resources-models-create-models-in-one-request>\
   or invokes a local ABM script that is executed by the Advanced Miner tool
5. The aggregate table is the input for the ABM process, which automatically selects variables and calculates the optimal model
6. The method that invokes off-line processing allows the table with aggregates to be saved for further analysis, or manual modeling by an analyst. In the call, specify the target alias for gdbase and the name under which the table should be saved

### Models

1. Model information is stored in the metadata in the models table
2. Models used for on-line scoring have the used = 1 flag and the active = 1 flag set. If the model has not yet been built, but is active, then it only has the active = 1 flag
3. The models table stores the scoring code as a string
4. Deployment of the new model
   * The new model after recalculation is automatically implemented - the new model overwrites the old model with the same id
   * If the model uses different variables than the previous one, you have to recalculate all the necessary aggregates backwards to have their current state. Aggregates are added iteratively to ensure the lowest possible latency of processing the first message with the new model. First, the necessary aggregates on the stored retail messages are counted, but in the meantime new messages may have arrived, so the next iteration updates the aggregate values with the additional ones. The process is repeated several times

### Scoring

1. Scoring is triggered by the occurrence of a high-level event. Events are defined in the metadata (triggers table)
2. Different events can trigger scoring with different models (assigning high-level events to models in the meta)
3. Different groups of users can be cored with different models. User group definitions are in the form of expressions defined in the same way as scoring triggers (also the triggers table)
4. A given user can be scored by multiple models at once (i.e. one event triggering scoring can be assigned to multiple models: table model\_triggers)
5. The scoring code is stored as a string in the models table
6. In the case of programmatic, the modeling table is built at the level of the unique user id and the impression id. So the scoring is also at the level of impressions. Events that trigger scoring (bid requests) can actually contain several bid requests (a list of impressions). In this case, at the beginning of processing, the event is divided into several events (one for each impression) and only those events are scored

### Dictionaries

In order to define aggregates more easily, dictionaries can be used. Dictionaries can be defined in the metadata in the dictionary table. Then, when defining the aggregate, the dictionary\_id of the appropriate entry in the dictionary should be provided.

Example of use:

Variable: variable1 takes the values: 1, 2, 3, 4, 5 in the table variables has id 1

Dictionary:

```
zmienna1(string/number), cat1 , cat2
1 , D , AA
2 , F , AA
3 , D , BB
4 , F , BB
5 , D , AA
```

We want to count the following aggregates:

```
zmienna1_D_count_all
zmienna1_F_count_all
zmienna1_AA_count_all
zmienna1_BB_count_all

zmienna1_D_AA_count_all
zmienna1_D_BB_count_all
zmienna1_F_AA_count_all
zmienna1_F_BB_count_all
zmienna1_F_BB_sum_all
```

Form of the dictionary table:

```
id, categorical, value, start, start_inclusive, end, end_inclusive, mapped_value
1, 1, 1, null, null, null, null, D
1, 1, 3, null, null, null, null, D
1, 1, 5, null, null, null, null, D
2, 1, 2, null, null, null, null, F
2, 1, 4, null, null, null, null, F
3, 1, 1, null, null, null, null, AA
3, 1, 2, null, null, null, null, AA
3, 1, 5, null, null, null, null, AA
4, 1, 3, null, null, null, null, BB
4, 1, 4, null, null, null, null, BB
5, 1, 1, null, null, null, null, D_AA
5, 1, 5, null, null, null, null, D_AA
6, 1, 3, null, null, null, null, D_BB
7, 1, 4, null, null, null, null, F_AA
8, 1, 5, null, null, null, null, F_BB
```

Form of the aggregates table (selected columns):

```
variable_id, aggregate_type, dictionary_id, name, …
1, 1, 1, D_count_all
1, 1, 2, F_count_all
1, 1, 3, AA_count_all
1, 1, 4, BB_count_all
1, 1, 5, D_AA_count_all
1, 1, 6, D_BB_count_all
1, 1, 7, D_AA_count_all
1, 1, 8, F_BB_count_all
1, 2, 8, F_BB_sum_all
```

### App Description

1. Components:
   * Core - Processes messages and returns a score. It must use the JDK (JRE is not enough). The application allows you to run multiple Core processes, which allows you to handle higher traffic volumes.\
     Also, if you need to ensure that the garbage collector doesn't stop the application for a long time, you can also run multiple Core processes - then a single process will take up less memory, so starting the garbage collector will be faster.\
     With many Core processes, each processes a certain pool of users resulting from the partitions created on Kafka. Partitioning is after the hash of the user's id.\
     Running multiple Core processes will prevent the garbage collector from stopping the application for a long time (a single process will take up less memory, so starting the garbage collector will be faster).

* HTTPServer - receives queries, sends to the server from Core, receives the result and returns it to the user. A detailed description of the supported queries can be found in the `17` and `18`.
* Metadata - you need a gdbase server with created and populated user data with tables. Table definitions are in the metadata.sql file (in the tool's sources).
* MongoDB - MongoDB database is needed.
* Kafka - you need Kafka and created 2 topics (the names of the topics are written in the configuration files. keys kafka\_request\_topic and kafka\_response\_topic).\
  Authorization must be disabled in Kafka.\
  If you run multiple Core processes, you need to create a kafka\_request\_topic with a number of partitions equal to the number of processes (e.g.: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 4 --topic request) and set kafka\_request\_topic\_partitions to this value before running them.
* InfluxDB and Grafana – used to collect and visualize statistics on-line. In influx, information about incoming events is collected, e.g. the number of events of various types, the number of serrations, processing times, etc.

2. Configuration:\
   The application can be configured with settings in the config.properties files (separate for Core and for HTTPServer). The file is loaded from the current directory. Keys starting with kafka\_consumer\_ are passed to the consumer kafka. The key passed to Kafka is the part of the key in the config.properties file after removing the Kafka\_consumer\_ prefix.\
   In the same way, keys starting with Kafka\_producer\_ are passed to the Kafka producer. By using these prefixes, you can set any keys for Kafka, not just those that are in the provided config.properties.example.
3. Component commissioning\
   Running the install.sh file (from the deploy directory) will cause all components to start at boot. There will also be commands such as: service gdbase restart\
   All components must be installed first. If the components are installed in other directories than those used by \*.service, you need to change the paths in these files. Default paths:

```
AM: /usr/AdvancedMiner
Kafka: /usr/Kafka_2.11-0.9.0.1
streaming: /usr/streaming
```

4. Starting the model build:
   * Building a model (i.e. off-line processing) should be run from the directory in which the streaming\_core.jar is
   * It should be started with the appropriate parameters: `nice -n 19 ionice -c 3 java -cp streaming_core.jar com.algolytics.streaming.Offline ...`\
     A list of all and required parameters will be displayed on the console or in the logs. Default: Log/offline.log\
     The nice + ionice parameters ensure that offline processing will not burden online processing.

* If ABM is run locally (by specifying the abmScript option), then:
  * before running you need to set the LD\_LIBRARY\_PATH /LIBPATH/PATH analogously to what is done in the AdvancedMiner launcher scripts
  * you need to configure the AdvancedMiner, e.g. it is worth setting a higher value MAX\_SCRIPT\_EXECUTOR\_HEAP\_SIZE
* Offline call parameters:
  * modelIds – list of model ids
  * start time – time in ms since when to count the table to be modeled
  * endTime – time in ms until when to count the table to be modeled
  * startDelay – offset in ms from when to count (current time – startDelay)
  * endDelay – shift in ms to when to count (current time – startDelay)
  * stateStartTime – time in ms from when to calculate the state
  * stateStartDelay – time in ms until when to count the state
  * copyURL – alias to gdbase, if given then the resulting table will be copied there
  * copyTablePrefix – prefix for the table name if it is to be copied to gdbase
  * copyUser – user do gdbase
  * copyPassword – hasło do gdbase
  * processMethod – typ metody (approximation, gold, quick, advanced)
  * positiveTargetValue – value denoting a positive target
  * negativeTargetValue – a value denoting a negative target
  * qualityMeasureName – quality measure (as in ABM)
  * cutoff – cutoff threshold for score (as in ABM)
  * samplingMode – type of sampling (as in ABM)
  * samplingSize – sample size (as in ABM)
  * samplingStratificationMode – stratification type (as in ABM)
  * samplingPositiveTargetCategoryRatio – percentage of positive target at stratification (as in ABM)
  * classificationThreshold – threshold (jak w ABM)
  * classificationThresholdType – typ thresholdu (jak w ABM)
  * profitMatrixOptimized
  * profitMatrixCurrency
  * profitMatrixTruePositive
  * profitMatrixFalseNegative
  * profitMatrixFalsePositive
  * profitMatrixTrueNegative
  * useTestData
  * threadCount – the number of threads
  * abmScript – path to the ABM script
  * abmAuthToken – token (when calculating ABM web)
  * userType – user type
  * stateUserType
  * minROCArea – min ROC for the model to be implemented
  * minimumPositiveTargets – the minimum number of positive targets to build a model (programmatic)
  * useModelUrls – if true – filter Urls based on model\_urls (programmatic)
  * triggerEventInTarget – if true – include the event that triggered the trigger to count the target window

5. Detailed recommendations for the installation and configuration of individual application components can be found in the sources (deploy directory).

### API

1. Send json with event\
   Example file with a package of messages from the client (test.json):

```
[
{"userId":1,"eventId":1,"time":1463988845701,"zm1":5,"zm2":8},
{"userId":1,"eventId":1,"time":1463988845709,"zm1":55,"zm2":55}
]
```

Sample customer inquiry:

```
wget -q -O - --post-file=test.json "https://localhost:8321/event?appid=3697"
```

Sample answer:

```
[
{"userId":1,"scores":[{"modelId":1,"score":1.0}]},
{"userId":1,"scores":[{"modelId":1,"score":0.6}]}
]
```

Parameters:

| **Name** | **Description**                                                                             | **Is it mandatory to** |
| -------- | ------------------------------------------------------------------------------------------- | ---------------------- |
| appid    | Customer ID                                                                                 | YES                    |
| time     | Time in ms (field name can be changed in the configuration file, field: jsonTimeName)       | YES                    |
| eventId  | Event type (the field name can be changed in the configuration file, jsonEventIdName field) | YES                    |
| userId   | User ID (the field name can be changed in the configuration file, jsonUserIdName field)     | YES                    |

Error codes:

| **Name**      | **Output JSON**                                                                | **Plaintiff**                        |
| ------------- | ------------------------------------------------------------------------------ | ------------------------------------ |
| 403 Forbidden |                                                                                | No appid                             |
| 200           | <p>Json z polem "error", np.:</p><p>{"error": "Error during json parsing"}</p> | Error in parsing (e.g. when parsing) |

2. Query for the current profile (list of aggregates) of the selected user

Sample customer inquiry:

```
wget -q -O - "https://localhost:8323/profile?appid=3697&userid=AB123"
```

Sample answer:

```
{"agg1_last_value_all":5.0,"agg2_last_value_all":"AAA","agg3_cnt_all":3.0," agg3_sum_all":2330.0}
```

Parameters:

| **Name** | **Description** | **Is it mandatory to** |
| -------- | --------------- | ---------------------- |
| appid    | Customer ID     | YES                    |
| userid   | User ID         | YES                    |

Error codes:

| **Name**        | **Output JSON** | **Plaintiff** |
| --------------- | --------------- | ------------- |
| 403 Forbidden   |                 | No appid      |
| 400 Bad request |                 | No userid     |

### API for adding models

1. Activating the API for adding models is done by setting the variable `enableModelsApi = true in the configuration file (for Core applications)`
2. Adding a model

Sample customer inquiry:

```
wget -qO- --post-data='{"use_category":0, "value":0.6, "category_id":0.6, "client_value":0.6, "positive_target_cnt":0.6, "excluded":["
www.wp.pl
", "
www.onet.pl
"]}' "http:// localhost:8323/models?appid=3697&modelid=123"
```

Sample answer:

`{}` – no error

Parameters:

| **Name**              | **Description**                                                                                                                                                 | **Is it mandatory to**   |
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ |
| appid                 | Customer ID                                                                                                                                                     | YES                      |
| modelid               | Model ID                                                                                                                                                        | YES                      |
| value                 | W programmatic CPC                                                                                                                                              | YES                      |
| client\_value         | In programmatic customer CPC                                                                                                                                    | NO                       |
| use\_category         | If 1 – the model will be built on categories (then category\_id must be provided), if 0 – the model on campaigns                                                | YES                      |
| category\_id          | Model category ID (in the case of programmatic, this is the campaign category)                                                                                  | YES if use\_category = 1 |
| positive\_target\_cnt | In programmatic – the number of clicks ordered                                                                                                                  | NO                       |
| excluded lub included | A list of urls to exclude from modeling (if excluded) to be taken into account in modeling (if included). There can be either an included or an excluded field. | NO                       |

Error codes:

| **Name**        | **Output JSON**                                                                  | **Plaintiff**                                       |
| --------------- | -------------------------------------------------------------------------------- | --------------------------------------------------- |
| 403 Forbidden   |                                                                                  | No appid                                            |
| 404 Not Found   |                                                                                  | No modelid                                          |
| 400 Bad request |                                                                                  | Wrong modelid format                                |
| 200             | {"error": "Parameter use\_category must be set and must be an integer (0 or 1)"} | Not set or wrong format use\_category               |
| 200             | {"error": "Parameter value must be set and must be numeric"}                     | Not set or wrong value format                       |
| 200             | {"error": "Parameter category\_id must be provided if use\_category = 1"}        | Not set category\_id and use\_category = 1          |
| 200             | {"error": "Exactly one parameter must be set (either included or excluded)"}     | None or both fields provided: excluded and included |

3. Modifying Model Parameters

The same query as when adding a model, but in addition to the mandatory parameters, only the modified ones should be specified.

Sample customer inquiry:

```
wget -qO- --post-data='{"use_category":0, "value":1.6}' "http:// localhost:8323/models?appid=3697&modelid=123"
```

Sample answer:

`{}` – no error

Parameters:

| **Name**              | **Description**                                                                                                                                                 | **Is it mandatory to**   |
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ |
| appid                 | Customer ID                                                                                                                                                     | YES                      |
| modelid               | Model ID                                                                                                                                                        | YES                      |
| value                 | W programmatic CPC                                                                                                                                              | YES                      |
| client\_value         | In programmatic customer CPC                                                                                                                                    | NO                       |
| use\_category         | If 1 – the model will be built on categories (then category\_id must be provided), if 0 – the model on campaigns                                                | YES                      |
| category\_id          | Model category ID (in the case of programmatic, this is the campaign category)                                                                                  | YES if use\_category = 1 |
| positive\_target\_cnt | In programmatic – the number of clicks ordered                                                                                                                  | NO                       |
| excluded lub included | A list of urls to exclude from modeling (if excluded) to be taken into account in modeling (if included). There can be either an included or an excluded field. | NO                       |

Error codes:

| **Name**        | **Output JSON**                                                                  | **Plaintiff**                                       |
| --------------- | -------------------------------------------------------------------------------- | --------------------------------------------------- |
| 403 Forbidden   |                                                                                  | No appid                                            |
| 404 Not Found   |                                                                                  | No modelid                                          |
| 400 Bad request |                                                                                  | Wrong modelid format                                |
| 200             | {"error": "Parameter use\_category must be set and must be an integer (0 or 1)"} | Not set or wrong format use\_category               |
| 200             | {"error": "Parameter value must be set and must be numeric"}                     | Not set or wrong value format                       |
| 200             | {"error": "Parameter category\_id must be provided if use\_category = 1"}        | Not set category\_id and use\_category = 1          |
| 200             | {"error": "Exactly one parameter must be set (either included or excluded)"}     | None or both fields provided: excluded and included |

4. Deactivate a model

Sample customer inquiry:

```
wget -qO- --delete "http:// localhost:8323/models?appid=3697&modelid=123"
```

Sample answer:

`{}` – no error

Parameters:

| **Name** | **Description** | **Is it mandatory to** |
| -------- | --------------- | ---------------------- |
| appid    | Customer ID     | YES                    |
| modelid  | Model ID        | YES                    |

Error codes:

| **Name**        | **Output JSON**                        | **Plaintiff**                       |
| --------------- | -------------------------------------- | ----------------------------------- |
| 403 Forbidden   |                                        | No appid                            |
| 404 Not Found   |                                        | No modelid                          |
| 400 Bad request |                                        | Wrong modelid format                |
| 200             | {"error": "model 111 does not exist."} | There is no model with the given id |

5. Adding Urls for excluding/including during modeling (for programmatic)

Sample customer inquiry:

```
wget -qO- --post-data='{"excluded":["
www.wp.pl
", "
www.onet.pl
"]}' "http:// localhost:8323/urls?appid=3697&modelid=123"
```

Sample answer:

`{}` – no error

Parameters:

| **Name**              | **Description**                                                                                                                                                 | **Is it mandatory to** |
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------- |
| appid                 | Customer ID                                                                                                                                                     | YES                    |
| modelid               | Model ID                                                                                                                                                        | YES                    |
| Excluded lub included | A list of urls to exclude from modeling (if excluded) to be taken into account in modeling (if included). There can be either an included or an excluded field. | YES                    |

Error codes:

| **Name**        | **Output JSON**                                                              | **Plaintiff**                                       |
| --------------- | ---------------------------------------------------------------------------- | --------------------------------------------------- |
| 403 Forbidden   |                                                                              | No appid                                            |
| 404 Not Found   |                                                                              | No modelid                                          |
| 400 Bad request |                                                                              | Wrong modelid format                                |
| 200             | {"error": "Exactly one parameter must be set (either included or excluded)"} | None or both fields provided: excluded and included |

6. Retrieve information about the selected model

Sample customer inquiry:

```
wget -qO- --get "http:// localhost:8323/models?appid=3697&modelid=115"
```

Sample answer:

```
{
"model_id": 115,
"active": 1,
"used": 0,
"first_model_time": null,
"saved_model_time": null,
"value": 6,
"client_value": 0.3,
"positive_target_cnt": 10000,
"category_id": 9,
"use_category": 0,
"end_date": "1970-01-01 00:00:00"
}
```

Parameters:

| **Name** | **Description** | **Is it mandatory to**                |
| -------- | --------------- | ------------------------------------- |
| appid    | Customer ID     | YES                                   |
| modelid  | Model ID        | NO – then all models will be returned |

Error codes:

| **Name**        | **Output JSON** | **Plaintiff**          |
| --------------- | --------------- | ---------------------- |
| 403 Forbidden   |                 | No appid               |
| 400 Bad request |                 | Wrong modelid format   |
| 200             | {}              | There is no such model |

7. Retrieve information about all models

Sample customer inquiry:

```
wget -qO- --get "http:// localhost:8323/models?appid=3697"
```

Sample answer:

```
[
{
"model_id": 111,
"active": 1,
"used": 0,
"first_model_time": null,
"saved_model_time": null,
"value": 6,
"client_value": 0.2,
"positive_target_cnt": 10000,
"category_id": 5,
"use_category": 1,
"end_date": null
},
{
"model_id": 112,
"active": 1,
"used": 0,
"first_model_time": null,
"saved_model_time": null,
"value": 6,
"client_value": 0.2,
"positive_target_cnt": 10000,
"category_id": 0,
"use_category": 0,
"end_date": "2018-02-28 04:00:02"
}
]
```

Parameters:

| **Name** | **Description** | **Is it mandatory to** |
| -------- | --------------- | ---------------------- |
| appid    | Customer ID     | YES                    |

Error codes:

| **Name** | **Output JSON** | **Plaintiff**                         |
| -------- | --------------- | ------------------------------------- |
| 200      | \[]             | There are no models for such an appid |

8. Retrieving information about the list of used/unused, active models

Sample customer inquiry:

```
wget-qO- --get " "
```

Sample answer:

```
[1018,1312,1314,1315,1319,1355,1378,1379,1391,1398]
```

Parameters:

| **Name** | **Description**                                                                                                                                                                 | **Is it mandatory to** |
| -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------- |
| appid    | Customer ID                                                                                                                                                                     | YES                    |
| used     | <p>1 – list of models used in on-line scoring (active=1 and used=1 in meta data)</p><p>0 – list of active models, but with used = 0 (e.g. the model has not been built yet)</p> |                        |

Error codes:

| **Name** | **Output JSON** | **Plaintiff**                                               |
| -------- | --------------- | ----------------------------------------------------------- |
| 200      | \[]             | There are no models for such an appid and set value of used |

### Scheduling offline processing

1. It is possible to schedule the process of building models
2. Offline scheduling is activated by setting the enableOfflineScheduler = true `variable in the configuration file (for the Core application)`
3. Scheduling is done by calling the API request to add models (point  [#api-for-adding-models](#api-for-adding-models "mention")).
4. The first time you add a model, the job is scheduled for the next day. If the building is successful and the model is implemented (a sufficient number of positive targets, the appropriate quality of the model based on ROC), the next build is scheduled for a week.
5. Currently, offline processes run serially (this is set in the code in the configuration of the Quartz library used for scheduling)

### Metadata overload while the app is running

* While the application is running, you can manually reload the metadata by calling the following command from the Core directory:

```
java -cp streaming_core.jar com.algolytics.streaming.ReloadMetadata clientId
```

* You can also modify the parameters of a single model (including the weight parameter that affects the calculated bid price in the programmatic case):

```
java -cp streaming_core.jar com.algolytics.streaming.ReloadModel clientId modelId value weight
```

### Visualization - EVE Metrics

1. Requirements\
   t needs influxDB and grafana or Power BI to work.
2. How it works:\
   Metrics are sent by the application (Core) to the influxDB database or to PowerBI, and then visualized by Grafana (or PowerBI). Statistics are counted for all requests that are processed by the application within a certain period of time. In Grafana, you need to define the influx as the DataSource from which the metrics will be retrieved. The app doesn't connect directly to Grafana.
3. Configuration:\
   The influx parameters are defined in the configuration file (for the Core application) along with other parameters for metrics:
   * metrics\_destination - INFLUX\_DB if statistics are to be sent to influx POWER\_BI if to Power BI, NONE if statistics counting is to be disabled
   * influx\_db\_url - adres url influxa np. [http://127.0.0.1:8086](http://127.0.0.1:8086/)
   * influx\_db\_user - username
   * influx\_db\_password - password
   * influx\_db\_database - database name
   * influx\_db\_retention\_policy - how long to keep records for metrics (<https://docs.influxdata.com/influxdb/v0.9/query_language/database_management/#retention-policy-management>)
   * custom\_request\_fields - fields from the incoming event to the engine by which metric values are to be aggregated
   * metric\_processed\_times – the value for which the number of messages that occurred in so many milliseconds is counted.
   * metric\_time\_window - every so many seconds, the metrics are recalculated for the collected events
   * aggregate\_time\_window - used by MeanScoreMetric, calculates the average score value in a time window. Defined in seconds, it should not be less than metric\_time\_window.
   * max\_metric\_calculation\_threads - The maximum number of threads to compute metrics. The number of threads is determined by the number of metrics defined, but it cannot be greater than the maximum number of threads.
   * event\_request\_metrics - metric names for events of the EVENT type, e.g. ScoreMetric; ProcessedRequestsMetric; WinPrcMetric; BidPrcMetric
   * profile\_request\_metrics - metric names for events of the PROFILE type
4. Available metrics:
   * ProcessedRequestsMetric\
     Presents the number of processed requests in a given ***metric\_time\_window*** along with the number of incorrect requests and those for which skinning was performed. The collected statistics are aggregated per clientId and user-defined fields in the custom\_request\_fields configuration.\
     Fields in JSON sent to Grafana (influx name: *processed\_requests*):
     * processed (number)
     * processed\_time\_\[numer z configa] (number)
     * scored (number)
     * errors (number)
     * min\_time (number)
     * max\_time (number)
     * mean\_time (number)
     * sum\_time (number)

![C:\gg\ClickAd\obrazki\image2017-1-27 10\_40\_6.png](/files/C49wI585ru93S8kDN5j9)

* MeanScoreMetric\
  The metric averages the score for each clientId and modelId, and user-defined fields in the custom\_request\_fields configuration. The time window size is configured by the aggregate\_time\_window with an offset every metric\_time\_window.\
  In the programmatic version, the suggested bidding price is returned instead of the score.\
  Fields in json sent to Grafana (influx name: *score*):
  * min\_score (number)
  * max\_score (number)
  * mean\_score (number)
  * sum\_score (number)
  * scores\_count (number)
  * modelId (text)

![C:\gg\ClickAd\obrazki\image2017-1-27 10\_51\_9.png](/files/nJ3ClkMqBnVnAiUjCbm8)

* WinPrcMetric\
  Metric used only in the programmatic version, regarding the price per won view\
  Fields in json sent to Grafana (influx name: *win\_prc*):
  * min\_win\_prc (number)
  * max\_win\_prc (number)
  * mean\_win\_prc (number)
  * sum\_win\_prc (number)
  * count\_win\_prc (number)
* BidPrcMetric\
  Metric used only in the programmatic version, concerning the bid price (taken from bid response)\
  Fields in json sent to Grafana (influx name: *win\_prc*):
  * min\_bid\_prc (number)
  * max\_bid\_prc (number)
  * mean\_bid\_prc (number)
  * sum\_bid\_prc (number)
  * count\_bid\_prc (number)

[^1]: e-abm.com or local ABM script call in Advanced Miner

[^2]: Relational database as part of the Algolytics analytics platform (<http://algolytics.pl/wp-content/uploads/docs\\_en/bk01pt05.html>)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://algolytics-technologies.gitbook.io/algolytics/event-engine-administrator.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
