Algolytics Technologies Documentation
  • End-to-end Data Science Platform
  • ABM
    • Introduction to ABM
    • Repository
    • Classification - adding, running and deleting projects
    • Approximation - adding, running and deleting projects
    • Models and variables statistics
    • Model deployment
    • ABM API
    • Data scoring
    • Adding, running and deleting projects
  • Event Engine [user]
    • Engine description
    • How the engine works
    • Events
    • Aggregate module
    • Metadata
    • Components of metadata
    • Off-line processing and modeling
    • Examples of API operations
    • Visualisation
  • Event Engine [administrator]
  • Scoring.One
    • Engine description
    • Panels overview
    • Implementation of scoring models
    • Creating and testing a scenario
    • SCE Tracking Script
  • Advanced Miner
    • Documentation
    • How to install license key
  • DataQuality [web app]
  • Algolytics APIs
    • DQ for Python API
    • Scoring Engine WEB API
    • ABM Restfull API
    • Other APIs
  • Privacy policy
  • GDPR
Powered by GitBook
On this page
  • Purpose of the solution
  • Scope of the solution
  • Solution assumptions
  • Messages
  • Unit counting module
  • How messages are stored
  • Structure and storage of metadata
  • High-level events
  • Definition of target
  • Converting Events to Variables
  • State storage for off-line users
  • Off-line event processing and modeling
  • Models
  • Scoring
  • Dictionaries
  • App Description
  • API
  • API for adding models
  • Scheduling offline processing
  • Metadata overload while the app is running
  • Visualization - EVE Metrics

Event Engine [administrator]

Event engine – functional specification of the solution

Purpose of the solution

  • Efficient real-time event stream processing and on-line scoring based on analytical models using information from these event streams

  • The solution is scalable and configurable, which allows it to be used in various business branches (including: gaming, recommendations, Web analytics, IoT – e.g. processing of event streams from device sensors)

Scope of the solution

It is a complete system for processing event streams on-line (event aggregation and real-time scoring) and off-line (event aggregation to automatically build analytical models). The scope of the solution includes:

  • Ensure efficient handling of event streams from multiple clients at the same time

  • Write retail events to a repository for off-line aggregation and create analytic tables for modelling

  • Storing the state of off-line users

  • Aggregate counting – a module used for counting aggregates in off-line (building models) and on-line (on-line scoring), connected to the processing path in the selected environment

  • Automatic creation of analytical models (using ABM)

  • Automatic deployment of new models for use for on-line scoring (via metadata)

  • On-line scoring – checking the conditions that trigger scoring with individual models, triggering scoring and returning a response to the customer

Solution assumptions

The client application sends messages (events) in the form of jsons via an http connection (REST API). Events are fed into the event engine via the Kafka queue. Each event is written to the event repository to enable off-line processing.

Then:

In the on-line version:

  • The event is converted into variables (definition in metadata)

  • The user's aggregate values using these variables are refreshed

  • The scoring conditions for each model are checked (conditions that trigger scoring and conditions that check whether a given user should be scored with a given model)

  • For each model for which the conditions are met, a row of data is prepared for recalculation (based on the model description in the meta data)

  • Scoring is triggered

  • The response is returned to the client

In the off-line version (triggered every set period of time, automated process):

  • For each customer and model, aggregates are counted based on messages stored in a text file

  • For each user, 1 row can be created in the result analytic table containing the counted aggregates and the value of the target variable. For some users, the poem will not be created, because:

    • The scoring condition will not be met

    • The conditions for calculating the target window will not be met (e.g. the target window will be exactly 3 days, and there are only 2 days of history in the data)

    Note: in the case of programmatic, multiple lines can be created for each user, because input jsons (bid requests) can actually contain several bid requests for different impressions. Then as many rows are created for the user as there was an impression id.

  • A separate analytical table is created for each model

  • The analysis table is the input for that count models

  • Selected models are automatically deployed (scoring code and information about the variables used are saved to metadata)

Scheme of operation of the system

Messages

  1. The client application sends messages (events) in the form of jsons over an http connection

    • http connection sends messages in packets (in particular, there can be 1 message in a packet)

    • Data encryption: SSL (can be disabled by setting it in the http server configuration file)

  2. Messages can come from multiple sources (e.g., game servers, users)

    • Differentiation by client_id

    • The application is configurable for specific customers by defining dedicated metadata (variables, aggregates, models)

  3. The order in which messages are processed, based on the time the event arrives in the system, is maintained

  4. Event json format

    • Generic formats – for them we provide effective processing

{"arrival_ts":1232344,
 "client_id":1,
 "user_id":5,
 "event_id":23,
 "eventType1":
  {"eventA":{"value":10, "name":"AAA"},
  "value":"value1"
  }
}
  • Variable example: $.['eventType1'].['eventA'].['value'] returns 10

  • Variable example: $.['eventType1'].['value'] returns "value1"

  • In addition, queries that allow the insertion of conditions that check for equality "==" and are combined with "&&" are optimized: $.['eventType1'].['eventA'].[?(@.['value'] == 10 && @.['name'] == "AAA")].value returns [10] or: $.['eventType1'].[?(@.['eventA'].['value'] == 10 && @.['eventA'].['name'] == "AAA")].value returns ["value1"] Filters using "==" and "&&" can be at different levels of the querycode

  • Lists of values of the "category" type are also optimized: ["A1", "A2", "A3"] if we want to pull out a category variable that is a list, but ultimately we want to build aggregates that count the number of events with a given value in the list, e.g. A1_cnt_all, A2_cnt_all, A3_cnt_all

{"arrival_ts":1232344,
 "client_id":1,
 "user_id":5,
 "event_id":23,
 "category":["A1", "A2", "A3"]
}
  • Arbitrary formats, compliant with JSONPath (slower processing efficiency) – the analyst adds rules for converting to a variable to the metadata (table variables, point Definition of target):

{"arrival_ts":1232344,
 "client_id":1,
 "user_id":5,
 "event_id":23,
 "name":"eventType1",
 "type":
  [
    {"event":"eventA",
     "value":10
    },
    {"event":"eventB",
     "value":45
    }
  ]
}
  • Example using a regular expression:

$.[? (@.name =~ /.*eve.*/i)].['type'].[1].['value'] returns [45]

  1. Response to the customer's message:

  • Score is returned directly in the query response

  • The results include the following fields:

    • userId – The user's ID

    • scores – a list of models and scores; empty if no scoring has occurred

    • modelsId - models

    • score – the value of the score for the model

{"userId":5,
 "scores":
  [
    {"modelId":1, "score":0.16596808075912742},
    {"modelId":2, "score":0.56665555575912789},
    {"modelId":3, "score":0.78954308075912573}
  ]
}
  1. Response to the client message in the case of programmatic:

  • Score is returned directly in response to the bid request

  • The results include a list of suggested bidding prices for each impression:

    • impid – impression id

    • scores – a list of models and scores. The stores object is empty if no scoring has occurred or there is no active deployed model. Otherwise, it contains elements where the key is the model id and the value is the suggested bidding price

[
    {
        "impid": "424c6a8db16d4fa8ab853c5cd7b04ac7",
        "scores": {
            "1314": 0.6930373068263596,
            "1315": 5.24522720421126,
            "1316": 44.87740531791001
            }
        },
    {
        "impid": "e5d26e91ac2441f997e3b1bee6168562",
        "scores": {
            "1314": 0.6930373068263596,
            "1315": 5.24522720421126,
            "1316": 44.87740531791001
            }
        }
]// Some code
  • The price is calculated according to the formula: min(10 * value, score * value * weight)

    • score

    • value – the value of the variable from the models table (CPC)

    • weight – weight from the models table (by default 1000, because the prices are bid in the CPM rate)

Unit counting module

  1. Counting aggregates in the on-line and off-line versions (in both versions, the aggregates are counted with the same code)

  2. Off-line version

    • Processing is parallelized after user id

    • Aggregates are counted based on retail events written to the event repository

    • The launch of the off-line version can be scheduled (offline scheduling, point Scheduling offline processing)

  3. On-line version

    • Processing is parallelized after user id

    • On-line generators are counted and stored in memory (for on-line users)

    • After a set time of user inactivity defined in the configuration file (no messages about a given user), the aggregates are saved in the mongo db database (the user logs out)

  4. Types of aggregates

    • Incremental Count for Whole Data

    • Counted in a window (time and with a specific number of messages)

    • Sliding windows (timed and with a specific number of messages)

    • Target windows – used for target calculation, only in the off-line version

    • A single message can belong to multiple windows

    • The list of counted aggregates is defined in the metadata

  5. Aggregate List

    • Number of occurrences, sum, last value, flag if the event occurred, min, max, current value (from the currently processed json)

    • Derivative aggregates defined in the form of expressions in Java (e.g. aggr1 + aggr2)

    • Aggregates resulting from defined dictionaries (described in the section on dictionaries)

How messages are stored

  1. All messages are written to the message repository. If necessary (e.g. customer requirement), a backup can be created

  2. Writing messages to the repository does not block further processing

  3. Repository for storing messages: txt files

Structure and storage of metadata

  1. Metadata is created and stored in the

  2. Creating tables:

CREATE TABLE clients(id INTEGER PRIMARY KEY /*, ...*/);
CREATE TABLE external_data(id INTEGER PRIMARY KEY, client_id INTEGER /*, ...*/);
CREATE TABLE variables( id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, event_id INTEGER NOT NULL CHECK(event_id >= -1), definition TEXT NOT NULL, definition_type TEXT NOT NULL CHECK(definition_type = 'JSON_PATH' OR definition_type = 'TRANSFORMED'), input_id INTEGER CHECK((input_id IS NULL AND definition_type != 'TRANSFORMED') OR (input_id IS NOT NULL AND definition_type = 'TRANSFORMED')), category TEXT CHECK(category IS NULL OR definition_type = 'JSON_PATH'), type INTEGER NOT NULL CHECK(type = 1 OR type = 2), default_value TEXT);
CREATE INDEX ON variables(client_id);
CREATE INDEX ON variables(input_id);
CREATE TABLE aggregates( id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, variable_id INTEGER, aggregate_type INTEGER, window_type INTEGER, window_size/*count|time*/ INTEGER, window_shift/*count|time*/ INTEGER, definition TEXT, return_type INTEGER CHECK((definition IS NULL AND return_type IS NULL) OR (definition IS NOT NULL AND return_type IS NOT NULL AND (return_type = 1 OR return_type = 2))), dictionary_id INTEGER, external_data_id INTEGER, external_data_name TEXT, name TEXT NOT NULL CHECK(name REGEXP '[a-zA-Z_$][a-zA-Z_$0-9]*'));
CREATE UNIQUE INDEX ON aggregates(client_id, name);
CREATE TABLE derived_aggregates( derived_aggregate_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL
);
CREATE INDEX ON derived_aggregates(derived_aggregate_id);
CREATE TABLE trigger_aggregates( trigger_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL
);
CREATE INDEX ON trigger_aggregates(trigger_id, aggregate_id);
CREATE TABLE triggers( id INTEGER PRIMARY KEY, definition TEXT NOT NULL, needs_change INTEGER/*boolean*/ NOT NULL CHECK(needs_change = 0 OR needs_change = 1), `group` INTEGER NOT NULL
);
CREATE TABLE model_triggers( model_id INTEGER NOT NULL, trigger_id INTEGER NOT NULL
);
CREATE INDEX ON model_triggers(model_id, trigger_id);
CREATE TABLE model_aggregates( model_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL, used INTEGER/*boolean*/ NOT NULL CHECK(used = 0 OR used = 1)
);
CREATE INDEX ON model_aggregates(model_id, aggregate_id);
CREATE TABLE trigger_validator_aggregates( model_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL
);
CREATE INDEX ON trigger_validator_aggregates(model_id, aggregate_id);
CREATE TABLE models( id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, active INTEGER/*boolean*/ NOT NULL CHECK(active = 0 OR active = 1), used INTEGER/*boolean*/ NOT NULL CHECK(used = 0 OR (active = 1 AND used = 1)), target_aggregate_id INTEGER NOT NULL, target_type TEXT NOT NULL CHECK((target_type = 'GENERAL' AND trigger_validator IS NULL) OR ((target_type = 'CLICK' OR target_type = 'CLICK_AD') AND target_start = 0 AND target_length = 0 AND target_length_exact = 0)), target_start INTEGER NOT NULL CHECK(target_start >= 0), target_length INTEGER NOT NULL CHECK(target_length >= 0), target_length_exact INTEGER/*boolean*/ NOT NULL CHECK(target_length_exact = 0 OR target_length_exact = 1), trigger_validator TEXT, scoring_code TEXT CHECK(NOT used OR scoring_code IS NOT NULL), saved_state_time INTEGER CHECK(NOT used OR saved_state_time IS NOT NULL), first_model_time INTEGER CHECK(NOT used OR first_model_time IS NOT NULL), saved_model_time INTEGER CHECK(NOT used OR saved_model_time IS NOT NULL), value FLOAT NOT NULL CHECK(value > 0), client_value FLOAT, positive_target_cnt INTEGER, weight FLOAT NOT NULL CHECK(weight > 0 AND weight <= 1000), positive_target_ratio FLOAT NOT NULL CHECK(positive_target_ratio >= 0 AND positive_target_ratio <= 1), category_id INTEGER CHECK(NOT use_category OR category_id IS NOT NULL), use_category INTEGER/*boolean*/ NOT NULL CHECK(use_category = 0 OR use_category = 1), end_date INTEGER
);
CREATE INDEX ON models(client_id);# Przy zmianach check-ow w tabeli models nalezy pamietac o wprowadzeniu tych samych zmian w odpowiadajacych zmiennych w tabeli default_models i odwrotnie (zmienna trigger_validator w models odpowiada zmiennej trigger_validator_definition w default_model)
CREATE TABLE default_model( id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, target_definition TEXT NOT NULL, target_aggregates TEXT NOT NULL, trigger_definition TEXT NOT NULL, trigger_aggregates TEXT NOT NULL, trigger_validator_definition TEXT NOT NULL, trigger_validator_aggregates TEXT NOT NULL, target_type TEXT NOT NULL CHECK((target_type = 'GENERAL' AND trigger_validator_definition IS NULL) OR ((target_type = 'CLICK' OR target_type = 'CLICK_AD') AND target_start = 0 AND target_length = 0 AND target_length_exact = 0)), target_start INTEGER NOT NULL CHECK(target_start >= 0), target_length INTEGER NOT NULL CHECK(target_length >= 0), target_length_exact INTEGER/*boolean*/ NOT NULL CHECK(target_length_exact = 0 OR target_length_exact = 1), target_aggregate_name TEXT NOT NULL, weight FLOAT NOT NULL CHECK(weight > 0 AND weight <= 1000));
CREATE TABLE dictionary(
id INTEGER, categorical INTEGER/*boolean*/ NOT NULL CHECK(categorical = 0 OR categorical = 1), value TEXT CHECK(categorical OR value IS NULL), start FLOAT CHECK(NOT categorical OR start IS NULL), start_inclusive FLOAT CHECK((start IS NULL AND start_inclusive IS NULL) OR (start IS NOT NULL AND start_inclusive IS NOT NULL AND (start_inclusive = 0 OR start_inclusive = 1))), end FLOAT CHECK((NOT categorical OR end IS NULL) AND (start IS NULL OR end IS NULL OR end >= start)), end_inclusive FLOAT CHECK((end IS NULL AND end_inclusive IS NULL) OR (end IS NOT NULL AND end_inclusive IS NOT NULL AND (end_inclusive = 0 OR end_inclusive = 1))), mapped_value TEXT NOT NULL
);
CREATE INDEX ON dictionary(id);
CREATE TABLE model_urls( model_id INTEGER NOT NULL, client_id INTEGER NOT NULL, url TEXT NOT NULL, included INTEGER/*boolean*/ NOT NULL CHECK(included = 0 OR included = 1));
CREATE INDEX ON model_urls(model_id, client_id);
  1. Tables structure:

clients: (not used yet)

  • id INTEGER PRIMARY KEY

  • authorization data

  • other customer data, e.g. payments, access restrictions

external_data: (not used yet)

  • id INTEGER PRIMARY KEY

  • client_id INTEGER

  • Data pointing to an external data source

variables: (stores definitions of variables obtained from json)

  • id INTEGER PRIMARY KEY

  • event_id INTEGER - corresponds to the eventId field passed in json

  • definition_type – definition type: JSON_PATH lub TRANSFORMED

  • definition TEXT - definition of converting an event to a variable (JSONPath)

    • if definition_type = JSON_PATH then definition contains a JsonPath expression. It should return a numeric or text value. The syntax for expressions is described in the JsonPath_README.md file. For performance reasons, it's best to use only expressions like $.x.y.z. In addition, optimization for [?( @.a == 'x' && @.b == 3 && @.c == @.d && ...)]

    • if definition_type = TRANSFORMED then definition contains the full definition of the Java class, which must inherit from DoubleTransformation or StringTransformation (there should be no import of this class). There should be no specified package in the class being defined

  • input_id - for the variable TRANSFORMED indicates the id of the input variable passed to the class defined in the definition_type. This id must be a variable JSON_PATH

  • category - if non-null then definition should return a list and a variable will be created if category is present in this list

  • type (numerical, categorical) - variable type defined in DBConstants (VARIABLE_...)

  • default_value – the value of the variable used if definition returns null

aggregates: (stores the aggregate definitions used by models and triggers)

  • id INTEGER PRIMARY KEY

  • variable_id INTEGER - points to the variables table. Null if the aggregate is not produced from the variables table

  • aggregate_type INTEGER - type of aggregate defined in DBConstants (AGGREGATE_...), the list of possible types is permanently saved in Java - maybe add a dictionary)

  • window_type INTEGER - a type of window defined in DBConstants (WINDOW_...). All variables used by the target must be set WINDOW_TARGET

  • window_size /*count|time*/ INTEGER - window size as number of event occurrences or time in ms

  • window_shift /*count|time*/ INTEGER - for windows with offset - number of event occurrences or time in ms

  • definition TEXT - a java expression that defines an aggregate based on the values of other aggregates. The derived_aggregates table must list all aggregates used in the definition. Null if the aggregate is not a derived aggregate

  • return_type - type of aggregate defined by definition (DBConstants.VARIABLE_...)

  • dictionary_id - if null then the aggregate does not use the dictionary. If non-null then points to rows from the dictionary table

  • external_data_id INTEGER - not used yet

  • external_data_name TEXT - not used yet - name of the variable in the external data source

  • name TEXT - the name of the variable. This name is used in java expressions, including the definition in this table (i.e., it should not be a java keyword), unique to client_id

derived_aggregates: (defines arguments for derived variables)

  • derived_aggregate_id INTEGER – identifier of the aggregate from the aggregates table for which arguments are defined

  • aggregate_id INTEGER - aggregate id from the aggregates table being an argument (there can be many for a given derived_aggregate_id)

triggers: (i.e. definition of the moment of scoring == definition of the moment of creation of the training row with the target, in this table also definitions of groups of users scored with the same model)

  • id INTEGER PRIMARY KEY

  • definition TEXT - definition of a given trigger variable, i.e. an expression in java that uses input variables to calculate the value of a given trigger variable, the expression returns true or false, it can also be a definition of the user's membership in the model. These variables must be listed in the trigger_aggregates table

  • needs_change INTEGER/*boolean*/ - scoring will occur only if the definition expression returned false on the previous trigger run

  • group INTEGER - the group to which the trigger belongs. If the model defines triggers from several groups and there are several triggers in each group, then scoring will occur if at least 1 trigger for each group returns that scoring should be performed

models:

  • id INTEGER PRIMARY KEY

  • active - if false then the model is inactive, it cannot be used for modeling or scoring. If a new model is added, but there is no scoring code yet, then active should be true

  • used - if false, the model is not used for scoring (it does not affect the calculation of the table to be modeled)

  • target_aggregate_id - indicates the target. It is used in the construction of the model

  • target_type:

    • if the value is GENERAL then the target_start, target_length, target_length_exact described below will be taken into account

    • if the value is CLICK then:

      • will be taken into account trigger_validator described below

      • When starting offline, it is necessary to enter the positive and negative target values

      • The table is credited with a row for the last positive trigger before the positive target occurred, or the last one in the data if there was no positive target

      • To count the aggregates of the target and validator, all messages after the trigger occurrence are taken into account. For each new occurrence of the trigger (if there was no positive target before), aggregates are counted anew (previously counted are forgotten)

    • If the value is CLICK_AD then the handling is specific to ClickAd, including:

      • When starting offline, it is necessary to enter the positive and negative target values

      • Each message that corresponds to the display of the ad to the user is saved in the table (multiple rows may appear for one user)

      • A specific JSON format is assumed. Among other things, there are fields from com.algolytics.streaming.clickad.Constants.

      • Matching the target to the trigger is based on specific fields in JSON. For a trigger, if userId is null, then target will still be found based on other fields

  • target_start - the time in ms after which the target window starts

  • target_length – window length in ms

  • target_length_exact - negative value means that the target window is counted until the end of the data (but cannot be less than target_length)

  • trigger_validator - an expression in Java that must return true after the trigger occurs in order to write the row to the training table. If equal to null, then there is no additional restriction on the written rows. The Validator can use aggregates with any type of window (WINDOW_TARGET and WINDOW_GLOBAL are treated the same)

  • saved_state_time - the time in ms to which the state of users after building the model was counted. Set by apps

  • first_model_time - time in ms of the first saving of the scoring code

  • saved_model_time - time in ms of the last time the scoring code was saved

  • Value - A value associated with the model, e.g. CPC (Cost per Click) in the case of RTB. If we return a score in response to a request and not a value, then value should be 1

  • client_value - value related to the model, e.g. client CPC in the case of RTB, how unnecessary is null

  • positive_target_cnt - in the case of RTB - the ordered number of positive targets to be generated in the campaign (e.g. clickow), if unnecessary is null

  • weight - weight, in the case of RTB the bid price is: value * score * weight

  • positive_target_ratio - the ratio of the number of rows with a positive target to all rows in the table used for modeling, until counted then it should be set to 0

  • category_id - campaign category

  • use_category - 1 - model built per campaign category, 0 - model built per campaign

  • end_date - the time in MS of the end of the campaign (in the case of RTB, the time by which the ordered number of positive targets is to be made positive_target_cnt)

model_aggregates: (table needed as each model can use multiple generators and each generator can be used in multiple models)

  • model_id INTEGER - id of the model from models table

  • aggregate_id INTEGER - the id of the variable from the aggregates tablel (there can be many for a given model_id)

  • used - positive value means that the variable is passed to the scoring code, negative value means that the variable will be used to build the model

trigger_aggregates: (defines the triggers arguments, table needed, because a given variable of triggers can use multiple aggregates, and each aggregate can be used by multiple variables)

  • trigger_id INTEGER - id of trigger from trigger table

  • aggregate_id INTEGER - the aggregate id from the aggregates table, which is an argument (there can be many for a given trigger_id)

model_triggers: (defines triggers for models, table needed because a given variable of triggers can be used by multiple models, and each model can use multiple triggers)

  • model_id INTEGER - id of the model from models table

  • trigger_id INTEGER - trigger id from the triggers table (there can be many for a given model_id)

dictionary: (dictionary that maps values from JSON to values passed to the aggregate)

  • id - the id of a group of values (they should not be repeated for the same id) or ranges (they should be disjoint for the same id)

  • categorical - if true, the value is taken as input. If false, start, start_inclusive, end, end_inclusive are taken as input

  • Value - a specific value from JSON

  • start - specifies the beginning of the numeric value interval

  • start_inclusive - specifies whether the start of the interval is open or closed

  • end - specifies the end of the range of numeric values

  • end_inclusive - determines whether the end of the interval is open or closed

  • mapped_value - value transferred to the aggregate

default_model: default parameters used when automatically adding new models from the API level

  • client_id - id of a client

  • target_definition - definition of the target derived aggregate (when loading and creating a new model, the string ${model_id} will be replaced with the model_id of the new model)

  • target_aggregates - aggregates needed to count the target (aggregate names separated by commas, e.g.: 'agg1, agg2, agg3')

  • trigger_definition - trigger definition (when loading and creating a new model, the ${model_id} string will be replaced with the model_id of the new model)

  • trigger_aggregates - aggregates needed to count the trigger (aggregate names separated by commas, e.g.: 'agg1, agg2, agg3')

  • trigger_validator_definition - definition of trigger validator (when loading and creating a new model, the string ${model_id} will be replaced with the model_id of the new model)

  • trigger_validator_aggregates - aggregates needed to count trigger validator (aggregate names separated by commas, e.g.: 'agg1, agg2, agg3')

  • target_type - corresponds to the variable target_type in the models table

  • target_start - corresponds to the variable target_start in the models table

  • target_length - corresponds to target_length variable in models table

  • target_length_exact - corresponds to target_length_exact variable in models table

  • target_aggregate_name - name of the target aggregate (when loading and creating a new model, the string ${model_id} will be replaced with the model_id of the new model)

  • weight - corresponds to the weight variable in the models table

model_urls: URLs to be included/excluded when building the model

  • model_id - id of a model

  • client_id - id of a client

  • url - URL of the website (domain)

  • included - if 1 then the page should be included in the modeling, if 0 then excluded

  1. The types of variables, aggregates, and types of available aggregation windows are defined in the DBConstants file:

Types of variables:

public final static int VARIABLE_TYPE_DOUBLE = 1;
public final static int VARIABLE_TYPE_TEXT = 2;

Types of windows:

public final static int WINDOW_TARGET = 0;
public final static int WINDOW_GLOBAL = 1;
public final static int WINDOW_TIME = 2;
public final static int WINDOW_TIME_SLIDE = 3;
public final static int WINDOW_COUNT = 4;
public final static int WINDOW_COUNT_SLIDE = 5;
public final static int WINDOW_CURRENT_TIME = 6;
  • WINDOW_TARGET – a specific window type to define a target variable for the predictive model training process

  • WINDOW_GLOBAL – The window includes all the data history saved in the tool

  • WINDOW_TIME – The window aggregates the data in a window specified by time (given in ms). The length of the window is given in window_size. The window is of the "tumbling window" type

  • WINDOW_TIME_SLIDE - The window aggregates data in a window defined by time (given in ms) and offset by the time specified in the parameter window_shift (in ms). The length of the window is given in window_size. Sliding window

  • WINDOW_COUNT – window defined as an aggregate from window_size events

  • WINDOW_COUNT_SLIDE - window defined as an aggregate of window_size events moved back by window_shift events. Sliding window

  • WINDOW_CURRENT_TIME – window length window_size aggregated in real time

For the above window units, the window_lag parameter allows the window to be moved away by window_lag from the current moment.

  • Tumbling window - events are summarized in fixed time-fixed windows. The value changes when the window is closed

  • Sliding window - Events are summarized in fixed time-fixed windows but in this case the windows overlap each other, thanks to which we get more frequent updates of the value than for the "tumbling window"

  • Real time window – aggregate values are calculated in real time

Please note that real-time window calculation is computationally expensive, so avoid using it where it is not necessary (only if the application requires a real-time aggregate).

Types of aggregates:

public final static int AGGREGATE_COUNT     	= 1; 
public final static int AGGREGATE_SUM       	= 2;
public final static int AGGREGATE_LASTVALUE 	= 3; //last value based on the order defined by the timestamp field
public final static int AGGREGATE_EXISTS    	= 4; //counts number of events with non null value of a field
public final static int AGGREGATE_MIN       	= 5;
public final static int AGGREGATE_MAX       	= 6; 
public final static int AGGREGATE_CURRENT_VALUE = 7; //last value according to the time of event consumption by EventEngine

High-level events

  1. Events that trigger scoring / events that trigger target counting

  2. Defined in metadata in the form of expressions in Java (triggers table)

  3. Example: "(aggr1 == 5 && aggr2 == 8) || (aggr1 < 4 && aggr2 == 1)"

Definition of target

  1. Target is defined as an aggregate in the aggregates table with a special window type window_type = 0. Then the aggregate id should be entered in the models table

  2. An example of the aggregates table – aggregate count as target: id, variable_id, aggregate_type, window_type, name, … 1, 1, 1, 0, D_exists_all

  3. Example form of the models table: id, used, target_aggregate_id, target_start, target_length, target_length_exact, … 1, 1, 1, 0, 0, 0

  4. In the example above, the target window is counted from the next event after the event setting the trigger to true (target_start = 0) and is counted until the end of the data (target_length = 0 and target_length_exact = 0)

  5. Example:

    • The target is the aggregate D_exists_all taking the value 1 if the event "D" occurred in the given target window and 0 if it did not occur

    • If target_start = 0, then the target window is counted from the next event after the trigger event is true. Otherwise, the target window starts counting target_start milliseconds after the trigger event occurs

    • The target window has a length of target_length in milliseconds. If target_length = 0, then all messages from the trigger occurrence to the end of the data are taken into account to calculate the target value

    • If target_length > 0 and target_length_exact = 1, it means that if there is no data for the entire window length period, the target will not be counted and the row for the given user will not appear in the resulting table with aggregates

Converting Events to Variables

  1. Defined in metadata in the form of rules in JSONPath. It is a library that allows you to search in json (variables table)

  2. Example in point Messages

  3. Variable derivatives It is possible to define derived variables. The definition of such a variable is in the form of the Java class.

    • In the variables table, define a variable of the type: definition_type = TRANSFORMED

    • In the definition field, enter the full definition of the Java class, which must inherit from DoubleTransformation or StringTransformation (there should be no import of this class). There should be no specified package in the class being defined.

    • For a variable of type TRANSFORMED, type the id of the input variable passed to the class defined in the definition_type in the input_id field. This id must be a variable of type JSON_PATH.

    • Example1 – a variable returning a domain from a url:

      • Input variable:

        • id = 1

        • definition_type = JSON_PATH

        • definition = $. ['url']

      • Variable derivative

        • id = 2

        • input_id = 1

        • definition_type = TRANSFORMED

        • definition:

public class Domain extends StringTransformation {
public Object transform(String s) {
int startInd = s.indexOf("//");
int endInd = s.indexOf("/", startInd + 2);
return s.substring(startInd > -1 && startInd < endInd ? startInd + 2 : 0, endInd > -1 ? endInd : s.length());
}
}
  • Example2 – a variable returning the day of the week based on the time in ms, by default the time field in json. The name of the field in json denoting time can be configured in config (jsonTimeName field):

    • Input variable:

      • id = 1

        • definition_type = JSON_PATH

        • definition = $. ['time']

      • Variable derivative

        • id = 2

        • input_id = 1

        • definition_type = TRANSFORMED

        • definition:

import java.util.Calendar;
public class DayOfWeek extends DoubleTransformation {
Calendar calendar;
public boolean needsCalendar() {
return true;
}
public void setCalendar(Calendar calendar) {
this.calendar = calendar;
}
public Object transform(double d) {
calendar.setTimeInMillis((long) d);
return calendar.get(Calendar.DAY_OF_WEEK);
}
}

State storage for off-line users

  1. For users who are currently off-line, aggregate values are stored.

  2. Database for storing state: mongo db

Off-line event processing and modeling

  1. The off-line aggregate counting and modeling process can be repeated at pre-set intervals (triggered automatically by offline scheduling or manually on demand)

  2. For each model, an analytical table is created containing 1 row for each user for whom the trigger (the condition that triggers the scoring of a given model) has been met and the target has been calculated Note: in the case of programmatic, multiple lines can be created for each user, because input jsons (bid requests) can actually contain several bid requests for different impressions. Then as many rows are created for the user as there was an impression id.

  3. For each analysis table, the calculation of ABM processes is run, and the finished models (scoring code) and information about the variables used (model signature) are saved to the engine meta data

  4. The aggregate table is the input for the ABM process, which automatically selects variables and calculates the optimal model

  5. The method that invokes off-line processing allows the table with aggregates to be saved for further analysis, or manual modeling by an analyst. In the call, specify the target alias for gdbase and the name under which the table should be saved

Models

  1. Model information is stored in the metadata in the models table

  2. Models used for on-line scoring have the used = 1 flag and the active = 1 flag set. If the model has not yet been built, but is active, then it only has the active = 1 flag

  3. The models table stores the scoring code as a string

  4. Deployment of the new model

    • The new model after recalculation is automatically implemented - the new model overwrites the old model with the same id

    • If the model uses different variables than the previous one, you have to recalculate all the necessary aggregates backwards to have their current state. Aggregates are added iteratively to ensure the lowest possible latency of processing the first message with the new model. First, the necessary aggregates on the stored retail messages are counted, but in the meantime new messages may have arrived, so the next iteration updates the aggregate values with the additional ones. The process is repeated several times

Scoring

  1. Scoring is triggered by the occurrence of a high-level event. Events are defined in the metadata (triggers table)

  2. Different events can trigger scoring with different models (assigning high-level events to models in the meta)

  3. Different groups of users can be cored with different models. User group definitions are in the form of expressions defined in the same way as scoring triggers (also the triggers table)

  4. A given user can be scored by multiple models at once (i.e. one event triggering scoring can be assigned to multiple models: table model_triggers)

  5. The scoring code is stored as a string in the models table

  6. In the case of programmatic, the modeling table is built at the level of the unique user id and the impression id. So the scoring is also at the level of impressions. Events that trigger scoring (bid requests) can actually contain several bid requests (a list of impressions). In this case, at the beginning of processing, the event is divided into several events (one for each impression) and only those events are scored

Dictionaries

In order to define aggregates more easily, dictionaries can be used. Dictionaries can be defined in the metadata in the dictionary table. Then, when defining the aggregate, the dictionary_id of the appropriate entry in the dictionary should be provided.

Example of use:

Variable: variable1 takes the values: 1, 2, 3, 4, 5 in the table variables has id 1

Dictionary:

zmienna1(string/number), cat1 , cat2
1 , D , AA
2 , F , AA
3 , D , BB
4 , F , BB
5 , D , AA

We want to count the following aggregates:

zmienna1_D_count_all
zmienna1_F_count_all
zmienna1_AA_count_all
zmienna1_BB_count_all

zmienna1_D_AA_count_all
zmienna1_D_BB_count_all
zmienna1_F_AA_count_all
zmienna1_F_BB_count_all
zmienna1_F_BB_sum_all

Form of the dictionary table:

id, categorical, value, start, start_inclusive, end, end_inclusive, mapped_value
1, 1, 1, null, null, null, null, D
1, 1, 3, null, null, null, null, D
1, 1, 5, null, null, null, null, D
2, 1, 2, null, null, null, null, F
2, 1, 4, null, null, null, null, F
3, 1, 1, null, null, null, null, AA
3, 1, 2, null, null, null, null, AA
3, 1, 5, null, null, null, null, AA
4, 1, 3, null, null, null, null, BB
4, 1, 4, null, null, null, null, BB
5, 1, 1, null, null, null, null, D_AA
5, 1, 5, null, null, null, null, D_AA
6, 1, 3, null, null, null, null, D_BB
7, 1, 4, null, null, null, null, F_AA
8, 1, 5, null, null, null, null, F_BB

Form of the aggregates table (selected columns):

variable_id, aggregate_type, dictionary_id, name, …
1, 1, 1, D_count_all
1, 1, 2, F_count_all
1, 1, 3, AA_count_all
1, 1, 4, BB_count_all
1, 1, 5, D_AA_count_all
1, 1, 6, D_BB_count_all
1, 1, 7, D_AA_count_all
1, 1, 8, F_BB_count_all
1, 2, 8, F_BB_sum_all

App Description

  1. Components:

    • Core - Processes messages and returns a score. It must use the JDK (JRE is not enough). The application allows you to run multiple Core processes, which allows you to handle higher traffic volumes. Also, if you need to ensure that the garbage collector doesn't stop the application for a long time, you can also run multiple Core processes - then a single process will take up less memory, so starting the garbage collector will be faster. With many Core processes, each processes a certain pool of users resulting from the partitions created on Kafka. Partitioning is after the hash of the user's id. Running multiple Core processes will prevent the garbage collector from stopping the application for a long time (a single process will take up less memory, so starting the garbage collector will be faster).

  • HTTPServer - receives queries, sends to the server from Core, receives the result and returns it to the user. A detailed description of the supported queries can be found in the 17 and 18.

  • Metadata - you need a gdbase server with created and populated user data with tables. Table definitions are in the metadata.sql file (in the tool's sources).

  • MongoDB - MongoDB database is needed.

  • Kafka - you need Kafka and created 2 topics (the names of the topics are written in the configuration files. keys kafka_request_topic and kafka_response_topic). Authorization must be disabled in Kafka. If you run multiple Core processes, you need to create a kafka_request_topic with a number of partitions equal to the number of processes (e.g.: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 4 --topic request) and set kafka_request_topic_partitions to this value before running them.

  • InfluxDB and Grafana – used to collect and visualize statistics on-line. In influx, information about incoming events is collected, e.g. the number of events of various types, the number of serrations, processing times, etc.

  1. Configuration: The application can be configured with settings in the config.properties files (separate for Core and for HTTPServer). The file is loaded from the current directory. Keys starting with kafka_consumer_ are passed to the consumer kafka. The key passed to Kafka is the part of the key in the config.properties file after removing the Kafka_consumer_ prefix. In the same way, keys starting with Kafka_producer_ are passed to the Kafka producer. By using these prefixes, you can set any keys for Kafka, not just those that are in the provided config.properties.example.

  2. Component commissioning Running the install.sh file (from the deploy directory) will cause all components to start at boot. There will also be commands such as: service gdbase restart All components must be installed first. If the components are installed in other directories than those used by *.service, you need to change the paths in these files. Default paths:

AM: /usr/AdvancedMiner
Kafka: /usr/Kafka_2.11-0.9.0.1
streaming: /usr/streaming
  1. Starting the model build:

    • Building a model (i.e. off-line processing) should be run from the directory in which the streaming_core.jar is

    • It should be started with the appropriate parameters: nice -n 19 ionice -c 3 java -cp streaming_core.jar com.algolytics.streaming.Offline ... A list of all and required parameters will be displayed on the console or in the logs. Default: Log/offline.log The nice + ionice parameters ensure that offline processing will not burden online processing.

  • If ABM is run locally (by specifying the abmScript option), then:

    • before running you need to set the LD_LIBRARY_PATH /LIBPATH/PATH analogously to what is done in the AdvancedMiner launcher scripts

    • you need to configure the AdvancedMiner, e.g. it is worth setting a higher value MAX_SCRIPT_EXECUTOR_HEAP_SIZE

  • Offline call parameters:

    • modelIds – list of model ids

    • start time – time in ms since when to count the table to be modeled

    • endTime – time in ms until when to count the table to be modeled

    • startDelay – offset in ms from when to count (current time – startDelay)

    • endDelay – shift in ms to when to count (current time – startDelay)

    • stateStartTime – time in ms from when to calculate the state

    • stateStartDelay – time in ms until when to count the state

    • copyURL – alias to gdbase, if given then the resulting table will be copied there

    • copyTablePrefix – prefix for the table name if it is to be copied to gdbase

    • copyUser – user do gdbase

    • copyPassword – hasÅ‚o do gdbase

    • processMethod – typ metody (approximation, gold, quick, advanced)

    • positiveTargetValue – value denoting a positive target

    • negativeTargetValue – a value denoting a negative target

    • qualityMeasureName – quality measure (as in ABM)

    • cutoff – cutoff threshold for score (as in ABM)

    • samplingMode – type of sampling (as in ABM)

    • samplingSize – sample size (as in ABM)

    • samplingStratificationMode – stratification type (as in ABM)

    • samplingPositiveTargetCategoryRatio – percentage of positive target at stratification (as in ABM)

    • classificationThreshold – threshold (jak w ABM)

    • classificationThresholdType – typ thresholdu (jak w ABM)

    • profitMatrixOptimized

    • profitMatrixCurrency

    • profitMatrixTruePositive

    • profitMatrixFalseNegative

    • profitMatrixFalsePositive

    • profitMatrixTrueNegative

    • useTestData

    • threadCount – the number of threads

    • abmScript – path to the ABM script

    • abmAuthToken – token (when calculating ABM web)

    • userType – user type

    • stateUserType

    • minROCArea – min ROC for the model to be implemented

    • minimumPositiveTargets – the minimum number of positive targets to build a model (programmatic)

    • useModelUrls – if true – filter Urls based on model_urls (programmatic)

    • triggerEventInTarget – if true – include the event that triggered the trigger to count the target window

  1. Detailed recommendations for the installation and configuration of individual application components can be found in the sources (deploy directory).

API

  1. Send json with event Example file with a package of messages from the client (test.json):

[
{"userId":1,"eventId":1,"time":1463988845701,"zm1":5,"zm2":8},
{"userId":1,"eventId":1,"time":1463988845709,"zm1":55,"zm2":55}
]

Sample customer inquiry:

wget -q -O - --post-file=test.json "https://localhost:8321/event?appid=3697"

Sample answer:

[
{"userId":1,"scores":[{"modelId":1,"score":1.0}]},
{"userId":1,"scores":[{"modelId":1,"score":0.6}]}
]

Parameters:

Name

Description

Is it mandatory to

appid

Customer ID

YES

time

Time in ms (field name can be changed in the configuration file, field: jsonTimeName)

YES

eventId

Event type (the field name can be changed in the configuration file, jsonEventIdName field)

YES

userId

User ID (the field name can be changed in the configuration file, jsonUserIdName field)

YES

Error codes:

Name

Output JSON

Plaintiff

403 Forbidden

No appid

200

Json z polem "error", np.:

{"error": "Error during json parsing"}

Error in parsing (e.g. when parsing)

  1. Query for the current profile (list of aggregates) of the selected user

Sample customer inquiry:

wget -q -O - "https://localhost:8323/profile?appid=3697&userid=AB123"

Sample answer:

{"agg1_last_value_all":5.0,"agg2_last_value_all":"AAA","agg3_cnt_all":3.0," agg3_sum_all":2330.0}

Parameters:

Name

Description

Is it mandatory to

appid

Customer ID

YES

userid

User ID

YES

Error codes:

Name

Output JSON

Plaintiff

403 Forbidden

No appid

400 Bad request

No userid

API for adding models

  1. Activating the API for adding models is done by setting the variable enableModelsApi = true in the configuration file (for Core applications)

  2. Adding a model

Sample customer inquiry:

wget -qO- --post-data='{"use_category":0, "value":0.6, "category_id":0.6, "client_value":0.6, "positive_target_cnt":0.6, "excluded":["
www.wp.pl
", "
www.onet.pl
"]}' "http:// localhost:8323/models?appid=3697&modelid=123"

Sample answer:

{} – no error

Parameters:

Name

Description

Is it mandatory to

appid

Customer ID

YES

modelid

Model ID

YES

value

W programmatic CPC

YES

client_value

In programmatic customer CPC

NO

use_category

If 1 – the model will be built on categories (then category_id must be provided), if 0 – the model on campaigns

YES

category_id

Model category ID (in the case of programmatic, this is the campaign category)

YES if use_category = 1

positive_target_cnt

In programmatic – the number of clicks ordered

NO

excluded lub included

A list of urls to exclude from modeling (if excluded) to be taken into account in modeling (if included). There can be either an included or an excluded field.

NO

Error codes:

Name

Output JSON

Plaintiff

403 Forbidden

No appid

404 Not Found

No modelid

400 Bad request

Wrong modelid format

200

{"error": "Parameter use_category must be set and must be an integer (0 or 1)"}

Not set or wrong format use_category

200

{"error": "Parameter value must be set and must be numeric"}

Not set or wrong value format

200

{"error": "Parameter category_id must be provided if use_category = 1"}

Not set category_id and use_category = 1

200

{"error": "Exactly one parameter must be set (either included or excluded)"}

None or both fields provided: excluded and included

  1. Modifying Model Parameters

The same query as when adding a model, but in addition to the mandatory parameters, only the modified ones should be specified.

Sample customer inquiry:

wget -qO- --post-data='{"use_category":0, "value":1.6}' "http:// localhost:8323/models?appid=3697&modelid=123"

Sample answer:

{} – no error

Parameters:

Name

Description

Is it mandatory to

appid

Customer ID

YES

modelid

Model ID

YES

value

W programmatic CPC

YES

client_value

In programmatic customer CPC

NO

use_category

If 1 – the model will be built on categories (then category_id must be provided), if 0 – the model on campaigns

YES

category_id

Model category ID (in the case of programmatic, this is the campaign category)

YES if use_category = 1

positive_target_cnt

In programmatic – the number of clicks ordered

NO

excluded lub included

A list of urls to exclude from modeling (if excluded) to be taken into account in modeling (if included). There can be either an included or an excluded field.

NO

Error codes:

Name

Output JSON

Plaintiff

403 Forbidden

No appid

404 Not Found

No modelid

400 Bad request

Wrong modelid format

200

{"error": "Parameter use_category must be set and must be an integer (0 or 1)"}

Not set or wrong format use_category

200

{"error": "Parameter value must be set and must be numeric"}

Not set or wrong value format

200

{"error": "Parameter category_id must be provided if use_category = 1"}

Not set category_id and use_category = 1

200

{"error": "Exactly one parameter must be set (either included or excluded)"}

None or both fields provided: excluded and included

  1. Deactivate a model

Sample customer inquiry:

wget -qO- --delete "http:// localhost:8323/models?appid=3697&modelid=123"

Sample answer:

{} – no error

Parameters:

Name

Description

Is it mandatory to

appid

Customer ID

YES

modelid

Model ID

YES

Error codes:

Name

Output JSON

Plaintiff

403 Forbidden

No appid

404 Not Found

No modelid

400 Bad request

Wrong modelid format

200

{"error": "model 111 does not exist."}

There is no model with the given id

  1. Adding Urls for excluding/including during modeling (for programmatic)

Sample customer inquiry:

wget -qO- --post-data='{"excluded":["
www.wp.pl
", "
www.onet.pl
"]}' "http:// localhost:8323/urls?appid=3697&modelid=123"

Sample answer:

{} – no error

Parameters:

Name

Description

Is it mandatory to

appid

Customer ID

YES

modelid

Model ID

YES

Excluded lub included

A list of urls to exclude from modeling (if excluded) to be taken into account in modeling (if included). There can be either an included or an excluded field.

YES

Error codes:

Name

Output JSON

Plaintiff

403 Forbidden

No appid

404 Not Found

No modelid

400 Bad request

Wrong modelid format

200

{"error": "Exactly one parameter must be set (either included or excluded)"}

None or both fields provided: excluded and included

  1. Retrieve information about the selected model

Sample customer inquiry:

wget -qO- --get "http:// localhost:8323/models?appid=3697&modelid=115"

Sample answer:

{
"model_id": 115,
"active": 1,
"used": 0,
"first_model_time": null,
"saved_model_time": null,
"value": 6,
"client_value": 0.3,
"positive_target_cnt": 10000,
"category_id": 9,
"use_category": 0,
"end_date": "1970-01-01 00:00:00"
}

Parameters:

Name

Description

Is it mandatory to

appid

Customer ID

YES

modelid

Model ID

NO – then all models will be returned

Error codes:

Name

Output JSON

Plaintiff

403 Forbidden

No appid

400 Bad request

Wrong modelid format

200

{}

There is no such model

  1. Retrieve information about all models

Sample customer inquiry:

wget -qO- --get "http:// localhost:8323/models?appid=3697"

Sample answer:

[
{
"model_id": 111,
"active": 1,
"used": 0,
"first_model_time": null,
"saved_model_time": null,
"value": 6,
"client_value": 0.2,
"positive_target_cnt": 10000,
"category_id": 5,
"use_category": 1,
"end_date": null
},
{
"model_id": 112,
"active": 1,
"used": 0,
"first_model_time": null,
"saved_model_time": null,
"value": 6,
"client_value": 0.2,
"positive_target_cnt": 10000,
"category_id": 0,
"use_category": 0,
"end_date": "2018-02-28 04:00:02"
}
]

Parameters:

Name

Description

Is it mandatory to

appid

Customer ID

YES

Error codes:

Name

Output JSON

Plaintiff

200

[]

There are no models for such an appid

  1. Retrieving information about the list of used/unused, active models

Sample customer inquiry:

wget-qO- --get " "

Sample answer:

[1018,1312,1314,1315,1319,1355,1378,1379,1391,1398]

Parameters:

Name

Description

Is it mandatory to

appid

Customer ID

YES

used

1 – list of models used in on-line scoring (active=1 and used=1 in meta data)

0 – list of active models, but with used = 0 (e.g. the model has not been built yet)

Error codes:

Name

Output JSON

Plaintiff

200

[]

There are no models for such an appid and set value of used

Scheduling offline processing

  1. It is possible to schedule the process of building models

  2. Offline scheduling is activated by setting the enableOfflineScheduler = true variable in the configuration file (for the Core application)

  3. Scheduling is done by calling the API request to add models (point API for adding models).

  4. The first time you add a model, the job is scheduled for the next day. If the building is successful and the model is implemented (a sufficient number of positive targets, the appropriate quality of the model based on ROC), the next build is scheduled for a week.

  5. Currently, offline processes run serially (this is set in the code in the configuration of the Quartz library used for scheduling)

Metadata overload while the app is running

  • While the application is running, you can manually reload the metadata by calling the following command from the Core directory:

java -cp streaming_core.jar com.algolytics.streaming.ReloadMetadata clientId
  • You can also modify the parameters of a single model (including the weight parameter that affects the calculated bid price in the programmatic case):

java -cp streaming_core.jar com.algolytics.streaming.ReloadModel clientId modelId value weight

Visualization - EVE Metrics

  1. Requirements t needs influxDB and grafana or Power BI to work.

  2. How it works: Metrics are sent by the application (Core) to the influxDB database or to PowerBI, and then visualized by Grafana (or PowerBI). Statistics are counted for all requests that are processed by the application within a certain period of time. In Grafana, you need to define the influx as the DataSource from which the metrics will be retrieved. The app doesn't connect directly to Grafana.

  3. Configuration: The influx parameters are defined in the configuration file (for the Core application) along with other parameters for metrics:

    • metrics_destination - INFLUX_DB if statistics are to be sent to influx POWER_BI if to Power BI, NONE if statistics counting is to be disabled

    • influx_db_user - username

    • influx_db_password - password

    • influx_db_database - database name

    • custom_request_fields - fields from the incoming event to the engine by which metric values are to be aggregated

    • metric_processed_times – the value for which the number of messages that occurred in so many milliseconds is counted.

    • metric_time_window - every so many seconds, the metrics are recalculated for the collected events

    • aggregate_time_window - used by MeanScoreMetric, calculates the average score value in a time window. Defined in seconds, it should not be less than metric_time_window.

    • max_metric_calculation_threads - The maximum number of threads to compute metrics. The number of threads is determined by the number of metrics defined, but it cannot be greater than the maximum number of threads.

    • event_request_metrics - metric names for events of the EVENT type, e.g. ScoreMetric; ProcessedRequestsMetric; WinPrcMetric; BidPrcMetric

    • profile_request_metrics - metric names for events of the PROFILE type

  4. Available metrics:

    • ProcessedRequestsMetric Presents the number of processed requests in a given metric_time_window along with the number of incorrect requests and those for which skinning was performed. The collected statistics are aggregated per clientId and user-defined fields in the custom_request_fields configuration. Fields in JSON sent to Grafana (influx name: processed_requests):

      • processed (number)

      • processed_time_[numer z configa] (number)

      • scored (number)

      • errors (number)

      • min_time (number)

      • max_time (number)

      • mean_time (number)

      • sum_time (number)

  • MeanScoreMetric The metric averages the score for each clientId and modelId, and user-defined fields in the custom_request_fields configuration. The time window size is configured by the aggregate_time_window with an offset every metric_time_window. In the programmatic version, the suggested bidding price is returned instead of the score. Fields in json sent to Grafana (influx name: score):

    • min_score (number)

    • max_score (number)

    • mean_score (number)

    • sum_score (number)

    • scores_count (number)

    • modelId (text)

  • WinPrcMetric Metric used only in the programmatic version, regarding the price per won view Fields in json sent to Grafana (influx name: win_prc):

    • min_win_prc (number)

    • max_win_prc (number)

    • mean_win_prc (number)

    • sum_win_prc (number)

    • count_win_prc (number)

  • BidPrcMetric Metric used only in the programmatic version, concerning the bid price (taken from bid response) Fields in json sent to Grafana (influx name: win_prc):

    • min_bid_prc (number)

    • max_bid_prc (number)

    • mean_bid_prc (number)

    • sum_bid_prc (number)

    • count_bid_prc (number)

PreviousVisualisationNextScoring.One

Last updated 1 month ago

The method that invokes off-line processing calls the ABM API method that allows you to build the model in a single query: or invokes a local ABM script that is executed by the Advanced Miner tool

influx_db_url - adres url influxa np.

influx_db_retention_policy - how long to keep records for metrics ()

http://e-abm.com/api_documentation.html#resources-models-create-models-in-one-request
http://127.0.0.1:8086
https://docs.influxdata.com/influxdb/v0.9/query_language/database_management/#retention-policy-management
C:\gg\ClickAd\obrazki\image2017-1-27 10_40_6.png
C:\gg\ClickAd\obrazki\image2017-1-27 10_51_9.png