Metadata
Metadata are created and stored in GDBase - a relational database, part of Algolytics analytical platform.
Tables with metadata are created with following SQL code:
Tables structure
clients
id - id, a primary key
authorization data
other data about a client
external data
id - id, a primary key
client_id
data pointing at external source of data
variables (stores definitions of variables obtained from JSON)
id INTEGER PRIMARY KEY
event_id INTEGER - corresponds with the eventId field in JSON
definition_type - a type of definition: JSON_PATH or TRANSFORMED
definition TEXT - definition of changing event into variable (JSONPath)
if definition_type = JSON_PATH then definition contains an expression JsonPath. It should returns numerical or text value. Syntax of expressions is described in JsonPath_README.md. For efficiency reasons it is the best to use $.x.y.z expressions types. Moreover, optimalization for
[?(@.a == 'x' && @.b == 3 && @.c == @.d && ...)]
filters is addedif definition_type = TRANSFORMED then definition contains full definition of java class, which must inherit from DoubleTransformation or StringTransformation (without the import of this class). In defined class should be no specified package
input_id - for TRANSFORMED variable indicates id of transferred to the class, definied in definition_type input variable. Id has to be JSON_PATH variable
category - if not null, definition should return a list, if this list contains category, variable will be created
type (numerical, categorical) - type of the variable definied in DBConstants (VARIABLE_...)
default_value - default value of the variable, used if the definition returns null value
aggregates
id - id, a primary key
variable_id - indicates a variables table. If the aggregate is not created from a variables table, the value is null
window_size (count/time) - window size, defined as number of occurrences or time in miliseconds
window_shift (count/time) - window shift (for windows with shift), defined as number of occurrences or time in miliseconds
definition - a Java expression defining aggregate as a derivative of other aggregates. Every aggregate used in the definition must be listed in derived_aggregates table. If the aggregate is not a derivative aggregate, the value is null
return_type - type of aggregate defined by definition column (DBConstants.VARIABLE_...)
dictionary_id - a dictionary from dictionary table. If null then the aggregate do not use a dictionary
external_data_id - id of external data from external data table
external_data_name - name of variable in external data
name - variable name, used in Java expressions, unique for client_id
derived_aggregates
derived_aggregate_id - id of derived aggregate from aggregates table
aggregate_id - ids of aggregates used in creating the derived aggregate (may be multiple ids for given derived_aggregate_id)
triggers (definition of scoring moment, i.e. definition of a moment of creating a training line with target)
id - id, a primary key
definition - definition of given trigger variable, which is an expression in Java using input variables to calculate value of trigger variable. The expression returns true or false, it may be also a definition of user's affiliance to a model. Input variables from the Java expression must be listed in the trigger_aggregates table
needs_change - scoring occurs only, if during the last launch of trigger the definition returned false
group - a group to which the trigger belongs. If the model defines triggers from different groups, and each group contain multiple triggers, scoring occurs only, if at least 1 trigger in every group returns, that scoring must occur
models
id INTEGER PRIMARY KEY
active - if false, model is inactive, cannot be used for modeling or scoring. If there is new model added, but it is without code yet, active shuold have true value
used - if false, the model is not used for scoring (it does not affect counting tables for modeling).
target_aggregate_id - indicates target, used while building a model
target_type:
if has GENERAL value, target_start, target\length, target_length_exact (described below) will be included
if has CLICK value:
trigger_validator (described below) will be included
at the offline start, giving negative and positive target value will be needed
row of last positive trigger before positive target or last target in data if there is no positive target value is recorded to a table
every annoucement after trigger is included in counting aggregates of target and validator. For every new trigger (if there was no positive target before) aggregates are counted again (aggregates counted before are forgotten)
if has CLICK_AD value, usage is special for clickad, inter alia:
at the offline start, giving negative and positive target value is needed
every annoucement, that responds to the display of advertisement to a user (for each user many rows can be shown) is recorded to a table
fitting target and trigger is based on specific fields in JSON. For trigger, despite userId is null, target will be found basing on different fields
target_start - time in miliseconds. The target window starts after this time
target_length - length of the target window in miliseconds
target_length_exact - negative value means, that target window is counted till the end of the data (cannot be smaller than target_length)
trigger_validator - expression in Java, which, after trigger, has to return true. Thereby row can be recorded to a learning table. If it has null value, there is no additional restriction on recording rows. validator can use aggregates with any type of window (WINDOW_TARGET and WINDOW_GLOBAL are treated the same)
saved_state_time - to what time the state of users was calculated after building a model. Value is set by application
value - value connected with model, for example CPC (cost per click) in RTB case. If in response for request there is score returned (not value), then value should be 1
client_value - value connected with model, for example CPC of a client in RTB case, if not needed then null
positive_target_cnt - in RTB case: ordered targets' quantity, which should be generated in a campaign (ex. clicks); if not needed, then null
weight - weight; in RTB case bidding price is
value * score * weight
positive_target_ratio - ratio of number of rows with positive target to the total number of rowsin table used for modeling. Until it is not counted, set value to 0
category_id - campaign category
use_category - 1 - model building per campaign category; 0 - model building per campaign
end_date - time in miliseconds to the end of campaign (in RTB case: time to the end of which ordered number of positive targets (positive_target_cnt) should be made
model_aggregates (defining models' arguments, every model can use various aggregates, and every aggregate can be used in various models)
model_id - id of the model from models table
aggregate_id - id of the variable from aggregates table (may by many for given model_id)
used - if positive, the variable is passed to scoring code, if negative, the variable is used in model building
trigger_aggregates (defining triggers' arguments, every trigger can use various aggregates, and every aggregate can be used in various triggers)
trigger_id - id of the trigger from trigger table
aggregate_id - id of the variable from aggregates table (may by many for given trigger_id)
model_triggers (defining models' triggers, every model can use various triggers, and every trigger can be used in various models)
model_id - id of the model from models table
trigger_id - id of the trigger from triggers table (may by many for given model_id)
dictionary (a dictionary mapping values from JSON to variables)
id - id of group or interval of values
categorical - if true, then value variable from this table is used as an input. If false, then start, start_inclusive, end, end_inclusive are used as an input
value - a specific value from JSON
start - start of an interval
start_inclusive - defines, whether the start of the interval is open or closed
end - end of an interval
end_inclusive - defines, whether the end of the interval is open or closed
mapped_value - value passed to an aggregate
default_model (default parameters used when adding new models from api level automatically)
client_id - id of a client
target_definition - definition of derivative aggregate being target (at uploading and creating new model string ${model_id} will be modified to model_id of new model)
target_aggregates - aggregates needed to count the target (names of aggregates separated by comas, ex.:'agg1,agg2,agg3')
trigger_definition - definition of trigger (at uploading and creating new model string ${model_id} will be modified to model_id of new model)
trigger_aggregates - aggregates needed to count the trigger (names of aggregates separated by comas, ex.:'agg1,agg2,agg3')
trigger_validator_definition - definition of trigger validator (at uploading and creating new model string ${model_id} will be modified to model_id of new model)
trigger_validator_aggregates - aggregates needed to count the trigger validator (names of aggregates separated by comas, ex.:'agg1,agg2,agg3')
target_type - corresponds to target_type variable in models table
target_start - corresponds to target_start variable in models table
target_length - corresponds to target_length variable in models table
target_length_exact - corresponds to target_length_exact variable in models table
target_aggregate_name - name of the aggregate being a target (at uploading and creating new model string ${model_id} will be modified to model_id of new model)
weight - corresponds to weight variable in models table
model_urls (url adresses that should be included/excluded when building a model)
model_id - id of a model
client_id - id of a client
url - url adress of www website (domena)
included - if 1, website should be included in modeling, if 0 website should be excluded
Window types
Windows definition:
WINDOW_TARGET – a specific window type to define a target variable for the predictive model training process
WINDOW_GLOBAL – The window includes all the data history saved in the tool
WINDOW_TIME – The window aggregates the data in a window specified by time (given in ms). The length of the window is given in window_size. The window is of the "tumbling window" type
WINDOW_TIME_SLIDE - The window aggregates data in a window defined by time (given in ms) and offset by the time specified in the parameter window_shift (in ms). The length of the window is given in window_size. Sliding window
WINDOW_COUNT – window defined as an aggregate from window_size events
WINDOW_COUNT_SLIDE - window defined as an aggregate of window_size events moved back by window_shift events. Sliding window
WINDOW_CURRENT_TIME – window length window_size aggregated in real time
For the above window units, the window_lag parameter allows the window to be moved away by window_lag from the current moment.
Tumbling window - events are summarized in fixed time-fixed windows. The value changes when the window is closed
Sliding window - Events are summarized in fixed time-fixed windows but in this case the windows overlap each other, thanks to which we get more frequent updates of the value than for the "tumbling window"
Real time window – aggregate values are calculated in real time
Please note that real-time window calculation is computationally expensive, so avoid using it where it is not necessary (only if the application requires a real-time aggregate).
Aggregate types
Last updated