> For the complete documentation index, see [llms.txt](https://algolytics-technologies.gitbook.io/algolytics/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://algolytics-technologies.gitbook.io/algolytics/event-engine-user/metadata.md).

# Metadata

Metadata are created and stored in GDBase - a relational database, part of Algolytics analytical platform.

Tables with metadata are created with following SQL code:

```
CREATE TABLE clients(id INTEGER PRIMARY KEY /*, ...*/)
CREATE TABLE external_data(id INTEGER PRIMARY KEY, client_id INTEGER /*, ...*/)
CREATE TABLE variables(id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, event_id INTEGER NOT NULL, definition TEXT NOT NULL, type INTEGER NOT NULL CHECK(type = 1 OR type = 2), default_value TEXT)
CREATE INDEX ON variables(client_id)
CREATE TABLE aggregates(id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, variable_id INTEGER, aggregate_type INTEGER, window_type INTEGER, window_size/*count|time*/ INTEGER, window_shift/*count|time*/ INTEGER, definition TEXT, return_type INTEGER CHECK((definition IS NULL AND return_type IS NULL) OR (definition IS NOT NULL AND return_type IS NOT NULL AND (return_type = 1 OR return_type = 2))), dictionary_id INTEGER, external_data_id INTEGER, external_data_name TEXT, name TEXT NOT NULL CHECK(name REGEXP '[a-zA-Z_$][a-zA-Z_$0-9]*'))
CREATE UNIQUE INDEX ON aggregates(client_id, name)
CREATE TABLE derived_aggregates(derived_aggregate_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL)
CREATE INDEX ON derived_aggregates(derived_aggregate_id)
CREATE TABLE trigger_aggregates(trigger_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL)
CREATE INDEX ON trigger_aggregates(trigger_id, aggregate_id)
CREATE TABLE triggers(id INTEGER PRIMARY KEY, definition TEXT NOT NULL, needs_change INTEGER/*boolean*/ NOT NULL CHECK(needs_change = 0 OR needs_change = 1), `group` INTEGER NOT NULL)
CREATE TABLE model_triggers(model_id INTEGER NOT NULL, trigger_id INTEGER NOT NULL)
CREATE INDEX ON model_triggers(model_id, trigger_id)
CREATE TABLE model_aggregates(model_id INTEGER NOT NULL, aggregate_id INTEGER NOT NULL, used INTEGER/*boolean*/ NOT NULL CHECK(used = 0 OR used = 1))
CREATE INDEX ON model_aggregates(model_id, aggregate_id)
CREATE TABLE models(id INTEGER PRIMARY KEY, client_id INTEGER NOT NULL, used INTEGER/*boolean*/ NOT NULL CHECK(used = 0 OR used = 1), target_aggregate_id INTEGER NOT NULL, target_start INTEGER NOT NULL CHECK(target_start >= 0), target_length INTEGER NOT NULL CHECK(target_length >= 0), target_length_exact INTEGER/*boolean*/ NOT NULL CHECK(target_length_exact = 0 OR target_length_exact = 1), scoring_code TEXT CHECK(NOT used OR scoring_code IS NOT NULL), saved_state_time INTEGER CHECK(NOT used OR saved_state_time IS NOT NULL))
CREATE INDEX ON models(client_id)
CREATE TABLE dictionary(id INTEGER, categorical INTEGER/*boolean*/ NOT NULL CHECK(categorical = 0 OR categorical = 1), value TEXT CHECK(categorical OR value IS NULL), start FLOAT CHECK(NOT categorical OR start IS NULL), start_inclusive FLOAT CHECK((start IS NULL AND start_inclusive IS NULL) OR (start IS NOT NULL AND start_inclusive IS NOT NULL AND (start_inclusive = 0 OR start_inclusive = 1))), end FLOAT CHECK((NOT categorical OR end IS NULL) AND (start IS NULL OR end IS NULL OR end >= start)), end_inclusive FLOAT CHECK((end IS NULL AND end_inclusive IS NULL) OR (end IS NOT NULL AND end_inclusive IS NOT NULL AND (end_inclusive = 0 OR end_inclusive = 1))), mapped_value TEXT NOT NULL)
CREATE INDEX ON dictionary(id)
```

## Tables structure <a href="#tables-structure" id="tables-structure"></a>

* **clients**
  * id - id, a primary key
  * authorization data
  * other data about a client
* **external data**
  * id - id, a primary key
  * client\_id
  * data pointing at external source of data
* **variables** (stores definitions of variables obtained from JSON)
  * id INTEGER PRIMARY KEY
  * event\_id INTEGER  - corresponds with the eventId field in JSON
  * definition\_type - a type of definition: JSON\_PATH or TRANSFORMED
  * definition TEXT - definition of changing event into variable (JSONPath)
    * if definition\_type = JSON\_PATH then definition contains an expression JsonPath. It should returns numerical or text value. Syntax of expressions is described in JsonPath\_README.md. For efficiency reasons it is the best to use $.x.y.z expressions types. Moreover, optimalization for `[?(@.a == 'x' && @.b == 3 && @.c == @.d && ...)]` filters is added
    * if definition\_type = TRANSFORMED then definition contains full definition of java class, which must inherit from DoubleTransformation or StringTransformation (without the import of this class). In defined class should be no specified package
  * input\_id - for TRANSFORMED variable indicates id of transferred to the class, definied in definition\_type input variable. Id has to be JSON\_PATH variable
  * category - if not null, definition should return a list, if this list contains category, variable will be created
  * type (numerical, categorical) - type of the variable definied in DBConstants (VARIABLE\_...)
  * default\_value - default value of the variable, used if the definition returns null value
* **aggregates**
  * id - id, a primary key
  * variable\_id - indicates a variables table. If the aggregate is not created from a variables table, the value is null
  * aggregate\_type - type of the aggregate defined in DBConstants (AGGREGATE\_...) (see [Aggregate types](/algolytics/event-engine-user/metadata.md#aggregate-types) section)
  * window\_type - type of the window defined in DBConstants (WINDOW\_...) (see [Window types](/algolytics/event-engine-user/metadata.md#window-types) section). Each variable used by target must have window type set to WINDOW\_TARGET
  * window\_size (count/time) - window size, defined as number of occurrences or time in miliseconds
  * window\_shift (count/time) - window shift (for windows with shift), defined as number of occurrences or time in miliseconds
  * definition - a Java expression defining aggregate as a derivative of other aggregates. Every aggregate used in the definition must be listed in derived\_aggregates table. If the aggregate is not a derivative aggregate, the value is null
  * return\_type - type of aggregate defined by definition column (DBConstants.VARIABLE\_...)
  * dictionary\_id - a dictionary from dictionary table. If null then the aggregate do not use a dictionary
  * external\_data\_id - id of external data from external data table
  * external\_data\_name - name of variable in external data
  * name - variable name, used in Java expressions, unique for client\_id
* **derived\_aggregates**
  * derived\_aggregate\_id - id of derived aggregate from aggregates table
  * aggregate\_id - ids of aggregates used in creating the derived aggregate (may be multiple ids for given derived\_aggregate\_id)
* **triggers** (definition of scoring moment, i.e. definition of a moment of creating a training line with target)
  * id - id, a primary key
  * definition - definition of given trigger variable, which is an expression in Java using input variables to calculate value of trigger variable. The expression returns true or false, it may be also a definition of user's affiliance to a model. Input variables from the Java expression must be listed in the trigger\_aggregates table
  * needs\_change - scoring occurs only, if during the last launch of trigger the definition returned false
  * group - a group to which the trigger belongs. If the model defines triggers from different groups, and each group contain multiple triggers, scoring occurs only, if at least 1 trigger in every group returns, that scoring must occur
* **models**
  * id INTEGER PRIMARY KEY
  * active - if false, model is inactive, cannot be used for modeling or scoring. If there is new model added, but it is without code yet, active shuold have true value
  * used - if false, the model is not used for scoring (it does not affect counting tables for modeling).
  * target\_aggregate\_id - indicates target, used while building a model
  * target\_type:
    * if has GENERAL value, target\_start, target\length, target\_length\_exact (described below) will be included
    * if has CLICK value:
      * trigger\_validator (described below) will be included
      * at the offline start, giving negative and positive target value will be needed
      * row of last positive trigger before positive target or last target in data if there is no positive target value is recorded to a table
      * every annoucement after trigger is included in counting aggregates of target and validator. For every new trigger (if there was no positive target before) aggregates are counted again (aggregates counted before are forgotten)
    * if has CLICK\_AD value, usage is special for clickad, inter alia:
      * at the offline start, giving negative and positive target value is needed
      * every annoucement, that responds to the display of advertisement to a user (for each user many rows can be shown) is recorded to a table
      * fitting target and trigger is based on specific fields in JSON. For trigger, despite userId is null, target will be found basing on different fields
  * target\_start - time in miliseconds. The target window starts after this time
  * target\_length - length of the target window in miliseconds
  * target\_length\_exact - negative value means, that target window is counted till the end of the data (cannot be smaller than target\_length)
  * trigger\_validator - expression in Java, which, after trigger, has to return true. Thereby row can be recorded to a learning table. If it has null value, there is no additional restriction on recording rows. validator can use aggregates with any type of window (WINDOW\_TARGET and WINDOW\_GLOBAL are treated the same)
  * saved\_state\_time - to what time the state of users was calculated after building a model. Value is set by application
  * value - value connected with model, for example CPC (cost per click) in RTB case. If in response for request there is score returned (not value), then value should be 1
  * client\_value - value connected with model, for example CPC of a client in RTB case, if not needed then null
  * positive\_target\_cnt - in RTB case: ordered targets' quantity, which should be generated in a campaign (ex. clicks); if not needed, then null
  * weight - weight; in RTB case bidding price is `value * score * weight`
  * positive\_target\_ratio - ratio of number of rows with positive target to the total number of rowsin table used for modeling. Until it is not counted, set value to 0
  * category\_id - campaign category
  * use\_category - 1 - model building per campaign category; 0 - model building per campaign
  * end\_date - time in miliseconds to the end of campaign (in RTB case: time to the end of which ordered number of positive targets (positive\_target\_cnt) should be made
* **model\_aggregates** (defining models' arguments, every model can use various aggregates, and every aggregate can be used in various models)
  * model\_id - id of the model from models table
  * aggregate\_id - id of the variable from aggregates table (may by many for given model\_id)
  * used - if positive, the variable is passed to scoring code, if negative, the variable is used in model building
* **trigger\_aggregates** (defining triggers' arguments, every trigger can use various aggregates, and every aggregate can be used in various triggers)
  * trigger\_id - id of the trigger from trigger table
  * aggregate\_id - id of the variable from aggregates table (may by many for given trigger\_id)
* **model\_triggers** (defining models' triggers, every model can use various triggers, and every trigger can be used in various models)
  * model\_id - id of the model from models table
  * trigger\_id - id of the trigger from triggers table (may by many for given model\_id)
* **dictionary** (a dictionary mapping values from JSON to variables)
  * id - id of group or interval of values
  * categorical - if true, then value variable from this table is used as an input. If false, then start, start\_inclusive, end, end\_inclusive are used as an input
  * value - a specific value from JSON
  * start - start of an interval
  * start\_inclusive - defines, whether the start of the interval is open or closed
  * end - end of an interval
  * end\_inclusive - defines, whether the end of the interval is open or closed
  * mapped\_value - value passed to an aggregate
* **default\_model** (default parameters used when adding new models from api level automatically)
  * client\_id - id of a client
  * target\_definition - definition of derivative aggregate being target (at uploading and creating new model string ${model\_id} will be modified to model\_id of new model)
  * target\_aggregates - aggregates needed to count the target (names of aggregates separated by comas, ex.:'agg1,agg2,agg3')
  * trigger\_definition - definition of trigger (at uploading and creating new model string ${model\_id} will be modified to model\_id of new model)
  * trigger\_aggregates - aggregates needed to count the trigger (names of aggregates separated by comas, ex.:'agg1,agg2,agg3')
  * trigger\_validator\_definition - definition of trigger validator (at uploading and creating new model string ${model\_id} will be modified to model\_id of new model)
  * trigger\_validator\_aggregates - aggregates needed to count the trigger validator (names of aggregates separated by comas, ex.:'agg1,agg2,agg3')
  * target\_type - corresponds to target\_type variable in models table
  * target\_start - corresponds to target\_start variable in models table
  * target\_length - corresponds to target\_length variable in models table
  * target\_length\_exact - corresponds to target\_length\_exact variable in models table
  * target\_aggregate\_name - name of the aggregate being a target (at uploading and creating new model string ${model\_id} will be modified to model\_id of new model)
  * weight - corresponds to weight variable in models table
* **model\_urls** (url adresses that should be included/excluded when building a model)
  * model\_id - id of a model
  * client\_id - id of a client
  * url - url adress of www website (domena)
  * included - if 1, website should be included in modeling, if 0 website should be excluded

## Window types <a href="#window-types" id="window-types"></a>

```
public final static int WINDOW_TARGET       = 0;
public final static int WINDOW_GLOBAL       = 1;
public final static int WINDOW_TIME         = 2;
public final static int WINDOW_TIME_SLIDE   = 3;
public final static int WINDOW_COUNT        = 4;
public final static int WINDOW_COUNT_SLIDE  = 5;
public final static int WINDOW_CURRENT_TIME = 6;
```

#### Windows definition:

* **WINDOW\_TARGET** – a specific window type to define a target variable for the predictive model training process
* **WINDOW\_GLOBAL** – The window includes all the data history saved in the tool
* **WINDOW\_TIME** – The window aggregates the data in a window specified by time (given in ms). The length of the window is given in window\_size. The window is of the "tumbling window" type
* **WINDOW\_TIME\_SLIDE** - The window aggregates data in a window defined by time (given in ms) and offset by the time specified in the parameter window\_shift (in ms). The length of the window is given in window\_size. Sliding window
* **WINDOW\_COUNT** – window defined as an aggregate from window\_size events
* **WINDOW\_COUNT\_SLIDE** - window defined as an aggregate of window\_size events moved back by window\_shift events. Sliding window
* **WINDOW\_CURRENT\_TIME –** window length window\_size aggregated in real time

For the above window units, the window\_lag parameter allows the window to be moved away by window\_lag from the current moment.

* **Tumbling window** - events are summarized in fixed time-fixed windows. The value changes when the window is closed

<figure><img src="/files/vO1VEXIkNCLoh74hsYJV" alt=""><figcaption></figcaption></figure>

* **Sliding window** - Events are summarized in fixed time-fixed windows but in this case the windows overlap each other, thanks to which we get more frequent updates of the value than for the "tumbling window"

<figure><img src="/files/0k2eANFWNZcLzeNgpliM" alt=""><figcaption></figcaption></figure>

* **Real time window** – aggregate values are calculated in real time

<figure><img src="/files/osijumFGQyLHVkogCT3x" alt=""><figcaption></figcaption></figure>

Please note that real-time window calculation is computationally expensive, so avoid using it where it is not necessary (only if the application requires a real-time aggregate).

## Aggregate types <a href="#aggregate-types" id="aggregate-types"></a>

```
public final static int AGGREGATE_COUNT     	= 1; 
public final static int AGGREGATE_SUM       	= 2;
public final static int AGGREGATE_LASTVALUE 	= 3; //last value based on the order defined by the timestamp field
public final static int AGGREGATE_EXISTS    	= 4; //counts number of events with non null value of a field
public final static int AGGREGATE_MIN       	= 5;
public final static int AGGREGATE_MAX       	= 6; 
public final static int AGGREGATE_CURRENT_VALUE = 7; //last value according to the time of event consumption by EventEngine
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://algolytics-technologies.gitbook.io/algolytics/event-engine-user/metadata.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
