# UDAF > The functions for aggregate processors. ## Function of content - [Collect List](#Collect-List) - [Collect Set](#Collect-Set) - [First Value](#First-Value) - [Last Value](#Last-Value) - [Long Count](#Long-Count) - [MEAN](#Mean) - [Number SUM](#Number-SUM) - [HLLD](#HLLD) - [Approx Count Distinct HLLD](#Approx-Count-Distinct-HLLD) - [HDR Histogram](#HDR-Histogram) - [Approx Quantile HDR](#APPROX_QUANTILE_HDR) - [Approx Quantiles HDR](#APPROX_QUANTILES_HDR) ## Description UDF(User Defined Aggregate Function) is used to extend the functions of aggregate processor. It is a part of the processing pipeline. It can be used in the pre-processing, processing, and post-processing pipeline. Each processor can assemble UDAFs into a pipeline. Within the pipeline, events are processed by each Function in order, top‑>down. The deference between UDF and UDAF is: - UDF is used to process each event, and the output is also an event. UDAF is used to process a group of events, and the output is also an event. - A UDF is designed to perform a transformation or calculation on a single event. A UDAF is designed to perform an aggregation over a group of events, such as summing values, calculating an average, or finding a maximum. It processes multiple events of input data and produces a single aggregated result. ## UDAF Definition The UDAF basic properties are the same as UDF, such as `name`, `event`, `context`,more detail can be found in [UDF](udf.md). But Aggregate Processor have some methods to process the data is: - `void add()`: Add a new event to the aggregation. - `void getResult()`: Get the result of the aggregation. ## Functions ### Collect List COLLECT_LIST is used to collect the value of the field in the group of events. ```COLLECT_LIST(filter, lookup_fields, output_fields)``` - filter: optional - lookup_fields: required. Now only support one field. - output_fields: optional. If not set, the output field name is `lookup_field_name`. ### Example ```yaml - function: COLLECT_LIST lookup_fields: [client_ip] output_fields: [client_ip_list] ``` ### Collect Set COLLECT_SET is used to collect the unique value of the field in the group of events. ```COLLECT_SET(filter, lookup_fields, output_fields)``` - filter: optional - lookup_fields: required. Now only support one field. - output_fields: optional. If not set, the output field name is `lookup_field_name`. ### Example ```yaml - function: COLLECT_SET lookup_fields: [client_ip] output_fields: [client_ip_set] ``` ### First Value FIRST_VALUE is used to get the first value of the field in the group of events. ```FIRST_VALUE(filter, lookup_fields, output_fields)``` - filter: optional - lookup_fields: required. Now only support one field. - output_fields: optional. If not set, the output field name is `lookup_field_name`. ### Example ```yaml - function: FIRST_VALUE lookup_fields: [client_ip] output_fields: [first_client_ip] ``` ### Last Value LAST_VALUE is used to get the last value of the field in the group of events. ```LAST_VALUE(filter, lookup_fields, output_fields)``` - filter: optional - lookup_fields: required. Now only support one field. - output_fields: optional. If not set, the output field name is `lookup_field_name`. ### Example ```yaml - function: LAST_VALUE lookup_fields: [client_ip] output_fields: [last_client_ip] ``` ### Long Count LONG_COUNT is used to count the number of events in the group of events. ```LONG_COUNT(filter, lookup_fields, output_fields)``` - filter: optional - lookup_fields: optional. - output_fields: required. ### Example ```yaml - function: LONG_COUNT output_fields: [sessions] ``` ### Mean MEAN is used to calculate the mean value of the field in the group of events. The lookup field value must be a number. ```MEAN(filter, lookup_fields, output_fields[, parameters])``` - filter: optional - lookup_fields: required. Now only support one field. - output_fields: optional. If not set, the output field name is `lookup_field_name`. - parameters: optional. - precision: `` required. The precision of the mean value. Default is 2. ### Example ```yaml - function: MEAN lookup_fields: [received_bytes] output_fields: [received_bytes_mean] ``` ### Number SUM NUMBER_SUM is used to sum the value of the field in the group of events. The lookup field value must be a number. ```NUMBER_SUM(filter, lookup_fields, output_fields)``` - filter: optional - lookup_fields: required. Now only support one field. - output_fields: optional. If not set, the output field name is `lookup_field_name`. ### Example ```yaml - function: NUMBER_SUM lookup_fields: [received_bytes] output_fields: [received_bytes_sum] ``` ### HLLD hlld is a high-performance C server which is used to expose HyperLogLog sets and operations over them to networked clients. More details can be found in [hlld](https://github.com/armon/hlld). ```HLLD(filter, lookup_fields, output_fields[, parameters])``` - filter: optional - lookup_fields: required. - output_fields: required. - parameters: optional. - input_type: `` optional. input field type can be `regular` or `sketch`. Default is `sketch`. regular field data type includes `string`, `int`, `long`, `float`, `double` etc. - precision: `` optional. The precision of the hlld value. Default is 12. - output_format: `` optional. The output format can be either `base64(encoded string)` or `binary(byte[])`. The default is `base64`. ### Example Merge multiple string field into a HyperLogLog data structure. ```yaml - function: HLLD lookup_fields: [client_ip] output_fields: [client_ip_hlld] parameters: input_type: regular ``` Merge multiple `unique_count ` metric type fields into a HyperLogLog data structure ```yaml - function: HLLD lookup_fields: [client_ip_hlld] output_fields: [client_ip_hlld] parameters: input_type: sketch ``` ### Approx Count Distinct HLLD Approx Count Distinct HLLD is used to count the approximate number of distinct values in the group of events. ```APPROX_COUNT_DISTINCT_HLLD(filter, lookup_fields, output_fields[, parameters])``` - filter: optional - lookup_fields: required. - output_fields: required. - parameters: optional. - input_type: `` optional. Refer to `HLLD` function. - precision: `` optional. Refer to `HLLD` function. ### Example ```yaml - function: APPROX_COUNT_DISTINCT_HLLD lookup_fields: [client_ip] output_fields: [unique_client_ip] parameters: input_type: regular ``` ```yaml - function: APPROX_COUNT_DISTINCT_HLLD lookup_fields: [client_ip_hlld] output_fields: [unique_client_ip] parameters: input_type: sketch ``` ### HDR Histogram A High Dynamic Range (HDR) Histogram. More details can be found in [HDR Histogram](https://github.com/HdrHistogram/HdrHistogram). ```HDR_HISTOGRAM(filter, lookup_fields, output_fields[, parameters])``` - filter: optional - lookup_fields: required. - output_fields: required. - parameters: optional. - input_type: `` optional. input field type can be `regular` or `sketch`. Default is `sketch`. regular field is a number. - lowestDiscernibleValue: `` optional. The lowest trackable value. Default is 1. - highestTrackableValue: `` optional. The highest trackable value. Default is 2. - numberOfSignificantValueDigits: `` optional. The number of significant value digits. Default is 1. The range is 1 to 5. - autoResize: `` optional. If true, the highestTrackableValue will auto-resize. Default is true. - output_format: `` optional. The output format can be either `base64(encoded string)` or `binary(byte[])`. The default is `base64`. ### Example ```yaml - function: HDR_HISTOGRAM lookup_fields: [latency_ms] output_fields: [latency_ms_histogram] parameters: input_type: regular lowestDiscernibleValue: 1 highestTrackableValue: 3600000 numberOfSignificantValueDigits: 3 ``` ```yaml - function: HDR_HISTOGRAM lookup_fields: [latency_ms_histogram] output_fields: [latency_ms_histogram] parameters: input_type: sketch ``` ### Approx Quantile HDR Approx Quantile HDR is used to calculate the approximate quantile value of the field in the group of events. ```APPROX_QUANTILE_HDR(filter, lookup_fields, output_fields, quantile[, parameters])``` - filter: optional - lookup_fields: required. - output_fields: required. - parameters: optional. - input_type: `` optional. Refer to `HDR_HISTOGRAM` function. - lowestDiscernibleValue: `` optional. Refer to `HDR_HISTOGRAM` function. - highestTrackableValue: `` required. Refer to `HDR_HISTOGRAM` function. - numberOfSignificantValueDigits: `` optional. Refer to `HDR_HISTOGRAM` function. - autoResize: `` optional. Refer to `HDR_HISTOGRAM` function. - probability: `` optional. The probability of the quantile. Default is 0.5. ### Example ```yaml - function: APPROX_QUANTILE_HDR lookup_fields: [latency_ms] output_fields: [latency_ms_p95] parameters: input_type: regular probability: 0.95 ``` ```yaml - function: APPROX_QUANTILE_HDR lookup_fields: [latency_ms_HDR] output_fields: [latency_ms_p95] parameters: input_type: sketch probability: 0.95 ``` ### Approx Quantiles HDR Approx Quantiles HDR is used to calculate the approximate quantile values of the field in the group of events. ```APPROX_QUANTILES_HDR(filter, lookup_fields, output_fields, quantiles[, parameters])``` - filter: optional - lookup_fields: required. - output_fields: required. - parameters: optional. - input_type: `` optional. Refer to `HDR_HISTOGRAM` function. - lowestDiscernibleValue: `` optional. Refer to `HDR_HISTOGRAM` function. - highestTrackableValue: `` required. Refer to `HDR_HISTOGRAM` function. - numberOfSignificantValueDigits: `` optional. Refer to `HDR_HISTOGRAM` function. - autoResize: `` optional. Refer to `HDR_HISTOGRAM` function. - probabilities: `>` required. The list of probabilities of the quantiles. Range is 0 to 1. ### Example ```yaml - function: APPROX_QUANTILES_HDR lookup_fields: [latency_ms] output_fields: [latency_ms_quantiles] parameters: input_type: regular probabilities: [0.5, 0.95, 0.99] ``` ```yaml - function: APPROX_QUANTILES_HDR lookup_fields: [latency_ms_HDR] output_fields: [latency_ms_quantiles] parameters: input_type: sketch probabilities: [0.5, 0.95, 0.99] ```