# Pipeline aggregations

With pipeline aggregations, you can chain aggregations by piping the results of one aggregation as an input to another for a more nuanced output.

You can use pipeline aggregations to compute complex statistical and mathematical measures like derivatives, moving averages, cumulative sums, and so on.

## Pipeline aggregation syntax

A pipeline aggregation uses the `buckets_path`

property to access the results of other aggregations. The `buckets_path`

property has a specific syntax:

`buckets_path = <AGG_NAME>[<AGG_SEPARATOR>,<AGG_NAME>]*[<METRIC_SEPARATOR>, <METRIC>];`

where:

`AGG_NAME`

is the name of the aggregation.`AGG_SEPARATOR`

separates aggregations. It's represented as`>`

.`METRIC_SEPARATOR`

separates aggregations from its metrics. It's represented as`.`

.`METRIC`

is the name of the metric, in case of multi-value metric aggregations.

For example, `my_sum.sum`

selects the `sum`

metric of an aggregation called `my_sum`

. `popular_tags>my_sum.sum`

nests `my_sum.sum`

into the `popular_tags`

aggregation.

You can also specify the following additional parameters:

`gap_policy`

: Real-world data can contain gaps or null values. You can specify the policy to deal with such missing data with the`gap_policy`

property. You can either set the`gap_policy`

property to`skip`

to skip the missing data and continue from the next available value, or`insert_zeros`

to replace the missing values with zero and continue running.`format`

: The type of format for the output value. For example,`yyyy-MM-dd`

for a date value.

## Quick example

To sum all the buckets returned by the `sum_total_memory`

aggregation:

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"number_of_bytes": {

"histogram": {

"field": "bytes",

"interval": 10000

},

"aggs": {

"sum_total_memory": {

"sum": {

"field": "phpmemory"

}

}

}

},

"sum_copies": {

"sum_bucket": {

"buckets_path": "number_of_bytes>sum_total_memory"

}

}

}

}

**Sample response**

`...`

"aggregations" : {

"number_of_bytes" : {

"buckets" : [

{

"key" : 0.0,

"doc_count" : 13372,

"sum_total_memory" : {

"value" : 9.12664E7

}

},

{

"key" : 10000.0,

"doc_count" : 702,

"sum_total_memory" : {

"value" : 0.0

}

}

]

},

"sum_copies" : {

"value" : 9.12664E7

}

}

}

## Types of pipeline aggregations

Pipeline aggregations are of two types: parent and sibling.

### Parent aggregations

Parent aggregations take the output of an outer aggregation and produce new buckets or new aggregations at the same level as the existing buckets.

For each of the parent pipeline aggregations, you have to define the metric for which the aggregation is calculated. That could be one of your existing metrics or a new one. You can also nest this type of aggregations (for example to produce 3rd derivative).

The

**Derivative**aggregation calculates the derivative of specific metrics.The

**Cumulative Sum**aggregation calculates the cumulative sum of a specified metric in a parent histogram.The

**Moving Average**aggregation will slide a window across the data and emit the average value of that window.The

**Serial Diff***(differencing)*is a technique where values in a time series are subtracted from itself at different time lags or period.

Parent aggregations must have `min_doc_count`

set to 0 (default for `histogram`

aggregations) and the specified metric must be a numeric value. If `min_doc_count`

is greater than `0`

, some buckets are omitted, which might lead to incorrect results.

`derivatives`

and `cumulative_sum`

are common parent aggregations.

### Sibling aggregations

Sibling aggregations take the output of a nested aggregation and produce new buckets or new aggregations at the same level as the nested buckets.

Just like with parent pipeline aggregations, you need to provide a metric for which to calculate the sibling aggregation. On top of that, you also need to provide a bucket aggregation which will define the buckets on which the sibling aggregation will run.

The *buckets* aggregations determine what information is being retrieved from your data set.

The

**Average Bucket**calculates the (mean) average value of a specified metric in a sibling aggregation.The

**Sum Bucket**calculates the sum of values of a specified metric in a sibling aggregation.The

**Min Bucket**calculates the minimum value of a specified metric in a sibling aggregationThe

**Max Bucket**calculates the maximum value of a specified metric in a sibling aggregation.A

**Date Histogram**is built from a numeric field and organized by date. You can specify a time frame for the intervals in seconds, minutes, hours, days, weeks, months, or years.A

**Histogram**is built from a numeric field. Specify an integer interval for this field.A

**Range**aggregation, you can specify ranges of values for a numeric field.A

**Date Range**aggregation reports values that are within a range of dates that you specify.The

**IPv4 Range**aggregation enables you to specify ranges of IPv4 addresses.A

**Terms**aggregation enables you to specify the top or bottom n elements of a given field to display, ordered by count or a custom metric.**Significant Terms**displays the results of the experimental significant terms aggregation.

Sibling aggregations must be a multi-bucket aggregation (have multiple grouped values for a certain field) and the metric must be a numeric value.

`min_bucket`

, `max_bucket`

, `sum_bucket`

, and `avg_bucket`

are common sibling aggregations.

`avg_bucket`

, `sum_bucket`

, `min_bucket`

, and `max_bucket`

The `avg_bucket`

, `sum_bucket`

, `min_bucket`

, and `max_bucket`

aggregations are sibling aggregations that calculate the average, sum, minimum, and maximum values of a metric in each bucket of a previous aggregation.

The following example creates a date histogram with a one-month interval. The `sum`

sub-aggregation calculates the sum of all bytes for each month. Finally, the `avg_bucket`

aggregation uses this sum to calculate the average number of bytes per month:

`POST opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"visits_per_month": {

"date_histogram": {

"field": "@timestamp",

"interval": "month"

},

"aggs": {

"sum_of_bytes": {

"sum": {

"field": "bytes"

}

}

}

},

"avg_monthly_bytes": {

"avg_bucket": {

"buckets_path": "visits_per_month>sum_of_bytes"

}

}

}

}

**Sample response**

`...`

"aggregations" : {

"visits_per_month" : {

"buckets" : [

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"sum_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"sum_of_bytes" : {

"value" : 3.8880434E7

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"sum_of_bytes" : {

"value" : 3.1445055E7

}

}

]

},

"avg_monthly_bytes" : {

"value" : 2.6575229666666668E7

}

}

}

In a similar fashion, you can calculate the `sum_bucket`

, `min_bucket`

, and `max_bucket`

values for the bytes per month.

`stats_bucket`

and `extended_stats_bucket`

The `stats_bucket`

aggregation is a sibling aggregation that returns a variety of stats (`count`

, `min`

, `max`

, `avg`

, and `sum`

) for the buckets of a previous aggregation.

The following example returns the basic stats for the buckets returned by the `sum_of_bytes`

aggregation nested into the `visits_per_month`

aggregation:

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"visits_per_month": {

"date_histogram": {

"field": "@timestamp",

"interval": "month"

},

"aggs": {

"sum_of_bytes": {

"sum": {

"field": "bytes"

}

}

}

},

"stats_monthly_bytes": {

"stats_bucket": {

"buckets_path": "visits_per_month>sum_of_bytes"

}

}

}

}

**Sample response**

`...`

"stats_monthly_bytes" : {

"count" : 3,

"min" : 9400200.0,

"max" : 3.8880434E7,

"avg" : 2.6575229666666668E7,

"sum" : 7.9725689E7

}

}

}

The `extended_stats`

aggregation is an extended version of the `stats`

aggregation. Apart from including basic stats, `extended_stats`

also provides stats such as `sum_of_squares`

, `variance`

, and `std_deviation`

.

**Sample response**

`"stats_monthly_visits" : {`

"count" : 3,

"min" : 9400200.0,

"max" : 3.8880434E7,

"avg" : 2.6575229666666668E7,

"sum" : 7.9725689E7,

"sum_of_squares" : 2.588843392021381E15,

"variance" : 1.5670496550438025E14,

"variance_population" : 1.5670496550438025E14,

"variance_sampling" : 2.3505744825657038E14,

"std_deviation" : 1.251818539183616E7,

"std_deviation_population" : 1.251818539183616E7,

"std_deviation_sampling" : 1.5331583357780447E7,

"std_deviation_bounds" : {

"upper" : 5.161160045033899E7,

"lower" : 1538858.8829943463,

"upper_population" : 5.161160045033899E7,

"lower_population" : 1538858.8829943463,

"upper_sampling" : 5.723839638222756E7,

"lower_sampling" : -4087937.0488942266

}

}

}

}

`bucket_script`

and `bucket_selector`

The `bucket_script`

aggregation is a parent aggregation that executes a script to perform per-bucket calculations of a previous aggregation. Make sure the metrics are of numeric type and the returned values are also numeric.

The `buckets_path`

property consists of multiple entries. Each entry is a key and a value. The key is the name of the value that you can use in the script.

The basic syntax is:

`{`

"bucket_script": {

"buckets_path": {

"my_var1": "the_sum",

"my_var2": "the_value_count"

},

"script": "params.my_var1 / params.my_var2"

}

}

The following example uses the `sum`

aggregation on the buckets generated by a date histogram. From the resultant buckets values, the percentage of RAM is calculated in an interval of 10,000 bytes in the context of a zip extension:

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"sales_per_month": {

"histogram": {

"field": "bytes",

"interval": "10000"

},

"aggs": {

"total_ram": {

"sum": {

"field": "machine.ram"

}

},

"ext-type": {

"filter": {

"term": {

"extension.keyword": "zip"

}

},

"aggs": {

"total_ram": {

"sum": {

"field": "machine.ram"

}

}

}

},

"ram-percentage": {

"bucket_script": {

"buckets_path": {

"machineRam": "ext-type>total_ram",

"totalRam": "total_ram"

},

"script": "params.machineRam / params.totalRam"

}

}

}

}

}

}

**Sample response**

`"aggregations" : {`

"sales_per_month" : {

"buckets" : [

{

"key" : 0.0,

"doc_count" : 13372,

"os-type" : {

"doc_count" : 1558,

"total_ram" : {

"value" : 2.0090783268864E13

}

},

"total_ram" : {

"value" : 1.7214228922368E14

},

"ram-percentage" : {

"value" : 0.11671032934131736

}

},

{

"key" : 10000.0,

"doc_count" : 702,

"os-type" : {

"doc_count" : 116,

"total_ram" : {

"value" : 1.622423896064E12

}

},

"total_ram" : {

"value" : 9.015136354304E12

},

"ram-percentage" : {

"value" : 0.17996665078608862

}

}

]

}

}

}

The RAM percentage is calculated and appended at the end of each bucket.

The `bucket_selector`

aggregation is a script-based aggregation that selects buckets returned by a `histogram`

(or `date_histogram`

) aggregation. Use it in scenarios where you don’t want certain buckets in the output based on conditions supplied by you.

The `bucket_selector`

aggregation executes a script to decide if a bucket stays in the parent multi-bucket aggregation.

The basic syntax is:

`{`

"bucket_selector": {

"buckets_path": {

"my_var1": "the_sum",

"my_var2": "the_value_count"

},

"script": "params.my_var1 / params.my_var2"

}

}

The following example calculates the sum of bytes and then evaluates if this sum is greater than 20,000. If true, then the bucket is retained in the bucket list. Otherwise, it’s deleted from the final output.

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"bytes_per_month": {

"date_histogram": {

"field": "@timestamp",

"calendar_interval": "month"

},

"aggs": {

"total_bytes": {

"sum": {

"field": "bytes"

}

},

"bytes_bucket_filter": {

"bucket_selector": {

"buckets_path": {

"totalBytes": "total_bytes"

},

"script": "params.totalBytes > 20000"

}

}

}

}

}

}

**Sample response**

`"aggregations" : {`

"bytes_per_month" : {

"buckets" : [

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"total_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"total_bytes" : {

"value" : 3.8880434E7

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"total_bytes" : {

"value" : 3.1445055E7

}

}

]

}

}

}

`bucket_sort`

The `bucket_sort`

aggregation is a parent aggregation that sorts buckets of a previous aggregation.

You can specify several sort fields together with the corresponding sort order. Additionally, you can sort each bucket based on its key, count, or its sub-aggregations. You can also truncate the buckets by setting `from`

and `size`

parameters.

Syntax

`{`

"bucket_sort": {

"sort": [

{ "sort_field_1": { "order": "asc" } },

{ "sort_field_2": { "order": "desc" } },

"sort_field_3"

],

"from": 1,

"size": 3

}

}

The following example sorts the buckets of a `date_histogram`

aggregation based on the computed `total_sum`

values. We sort the buckets in descending order so that the buckets with the highest number of bytes are returned first.

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"sales_per_month": {

"date_histogram": {

"field": "@timestamp",

"calendar_interval": "month"

},

"aggs": {

"total_bytes": {

"sum": {

"field": "bytes"

}

},

"bytes_bucket_sort": {

"bucket_sort": {

"sort": [

{ "total_bytes": { "order": "desc" } }

],

"size": 3

}

}

}

}

}

}

**Sample response**

`"aggregations" : {`

"sales_per_month" : {

"buckets" : [

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"total_bytes" : {

"value" : 3.8880434E7

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"total_bytes" : {

"value" : 3.1445055E7

}

},

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"total_bytes" : {

"value" : 9400200.0

}

}

]

}

}

}

You can also use this aggregation to truncate the resulting buckets without sorting. For this, just use the `from`

and/or `size`

parameters without `sort`

.

`cumulative_sum`

The `cumulative_sum`

aggregation is a parent aggregation that calculates the cumulative sum of each bucket of a previous aggregation.

A cumulative sum is a sequence of partial sums of a given sequence. For example, the cumulative sums of the sequence `{a,b,c,…}`

are `a`

, `a+b`

, `a+b+c`

, and so on. You can use the cumulative sum to visualize the rate of change of a field over time.

The following example calculates the cumulative number of bytes over a monthly basis:

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"sales_per_month": {

"date_histogram": {

"field": "@timestamp",

"calendar_interval": "month"

},

"aggs": {

"no-of-bytes": {

"sum": {

"field": "bytes"

}

},

"cumulative_bytes": {

"cumulative_sum": {

"buckets_path": "no-of-bytes"

}

}

}

}

}

}

**Sample response**

`...`

"aggregations" : {

"sales_per_month" : {

"buckets" : [

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"no-of-bytes" : {

"value" : 9400200.0

},

"cumulative_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"no-of-bytes" : {

"value" : 3.8880434E7

},

"cumulative_bytes" : {

"value" : 4.8280634E7

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"no-of-bytes" : {

"value" : 3.1445055E7

},

"cumulative_bytes" : {

"value" : 7.9725689E7

}

}

]

}

}

}

`derivative`

The `derivative`

aggregation is a parent aggregation that calculates 1st order and 2nd order derivates of each bucket of a previous aggregation.

In mathematics, the derivative of a function measures its sensitivity to change. In other words, a derivative evaluates the rate of change in some function with respect to some variable. To learn more about derivates, see Wikipedia.

You can use derivates to calculate the rate of change of numeric values compared to its previous time periods.

The 1st order derivative indicates whether a metric is increasing or decreasing, and by how much it's increasing or decreasing.

The following example calculates the 1st order derivative for the sum of bytes per month. The 1st order derivative is the difference between the number of bytes in the current month and the previous month:

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"sales_per_month": {

"date_histogram": {

"field": "@timestamp",

"calendar_interval": "month"

},

"aggs": {

"number_of_bytes": {

"sum": {

"field": "bytes"

}

},

"bytes_deriv": {

"derivative": {

"buckets_path": "number_of_bytes"

}

}

}

}

}

}

**Sample response**

`...`

"aggregations" : {

"sales_per_month" : {

"buckets" : [

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"number_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"number_of_bytes" : {

"value" : 3.8880434E7

},

"bytes_deriv" : {

"value" : 2.9480234E7

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"number_of_bytes" : {

"value" : 3.1445055E7

},

"bytes_deriv" : {

"value" : -7435379.0

}

}

]

}

}

}

The 2nd order derivative is a double derivative or a derivative of the derivative. It indicates how the rate of change of a quantity is itself changing. It’s the difference between the 1st order derivatives of adjacent buckets.

To calculate a 2nd order derivative, chain one derivative aggregation to another:

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"sales_per_month": {

"date_histogram": {

"field": "@timestamp",

"calendar_interval": "month"

},

"aggs": {

"number_of_bytes": {

"sum": {

"field": "bytes"

}

},

"bytes_deriv": {

"derivative": {

"buckets_path": "number_of_bytes"

}

},

"bytes_2nd_deriv": {

"derivative": {

"buckets_path": "bytes_deriv"

}

}

}

}

}

}

**Sample response**

`...`

"aggregations" : {

"sales_per_month" : {

"buckets" : [

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"number_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"number_of_bytes" : {

"value" : 3.8880434E7

},

"bytes_deriv" : {

"value" : 2.9480234E7

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"number_of_bytes" : {

"value" : 3.1445055E7

},

"bytes_deriv" : {

"value" : -7435379.0

},

"bytes_2nd_deriv" : {

"value" : -3.6915613E7

}

}

]

}

}

}

The first bucket doesn't have a 1st order derivate as a derivate needs at least two points for comparison. The first and second buckets don't have a 2nd order derivate because a 2nd order derivate needs at least two data points from the 1st order derivative.

The 1st order derivative for the "2020-11-01" bucket is 2.9480234E7 and the "2020-12-01" bucket is -7435379. So, the 2nd order derivative of the “2020-12-01” bucket is -3.6915613E7 (-7435379-2.9480234E7).

Theoretically, you could continue chaining derivate aggregations to calculate the third, the fourth, and even higher-order derivatives. That would, however, provide little to no value for most datasets.

`moving_avg`

A `moving_avg`

aggregation is a parent aggregation that calculates the moving average metric.

The `moving_avg`

aggregation finds the series of averages of different windows (subsets) of a dataset. A window’s size represents the number of data points covered by the window on each iteration (specified by the `window`

property and set to 5 by default). On each iteration, the algorithm calculates the average for all data points that fit into the window and then slides forward by excluding the first member of the previous window and including the first member from the next window.

For example, given the data `[1, 5, 8, 23, 34, 28, 7, 23, 20, 19]`

, you can calculate a simple moving average with a window’s size of 5 as follows:

`(1 + 5 + 8 + 23 + 34) / 5 = 14.2`

(5 + 8 + 23 + 34+ 28) / 5 = 19.6

(8 + 23 + 34 + 28 + 7) / 5 = 20

so on...

For more information, see Wikipedia.

You can use the `moving_avg`

aggregation to either smoothen out short-term fluctuations or to highlight longer-term trends or cycles in your time-series data.

Specify a small window size (for example, `window`

: 10) that closely follows the data to smoothen out small-scale fluctuations. Alternatively, specify a larger window size (for example, `window`

: 100) that lags behind the actual data by a substantial amount to smoothen out all higher-frequency fluctuations or random noise, making lower frequency trends more visible.

The following example nests a `moving_avg`

aggregation into a `date_histogram`

aggregation:

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"my_date_histogram": {

"date_histogram": {

"field": "@timestamp",

"calendar_interval": "month"

},

"aggs": {

"sum_of_bytes": {

"sum": { "field": "bytes" }

},

"moving_avg_of_sum_of_bytes": {

"moving_avg": { "buckets_path": "sum_of_bytes" }

}

}

}

}

}

**Sample response**

`...`

"aggregations" : {

"my_date_histogram" : {

"buckets" : [

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"sum_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"sum_of_bytes" : {

"value" : 3.8880434E7

},

"moving_avg_of_sum_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"sum_of_bytes" : {

"value" : 3.1445055E7

},

"moving_avg_of_sum_of_bytes" : {

"value" : 2.4140317E7

}

}

]

}

}

}

You can also use the `moving_avg`

aggregation to predict future buckets. Just add the `predict`

property and set it to the number of predictions that you want to see.

The following example adds five predictions to the preceding query:

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"my_date_histogram": {

"date_histogram": {

"field": "@timestamp",

"calendar_interval": "month"

},

"aggs": {

"sum_of_bytes": {

"sum": {

"field": "bytes"

}

},

"moving_avg_of_sum_of_bytes": {

"moving_avg": {

"buckets_path": "sum_of_bytes",

"predict": 5

}

}

}

}

}

}

**Sample response**

`"aggregations" : {`

"my_date_histogram" : {

"buckets" : [

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"sum_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"sum_of_bytes" : {

"value" : 3.8880434E7

},

"moving_avg_of_sum_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"sum_of_bytes" : {

"value" : 3.1445055E7

},

"moving_avg_of_sum_of_bytes" : {

"value" : 2.4140317E7

}

},

{

"key_as_string" : "2021-01-01T00:00:00.000Z",

"key" : 1609459200000,

"doc_count" : 0,

"moving_avg_of_sum_of_bytes" : {

"value" : 2.6575229666666668E7

}

},

{

"key_as_string" : "2021-02-01T00:00:00.000Z",

"key" : 1612137600000,

"doc_count" : 0,

"moving_avg_of_sum_of_bytes" : {

"value" : 2.6575229666666668E7

}

},

{

"key_as_string" : "2021-03-01T00:00:00.000Z",

"key" : 1614556800000,

"doc_count" : 0,

"moving_avg_of_sum_of_bytes" : {

"value" : 2.6575229666666668E7

}

},

{

"key_as_string" : "2021-04-01T00:00:00.000Z",

"key" : 1617235200000,

"doc_count" : 0,

"moving_avg_of_sum_of_bytes" : {

"value" : 2.6575229666666668E7

}

},

{

"key_as_string" : "2021-05-01T00:00:00.000Z",

"key" : 1619827200000,

"doc_count" : 0,

"moving_avg_of_sum_of_bytes" : {

"value" : 2.6575229666666668E7

}

}

]

}

}

}

The `moving_avg`

aggregation supports five models — `simple`

, `linear`

, `exponentially weighted`

, `holt-linear`

, and `holt-winters`

. These models differ in how the values of the window are weighted. As data points become "older" (i.e., the window slides away from them), they might be weighted differently. You can specify a model of your choice by setting the `model`

property. The `model`

property holds the name of the model and the `settings`

object, which you can use to provide model properties. For more information on these models, see Wikipedia.

A `simple`

model first calculates the sum of all data points in the window, and then divides that sum by the size of the window. In other words, a `simple`

model calculates a simple arithmetic mean for each window in your dataset.

The following example uses a simple model with a window size of 30:

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"my_date_histogram": {

"date_histogram": {

"field": "@timestamp",

"calendar_interval": "month"

},

"aggs": {

"sum_of_bytes": {

"sum": {

"field": "bytes"

}

},

"moving_avg_of_sum_of_bytes": {

"moving_avg": {

"buckets_path": "sum_of_bytes",

"window": 30,

"model": "simple"

}

}

}

}

}

}

**Sample response**

`...`

"aggregations" : {

"my_date_histogram" : {

"buckets" : [

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"sum_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"sum_of_bytes" : {

"value" : 3.8880434E7

},

"moving_avg_of_sum_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"sum_of_bytes" : {

"value" : 3.1445055E7

},

"moving_avg_of_sum_of_bytes" : {

"value" : 2.4140317E7

}

}

]

}

}

}

The following example uses a `holt`

model. You can set the speed at which the importance decays with the `alpha`

and `beta`

setting. The default value of `alpha`

is 0.3 and `beta`

is 0.1. You can specify any float value between 0-1 inclusive.

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"my_date_histogram": {

"date_histogram": {

"field": "@timestamp",

"calendar_interval": "month"

},

"aggs": {

"sum_of_bytes": {

"sum": {

"field": "bytes"

}

},

"moving_avg_of_sum_of_bytes": {

"moving_avg": {

"buckets_path": "sum_of_bytes",

"model": "holt",

"settings": {

"alpha": 0.6,

"beta": 0.4

}

}

}

}

}

}

}

**Sample response**

`...`

"aggregations" : {

"my_date_histogram" : {

"buckets" : [

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"sum_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"sum_of_bytes" : {

"value" : 3.8880434E7

},

"moving_avg_of_sum_of_bytes" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"sum_of_bytes" : {

"value" : 3.1445055E7

},

"moving_avg_of_sum_of_bytes" : {

"value" : 2.70883404E7

}

}

]

}

}

}

`serial_diff`

The `serial_diff`

aggregation is a parent pipeline aggregation that computes a series of value differences between a time lag of the buckets from previous aggregations.

You can use the `serial_diff`

aggregation to find the data changes between time periods instead of finding the whole value.

With the `lag`

parameter (a positive, non-zero integer value), you can tell which previous bucket to subtract from the current one. If you don't specify the `lag`

parameter, Circonus sets it to 1.

Lets say that the population of a city grows with time. If you use the serial differencing aggregation with the period of one day, you can see the daily growth. For example, you can compute a series of differences of the weekly average changes of a total price.

`GET opensearch_dashboards_sample_data_logs/_search`

{

"size": 0,

"aggs": {

"my_date_histogram": {

"date_histogram": {

"field": "@timestamp",

"calendar_interval": "month"

},

"aggs": {

"the_sum": {

"sum": {

"field": "bytes"

}

},

"thirtieth_difference": {

"serial_diff": {

"buckets_path": "the_sum",

"lag" : 30

}

}

}

}

}

}

**Sample response**

`...`

"aggregations" : {

"my_date_histogram" : {

"buckets" : [

{

"key_as_string" : "2020-10-01T00:00:00.000Z",

"key" : 1601510400000,

"doc_count" : 1635,

"the_sum" : {

"value" : 9400200.0

}

},

{

"key_as_string" : "2020-11-01T00:00:00.000Z",

"key" : 1604188800000,

"doc_count" : 6844,

"the_sum" : {

"value" : 3.8880434E7

}

},

{

"key_as_string" : "2020-12-01T00:00:00.000Z",

"key" : 1606780800000,

"doc_count" : 5595,

"the_sum" : {

"value" : 3.1445055E7

}

}

]

}

}

}