Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Analysing some log files using AWS CloudWatch Insights, I can plot a count aggregated in time bins with:

| stats count(*) by bin(1h)

This produces a graph, as expected, aggregating all logs in each time bin.

I want to split this data by a 'group' field, with values A and B.

| stats count(*) by group, bin(1h)

This returns log counts across time bins as expected, but the visualisation tab says 'No visualisation available.' I would like it to return a time series plot with a series for group A and a series for group B.

Where am I going wrong, or is this simply not possible?

Alright, I found a very janky way to solve this. It appears that no, you cannot do

| stats count(*) by group, bin(1h)

However, you can use parse to artificially create new variables, like so:

parse "[E*]" as @error
| parse "[I*]" as @info
| parse "[W*]" as @warning
| filter ispresent(@warning) or ispresent(@error) or ispresent(@info)
| stats count(@error) as error, count(@info) as info, count(@warning) as warning by bin(15m)

Here, I'm trying to look at the distribution of log types over time. I have three types of log messages with formats: "[ERROR]", "[INFO]" and "[WARNING]"

it will not give exact answer which Tom is looking i think. I am also facing same issue but problem is group field has multiple values so column header should be denoted by these values the output should be like this for query parse "group" as @group | filter ispresent(@group) | stats count(@group) as group by bin(15m) – viren Sep 30, 2019 at 6:01 time A B 2019-09-26T12:00:00.000+05:30 3 2 2019-09-26T12:30:00.000+05:30 1 2 But it will print like this time Group 2019-09-26T12:00:00.000+05:30 5 2019-09-26T12:30:00.000+05:30 13 – viren Sep 30, 2019 at 6:02 This allowed me to visualize my two different fields. They just couldn't make it easy could they. – DataDino Oct 2, 2019 at 1:11

Expanding on the sample of @smth, i usually do it a little different,

with this query i follow the trend of status codes aggregated over time on a standard nginx access log

fields @timestamp, @message
| parse @message '* - * [*] "* * *" * * "-" "*"' as host, identity, dateTimeString, httpVerb, url, protocol, status, bytes, useragent
| stats count (*) as all, sum ( status < 299 ) as c_s200, sum ( status > 299 and status < 399 ) as c_s300, sum ( status > 399 and status < 499 ) as c_s400, sum ( status > 499 ) as c_s500 by bin (1m)

The trick on this is that expressions like "status > 499" returns 0 if false and 1 if true, and so, adding it up on the time bucket allows to simulate something like a 'count if [condition]'

And so does the sample generated graph look like, on the visualisation tab.

How can we further partition this by verb? In Kibana, I can have a stacked bar chart like you have, but also stack per count per verb. – baumannalexj Oct 14, 2021 at 19:02 As far i know, you can't do that using cloudwatch visualization, as of yet at least. The visualization capabilities are pretty basic. Definitively something that could see some improvements – 6ugr3 Oct 15, 2021 at 20:13 This is great! I'd instinctively done my query like a pivot table (one row per status+bin combo). By instead doing sums like this to create multiple columns with one row per bin I was finally able to get the line and stacked area graphs I wanted. – tbernard Nov 26, 2021 at 16:14

Here is another variation based on the answers of @smth et al. Not being able to build a chart automatically based on the values is very annoying, so you need to mention each value explicitly.

fields @timestamp, @message
| filter namespace like "audit-log"
| fields coalesce(`audit.client-version`, "legacy") as version
| parse version "legacy" as v0
| parse version "886ac066*" as v1
| parse version "93e021e2*" as v2
| stats count(*), count(v0), count(v1), count(v2) by bin(5m)

Result: is it still not possible to do that automatically as of 2021? I have over 30 different values and there may be more in the future, this is not an option for me unfortunately :( – DataGreed Aug 23, 2022 at 2:45

There is also an extension to the workaround described by @smth that can support more complicated statistics than count(). Here's an example that graphs the CPU usage across different instances over time:

| fields (instance_id like "i-instance_1") as is_instance_1, (instance_id like "i-instance_2") as is_instance_2
| stats sum(cpu * is_instance_1) as cpu_1, sum(cpu * is_instance_2) as cpu_2 by bin(5m)
                This solution was incredibly helpful...I needed to essentially create an additional dimension. Since I new the "like" values ahead of running the query, I was able to make it work!
– jerome
                Oct 23, 2020 at 19:04
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.