Apologies in advance if I ask ignorant questions, or don't provide all relevant info in my post. I'm not an expert sysadmin, or network admin, or security admin. I've inherited Elastic from a departed co-worker, and have minimal understanding of how it works based on our previous architecture of installing individual beats agents. Now I am trying to figure out how to use Fleet and the unified Elastic Agent.
I'm working with Elastic v7.17.
I've set up multiple Elasticsearch hosts in AWS on RHEL8 EC2 instances, plus I've installed kibana on another RHEL8 host. I have certificates on all the hosts, generated by a certificate authority that we established using AWS ACM.
Before my co-worker departed we got as far as successfully executing the installation of the agent on these hosts to make them Fleet Server. They all show up in Fleet Agents in kibana. On each host the elastic-agent status is reporting Healthy. However, no data is showing up in Data Streams.
The agent policy applied to the hosts includes the following integrations: fleet-server, auditd, system, linux, Endpoint Security.
When I look at the logs on the server, specifically
/opt/Elastic/Agent/data/elastic-agent-*/logs/default/metricbeat-json.log and filebeat-json.log, I am find this message repeating over and over:
{"log.level":"error","@timestamp":"2022-03-11T21:46:58.160Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":154},"message":"Failed to connect to backoff(elasticsearch(http://localhost:9200)): Get \"http://localhost:9200\": EOF","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-03-11T21:46:58.160Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":145},"message":"Attempting to reconnect to backoff(elasticsearch(http://localhost:9200)) with 1 reconnect attempt(s)","service.name":"metricbeat","ecs.version":"1.6.0"}
I'm sure I haven't provided enough information to diagnose the issue yet. Again I apologize for being a total newbie, and hope somebody will take pity on me.
I am now getting Endpoint Security data in data streams. The solution to that problem was to fix a type in the ssl.certificate_authorities setting in Fleet Settings.
I am still experiencing the problem where no data is coming through from the auditd, system, and linux integrations.
Reiterating my scenario...
I'm using Elastic 7.17, self managed.
I've set up three elasticsearch nodes on RHEL 8 in AWS EC2. Each of these has additionally been set up as a Fleet Server.
I've set up a fourth EC2 RHEL8 host for kibana.
All four hosts have certificates signed by a certificate authority we set up using AWS Certificate Management.
We are using an AWS NLB for managing traffic, so the Fleet Settings are:
Fleet Server Hosts: https://:8220
Elasticsearch Hosts: https://:9200
On the NLB we have set up listeners for the two ports above. Each one is forwarding to a target group that is comprised of the three Elasticsearch nodes.
On the Elasticsearch EC2 instances a security group has been assigned with inbound rules for ports 8220, 9200, and 9300 all allowing TCP traffic from the VPC CIDR.
On the kibana EC2 instance a security group has been assigned with inbound rules for ports 5601 and 443 allowing https traffic from our application load balancer.
On the kibana instance in the Agent/data/elastic-agent-*/logs/default/filebeat-json.log file I see the following messages repeating:
{"log.level":"error","@timestamp":"2022-03-15T17:03:41.086Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":154},"message":"Failed to connect to backoff(elasticsearch(http://localhost:9200)): Get \"http://localhost:9200\": dial tcp [::1]:9200: connect: connection refused","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-03-15T17:03:41.086Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":145},"message":"Attempting to reconnect to backoff(elasticsearch(http://localhost:9200)) with 94 reconnect attempt(s)","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-03-15T17:03:41.086Z","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":219},"message":"retryer: send unwait signal to consumer","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-03-15T17:03:41.086Z","log.logger":"publisher","log.origin":{"file.name":"pipeline/retry.go","file.line":223},"message":" done","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-03-15T17:03:41.086Z","log.logger":"esclientleg","log.origin":{"file.name":"transport/logging.go","file.line":37},"message":"Error dialing dial tcp [::1]:9200: connect: connection refused","service.name":"filebeat","network":"tcp","address":"localhost:9200","ecs.version":"1.6.0"}
On the elasticsearch nodes in the same file I see these messages repeating:
{"log.level":"error","@timestamp":"2022-03-15T17:11:58.650Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":154},"message":"Failed to connect to backoff(elasticsearch(http://localhost:9200)): Get \"http://localhost:9200\": EOF","service.name":"filebeat","ecs.version":"1.6.0"}{"log.level":"info","@timestamp":"2022-03-15T17:11:58.650Z","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":145},"message":"Attempting to reconnect to backoff(elasticsearch(http://localhost:9200)) with 113reconnect attempt(s)","service.name":"filebeat","ecs.version":"1.6.0"}
Metricbeat and Filebeat are in a perpetual state of "configuring". I've never seen this change to "healthy"
elastic-agent status
Status: HEALTHY
Message: (no message)
Applications:
* metricbeat_monitoring (CONFIGURING)
Updating configuration
* endpoint-security (HEALTHY)
Protecting with policy {bd328999-4957-44fd-9e57-75aad67d7302}
* filebeat (CONFIGURING)
Updating configuration
* fleet-server (HEALTHY)
Running on policy with Fleet Server integration: 499b5aa7-d214-5b5d-838b-3cd76469844e
* metricbeat (CONFIGURING)
Updating configuration
* filebeat_monitoring (CONFIGURING)
Updating configuration
This is the fleet.yml file on one of the elasticsearch nodes: