Is healthy: false Failure reason: The node XXXXX is not replicating Severity: CRITICALAdditional links: []


Environment

Data Center instances having more than one node in the cluster.

Diagnosis

  • Review atlassian-jira.log for the affected node showing “Cluster Cache Replication” health check warning. Following traces are noticed in the logs:
2023-06-01 11:58:15,055+0200 main WARN      [c.a.jira.util.JiraUtils] IP/Hostname address cannot be calculated for this host. Please fix this.
2023-06-01 11:58:15,180+0200 main ERROR      [n.sf.ehcache.Cache] Unable to set localhost. This prevents creation of a GUID. Cause was: XXXXX: XXXXX: Name or service not known
java.net.UnknownHostException: XXXXX: XXXXX: Name or service not known
2023-06-01 11:58:15,971+0200 main WARN      [n.sf.ehcache.CacheManager] Cache com.atlassian.jira.task.TaskManagerImpl.taskMaprequested bootstrap but a CacheException occured. Error bootstrapping from remote peer. Message was: java.lang.reflect.InvocationTargetException
net.sf.ehcache.distribution.RemoteCacheException: Error bootstrapping from remote peer. Message was: java.lang.reflect.InvocationTargetException
    at net.sf.ehcache.distribution.RMIBootstrapCacheLoader.doLoad(RMIBootstrapCacheLoader.java:176)
Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested exception is: 
    java.net.ConnectException: Connection refused (Connection refused)
    at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(Unknown Source)
    at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source)
    at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source)
    at java.rmi/sun.rmi.server.UnicastRef.invoke(Unknown Source)
    at java.rmi/java.rmi.server.RemoteObjectInvocationHandler.invoke(Unknown Source)
    at com.sun.proxy.$Proxy40.getKeys(Unknown Source)
    ... 64 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
  • Verify /etc/hosts entries on the affected node t o confirm if there exists an entry like below:
2021-06-04 18:26:25,830+0000 localq-reader-12 ERROR      [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Abandoning sending: LocalQCacheOp{cacheName='com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat', action=PUT, key=node2, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1622831185825} from cache replication queue: [queueId=queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put, queuePath=/var/atlassian/application-data/jira-home/localq/queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.MarshalException: error marshalling arguments; nested exception is:
        java.net.SocketException: Broken pipe (Write failed)

Cause

These Errors suggest that there is some misconfiguration with the /etc/hosts entries . The node XXXXX is pointing to 127.0.1.1 but this IP address is not resolving to the node itself and hence, Connection refused Error.

Solution

  • Please comment out ( add a '#' in front of lines ) below entries in the /etc/hosts file.