Is healthy: false
Failure reason: The node XXXXX is not replicating
Severity: CRITICALAdditional links: []
Environment
Data Center instances having more than one node in the cluster.
Diagnosis
-
Review atlassian-jira.log for the affected node showing “Cluster Cache Replication” health check warning. Following traces are noticed in the logs:
2023-06-01 11:58:15,055+0200 main WARN [c.a.jira.util.JiraUtils] IP/Hostname address cannot be calculated for this host. Please fix this.
2023-06-01 11:58:15,180+0200 main ERROR [n.sf.ehcache.Cache] Unable to set localhost. This prevents creation of a GUID. Cause was: XXXXX: XXXXX: Name or service not known
java.net.UnknownHostException: XXXXX: XXXXX: Name or service not known
2023-06-01 11:58:15,971+0200 main WARN [n.sf.ehcache.CacheManager] Cache com.atlassian.jira.task.TaskManagerImpl.taskMaprequested bootstrap but a CacheException occured. Error bootstrapping from remote peer. Message was: java.lang.reflect.InvocationTargetException
net.sf.ehcache.distribution.RemoteCacheException: Error bootstrapping from remote peer. Message was: java.lang.reflect.InvocationTargetException
at net.sf.ehcache.distribution.RMIBootstrapCacheLoader.doLoad(RMIBootstrapCacheLoader.java:176)
Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested exception is:
java.net.ConnectException: Connection refused (Connection refused)
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(Unknown Source)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source)
at java.rmi/sun.rmi.server.UnicastRef.invoke(Unknown Source)
at java.rmi/java.rmi.server.RemoteObjectInvocationHandler.invoke(Unknown Source)
at com.sun.proxy.$Proxy40.getKeys(Unknown Source)
... 64 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
-
Verify
/etc/hosts
entries on the affected node t
o confirm if there exists an entry like below:
2021-06-04 18:26:25,830+0000 localq-reader-12 ERROR [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Abandoning sending: LocalQCacheOp{cacheName='com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat', action=PUT, key=node2, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1622831185825} from cache replication queue: [queueId=queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put, queuePath=/var/atlassian/application-data/jira-home/localq/queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.MarshalException: error marshalling arguments; nested exception is:
java.net.SocketException: Broken pipe (Write failed)
Cause
These Errors suggest that there is some misconfiguration with the
/etc/hosts entries
. The node XXXXX is pointing to 127.0.1.1 but this IP address is not resolving to the node itself and hence,
Connection refused
Error.
Solution
-
Please comment out (
add a '#' in front of lines
) below entries in the
/etc/hosts
file.