Clustering, Performance

Issue

Cause

Diagnosis

Resolution

Cluster node failure

This error is thrown when the primary node cannot connect to the secondary node

  • The date and time of the servers in the clusters are different

  • The heartbeat period is too low

  • A large number of jobs is causing the secondary node to go down

One of the nodes of our clustered instance keeps going down with the following error in the kernel logs -

"Error while detecting node "192.168.193.145" from database :: Last time stamp value not updated by cluster node since last "45" seconds
com.adeptia.indigo.cluster.failure.detection.ClusterDBFailureDetection.run(ClusterDBFailureDetection.java:45) "

  1. Ensure the times of all the nodes in the cluster are the same.

  1. Increase the heartbeat period of the cluster. This is the "abpm.node.heartbeat.period" property in the server-configure.properties file (ServerKernel/etc)

  1. Turn on the Queue Processor and setting a limit on how many concurrent process flows can be executed by the server at the same time. Please refer to the developer's guide for more information on the Record Queue Processor.

Web Service Provider on cluster Call Fails.

From the behavior of the system and the error message generated, this issue is caused by node1 being unable to communicate with node2 through ports 21000 and 1098 (default RMI ports)

We have a Web Service Provider published on an Adeptia Cluster. A call to service when both nodes of the cluster are up. (The soap address is using the IP of load balancer) -> Call fails in this scenario.

The error returned is below

<?xml version='1.0' encoding='UTF-8'?>

<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">

<S:Body>

<S:Fault xmlns="" xmlns:ns3="http://www.w3.org/2003/05/soap-envelope">

<faultcode>S:Server</faultcode>

<faultstring>java.lang.RuntimeException: Published transaction did not produce output.</faultstring>

</S:Fault>

</S:Body>

</S:Envelope>

Call to service when the node B is down. (The service was deployed using migration utility in Node A) and soap address still using the IP of load balancer ->Call succeeds

In the Webrunner.log file, we are seeing this error

2016-04-28 13:43:59,025 ERROR [qtp2142659404-22] webservice com.adeptia.indigo.services.webservice.metro.WsTransactionImlMetro.invoke(WsTransactionImlMetro.java:466) - ||||null|||||null|Error while executing transaction through web service provider :: Error in creating process flow.: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: node1; nested exception is:
2016-04-28 13:43:59 java.net.ConnectException: Connection refused: connect][Error in creating process flow.: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: node1; nested exception is:
2016-04-28 13:43:59 java.net.ConnectException: Connection refused: connect]]|apses1639|
2016-04-28 13:43:59 java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: node1; nested exception is:
2016-04-28 13:43:59 java.net.ConnectException: Connection refused: connect]

Enable connectivity between node 1 and node 2 through ports 21000 and 1098

Clustering Unable to Connect FTP target using SSH

It seems that the Key Manager was created from Node1. Alternate PF is failing because it is not able to find the ppk file on second node.

https://support.adeptia.com/hc/en-us/article_attachments/360016909911/309689870898613495d5a5be5471db8a7f082267292bd999bc56473e6a48ed79.png

If we face a scenario regarding our FTP target using SSH. The process flow executes successfully, then on the next run aborts.

https://support.adeptia.com/hc/en-us/article_attachments/360016909871/960759292154d57dd40cc83cd16060d59deadbb0d804d6a3150770de0576c676.png

Error:
https://support.adeptia.com/hc/en-us/article_attachments/360016909891/2c9ee16a2a183d30fd478882310e5f877290995e53cd3c631785270b4bfee666.png

Create the Keystore activity from another node also so that it can find the ppk file on second node as well.