Almost a year ago I ran into the CPUSE timeout issue when Saving File Permissions. Last week I ran into a similar problem when creating a snapshot of the same VSX gateway. In a year time this VSX gateway received some newer Jumbo Hotfixes and at this point Jumbo Hotfix Take 286 (GA) was installed. I’ve never received a hotfix from TAC when I had the CPUSE issue back then. We only had the workaround with manually deleting the temporary files in/opt/CPshrd-R77/CTX/CTX00001/tmp/. So might it be possible the snapshot and CPUSE issues are caused by the same problem?
It seems that the Gaia Deployment Agent, also known as CPUSE, is affected by a memory leak. When monitoring devices we saw memory usage increasing rapidly to almost unhealthy proportions. Most of the time the process was cleaned up and memory usage was restored to normal values. But today unfortunately a customer experienced problems with clients running Identity Awareness agents and some Site-to-site VPNs when DAService crashed and a coredump was created. Though it was part of a cluster no failover was initiated…
More than a month ago I published this article about a crashing routed on two 1100 appliances when just entering the command ‘show route’ in CLISH.
It took some very long time to get an answer from R&D and apparently they could not reproduce the whole thing. They were able to crash routed but the failover could not be reproduced. After sending some more debug information and further investigation by R&D I was asked to enter two commands in CLISH.
Probably a few weeks ago I was asked to take a look at a Check Point 1100 cluster with Gaia Embedded R77.20.31 on it which had one member with a Down status. This is not a production cluster so when a reboot solved the problem we did not spend more time to investigate this issue deeper. Until then this only happened once.
But yesterday it happened again…
Recently I had to install a Jumbo Hotfix on a R77.30 VSX cluster for a customer. I made sure CPUSE was updated to the latest version.
When the Jumbo Hotfix installation was in progress I noticed that the status ‘Saving File Permissions’ did not finish. It looked like if this process was hanging. We waited a long time, but in the end the installation of the Jumbo Hotfix was simply cancelled without any warning or message being displayed.