CPUSE timeout during ‘Saving File Permissions’
Recently I had to install a Jumbo Hotfix on a R77.30 VSX cluster for a customer. I made sure CPUSE was updated to the latest version.
When the Jumbo Hotfix installation was in progress I noticed that the status ‘Saving File Permissions’ did not finish. It looked like if this process was hanging. We waited a long time, but in the end the installation of the Jumbo Hotfix was simply cancelled without any warning or message being displayed.
When investigating the installation logs in /opt/CPInstLog I found an entry for ‘ Saving File Permissions’. This status means a shell script is created containing the inventoried files. In this particular case the file /ramdisk/ping/tmp/HFtemp/rraqp/prodPrems.sh was being created. The part ‘rraqp’ seems to be random as with other installations I experienced other combinations of letters.
This was the key of the problem. The status ‘Saving File Permissions’ was stalling because the file prodPrems.sh will have all inventoried files and its locations for which it needs to save the permissions.
There were millions of zero-byte files in /opt/CPshrd-R77/CTX/CTX00001/tmp/ all starting with ‘file’ and than a combination of numbers and letters. All these filed had to be entered into prodPrems.sh so you can imagine this would take a lot of time.
But how did I found out there were millions of files? A simple ‘ls -al’ show the names of the files and the size.
Though these commands did not work:
[Expert@FWL-EXAMPLE-001:0]# ls -al file* bash: /bin/ls: Argument list too long [Expert@FWL-EXAMPLE-001:0]# rm file* bash: /bin/rm: Argument list too long
The command that did work:
[Expert@FWL-EXAMPLE-001:0]# find /opt/CPshrd-R77/CTX/CTX00001/tmp/ -type f -name 'file*' | wc -l 3583413
3,583,413 zero-byte files! I’ve contacted TAC via chat and they found out these were temporary files related to the enabled Threat Emulation blade. But they said these files should’ve been removed automatically. Most of the files were very old, so we knew whatever process that should remove them did not work.
The only way to delete these file quick via a one-liner was:
[Expert@FWL-EXAMPLE-001:0]# find /opt/CPshrd-R77/CTX/CTX00001/tmp/ -type f -name 'file*' -delete
As this was a VSX gateway we checked the other directories (CTX00001…CTX00004) and found even more zero-byte files for the other Virtual Systems.
It took a few hours to delete al those temporary zero-byte files. When it was finished we started the Jumbo Hotfix installation again and found out that this time the status ‘ Saving File Permission’ did not take long to finish. Just as expected as the millions of zero-byte files were gone.
At this moment an SR is under investigation at TAC. Three things need to be answered:
- Why aren’t these temporary zero-byte files cleaned up automatically?
- Why does the CPUSE installer fail without any error message?
- Is this a known error? Is a hotfix available?
As soon we know more I will update this story.
[UPDATE January 31, 2017]
On another VSX gateway a colleague had a similar problem. This time he found a lot of zero-byte files in /opt/CPcvpn-R77/CTX/CTX00001/tmp but here the name of the the files started with sess_xxxxxxxx files (example: sess_fee6a224e4df20a9a753a06333a3b4e). On this particular gateway Capsule Workspace is used which seem to cause this enormous amount of files. And again, this causes Jumbo Hotfix installations to get stuck at ‘Saving File Permissions’. Also under investigation at TAC.