More than a month ago I published this article about a crashing routed on two 1100 appliances when just entering the command ‘show route’ in CLISH.
It took some very long time to get an answer from R&D and apparently they could not reproduce the whole thing. They were able to crash routed but the failover could not be reproduced. After sending some more debug information and further investigation by R&D I was asked to enter two commands in CLISH.
set ospf area 1 on set ospf area 1 off
This cluster is not using dynamic routing so it’s kind of strange to enter these commands. But hey…I’ve been waiting so long for R&D to come up with something…so why not give it a try?
Funny thing is…this actually solved the issue! When entering ‘show route’ the correct output is displayed, no pnotes are registered, no member will go down and no failovers are occurring anymore. When the member was rebooted the problem did not return so this looks like a permanent fix.
I’ve asked R&D for more background information. I just assume some potentially corrupted routed configuration files or database entries were overwritten by those commands and might have fixed the routed issue. When the 1100 appliance is now rebooted we also see routed startup messages in /var/log/messages which weren’t there before we applied this “fix”.
Again, waiting on R&D for the final explanation. To be continued…
UPDATE 7 March 2017:
R&D did not explain the background but informed me this was just a workaround. A fix is being implemented in one of the next HFA’s for Gaia Embedded R77.20. They could not promise the fix would already be incorporated in the next available HFA as this depends on test results.
It seems R77.20.51 was released on February 21st but as expected the fix was not already included in the next available HFA. I guess we have to wait for the next HFA.