Some days ago we had trouble on one of our QFXes where the jdhcpd deamon would consume 100% CPU and “crash” – resulting in users not getting IP’s anymore.
While TAC is still investigating, I made a quick Workaround for this – the DHCP-Sheriff 😉
#!/bin/bash
current=$(top | grep jdhcpd | awk '{ print $10 }')
desired="1.00%"
if [ ${current%.*} -eq ${desired%.*} ] && [ ${current#*.} \> ${desired#*.} ] || [ ${current%.*} -gt ${desired%.*} ]; then
echo "$(date)" >> /var/log/dhcp-sheriff.log >> /var/log/dhcp-sheriff.log
echo The current load of dhcp-service is above desired value - restarting the service >> /var/log/dhcp-sheriff.log
echo "${current} >= ${desired}" >> /var/log/dhcp-sheriff.log
cli restart dhcp-service
echo "" >> /var/log/dhcp-sheriff.log;
else
echo "$(date)" >> /var/log/dhcp-sheriff.log
echo The current load of dhcp-service is below desired value - no action needed >> /var/log/dhcp-sheriff.log
echo "${current} <= ${desired}" >> /var/log/dhcp-sheriff.log
echo "" >> /var/log/dhcp-sheriff.log;
fi
This Script restarts the Service if the load of the Service is above 1% (adjustable) – this can be easily adopted to other services and thresholds.
1.) Login as root and in shell type: vi /var/tmp/dhcp-sheriff.sh
2.) Press “i” and paste the above lines, followed by “[Esc-Button]”. Save and Quit with :wq
3.)
chmod +x /var/tmp/dhcp-sheriff.sh
4.)
crontab -e
0 */8 * * * sh /var/tmp/dhcp-sheriff.sh (executes it every 8h)
5.)
crontab -l
0 */8 * * * sh /var/tmp/dhcp-sheriff.sh
6.) in cli check after job has finished to run via show log dhcp-sheriff.log
Feel free to use this to your advantage – hopefully this will be a workaround for you in urgent-times until a fix is released.
This is only a workaround – do not use this in production for a long time / use at your own risk.
