Attack of the undead /USR/SBIN/CRON
I just maybe apparently (inch’Allah !) about (I am definitely unsure…) got rid of (“solved” would probably be too strong a word for that haphazard process) a very vexing problem that kept me wondering for weeks.
On an otherwise very healthy Debian Linux host I saw idle /USR/SBIN/CRON processes begin to accumulate by the hundreds at a rate of a few every few minutes and after some time inducing significant load although some of them eventually died. Killing them was only a temporary remedy as they kept reappearing. I could not link their appearance to specific cron jobs nor could I link them to a specific command. And hours of sifting through forums and mailing lists yielded nothing conclusive : /USR/SBIN/CRON processes not terminating were not unheard of but their causes seemed to be varied and most often quite mysterious.
Liberal use of strace with various combinations of ‘-p’ ‘-f’ ‘-F’ and ‘-ff’ binding to the running cron daemon process and following vforks showed that the undead processes were left listening on an open connection. I also observed that the /USR/SBIN/CRON spawning was inhibited by an attached strace – in presence of strace the children did receive their missing SIGSTOP. And sometimes days went by with no manifestation of the dreaded processes – but as soon as I thought the problem was solved they began to reappear…
Anyway, finding that the undead processes were left listening on an open connection was the smelly trail I was looking for. ‘netstat -p | grep tcp | grep CRON’ soon showed me that each one of them had an open connexion to the local LDAP server. Then ‘lsof | grep cron | grep ldap’ hinted that it was not the cron process itself that was directly connecting to the LDAP server but an underlying library involved in our PAM LDAP user management system.
Armed with those new results I went hunting for some wild data and found a discussion between Robert Rakowicz and Jerome Reinert about a somewhat similar problem. But the maintenance operations Jerome Reinert suggested on slapd‘s Berkeley DB database did not solve the problem.
For now I have read another post mentioning that versions mismatches and assorted maintainance issues in slapd‘s Berkeley DB database can cause a similar problem. I can’t find the adress anymore but if I do I’ll post it here. We found that a simple slapd restart got us rid of the undead /USR/SBIN/CRON. It has been a few days and I have not seen one again… We keep our fingers crossed – maybe an upgrade silently fixed the problem…
Meanwhile I posted this to debian-user just in case someone there recognizes this problem as something familiar…
Since then I have seen the problem appear again and restarting slapd temporarily fixed it. I am using slapd 2.2.26 from Debian. Maybe I should upgrade to 2.3.23 : although it is available through Debian Unstable it has been released upstream one year ago so maybe I should trust it…
8 responses to “Attack of the undead /USR/SBIN/CRON”
Leave a Reply
You must be logged in to post a comment.
I face similar issue with sles8 and rhel3.
This is seen only with the machines on which there is crontab entries for XYZ user and root(no matter if it is empty for root).
Parent of all “/USR/SBIN/CRON” is “/usr/sbin/cron”, seems same program only small letters for name.
The only solution I found is.
stop the cron damon.
kill all cron processes.
start the cron damon
For now I preemptively restart slapd by cron a few times every day and it takes care of the problem when it still happens once every few weeks. It does the job somehow but I don’t like it when I don’t understand the problem and rely on such unclean hack…
Wow, there are references to this problem going back to 2001 and still not many if any common explanations.
In my case, I’ve been running cron jobs without incident for about 5 years (redhat, and several versions of ubuntu). Recently, the spurious process has begun to appear in between scheduled cron jobs, either the user’s or root’s, at the rate of twice per hour. Once it is present, any regularly scheduled cron jobs then add further such zombie/sleeping/uninterruptible processes. Bizarre.
Anyway, I don’t have any slapd on my system, so I end up killing the spurious processes directly by another cron job like this:
my $pid;
my @pid = qx(ps -C cron -o args,pid=);
foreach $pid (@pid) {
if ($pid =~ m/CRON.*?([0-9]+)/) {
system(“sudo -S kill -9 “.$1);
}
}
I am not a programmer, so the ps command comes from an example on the man page, and the sudo -S idea comes from the perl.beginners list on perl.org; though it is a hack as noted above, I still appreciate how concisely such a messy hackjob can be implemented in perl/unix. Doesn’t seem to be necessary to stop/start the daemon (which as noted is the lower case process, at least on ubuntu).
[…] Neuinstallation, welche jetzt keine Besserung zeigte. Die ersten Google-Tipps bezüglich PAM und SLAP halfen mir nicht sonderlich und auch lsof sowie das Syslog zeigen nichts brauchbares. Selbst Strace […]
Hi, I got exactly the same problem on one of my servers, an old Debian 3: every process in cron would spawn an /USR/SBIN/CRON fork that would (almost) never die!! And I’ve got munin-node running every 5 minutes…. It was driving me (and the load average) crazy, until I found a solution: an apt-get update and upgrade! I don’t know if it will fix that for everyone but for me it worked.
Dude the apt-get update worked
… I recommend this to all of you!
I’m glad it worked ;)
Let’s see if someone else finds that useful as well!!