mdadm: Rebuild20 event detected on md device
Looking at server logs in search of clues about a recent filesystem corruption incident, I stumbled upon the following messages :
Aug 5 01:06:01 kivu mdadm: RebuildStarted event detected on md device/dev/md0 Aug 5 01:43:01 kivu mdadm: Rebuild20 event detected on md device /dev/md0 Aug 5 02:15:01 kivu mdadm: Rebuild40 event detected on md device /dev/md0 Aug 5 02:59:02 kivu mdadm: Rebuild60 event detected on md device /dev/md0 Aug 5 04:33:02 kivu mdadm: Rebuild80 event detected on md device /dev/md0 Aug 5 05:24:33 kivu mdadm: RebuildFinished event detected on md device/dev/md0
We never asked for a manual rebuild of that RAID array so I started thinking I was on to something interesting. But ever suspicious of easy leads I went checking for some automated actions. Indeed that was a false alarm : I found that a Debian Cron script packaged with mdadm at /etc/cron.d/mdadm contained the following :
# cron.d/mdadm -- schedules periodic redundancy checks of MD devices # By default, run at 01:06 on every Sunday, but do nothing unless # the day of the month is less than or equal to 7. Thus, only run on # the first Sunday of each month. crontab(5) sucks, unfortunately, # in this regard; therefore this hack (see #380425). 6 1 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet
So there, Google fodder for the poor souls who like me will at some point wonder why their RAID array spontaneously rebuilds…
Now why does the periodic redundancy check appear like a rebuild ? Maybe a more explicit log would be nice there.
57 responses to “mdadm: Rebuild20 event detected on md device”
Leave a Reply
You must be logged in to post a comment.
Additional information from /usr/share/doc/mdadm/README.checkarray :
checkarray will run parity checks across all your redundant arrays. By default, it is configured to run on the first Sunday of each month, at 01:06 in the morning. This is realised by asking cron to wake up every Sunday with /etc/cron.d/mdadm, but then only running the script when the day of the month is less than or equal to 7. See #380425.
‘check’ is a read-only operation, even though the kernel logs may suggest otherwise (e.g. /proc/mdstat and several kernel messages will mention “resync”).
Additional information from /usr/share/doc/mdadm/FAQ.gz :
21. Why does the kernel speak of ‘resync’ when using checkarray
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please see README.checkarray and http://www.mail-archive.com/linux-raid@vger.kernel.org/msg04835.html.
In short: it’s a bug. checkarray is actually not a resync, but the kernel does not distinguish between them.
I noticed the very same “error” message on one of my servers (the others use hardware raid arrays). I thought that one of my drives was near it’s end of life! Until I found this post! And guess what… It’s the first sunday of the month :) So thanks for sharing this info!
Same reaction as for Hugo. Thanx for the info.
Haha, thank god i found this page otherwise i would have missed church while i pulled my hair out haha..
Thanks
Thanks for the post, and kudos on your foresight and compassion for googlers.
This Google query “checkarray RebuildStarted event detected on md device” worked nicely.
Omg… First sunday of the month again :D
I can only join the others in thanking you for publishing this. I too thought something was seriously wrong with a 5.6T RAID5 array, and finding this means I can once again enjoy my Sunday. Thanks!
Note to self: Get hot-spare for /home
:) Yep. /me preemptively waves to Sunday, May 4th, 2008 viewers.
Thanks. Luckily for me, your post was the first google hit I got, so I can keep my hair in the head.
Ah. I see. *That’s* what it’s about. Hello from Sunday, May 4th, 2008 ;-)
Indeed, thank you on this Sunday, June 1st, 2008.
I was just playing around with the webserver config when I noticed a bit of slowdown. “top” revealed an “md2_resync” process, confirmed by “RebuildStarted event detected” in syslog.
This is a brand-new server and my first RAID, so I almost started to panic. Luckily, I found this page, restoring my faith. :)
/metoo
This wasn’t the first time either when the incessant crunching seek noises from the heads wake me up, my “server” being on a shelf not 3 meters from my bed. Thanks for getting me at least a couple of hours of untroubled sleep!
We should all meet in Jean-Marc’s hometown on the one year anniversary of the original post :-)
Front: “I too woke up Sunday to find my soft RAID rebuilding and all I got was this lousy t-shirt!”
Back: “Thanks Jean-Marc for your Serendipitous Altruism.”
(incredibly fitting name for the blog, BTW. Too funny.)
Thanks! same as above :)
Long live to this thread…
Hello at the 1st Sunday in July!
Thanks very much Google and Jean-Marc. Just recived the System Event Logcheck Mail
Jul 6 01:34:03 DHC001 mdadm: Rebuild20 event detected on md device /dev/md2
Thanks to this post, my heartrate is back to normal.
ps. i want this T-Shirt too! ;)
Greetings from the first Sunday of Juli 2008, may there be many Sundays to come! Only one more month till this will be a one year old post ^_^
I’ve got 5x1TB in RAID5, so seeing a rebuild is not funny at all. Thank god I can go to bed without worry. My data will (should) still be there in the morning.
Thankyou, excellent “google fodder”, solved my query at once. My 500gb 5-disk array (yeah nice and old) broke recently, and I foolishly set it rebuilding, not realising it would take the better part of a weekend to fix itself.
I think this your page get his on the first month of each sunday, check your traffic logs, would be interesting to graph it :-)… It’s the first sunday of the month, and I’m here, and guess what, my raid array has just rebuilt!!
thanks!!
Hello from 2009! :D
Nothing more to add…
Hehe, had a rebuild recently here as well.
Regards,
Jørgen.
question to this issue:
we are running 2 servers with software raid 1 and during the check the load gets very high and services seem to fail (i.e. ssh).
hardware is new and identical, dual xeon with 2gb ram, two WD1000FYPS. don’t know why. but can’t the check safely be turned off? don’t know how important this really is but it sucks my system…
Thanks!
@Just-a-question: This check is done for your ease of mind. I myself love the fact my array is being checked at least once a month. Either move it to a different time when your server isn’t busy, or remove the /etc/cron.d/mdadm script all together.
Though I would check hdparm to see if you don’t have a bottleneck somewhere. Maybe something is sharing IRQs with your controller? Or maybe your disks are set to Legacy IDE instead of native SATA?
Was just a victim of that hack…
Hello from european timezone !
Good night ! :)
Ubuntu Hardy, Intrepid and Jaunty will come up with a more descriptive dmesg message:
[34887.347576] md: data-check of RAID array md0
Also, /proc/mdstat now mentions ‘check’ instead of ‘resync’ :)
[===========>………] check = 57.7% (141069696/244198400) finish=110.4min speed=15558K/sec
FWIW Ubuntu (Intrepid) /var/log/syslog still refers to it as a “Rebuild20”, even though dmesg now gets it right.
Hello from April 5th 2009, and thanks :-)
Just go into /etc/cron.d/mdadm, comment out the checkarray run line, then chattr +i the file so it never gets changed back (but still keeps dpkg happy).
It’s insane to do a thing like this on a production server when there is no indication of RAID issues, that’s what SMART monitoring is for. The first time I ran across this I was furious… it takes days to “check” a multi-terabyte RAID5 and RAID6 array and _significantly_ impacts production workload. The “resync” message threw me off and thus made me believe a drive or the controller had suffered some significant error event… anyway, long story short, somebody owes me a couple days of my life back.
In the spirit of getting waked up in the middle of the night by a too eager monitoring software. Heres a tip off the hat from Sunday 7th June 09. :-)
I have just had this happen on my Fedora 11 box. That box does not have the mdadm cron job however, or the /usr/share/mdadm/checkarray script.
All symptoms are the same:
md1 : active raid1 sdb3[0] sda3[1]
974446592 blocks [2/2] [UU]
[==================>..] check = 92.9% (905802048/974446592) finish=45.0min speed=25419K/sec
It is Thursday morning, not Sunday, and this machine has been on since Sunday.
Anybody know what could be the situation in my case ? Feel free to mail me at thomas (at) apestaart (dot) org
Came across this thread while looking for the meaning of Rebuild20 and friends(20, 40, 60, 80). Turns out (as you can see from the sequence) they are progress indicators.
I did want to address one poster above who says running checkarray is not necessary. It’s a REALLY good idea to do it. It’s what’s known as a RAID scrub. Bits on the disk can go bad and you’ll never know until you attempt to read the bad bit. Running a scrub gives both your disk a chance to reallocate bad sectors and mdadm a chance to restore the data should a read error occur. So to prevent silent bit rot, you need to occasionally read all of the data on your disk(s).
Just got bit by this myself. My opinion is that it’s not suitable for production systems to automatically do this. At least put a question to the user if they want this turned on.
My main database runs on an Areca hardware card that autoscrubs a 16 disk RAID-10 but doesn’t get in the way. I have several replicant slaves that have just a pair of 1TB 7200RPM SATA drives because they are read only and don’t need massive performance on IO. They work just fine. Until the first sunday of the month. I don’t care about bit rot, the data is all reproduceable from the master anytime I need it. I can monitor the drives with smart and replace them as needed easily and cheaply.
However, a batch job that takes 6 hours ran for 12 hours last night and every customer I had was bitching about how slow the system was. In general there’s a lot of smart stuff done with mdadm. This is NOT one of them.
Thanks for this post! You save me a lot of time !
Another Sunday (well, in UTC), another thank you for this post =)
damn…first sunday :D cheers
[x] Add me.
Greetings from 4th of July 2010 (Sunday) ;)
Hi from 1st August 2010 :)
Greetings from 5th Sept 2010 (Sunday) ;)
Now I know why nagios was whining about high server load. Surprised I haven’t noticed this before now.
Thanks for the post!!!
Greetings from October 3rd, 2010!
Saved me my morning! Thanx!
So it’s the 7th November and today I was surprised to find my message has changed! (It actually changed before last months scan too but I didn’t notice!) I’m running debian unstable and I now get the message:
“checkarray: I: selecting idle I/O scheduling class for resync”
Hopefully now people getting this new message will still be able to find our little club once the Google bots have had a read!
-Haz
Same thing happened to me! Decided to run some hdparm tests to see how fast the array was on a new 2TBx8 array that had just finished resyncing the day prior, and when I found it was kind of slow, I checked the status and found it resyncing! This was a good post, now I know my new drives aren’t bad (yet!)
Thanks again!
I wonder how many hours of sleep have been lost to this deceptive mdadm process. SABnzbd is idle but showing a raised load average. Hmm…
mdadm –detail /dev/mdX
State : clean, recovering
Rebuild status : 92% complete
Urgh. Is a six month old WD RE3 on the way out? :(
Check the syslog, google ‘mdadm Rebuild20 event’…
Ahh! Merci mille fois, Jean-Marc!
Waves hello from Sunday May 1st 2011!
Now I can go back to watching Doctor Who in peace. :)
And thank you & hello from 5th Jun 2011 (obviously 1st Sunday of month). I got worried after seeing “Rebuild68 event detected on md device /dev/md/1” flying among other syslog messages while monitoring my systems.
The OP could publish some statistics: how many people are bitten by this incorrect syslog message every 1st Sunday? This really sucks. It has been several years and nobody has changed the message to “Scrub (68% complete) event detected on md device /dev/md/1”. I got bitten by Ubuntu 11.04 (maverick).
Three years and still going. More thanks from me on 5th June 2011.
good post, thank you for the information. as a major linux noob i was worried that i had accidentally forked my 4tb raid 5 array. great ‘google fodder’ indeed.
funny how many of the replies were all posted on a sunday :)
Sunday here too and logwatch tells me
mdadm[1604]: Rebuild40 event detected on md device /dev/md0
mdadm[1604]: RebuildFinished event detected on md device /dev/md0, component device mismatches found: 768
Thanks.
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.10
DISTRIB_CODENAME=maverick
DISTRIB_DESCRIPTION=”Ubuntu 10.10″
Goddamn! I’m running a 30TB RAID-6 JFS array (12 x 3TB), and I thought I was going crazy… It froze every weekend!
After finally pooring through syslog, I found that the last message before the hard reset was “Rebuild42 event detected” and traced it back to this web page…
I totally agree that running a check every Sunday (as it states in my /etc/cron.d/mdadm) is just stupid! 30TB? come on! it took two days just to build the array…
Hopefully this will resolve the stability issue. Thanks guys!
Thanks September 4th 2011
still an issue… 10.06.12
I just rebuilt this workstation from a VERY OLD Ubuntu to Linux Mint 15, and was hacking around at 1am, wondering why my CPU load was so high. Poking around my logs… PANIC! Search. Find this page. Unpanic.
Hello from December 1st, 2013! One year into the mayacalypse and still paranoid.
And another thanks. Have run an array for quite some time, and this was the first time I noticed this behaviour!
Another thanks.
07.06.2015
Thanks! You saved me some time!
Another 1st Sunday thanks.
And another 1st Sunday :-)
Big thanks!