» mdadm: Rebuild20 event detected on md device

Debian and Systems06 Aug 2007 at 13:24 by Jean-Marc Liotier

mdadm: Rebuild20 event detected on md device

Looking at server logs in search of clues about a recent filesystem corruption incident, I stumbled upon the following messages :

Aug  5 01:06:01 kivu mdadm: RebuildStarted event detected on md device/dev/md0
Aug  5 01:43:01 kivu mdadm: Rebuild20 event detected on md device /dev/md0
Aug  5 02:15:01 kivu mdadm: Rebuild40 event detected on md device /dev/md0
Aug  5 02:59:02 kivu mdadm: Rebuild60 event detected on md device /dev/md0
Aug  5 04:33:02 kivu mdadm: Rebuild80 event detected on md device /dev/md0
Aug  5 05:24:33 kivu mdadm: RebuildFinished event detected on md device/dev/md0

We never asked for a manual rebuild of that RAID array so I started thinking I was on to something interesting. But ever suspicious of easy leads I went checking for some automated actions. Indeed that was a false alarm : I found that a Debian Cron script packaged with mdadm at /etc/cron.d/mdadm contained the following :

# cron.d/mdadm -- schedules periodic redundancy checks of MD devices
# By default, run at 01:06 on every Sunday, but do nothing unless
# the day of the month is less than or equal to 7. Thus, only run on
# the first Sunday of each month. crontab(5) sucks, unfortunately,
# in this regard; therefore this hack (see #380425).

6 1 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +%d) -le
7 ] && /usr/share/mdadm/checkarray --cron --all --quiet

So there, Google fodder for the poor souls who like me will at some point wonder why their RAID array spontaneously rebuilds…

Now why does the periodic redundancy check appear like a rebuild ? Maybe a more explicit log would be nice there.

57 responses to “mdadm: Rebuild20 event detected on md device”

Jean-Marc Liotier says:

August 6, 2007 at 16:53

Additional information from /usr/share/doc/mdadm/README.checkarray :

checkarray will run parity checks across all your redundant arrays. By default, it is configured to run on the first Sunday of each month, at 01:06 in the morning. This is realised by asking cron to wake up every Sunday with /etc/cron.d/mdadm, but then only running the script when the day of the month is less than or equal to 7. See #380425.

‘check’ is a read-only operation, even though the kernel logs may suggest otherwise (e.g. /proc/mdstat and several kernel messages will mention “resync”).

Log in to Reply
Jean-Marc Liotier says:

August 6, 2007 at 16:54

Additional information from /usr/share/doc/mdadm/FAQ.gz :

21. Why does the kernel speak of ‘resync’ when using checkarray
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please see README.checkarray and http://www.mail-archive.com/linux-raid@vger.kernel.org/msg04835.html.

In short: it’s a bug. checkarray is actually not a resync, but the kernel does not distinguish between them.

Log in to Reply
Hugo says:

January 7, 2008 at 0:11

I noticed the very same “error” message on one of my servers (the others use hardware raid arrays). I thought that one of my drives was near it’s end of life! Until I found this post! And guess what… It’s the first sunday of the month :) So thanks for sharing this info!

Log in to Reply
Marc says:

February 5, 2008 at 13:57

Same reaction as for Hugo. Thanx for the info.

Log in to Reply
Julius says:

April 6, 2008 at 4:47

Haha, thank god i found this page otherwise i would have missed church while i pulled my hair out haha..
Thanks

Log in to Reply
Jay says:

April 6, 2008 at 8:18

Thanks for the post, and kudos on your foresight and compassion for googlers.

This Google query “checkarray RebuildStarted event detected on md device” worked nicely.

Log in to Reply
Nils says:

April 6, 2008 at 9:05

Omg… First sunday of the month again :D

Log in to Reply
B. Johannessen says:

April 6, 2008 at 9:08

I can only join the others in thanking you for publishing this. I too thought something was seriously wrong with a 5.6T RAID5 array, and finding this means I can once again enjoy my Sunday. Thanks!

Note to self: Get hot-spare for /home

Log in to Reply
Jay says:

April 6, 2008 at 9:10

:) Yep. /me preemptively waves to Sunday, May 4th, 2008 viewers.

Log in to Reply
philsf says:

April 9, 2008 at 18:42

Thanks. Luckily for me, your post was the first google hit I got, so I can keep my hair in the head.

Log in to Reply
davee says:

May 6, 2008 at 10:45

Ah. I see. *That’s* what it’s about. Hello from Sunday, May 4th, 2008 ;-)

Log in to Reply
wossName says:

June 1, 2008 at 0:38

Indeed, thank you on this Sunday, June 1st, 2008.

I was just playing around with the webserver config when I noticed a bit of slowdown. “top” revealed an “md2_resync” process, confirmed by “RebuildStarted event detected” in syslog.

This is a brand-new server and my first RAID, so I almost started to panic. Luckily, I found this page, restoring my faith. :)

Log in to Reply
Timo says:

June 1, 2008 at 1:02

/metoo

This wasn’t the first time either when the incessant crunching seek noises from the heads wake me up, my “server” being on a shelf not 3 meters from my bed. Thanks for getting me at least a couple of hours of untroubled sleep!

Log in to Reply
Jay says:

June 1, 2008 at 11:56

We should all meet in Jean-Marc’s hometown on the one year anniversary of the original post :-)

Front: “I too woke up Sunday to find my soft RAID rebuilding and all I got was this lousy t-shirt!”

Back: “Thanks Jean-Marc for your Serendipitous Altruism.”

(incredibly fitting name for the blog, BTW. Too funny.)

Log in to Reply
Yves says:

July 5, 2008 at 17:24

Thanks! same as above :)
Long live to this thread…

Log in to Reply
mark says:

July 6, 2008 at 0:12

Hello at the 1st Sunday in July!

Thanks very much Google and Jean-Marc. Just recived the System Event Logcheck Mail

Jul 6 01:34:03 DHC001 mdadm: Rebuild20 event detected on md device /dev/md2

Thanks to this post, my heartrate is back to normal.

ps. i want this T-Shirt too! ;)

Log in to Reply
Mattijs says:

July 6, 2008 at 2:18

Greetings from the first Sunday of Juli 2008, may there be many Sundays to come! Only one more month till this will be a one year old post ^_^

I’ve got 5x1TB in RAID5, so seeing a rebuild is not funny at all. Thank god I can go to bed without worry. My data will (should) still be there in the morning.

Log in to Reply
Ed says:

July 24, 2008 at 11:54

Thankyou, excellent “google fodder”, solved my query at once. My 500gb 5-disk array (yeah nice and old) broke recently, and I foolishly set it rebuilding, not realising it would take the better part of a weekend to fix itself.

Log in to Reply
Dr Raid says:

October 5, 2008 at 15:59

I think this your page get his on the first month of each sunday, check your traffic logs, would be interesting to graph it :-)… It’s the first sunday of the month, and I’m here, and guess what, my raid array has just rebuilt!!

thanks!!

Log in to Reply
A Mad Man says:

January 4, 2009 at 3:09

Hello from 2009! :D
Nothing more to add…

Log in to Reply
Jørgen Tietze says:

January 4, 2009 at 10:13

Hehe, had a rebuild recently here as well.

Regards,
Jørgen.

Log in to Reply
Just a question says:

January 5, 2009 at 8:31

question to this issue:

we are running 2 servers with software raid 1 and during the check the load gets very high and services seem to fail (i.e. ssh).

hardware is new and identical, dual xeon with 2gb ram, two WD1000FYPS. don’t know why. but can’t the check safely be turned off? don’t know how important this really is but it sucks my system…

Log in to Reply
Masoud says:

January 5, 2009 at 11:23

Thanks!

Log in to Reply
Mattijs says:

January 21, 2009 at 16:37

@Just-a-question: This check is done for your ease of mind. I myself love the fact my array is being checked at least once a month. Either move it to a different time when your server isn’t busy, or remove the /etc/cron.d/mdadm script all together.

Though I would check hdparm to see if you don’t have a bottleneck somewhere. Maybe something is sharing IRQs with your controller? Or maybe your disks are set to Legacy IDE instead of native SATA?

Log in to Reply
Kevin Deldycke says:

April 5, 2009 at 1:00

Was just a victim of that hack…
Hello from european timezone !
Good night ! :)

Log in to Reply
Mattijs says:

April 5, 2009 at 2:30

Ubuntu Hardy, Intrepid and Jaunty will come up with a more descriptive dmesg message:
[34887.347576] md: data-check of RAID array md0

Also, /proc/mdstat now mentions ‘check’ instead of ‘resync’ :)

[===========>………] check = 57.7% (141069696/244198400) finish=110.4min speed=15558K/sec

Log in to Reply
Shaun says:

April 5, 2009 at 2:50

FWIW Ubuntu (Intrepid) /var/log/syslog still refers to it as a “Rebuild20”, even though dmesg now gets it right.

Hello from April 5th 2009, and thanks :-)

Log in to Reply
max says:

May 3, 2009 at 9:55

Just go into /etc/cron.d/mdadm, comment out the checkarray run line, then chattr +i the file so it never gets changed back (but still keeps dpkg happy).

It’s insane to do a thing like this on a production server when there is no indication of RAID issues, that’s what SMART monitoring is for. The first time I ran across this I was furious… it takes days to “check” a multi-terabyte RAID5 and RAID6 array and _significantly_ impacts production workload. The “resync” message threw me off and thus made me believe a drive or the controller had suffered some significant error event… anyway, long story short, somebody owes me a couple days of my life back.

Log in to Reply
Rune Nilssen says:

June 7, 2009 at 11:05

In the spirit of getting waked up in the middle of the night by a too eager monitoring software. Heres a tip off the hat from Sunday 7th June 09. :-)

Log in to Reply
Thomas Vander Stichele says:

June 25, 2009 at 11:25

I have just had this happen on my Fedora 11 box. That box does not have the mdadm cron job however, or the /usr/share/mdadm/checkarray script.

All symptoms are the same:
md1 : active raid1 sdb3[0] sda3[1]
974446592 blocks [2/2] [UU]
[==================>..] check = 92.9% (905802048/974446592) finish=45.0min speed=25419K/sec

It is Thursday morning, not Sunday, and this machine has been on since Sunday.

Anybody know what could be the situation in my case ? Feel free to mail me at thomas (at) apestaart (dot) org

Log in to Reply
Randall Smith says:

September 23, 2009 at 21:49

Came across this thread while looking for the meaning of Rebuild20 and friends(20, 40, 60, 80). Turns out (as you can see from the sequence) they are progress indicators.

I did want to address one poster above who says running checkarray is not necessary. It’s a REALLY good idea to do it. It’s what’s known as a RAID scrub. Bits on the disk can go bad and you’ll never know until you attempt to read the bad bit. Running a scrub gives both your disk a chance to reallocate bad sectors and mdadm a chance to restore the data should a read error occur. So to prevent silent bit rot, you need to occasionally read all of the data on your disk(s).

Log in to Reply
Scott Marlowe says:

December 8, 2009 at 22:55

Just got bit by this myself. My opinion is that it’s not suitable for production systems to automatically do this. At least put a question to the user if they want this turned on.

My main database runs on an Areca hardware card that autoscrubs a 16 disk RAID-10 but doesn’t get in the way. I have several replicant slaves that have just a pair of 1TB 7200RPM SATA drives because they are read only and don’t need massive performance on IO. They work just fine. Until the first sunday of the month. I don’t care about bit rot, the data is all reproduceable from the master anytime I need it. I can monitor the drives with smart and replace them as needed easily and cheaply.

However, a batch job that takes 6 hours ran for 12 hours last night and every customer I had was bitching about how slow the system was. In general there’s a lot of smart stuff done with mdadm. This is NOT one of them.

Log in to Reply
zhamrock says:

January 2, 2010 at 18:48

Thanks for this post! You save me a lot of time !

Log in to Reply
jbm says:

April 4, 2010 at 3:02

Another Sunday (well, in UTC), another thank you for this post =)

Log in to Reply
chris says:

May 2, 2010 at 1:45

damn…first sunday :D cheers

Log in to Reply
hiro says:

May 2, 2010 at 14:30

[x] Add me.

Log in to Reply
Andy says:

July 4, 2010 at 14:22

Greetings from 4th of July 2010 (Sunday) ;)

Log in to Reply
James says:

August 2, 2010 at 1:30

Hi from 1st August 2010 :)

Log in to Reply
cdc says:

September 5, 2010 at 17:09

Greetings from 5th Sept 2010 (Sunday) ;)

Now I know why nagios was whining about high server load. Surprised I haven’t noticed this before now.

Thanks for the post!!!

Log in to Reply
Ivo says:

October 3, 2010 at 8:48

Greetings from October 3rd, 2010!

Saved me my morning! Thanx!

Log in to Reply
ComradeHaz says:

November 7, 2010 at 11:11

So it’s the 7th November and today I was surprised to find my message has changed! (It actually changed before last months scan too but I didn’t notice!) I’m running debian unstable and I now get the message:

“checkarray: I: selecting idle I/O scheduling class for resync”

Hopefully now people getting this new message will still be able to find our little club once the Google bots have had a read!

-Haz

Log in to Reply
Chorca says:

January 2, 2011 at 17:26

Same thing happened to me! Decided to run some hdparm tests to see how fast the array was on a new 2TBx8 array that had just finished resyncing the day prior, and when I found it was kind of slow, I checked the status and found it resyncing! This was a good post, now I know my new drives aren’t bad (yet!)

Thanks again!

Log in to Reply
jok says:

April 3, 2011 at 4:36

I wonder how many hours of sleep have been lost to this deceptive mdadm process. SABnzbd is idle but showing a raised load average. Hmm…

mdadm –detail /dev/mdX
State : clean, recovering
Rebuild status : 92% complete

Urgh. Is a six month old WD RE3 on the way out? :(
Check the syslog, google ‘mdadm Rebuild20 event’…
Ahh! Merci mille fois, Jean-Marc!

Log in to Reply
Felixrising says:

May 1, 2011 at 4:19

Waves hello from Sunday May 1st 2011!

Now I can go back to watching Doctor Who in peace. :)

Log in to Reply
snap says:

June 5, 2011 at 5:12

And thank you & hello from 5th Jun 2011 (obviously 1st Sunday of month). I got worried after seeing “Rebuild68 event detected on md device /dev/md/1” flying among other syslog messages while monitoring my systems.

The OP could publish some statistics: how many people are bitten by this incorrect syslog message every 1st Sunday? This really sucks. It has been several years and nobody has changed the message to “Scrub (68% complete) event detected on md device /dev/md/1”. I got bitten by Ubuntu 11.04 (maverick).

Log in to Reply
Joel says:

June 6, 2011 at 15:47

Three years and still going. More thanks from me on 5th June 2011.

Log in to Reply
Jackson says:

July 3, 2011 at 7:48

good post, thank you for the information. as a major linux noob i was worried that i had accidentally forked my 4tb raid 5 array. great ‘google fodder’ indeed.

funny how many of the replies were all posted on a sunday :)

Log in to Reply
Mohclips says:

July 3, 2011 at 11:36

Sunday here too and logwatch tells me

mdadm[1604]: Rebuild40 event detected on md device /dev/md0

mdadm[1604]: RebuildFinished event detected on md device /dev/md0, component device mismatches found: 768

Thanks.

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.10
DISTRIB_CODENAME=maverick
DISTRIB_DESCRIPTION=”Ubuntu 10.10″

Log in to Reply
Justin says:

August 7, 2011 at 20:21

Goddamn! I’m running a 30TB RAID-6 JFS array (12 x 3TB), and I thought I was going crazy… It froze every weekend!

After finally pooring through syslog, I found that the last message before the hard reset was “Rebuild42 event detected” and traced it back to this web page…

I totally agree that running a check every Sunday (as it states in my /etc/cron.d/mdadm) is just stupid! 30TB? come on! it took two days just to build the array…

Hopefully this will resolve the stability issue. Thanks guys!

Log in to Reply
Stam says:

September 4, 2011 at 8:36

Thanks September 4th 2011

Log in to Reply
admin says:

June 11, 2012 at 18:03

still an issue… 10.06.12

Log in to Reply
PhiloVivero says:

December 1, 2013 at 10:16

I just rebuilt this workstation from a VERY OLD Ubuntu to Linux Mint 15, and was hacking around at 1am, wondering why my CPU load was so high. Poking around my logs… PANIC! Search. Find this page. Unpanic.

Hello from December 1st, 2013! One year into the mayacalypse and still paranoid.

Log in to Reply
Stefan Axelsson says:

September 7, 2014 at 10:56

And another thanks. Have run an array for quite some time, and this was the first time I noticed this behaviour!

Log in to Reply
Bogdan says:

June 6, 2015 at 23:49

Another thanks.
07.06.2015

Log in to Reply
irve says:

December 6, 2015 at 16:33

Thanks! You saved me some time!

Log in to Reply
Andy says:

September 4, 2016 at 23:10

Another 1st Sunday thanks.

Log in to Reply
Chrysa says:

October 1, 2017 at 20:44

And another 1st Sunday :-)
Big thanks!

Log in to Reply

Serendipitous altruism

57 responses to “mdadm: Rebuild20 event detected on md device”

Leave a Reply Cancel reply

Archived Entry

Categories: