The great defrag debate. Defrag Domino or No
Last week, a windows server admin came to me wanting to defrag the hard drive on my main Domino mail server. I told him to hold off because I didn’t think it was a good idea. For one, early on in my Notes career (over a decade ago) I thought I remembered that defragmenting a Domino server was fraught with peril. I remembered horror stories of database corruption and the like. So basically, I’ve always avoided it. In fact, I have never in my career bothered to defragment a Domino server.
So that begs the question, am I right not to defrag?
I had brought this up on Twitter the other night, and the response I got was almost universally to NOT defrag, but one person I trusted said it was okay to do so. I had planned on blogging the question, but with my outage, I had forgotten. Well today on the hot blogs on PlanetLotus, Adam had some info on Defragging Domino databases themselves, for free nonetheless.
So, I’m going to throw the question out there, and I would really like as much feedback from folks as possible. Should we defrag Windows servers running Domino? Give me your good stories and your horror stories. I want to know what you all think? Can it cause corruption? Does it even help (since Domino will create fragmentation almost immediately again anyway.) Should you even bother defragging a drive the utilizes RAID5? I’ve seen heated arguments between Windows admins as to whether or not you should defrag RAID5, so who knows? And if you should defrag, what software should you use? And does anyone have any benchmark numbers? Did you see a 10% increase in speed after the defrag?
Come on admin folks, give me your best justifications and let’s put this to rest once and for all!
Adam Osborne
June 2, 2008 @ 6:46 pm
I think it absolutely makes a difference.
If I have a database with 50,000 fragments then no matter how the file is physically stored (think raid 5, SAN etc) the operating system will still have to deal with those fragments at least at a logical level. This all takes time and I believe impacts performance.
If a thread blocked waiting for an I/O (even if it is logical) then time is going to be consumed.
I too would love to hear what other people think (or know).
Regards,
Adam
Keith Brooks
June 2, 2008 @ 7:03 pm
John,
Think I fell off twitter at the time.
My thoughts are this:
Defrag is a real issue, um, depending on who you ask.
Early on Microsoft, and IBM(for OS/2) recommended defragging to speed up performance.
And well, old habits die hard.
Is it really a bad thing, not that I have ever noticed in 15 years or so of Lotus networks.
Should you do it offline or online? That is a better question.
If you have a RAID, it is probably a good idea to just smooth the I?O disk access.
Is it really such an issue though? I don’t find my servers usually that fragmented, from a percentage viewpoint.
Will check on various ones and see if it makes a difference what version of Windows and NTFS(2000 or 2003 for example) show when I check on fragmentation.
Good question.
Chris Whisonant
June 2, 2008 @ 7:27 pm
I posted some info on this back in February: { Link } I still have a lot of second thoughts about it, though.
Defragging can only have positive effects – no matter the file system. But with NSF’s we’re usually seeing the larger databases with (tens of) thousands of fragments. This is NOT good! The drive arms have to constantly be moving to access all of the data. I haven’t heard of any recent issues with defragging, to be honest. The only problem is that it’s very time consuming to do it with the system down.
And while I like seeing the tool that Adam posted (may have to try it), I’m really hesitant about trying it with Domino. Even though it’s working at the NTFS API level, it’s still not certified to the Domino API level (that I’m aware of…). So if it doesn’t get the proper handle on the NSF then it could pose problems. This is the same reason that SAN/NAS vendors don’t yet provide snapshotting for NSF’s (even though they’re working on it).
But even after you have all of the NSF’s all pretty and defragged, turn on Domino’s router and that’s all out the door – probably with the first email. There’s nowhere for the file system to put the data in a contiguous block, so it dumps it at the end. And viola! fragmentation starts anew.
This was the main reason that I think there should be a lo compact +10 Compact the DB with 10% (or whatever) white space at the end and then defrag that. Then Domino has some built-in room for growth.
Jim Casale
June 2, 2008 @ 8:02 pm
@3 Chris
I read your posting back in February and I got to thinking about what you said. I agree that a mail file needs free space to grow without defragmenting. If there is some sort of retention policy like we have here then it could work (if you compact with the -b switch). Since there is a constant amount of incoming mail, and mail being purged, once the mail file is defragmented there is a good chance it could stay that way. Basically if the amount of purging is equal to the amount of new mail it should even out and not fragment (at least in theory).
For what it’s worth I have never been able to get a straight answer from IBM software or IBM hardware as to the need to defrag a Domino server that is on a SAN.
Chris Whisonant
June 2, 2008 @ 8:54 pm
@4 – Jim, That’s exactly what I mean! 🙂 If indeed there is a retention policy in place, that should generally offset the growth of the incoming mail. And then as long as you don’t recover the white space it may work out fairly well in that regard. I think it’s interesting that the hardware folks can’t get you an answer on the SAN fragmentation since IBM actually makes SAN hardware and is in a close relationship with NetApp as well.
Anthony Holmes
June 2, 2008 @ 8:58 pm
Disk throughput – through every means possible – is increasingly important as servers scale and mail files get larger.
There has been an increasing amount of information flowing from IBM about configuring SANs to support the throughput demanded by a large bunch of users on a Domino server.
But at the end of the day, whether it’s SANs or local disk, if the data is scattered all over the place in a way that slows down reads and writes, then Domino performance will decrease. And, critically, if Domino is forced to wait too long to write that data, then that may cause reliability issues.
Backups
Many customers also find that it takes excessively long to perform Domino backups. Some customers have seen significant improvements in backup speeds once they defragment drives.
Server Up or Down?
Ideally Domino will be down when you run the defrag. The defrag will run much faster that way. Diskeeper claim that their product can operate on a running Domino server, but that’s not something that IBM has an official view on. IBM has assessed that standard Defrags on running servers have caused RRV bucket errors.
Here’s the official IBM statement recommending defragmentation:
Lotus Software Knowledge Base Document
Title: Slow response from a Domino server due to fragmentation of the data drive
Doc #: 1229817
URL: { Link }
Here’s a technote implicating Defrag and a running Domino server with RRV Bucket errors:
Lotus Software Knowledge Base Document
Title: Error: ‘RRVBucket is corrupt’ when opening mail database
Doc #: 1084594
URL: { Link }
Greyhawk68
June 2, 2008 @ 10:04 pm
Well keeping a Domino server up WHILE defragging could be an issue then as I don’t know any enterprise server that can be down for the amount of time it takes to defrag (unless of course you are clustering)
Now on a logical level I agree with it being a problem for reads and writes, but with your data spread over a RAID5 at a physical level are you really getting anything back when you DO defrag. It’s not contiguous anyway…
Maybe keeping a good amount of white space in the databases is the way to go, I dunno. I would love to see some actual benchmarks to prove things one way or the other.
Now, how many of you folks defrag while Domino is running and what software do you use? I know it USED to cause problems, is that a thing of the past?
Does anyone have any good success stories? Any horror stories?
-Grey
Adam Osborne
June 2, 2008 @ 10:38 pm
(7) I think you will find that decent raid hardware will shield you from a lot of the physical I/Os (for example big non-volatile caches). The problem exists within the logical I/Os. The more of them you eliminate the closer you are to experiencing the performance of the hardware you paid for.
The Unknown Student
June 4, 2008 @ 4:03 am
If it’s of any help, this document outlines the benefits of defragging RAIDs. It’s not application-specific, but a just a short general overview.
{ Link }
I don’t deal with servers (just a part-time lab assistant), so I don’t know what the admins at my univ use on those. But, the lab workstations with RAID run Diskeeper, and it works fine.
lbob nolan
June 17, 2008 @ 9:27 am
First a caveat, I am the CEO of a company that develops enterprise disk defragmentation tools.
All disk defragmentation occurs at the logical disk level. This means the MFT record for the file contains one entry for every fragment indicating its starting LCN address and its length. If a file is in 500 fragments there are 500 entries. The more fragments there are, the longer it takes to read the file.
Ideally, defragmentation software consolidates the file so it has one LCN entry in the MFT. Deafragmentation software is hardware independent; all it needs to know is the file system (NTFS or FAT) and the disk size. Once the software defrags a file it is reported to the disk controller which maps the LCNs to the physical clusters on the drive. In RAID for example, this might mean striping a file across multiple platters. In this situation the file is read (logically) by the file system in a single logical I/O and it has the fastest possible access by the physical disk according to the controller software. This is a win-win. If the file were logically fragmented the system would ned multiple logical I/O and the physical mapping would also be less efficient.
With resepct to a Domino server the Notes database is just one big file to the defrag software. The Micorsoft API for moving files lets us move open files, including databases, safely. We have a number of customers who routinley defrag Domino servers with no problem. We recently had a Notes workstation customer defag a Domino server and the backup time went from 2 hours to 45 minutes. A good defrag product will have CPU and I/O throttling to minimize impact on users if you are defragmenting on an active server.
Ironically, as a defrag vendor we see most of the Notes problems not on servers but on workstations. The replication feature shatters local Notes databases with an adverse effect on Notes launch time and application performance. Defrag software that consolidates the free space into the largest contiguous piece will slow re-fragmentation of the Notes database and mitigate the fragmentation from replication.
Bob Nolan,
Raxco Software