Home / HomePage / Microsoft / Fragmentation

Fragmentation


Time to collect some links and quotes on disk fragmentation / defrag

Wikipedia explains fragmentation. For a real-world example, think of a book with 100 pages. Normally, you read a book sequentially, starting at page one and ending on page 100. Now imagine that an insane editor had randomly moved all the text around, then put a set of instructions at the beginning of the book that said "first read page 17, then page 98, then page 42, then page 1 ..." and so on, for 100 randomly ordered pages. This would be a highly fragmented book, and you would spend a lot of time seeking from page to page, which would greatly increase the time it took to read the book.

Computers fragment disks because, when they start writing any individual file, they do not know how big it will eventually become. And computers write thousands or millions of files to disk. The computer is good at maintaining and using the sort of fragmented table of contents mentioned above. It does not become frustrated the way a human would - but it still takes measurable (though small) amounts of time for it to sort through all the fragments to create a cohesive whole. Those many small delays add up, slowing our overall experience in using the computer - sometimes by a tiny unnoticeable amount, but other times the slowdown is enough that we notice it.

In all my research, I found only one true benchmark data showing the effects of fragmentation on performance - and that was quite old. It seems to be a very subjective and theoretical issue. People seem to love arguing about it, but that love doesn't seem to extend to doing repeatable benchmark tests.

Take-away points

  • NTFS is better at handling fragmentation than FAT (all versions)
  • Linux filesystems seem to beat NTFS in the fragmentation race.
  • Fragmentation is a problem because it causes seeks, which are time consuming. One way to define a seek is: basically any time the system wants data and the disk read head isn't delivering it.
  • Fragmentation is pretty much always a problem when partitions are nearly full.
    • These days we have larger disks, hence larger partitions, hence less performance issues due to the high fragmentation of a nearly-full partition.
    • Additionally, disks spin faster and read head move radially faster, cutting down seek times and reducing the performance penalty of fragmentation.
    • Finally, we have larger caches and better caching algorithms, which further mitigate the effects of fragmentation.
    • However, disk is still the slowest component in the system, so whatever we can do to speed it up, speeds the system.
  • Don't obsess over fragmentation on NTFS. Generally you are fine if you have both <30% file fragmentation and >30% disk free. In many cases you'll be OK with only 10% of disk free, but only if you've been staying defragged.
  • Defrag early and often.
  • Since XP, the builtin defrag tool has been able to defrag the MFT.
  • There is much general opinion to be found on the subject of filesystems and fragmentation. But little in the way of factual, provable, repeatable 'here is the data, presented by an expert'.
  • When we move from hard drives to solid-state drives, we will stop caring about fragmentation, since the 'seek time' of an SSD is so small as to be effectively nothing.

Best Practices

  1. Don't obsess over NTFS fragmentation. Generally you are fine if you have both <30% file fragmentation and >30% disk free.
  2. Schedule a job to run defrag {driveletter} weekly or monthly. The job can include all locally mounted driveletters (for hard drives only!) as shown below. (Note: starting with Vista, client versions of Windows already have a scheduled weekly defrag.)
  3. Consider PageDefrag with the -e option, so that it will defrag defrag paging files, registry hives, eventlog files, and hibernation files at every boot. May be more relevant for Windows 2000 systems.
  4. Consider using Contig if you need to defrag one specific file that gets fragmented a lot.
  5. chkdsk /f will cleanout malformed security descriptors, in some cases making MFT smaller. This shouldn't be needed very often - say - yearly? If used on your boot partition it will cause a reboot, and depending on several factors (including mainly: speed of disk I/O, number of files/folders, and number of errors found), could keep your system out of commission for hours.

Defrag script

On Windows servers (2003 and below), I usually create a folder called c:\batch. Into this I create a simple weeklydefrag.cmd file, containing:

@echo off
setlocal
::Weekly defrag script; will defrag all locally attached hard disk partitions
set wmiccommand=wmic logicaldisk where "Description='Local Fixed Disk'" get caption
for /f "skip=1" %%a in ('%wmiccommand%') do call :DRIVECOMMANDS %%a
endlocal
goto :eof
:DRIVECOMMANDS
::each command here will be run against every local fixed
::disk (hard drive letter, C:, E:, etc) on the system
::CDroms, floppies, removable disks, network drives will NOT be included

::the driveletter is represented as %1
defrag %1

A nice thing about the script is that you can easily add to the DRIVECOMMANDS section to perform other per-driveletter tasks if you need to.

Then using schtasks (the commandline version of Scheduled Tasks; oldtimers use at.exe) I run:

schtasks /create /tn WeeklyDefrag /tr c:\batch\weeklydfrag.cmd /sc WEEKLY /mo 1 /d SAT /st 03:30:00 ru "System"

This will run the weeklydefrag.cmd file once a week, on Saturday at 3:30 AM, as the SYSTEM account. To test that everything is working, I look in the Scheduled Tasks folder to see that the new task is there, rightclick it, and choose Run from the dropdown menu. It should run without problem.

References and Quotes - NTFS

The next three links explain a lot of the changes to defrag that have happened in Vista and Win7. They are crucial to understanding modern defragmentation:

Disk Defragmentation – Background and Engineering the Windows 7 Improvements.

Vista Defrag FAQ and a few more Vista defrag infonuggets, from the MS storage team.


Larry Osterman notes that the OS can't find a right-sized block of freespace to put the file into, if it doesn't know the size of the file!

---- Everything below here is more relevant to XP, 2003, and older versions of Windows. ----

WindowsITPro does a study of The Impact of Disk Fragmentation

IDC study of fragmentation on NT/2000 (note old system specs) has NSTL results showing up to 80% performance degredation

DisKeeper documents How NTFS Reads a File. However they fail to note that often much of the MFT is cached in memory.

DisKeeper's Book on Disk Fragmentation

NTFS Optimization at NTFS.com

Some interesting notes on the defragger in Vista, like: why no more progress bar? Why does it seem less efficient?

JKdefrag uses an interesting strategy, and is free. I have not yet tried it. Mind the 'Known Problems' section near bottom of page!

NSTL Whitepaper (NT, 1999) showed an Excel performance degradation (operations took more than 2x as long!) at only 13% file fragmentation.

Michael Kessler of MS notes that defrag.exe cannot defrag "the MFT, the Paging File, FAT directories, or files open for exclusive use—for example, Windows registry." This is corroborated in KB227350 (for Windows 2000 only), which also gives a workaround method for page file defragging, though PageDefrag is easier, and can defrag paging files, registry hives, eventlog files, and hibernation files. However, KB227463 documents "In Windows 2000, it does not defragment NTFS metadata files, such as the Master File Table (MFT), or the metadata that describes a directory's contents. This limitation has been removed in Windows XP and later. It cannot defragment encrypted files in Windows 2000. This limitation has been removed in Windows XP and later." 

I had some difficulty reconciling KB227463 with How Disk Defragmenter Works - specifically, it was difficult to determine whether MFT defragmentation appeared in W2003 or in XP. Luckily I found Windows XP: Kernel Improvements Create a More Robust, Powerful, and Scalable OS (scroll down to the section on Defragmentation Improvements), which confirms that this change appeared in XP.

Maintaining Windows 2000 Peak Performance Through Defragmentation - explains how the pre-Vista defrag tool works.

Experiments

I did a few experiments:

  • On an XP system, defrag c: -a -v reported that the MFT was in 3 fragments. After defrag, still 3 fragments.
  • On a W2003R2 system, defrag c: -a -v reported that the MFT was in two fragments. After a defrag,  the MFT was still in two fragments.
  • On a Vista system, defrag c: -a -v reported that the MFT was in 27 fragments! After a defrag of the volume, there were only 3 fragments.

From this I surmise that 3 fragments is OK, and the defragger won't take action. 27 is way too many, and will be fixed. So, somewhere between 4 and 27 fragments is a bad thing.

Serdar Yegulalp at Techtarget says: "I would argue that fragmented free space really becomes critical only when free space on a hard disk drive becomes extremely low—i.e., when the only space available is badly fragmented free space, and the system is forced to create new files in a highly fragmented fashion. But on a large enough drive, where the free space isn't allowed to go below 30%, this should almost never be an issue. There may still be fragmentation of free space, but large enough blocks of free space will almost certainly always exist somewhere on the drive to ensure that files can be moved or newly allocated without trouble. "

Mark Patton (formerly of DisKeeper) says "Larger and faster drives have minimized the impact of fragmentation. The Windows file system tends to fragment files all on its own to a small degree, but fragmentation starts for real when the drive starts to get full—as in over 70%. As the drive fills up, the larger areas of free space become scarce and the file system has no choice but to splatter large files around the disk. As the drive gets really full (over 90%), the file system then starts to fragment the MFT and the Pagefile. Now you've got a full drive, with lots of fragmented files, making the job of the defragmenter nearly impossible because it also needs space to do its job. It is my opinion that a drive that is more than 80% full is not defragmentable. You can see that effect with huge hard disk drives, since they generally use smaller percentages of the drive's total free space. A side-effect is that the overall fragmentation is reduced, and the fact that these drives have faster seek times makes the effect even less noticeable. "

Mike Kronenberg, who wrote a defragger for OnTrack, says "I challenge any defrag company to prove that, on a modern 2006 large drive about 50% full, defragmenting files will increase performance in any way that will be sensed by a user. Basically, nowadays defragmenting files will only provide a moderate performance boost when a drive is relatively full. Modern computers come with 250GB and larger drives that most people will never fill up." Additionally he noted that users are unable to tell the difference between Word loading in 6 seconds or 6.2 seconds.

A Diskeeper whitepaper on Windows2003/XP fragmentation states: "The XP/2003 NTFS file system driver maintains a list of the largest free spaces on the volume." ... "When a file gets created, it gets created in the free space that most closely matches the size of data available to write, in other words a "best fit". Additionally, a presumption is made that a newly created file will end up larger than the size that is currently available for the operating system to write, and extra free space, an “overallocation”, is reserved for the file so as to prevent the file from fragmenting (see Microsoft Knowledge Base article Q228198). That presumption is that the file will be 2, 4, 8 or 16 times larger than the currently known data size, depending on how much data is currently available for writing to the file in the operating system’s file cache."

In MFT Fragmentation, Lance Jensen points out the obvious: "The Master File Table (MFT) is the heart of the NTFS file system. It is essentially an index to all of the files on an NTFS volume, containing the file name, a list of the file attributes, and pointers to the fragments." ... "Fragmentation of the MFT can be a problem on NTFS partitions. This is because the MFT is used for every disk I/O. While much of the MFT can be cached so that an actual disk I/O does not have to be performed every time, it is still true that on most systems the MFT is accessed more than any other file. This means that MFT fragmentation is likely to have more impact on the system than fragmentation of any other single file."

Placed in context with The Four Stages of NTFS File Growth, the above statement gets more interesting!

Linux

Occasionally an argument about fragmentation on Linux filesystems vs NTFS will break out. A few linux-isms:

  • ext2/ext3
    • The general consensus is that fragmentation doesn't become an issue until the disk is nearly full.
    • There really isn't a good tool for diagnosing fragmentation, or for defragging ext2/ext3 filesystems. Although you can get a percent fragmentation number with fsck or e2fsck, you must unmount the filesystem to do so correction: you can use e2fsck -n (thanks Kevin!). Honestly I don't know which is more true:
      • No one measures fragmentation on ext2/ext3 because fragmentation isn't a problem, or
      • Fragmentation on ext2/ext3 isn't considered a problem because no one ever measures it!
    • ext2, according to Daniel Robbins, Chief Architect of the Gentoo Project, "does not get fragmented easily, fragmentation is a one-way, cumulative process. That is, while ext2 fragments slowly, it cannot defragment itself. In other words, any often-modified ext2 filesystem will gradually get more and more fragmented, and thus slower. Even worse, there are no production-quality ext2 filesystem defragmenting programs currently available."
    • ext2's main defense against fragmentation is preallocation - when writing a new file, it allocates extra blocks so the file will have room to grow. A good explanation of it can be found here, from an author who believes defragmentation on ext2/ext3 is silly.
    • Generally speaking, the way to defrag an ext2/ext3 system is to back it up, destry the prime copy, then restore from backup.
  • ReiserFS
    • Uses ext2 preallocation strategy to resist fragmentation (according to Hans Reiser in 2001)
    • No tools to measure fragmentation, or to defragment.
  • XFS
    • Uses delayed allocation, space preallocation and space coallescing on deletion to avoid fragmentation (according to Nathan Scott in 2001)
    • Does have a defrag tool, xfs_fsr
  • JFS
    • uses extents (basically another form of preallocation) to stave off fragmentation (according to Steve Best in 2001)
    • does have a defrag tool, defragfs. Can measure fragmentation with defragfs -q/-r

Constantin Loizides did some experiments to judge fragmentation among multiple filesystems. See the link, but basically after giving several filesystems a workout that should fragment them quite a bit, he observed the following fragmentation levels (although it is worthwhile to read his definition of fragmentation before trying to interpret this table):

FS type # files # Bytes Int. Frag.
Reiser 252738 4192538 KB 6 %
Reiser notail 227190 3791069 KB 14 %
XFS 225394 3757207 KB 15 %
JFS 219376 3667703 KB 17 %
Ext3 208314 3482720 KB 21 %

More recently, Jarmo Ilonen put together a script to more closely report on fragmentation levels in ext3. He found files fragmented into hundreds of noncontiguous blocks - one with over a thousand separate and noncontiguous pieces! This was on a very full filesystem, but he says: "So in this case the partition was heavily fragmented, but that is not very surprising because the partition is a bit too small and has therefore been always nearly full. Other filesystems I measured were not nearly as bad, but most files were still in quite small pieces, say, around 100kB per contiguous block on average."

    Post a comment

    Your Name or E-mail ID (mandatory)

     



     RSS of this page