Thrilling tales of NTFS compression

Publication date: 6 August 2008.
Last modified 03-Dec-2011.

 

Man, do I feel like a rube. I'd never in my life fiddled with NTFS compression.

Talk about uncool, huh!

And now I've spent long enough playing with it that I thought I'd better write something.

NTFS compression

To be perfectly honest, NTFS compression is not actually that amazing a thing. In these modern days of fifty-terabyte hard drives that you can purchase for two washers and some pocket fluff, disk compression of any kind is almost useless for most PC users.

But it's still interesting - well, I think so, anyway. And fiddling with it took me right back to geeking around with new filesystems on my Amiga.

"Wow, this lets me fit another 31 kilobytes on a floppy! Of course, no other computer can READ the floppy unless it's got the right driver in the L directory, but that's a small price to pay...!"

Talk about "disk compression" and anybody who's been using personal computers since the days when only rich people had hard drives will be unlikely to get excited. Back in the day, on-the-fly compression - of disk contents, of individual executables, of anything else you could compress - was often a simple necessity, because storage was severely limited.

Disk compression was still necessary for a lot of users well into the hard-disk era. DOS 6.0 came out in 1993, and came with DoubleSpace. A 100Mb hard drive then would cost you about as much as 931 gigabytes does today.

Nowadays, outrageous amounts of RAM and disk are cheap, and (relative) old-timers have banished from their memory the days when you had to use something like Stacker if you didn't want to swap floppies every time you booted your customised startup on your XT, Amiga 500, or even Macintosh.

NTFS compression does not require that sort of fancy footwork. In Windows, all it takes is a few clicks, and then the filesystem starts automagically compressing data right down at the allocation unit level, perfectly transparently for everything else on the system.

There are apparently some technical reasons for filesystem wonks to detest the way NTFS compression works, but you want to be careful how far you go down that rabbit hole, dude. From most people's point of view, NTFS compression has none of the drawbacks of disk compression back in days of yore, but all of the advantages.

Most people still don't need to know about it, but it's surprisingly useful for certain tasks.

OK, first the basics: You can compress whole NTFS volumes, or directories, or individual files. You can do this in Windows XP using the "Properties" dialog ("Properties" -> "Advanced..." -> "Compress contents to save disk space"), or with the command-line utility "compact".

Once you turn compression on, the computer will chug away compressing whatever data you indicated, and the difference will be viewable in the Properties page for those files.

NTFS compression

The "Size" number in Properties tells you the actual bytesize of the data. The "Size on disk" number, though, tells you how much space that data is taking up on the drive. Normally, Size-on-disk is larger than Size, because it includes the "slack space" between the ends of the data in the files and the ends of the disk clusters they're using.

Turn on NTFS compression, though, and Size-on-disk will, with any luck, end up smaller than Size. The worst case scenario, for very incompressible data, is that Size-on-disk and Size will end up exactly the same, a fact which I think indicates that the data is being stored in the Master File Table rather than in normal disk clusters. I think it's this part that makes filesystem experts angry.

Any computer that can read NTFS should be able to read compressed NTFS. That includes all NTFS-capable flavours of Windows (yes, you can just plug a compressed drive into a new Windows box and access it like any other pre-formatted drive), various Linuxes, and also Mac OS back to v10.3. Mac OS (and some other oddball filesystems, definitely including Linux ones) can only read from NTFS volumes, not write to them. But that's the case for all NTFS volumes on a Mac, not just compressed ones. Read/write NTFS drivers for Mac and Linux are available, but not standard equipment.

NTFS encryption is famous for, in effect, destroying people's data, when their computer dies and they find they now have no way to access the encrypted data on the perfectly-intact hard drive, because they didn't back up the encryption key.

NTFS compression is not like that.

Any NTFS-capable computer can read compressed volumes. And NTFS compression and encryption are actually mutually exclusive - you can have either, or neither, or even encryption for some files on a drive and compression for some other files, but you can't have both at once for the same files.

It is possible to screw up your computer by compressing things that the computer needs direct memory-map-ish access to, like the swap file or the hibernate file. Windows will try to stop you from doing this, but if you really try you can manage it, and if you do, it's your own lookout.

It's not at all easy to do this by accident. So don't worry about it, but also don't think you're the first guy to realise that hiberfil.sys, so obviously exactly the same size as your system memory, must be full of direct memory-map data and so must be very highly compressible. You're right, it is, but there's A Good Reason why Windows doesn't want you to compress it.

If you copy a compressed folder from one NTFS drive to another one, it'll be uncompressed at the destination unless it's being copied into a drive or folder that's also got compression turned on. It is impossible to turn on compression for non-NTFS volumes, like removable USB drives and the like.

Windows systems by default don't let you format removable devices as NTFS, but if you go to the device's Properties, then Hardware, then Policies, and change it from "Optimize for quick removal" to "Optimize for performance", you'll be able to format the device as NTFS.

("Optimize for performance" turns on write caching. This means you should use Safely Remove Hardware to stop the device before you unplug it, no matter what filesystem it's using, lest pending writes be lost.)

And, as always with Windows, there's an exception to this rule: If your removable device is bigger than 32Gb - a hard drive in a USB box, say - the Windows graphical interface (as opposed to the command-line FORMAT command) will only let you format it as NTFS. This is because the time a FAT32 volume takes to do some common operations scales directly with the volume's size. Just figuring out how much space is free on a 700Gb FAT drive can take a lot longer than it would if the drive were formatted NTFS.

(As I write this, 32Gb is the most capacity you can get on a USB key or CompactFlash card, but soon even bigger ones will be available. The NTFS-only restriction may cause... confusion.)

NTFS compression and decompression uses CPU time, but modern CPUs are so fast that the impact, for desktop computer tasks, is likely to be negligible. If you're running a file server that has writes and reads happening all over the place all the time, then turning on compression is likely to have a clear negative effect on performance; otherwise, you're probably going to need a stopwatch to tell the difference.

If you're accessing a very slow device (say, a drive plugged in via USB 1.0, or some drive at the other end of an Internet VPN connection), then NTFS compression can actually make access quite a lot faster, if your data is compressible enough.

The question, of course, is whether your data is compressible enough. NTFS compression is tuned for speed, not efficiency; if your data is relatively incompressible, so that you can't make it more than, say, 20% smaller by Zipping it, then NTFS compression is unlikely to do anything worthwhile.

It can still massively reduce the size of some particularly pathological kinds of data, though. And some of those kinds of data can be found in nature.

Herewith, some reasonably common situations in which NTFS compression can be helpful.

Laptops

Running out of space on your Windows laptop's hard drive? Not crazy about the idea of buying a new larger drive, hooking both drives up to one computer one way or another and cloning the old drive onto the new one? Just need a little breathing space, only a gigabyte or three on an 80Gb volume? Not crazy about picking your way through the Add or Remove Programs list, getting back ten megabytes here, five megabytes there?

Try compressing the Program Files folder.

Program Files contains a lot of stuff that's read moderately often but not written to very much, which is a good access profile for compression to have no perceptible speed impact at all, after the half-hour or whatever you'll be waiting for everything in a typical large-ish Program Files, on a laptop with a slow-ish CPU, to be compressed.

(While it's happening, by the way, you can keep accessing the contents of the folder as normal.)

You can't expect a terribly large amount of compression from this, but when I tried it, I turned a 35.4Gb Program Files into a 27.6Gb one - 78% of its previous size, and a perfectly worthwhile 7.8Gb saved. For free.

You can do this with desktop computers as well, of course, but most PCs have room and spare controller ports for at least one more whole new drive, which is obviously a better solution. Stationary computers can also have extra drives hung off the back of them via USB, FireWire or eSATA.

But if none of this is an option for you, NTFS stands ready to at least somewhat delay the time when you're going to have to upgrade.

Giant text files

Are you running a Web server, or some other piece of software that produces really big log files?

Do you not very much want to set up some scheduled archiving or auto-deleting system to stop those logs from filling the drive?

Just turn on NTFS compression for the log file directory! Suddenly, all that airy repetitive plain text will take up a lot less space. This is the sort of situation where even very basic compression can do a very good job.

Exactly how much space you'll get back is, of course, uncertain. Log files are usually highly compressible, but how highly depends on exactly what sort of log it is.

I tried NTFS compression out on a few different kinds of text.

First, an optimal case: I made a giant 23.6Mb text file containing nothing but the repeated letter "a".

NTFS-compressing the folder got the file's size-on-disk down to only 1.48Mb - less than 6.3% of its previous size, or about one-sixteenth. That's a much better result than I expected, because a lot of speedy compression systems have an efficiency limit; they may, for instance, never make anything smaller than half of its original size, no matter how compressible the data is. NTFS isn't like that - or, if it is, the limit's pretty small.

For comparison, compressing this same stupid file with 7-Zip's "fastest" Zip mode reduced the file to only 29 kilobytes. In "ultra" 7z mode, it shrank to only four kilobytes. But when it was NTFS compressed, I could load it straight into a text editor as normal.

Next, I made a dummy extra-compressible server log, about as compressible as any real log is likely to be. It was composed of about a dozen lines from an actual log copied and pasted over and over until it ballooned out to almost 84 megabytes. NTFS compression got it down to 15.75Mb, 18.75% of its previous size.

Next, I tried an actual system log, in this case from an OS X Macintosh. It started out at 2013 kilobytes; compression got size-on-disk down to 452 kilobytes, 22.5% of what it had been. Not bad at all, for a real-world file.

Next, I tested it on a directory full of non-log text, various documentation and funny stuff from back in the bulletin-board days. It was about a floppy disk worth of data - 46 files, 1.44 megabytes. The original size-on-disk was actually 1.55 megabytes, because you get an average of half a cluster worth of slack space for every file. NTFS compression got it down to 1.05Mb on disk; 73% of the data bytesize, and 68% of the previous size-on-disk.

Finally, I tried it out on a single large, extremely un-log-like text file - the Project Gutenberg version of "Moby Dick". That was 1.19 megabytes of text, and NTFS compression got it down to 920 kilobytes, 75% of its previous size.

So for normal sorts of plain-text documents, there's not much to be gained from NTFS compression. And it's not as if you're likely to be filling a significant amount of a whole modern drive with plain text anyway. But if you're trying to fit a bunch of e-books or something onto your old 16Mb USB key, NTFS compression could still help. And if you've got a Web server or something that spits out miles and miles of repetitive logfile text every day, compression could make a huge difference.

Compression will also work well on source code, HTML files and so on, but most people are unlikely to have enough of those sitting around to make compression necessary. The HTML files for the whole of Dan's Data, for instance, add up to 38.9Mb; it took about half a minute to NTFS-compress them all to a perfectly acceptable 22.3Mb.

But the HTML files are insignificant compared with the size of the images directory, which is more than 400 megabytes all by itself, and is full of incompressible JPGs and PNGs and GIFs. If you happen to have a hard drive that's actually full of HTML or source-code files, then NTFS compression could be very handy. But I bet you don't.

The standard e-mail mbox format is plain text too, so you'd think NTFS compression would be useful there, as well. The catch is that people with a big local e-mail folder probably have a lot of stuff in an "Attachments" folder as well, and much of the data in there will be incompressible.

I still gave NTFS compression a go on my venerable Eudora directory, though; its total bytesize is a very imposing 1,323 megabytes. After compression, it was down to 1,043Mb, 79% of what it was before. Not a very inspiring result overall, but the actual .mbx files compressed to about 63% of their previous size, which is more worthwhile. If you've got a lot of not-too-heavily-accessed MBXes, compression might still get you out of a jam.

Other airy data

The easiest way for a program to save data is to just take that data from memory and write it directly into a file.

Memory data is likely to be very "airy", though. Take, for instance, a screenshot of a normal PC desktop with some windows and icons scattered around. In memory, the size of that image is very simple to calculate - take the number of bytes per pixel, and multiply it by the number of pixels. So a 24-bit (3 bytes per pixel), 1600 by 1200 image takes up precisely 5,760,000 bytes - that's 5,625 kilobytes - of memory, no matter what its content is.

But repeated pixels in an image can be compressed, just like repeated letters in a text file. All proper image formats have one or another way to encode "this pixel is green, and the next 200 pixels are the same, then there's a grey, a black, another grey, and then 300 more whites..." instead of just saying "green green green green green..."

Which brings us to BMP.

BMP is the default Windows bitmap image format, and it can be compressed. But it usually isn't, because if a program's willing to go to the effort of saving compressed BMPs, it might as well use a better format instead. Programs that output BMPs do so because it's just about the easiest way, on Windows, to write a generally-legible image file.

And an uncompressed BMP is basically just the raw RAM data for the image, plus a bit of framing. You can tell this from the size of the file. 1600 by 1200 pixels of 24-bit colour take up five million, seven hundred and sixty thousand bytes. A 1600-by-1200 24-bit BMP, whether it contains nothing but solid white or a photo of the Mona Lisa, will be precisely five million, seven hundred and sixty thousand and fifty-six bytes in size. That's right - the extra BMP-ifying data takes up only 56 bytes.

There are a few other zero-compression image formats out there, like uncompressed TGA and Windows Metafile, which is sort of like an even more raw version of BMP, and shouldn't actually be used as a final image format. Sometimes, people just have to use programs that make these files, because nothing else will work with their chintzy webcam. Or because it's all their billion-dollar enterprise facial recognition system can produce. Uncompressed BMP is also a popular basic format for screen-capture programs, because it requires almost no processing and thus is very unlikely to interfere with the game you're playing, or whatever.

You're also likely to find almost raw RAM dumps - or very similarly airy data - in things like game save files. If saving a game takes a while and the save file is outrageously large, this is probably the explanation.

I use the command-line version of Ken Silverman's excellent PNGOUT to PNG-ify images, because it'll make them a few per cent smaller than Photoshop will, don't you know. Yes, I am seeking counselling for this problem of mine.

Whenever I PNGOUT a BMP screenshot, it's likely to compress the file so much that the final file size is described as "0% of original".

PNGOUT takes a long time to compress an image, though. NTFS compression can do it on the fly, making the save operation only imperceptibly slower. So I gave it a shot on a few nice airy images.

First, another optimal case: A 1024 by 768 blank white uncompressed BMP. 2,305 kilobytes size-on-disk, 2,304 kilobytes of actual data. NTFS compression packed it down to only 148k, 6.4% of its original size.

PNGOUT got it down to one kilobyte, mind you. But NTFS compression didn't suck.

Next, a much more difficult image - a picture of cats, scaled and cropped to 1024 by 768 and also saved as uncompressed BMP, thus making it the same 2,304 kilobytes of data and 2,305k size-on-disk.

NTFS compression couldn't do anything with this at all. It got the size-on-disk down to the same as the bytesize, but it's been a while since a one-kilobyte saving has been important to computer users.

PNGOUT, in contrast, shrank the BMP down to a 1,283-kilobyte file, 56% of its previous size.

OK, now for a real-world super-compressible image, a screenshot.

Because my desktop currently includes one giant 2560-by-1600 monitor plus one cheap 1280-by-1024 seventeen-incher off to the side, a screenshot of the whole thing is a gargantuan 3840 by 1600 pixels.

The actual desktop shape is different from that...

Funny-shaped screenshot.

...since the smaller monitor has fewer vertical pixels than the bigger one; I talk about the sort of weirdness this can cause in this article. But images have to be rectangular, so 3840 by 1600 it is. This is obviously on the large side, as screenshots go, but smaller images with similar content will compress by about the same ratios.

An uncompressed 3840-by-1600 24-bit BMP that big will always have a bytesize of marginally more than 18,000 kilobytes - 18,432,056 bytes, to be exact. I populated the screenshot (visible in illegibly-small form above) with plenty of detail, not leaving vast expanses of empty desktop. I didn't use a background image either, though; if you've got a picture on your desktop, it's likely to make "empty desktop" the hardest-to-compress parts of a screenshot.

Then I hit the picture with NTFS compression. The result was a size-on-disk of only 2,564,096 bytes, 14% of the previous size. That's a very worthwhile reduction, on a real-world image.

PNGOUT, after chewing on the image for several minutes, managed to get it down to 368,644 bytes - almost exactly two per cent of its original size. But that really did take ages.

One final example - the little illustration from the top of this page:

NTFS compression

The original image - the unscaled one you get if you click the above scaled-down one - is 555 by 641 pixels, which makes it 1,045 kilobytes as an uncompressed BMP.

PNGOUT squeezed it down to only 13 kilobytes (1.2% of the original size!), but NTFS compression managed a credible 140 kilobytes - 13%.

The 320-pixel-wide scaled-down image, by the way, is 428 kilobytes as an uncompressed BMP, because it's 320 by 456 pixels. Because it's scaled down, it has more colours and fewer hard transitions than the original screenshot, which makes it harder to compress. PNGOUT got it down to 56 kilobytes (13%); NTFS compression got the size-on-disk down to 112 kilobytes (26%), which ain't bad at all.

A lot of software for cheap webcams wants to save video in some minimally-compressed format, for reasons analogous to the BMP obsession of half-baked imaging apps. The result is very big files - low-compression 30-frame-per-second video is thirty low-compression image files hitting your disk every second. You can often only make the video smaller by re-encoding it with some other codec, which reduces image quality if the original codec wasn't completely raw RGB.

Unfortunately, it doesn't look as if NTFS compression can help you a lot there. The original version of the terrifying journey into my own ear from my ETime Home Endoscope review is only sixteen seconds of none-too-sharp video, but it takes up 57.2 megabytes. NTFS compression only got it down to 47.9Mb, 84% of its previous size; the system load that the high data rates of uncompressed video can create means that if this is all the compression you're going to get, there's probably no point in using NTFS compression for this.

The ETime software saves video in MS-YUV format, which is slightly compressed in the first place; if there was no compression at all then NTFS would have done a better job. There are 489 320-by-240 frames in the video, which with zero compression and at 24 bits per pixel would be more than 107 megabytes. NTFS compression would probably do a pretty good job with that video, but it'd hurt performance if the video were higher resolution, and even very unlucky people are probably not so spectacularly unlucky as to have hundreds of gigabytes of completely uncompressed RGB video to deal with.

Bad ideas

I tried NTFS compression out on some other stuff, where I thought it'd probably turn out to be useless. And lo, it was.

First, a directory full of video files. Seven files, 2.22 gigabytes. Compression finished in a few minutes, and reduced the size of the files by... 0.4%.

Next, my directory full of ISO image files. I figured that some of those might have quite compressible data in them.

They didn't. Compressed version 97.5% of the original size.

My big folder full of drivers and utilities and installers and suchlike. It was 20.2Gb to start with, took the better part of an hour to compress, and ended up... 97.2% of the original size.

Conclusion

Compression algorithms that're carefully tuned for a particular task will always do a better job at that task than will general-purpose compression schemes. Especially if the general-purpose compression is very computationally fast, which NTFS compression is.

It's hardly surprising that PNG, a format specially designed to losslessly compress image data by the greatest amount possible, greatly outperforms NTFS compression for that task. Heck, even quite speedy PNG (as opposed to the super-intensive form used by PNGOUT) takes several thousand times as long to compress an image as NTFS. You'd bleeding well want it to work well.

Likewise, ZIP and RAR and 7z and pretty much all of the umpteen other compressed archive formats also outperform NTFS compression, at the cost of far higher computational demands and non-transparent operation.

If you're running Windows and, for one reason or another, you have to deal with a lot of airy data on the fly, or if you just need to claw back some free space on an NTFS storage device that you can't trade for a bigger one, NTFS compression is safe, simple, and only a few clicks away.

And yes, fellow old-school tinkerers: You can even use it on a floppy.



Give Dan some money!
(and no-one gets hurt)