Sunday, November 24, 2013

Solution to Linux NTFS performance woes :- Bad performance / 100% CPU usage when using VirtualBox / VMWare and In general

Hey everybody!

On my dual boot system, apart from the system partition for Windows and Root+Swap partitions for Linux, I have one large partition of about 400 GB dedicated for storing my data which includes my shared Thunderbird email data folders, my documents, my videos, my music and what not. For obvious reasons, this partition is formatted with NTFS file system, for easy data sharing between Windows and Linux (BTW, If you really wanna know, I use Windows 8.1 Pro and Kubuntu 13.10, Kubuntu being my primary OS as of now :-) ). I was quite happy with my setup except for two very annoying problems (which use to happen in Linux Mint too that I was using previously) :

1. The Thunderbird installation use to lag too much on Linux, and sometimes on Windows too while starting up and reading multiple mails or accessing folders.

2. While using any Virtual Box or VMWare VM under Linux, I was getting pathetic performance and my host system use to hang a lot while reading or writing files inside the VMs. In system monitor i could see a lot of processor time going to the process : mount.ntfs. Also, in VirtualBox, whenever I tried merging a snapshot with the base image, the process use to hang and never complete.

High CPU usage by mount.ntfs

The Thunderbird problem was not that severe so my attention was solely on the VM problem and for some time I thought that this may be an issue with VMWare and Virtual Box. But even after upgrading to newest versions of both the software multiple times, this problem was never gone. And besides, on giving it a well deserved thought, I realized that the process mount.ntfs is not specific to Virtual Box or VMWare.

So basically, this seemed to me as an issue with the file system driver itself, namely NTFS-3G. I searched a lot on the net for a solution but didn't find any, there were only the same questions, that I was also asking. Frustrated, I decided to look into the official specifications and FAQ section of the driver developer's website (www.tuxera.com) and viola! I found the answers to all my issues with NTFS file system in Linux. Below are the exact steps using which you can also get great performance on NTFS drives under Linux:

1. Keep it Uncompressed! Period.

NTFS is a closed source file system and the NTFS-3G driver was created using some very sophisticated reverse-engineering techniques. Now, all the code revisions over the past few years have made it very speedy and bug free but still there are some grey areas where it cannot compete with the native driver as far as performance is concerned (and it is also not expected to. Remember, NTFS is not a preferred FS under Linux, it's there for compatibility with the Windows world).

Transparent Compression, is one such feature. Under Windows, you can compress a particular folder or even a whole drive using the native file system compression feature and it works great. The files are compressed and decompressed on the fly when you use them and you don't notice a thing. Performance optimizations have also been done by Microsoft to make it work seamlessly. But when working in Linux, all is not so hunky dory. While decompressing and compressing files, the NTFS-3G driver takes way too much CPU power and being a file system driver with more privileges, it hogs the system resources like a monster, uninterrupted for the most part. So the basic thing you can do to get about 10 times more performance is to decompress your drives that you share under both Windows and Linux. To do that, just right click the drive in Windows Explorer, select properties and uncheck the option  “Compress drive to save disk space.” and click Apply. In the next dialog that appears, choose “Apply changes to :\, subfolders and files” and click OK.

Remove drive NTFS compression under Windows

 If you have lots of files on the drive, this process can take some time so have some tea and snacks. This procedure will decompress the whole drive. If you don't want that, then at least decompress the performance critical folders on your drive like the ones where you have kept your VM virtual hdds (VDI, VMDK or VHD files). You can do that by right clocking on the folder, clicking Advance... button and unchecking the checkbox that says “Compress contents to save disk space”. This will improve the performance a lot and you will immediately notice the difference as soon as you will boot into Linux.


Remove NTFS folder compression under Windows

2. Enable Big Write Mode and Disable Last Access Timestamp Updation:

The NTFS-3G driver supports a flag called big_writes while configuring your file system in /etc/fstab or while mounting using mount command. What this essentially does is that it instructs the driver to write data to disk in larger chunks instead of on every single write instruction received by it. This helps a lot with throughput while writing/copying/moving large files and is in general good for small files too.

Similarly, NTFS has a feature of recording last access time of a file and this this is done every time the file is accessed, which adds up to the total time it takes to read from or write to the file. This can be safely turned off without causing any harm to the data.

To configure these options, below are the settings I use in my /etc/fstab file. You can use the same flags as in screenshot, other details will vary from system to system depending on how many drives you have and how you configure them.Basically, the highlighted items are the ones you wanna change in your config.

Configure big_writes and noatime mode in /etc/fstab

3. Disable mlocate/locate indexing of NTFS drives

mlocate or locate is a standard program under linux which can be used to search the file system quickly for a file or directory. It uses a high performance index of the file system,  generated and updated every day by the updatedb command. Usually, this is a scheduled activity by default on most systems.

The updatedb utility has some issues with NTFS file system, where even if a single file or folder is changed on the file system, it considers all the files and folders as changed and re-indexes everything on the drive. This obviously takes CPU resources and if the drive is compressed, the situation becomes more problematic because of the high CPU utilization by compression/decompression routines. This doesn't happen too much now-a-days it seems, probably due to updated versions of these two commands but still, changing a little configuration option for these commands can give you much better results.

The trick is to disable the index generation on NTFS file systems altogether. Usually, indexing is not required on NTFS and you can always go and search items using your GUI file managers if you need to. To disable it, edit the file /etc/updatedb.conf and add the entries ntfs and ntfs-3g to the "PRUNEFS=" line, like in the screenshot below. I am not sure whether ntfs-3g is needed or not but there is no harm in adding it so I add it nevertheless.



After applying all these tricks, my system has become so fast and responsive that I can finally use it without a hitch as my production machine for all purposes. Try these and let me know in the comments how it worked out for you.


 Cheers!