Memory leak in Nautilus? (3/12/2011)

Yesterday when moving relatively huge amounts of data between hard drives on my system (using the GUI - drag-and-drop) I discovered that after some time I had consumed all available memory and all available swap.

Desktop configuration:
Running the "Clear" (?) Desktop theme (the "Windows" looking desktop)
I have not tried this using any other desktop configuration.

Possibly relevant?
During a moment of absolutely culpable stupidity, I inadvertantly deleted all the non-dot folders and files from my home directory. Not knowing any clever way to rebuild my home directory, I rebooted into the 10.04 live CD (64 bit), and copied the "Ubuntu" users home directory (the non-dot folders) back to my home directory, changed permissions and owner to match my user, and rebooted. Prior to this copy, I verified that none of the hidden (dot) files or folders had ben touched. I do not believe that this would have any bearing on this, but I mention it just in case.

Scenerio:

My 4T RAID-5 file store was becoming fuller than I could back-up so I decided to re-distribute the files by moving certain directories to Public2 and replacing them (on Public) with symlinks.

The magnitude of the move was in the order of almost two teryabytes of data, (appx. 1.5-or-so T), to distribute the data more evenly between the two drives.

About half the data was moved using "cp" on the command line. Parts of the data that had non-standard filenames (containing spaces, etc.) caused difficulties moving it with cp, so I used the "file browser" as root (gksu "nautilus. . .") because some of the directories had varying permissions set.

The porocess was as follows:
1. Drag directory from source to destination.
2. Wait for the copy to successfully finish.
3. Delete the original direcotry.
4. Create a symlink (drag with ctl-shift) from the new location back to the original location.
This was done because only the Public directory is shared via Samba, and I wanted to keep it that way.

In some cases, I moved multiple directories at the same time - some being very small (in the megabytes range) and some were relatively large (in the multiples of gigabytes range)

This went on for several hours.

Eventually my system virtually ground to a halt. Attempting to launch any new GUI based process failed with "unable to fork - could not allocate memory" (or something like that).

I eventually got "top" running in a terminal, and discovered that all available RAM and all available swap had been consumed.

I was eventually able to get the computer to shutdown, pause to catch it's breath, and restart.

Performing some directory copies after the re-boot, with "system monitor" running on the desktop, disclosed the following:

1. Under normal operating condiditons, the system is using appx. 450 (plus or minus) megs of RAM and zero swap.

2. Copying of any amount of data using the command-line within a terminal does not affect the system memory used.
There are variations in memory used - but the magnitudes are less than about 50 or so megs - it never gets as high as 500 megs used. Once the process is finished, the memory used returns to very nearly the base value - eventually dropping back to it after a moment or two.

3. If I copy data using the GUI and drag and drop between different file-manager windows - the used memory rises to about 570 (plus or minus) megs used, and when the process finishes, it releases only about 20 or 30 megs - if that much. Additional directory moves cause the memory usage to continue to climb by about 100 megs per copy. Note that the majority of the copies were multi-gigabyte copies that could take from 10 minutes to an hour to complete.

Rebooting the machine always restores memory - as we would expect.

Top shows Nautilus consuming increasing amounts of memory every copy.

I looked all over the web checking for a solution, and I even tried the "apt-get remove appmenu-gtk" fix - apt-get reported that I didn't have the package installed.

I used system monitor to look at running processes, and nothing looked fishy to me. (i.e. I didn't have 30 instances of nautilus running, or other strange things like that.)

I did not try logging off and back on again.

Due to the somewhat "mission critical" nature of this file-store, my ability to experiment with it to try various things may be severely limited.

Before I continue with all of your excellent suggestions, I am investigating the possibility that this may be a problem with the latest *kernel* and not Nautilus.

I have subsequently seen some interesting things that may actually implicate the kernel, so I have rebooted using an older kernel and I am repeating some of my huge file moves, both via command-line and via Nautilus.

If, for the sake of argument, I exonerate Nautilus - is it possible to "back-out" a bug in launchpad? (and possibly reference a new bug as the reason for backing it out?)

I don't want to pull the trigger just yet, but I do want to do some more research before I say one thing or another.

Could you install in desktop upper bar, the gadget (don't know English word) linked to monitoring of system resources, and then check which type of RAM is used: User, Shared, Buffer, Hidden or Free. As kernel uses lot of extra memory in 4GB configuration as I/O cache, it's not uncommon today to have all remaining RAM used as I/O buffer. All I/Os are done when more memory is required, which sometimes induces big latency.

Actually, I am becoming more confused. There are times when it appears that the kernel is hogging memory, at other times it seems that Nautilus has grabbed it all. Regardless of kernel used.

On another topic: If the kernel has the nasty habit of grabbing "all remaining RAM" to be used as "I/O buffer" - I would say that this is a serious bug in the kernel! The kernel's memory management should not be so stupid as that.

If the kernel is holding all the physical RAM, then it's ok. If it is using a lot of swap and never releasing, then it is not ok. It is not a bad habit to use all RAM, since all RAM not being used is misusing resources. RAM is fast and needs to be used as much as possible.

grabbing "all remaining RAM" to be used as "I/O buffer" avoid hard disk access which are slow and multiple flash memory write which reduce its life expectancy. So it's not so stupid, except when it has to dump all this memory or when a crash make lost all buffered data. But Nautilus should not use this memory, as it is I/O buffer managed by kernel. I can't tell you more.

I know that it makes sense for the kernel to grab RAM as needed to substitute for slow disk-accesses. This is normal and expected behavior.

However, when the Kernel, or Nautilus, grabs ram and fails to either re-use it or relinquish it, that's a problem.

Example: I do a 1T file copy and the kernel/nautilus grabs 1.45 gigs of RAM out of 2.9 gigs. Once the file copy ends, the memory is not relinquished into the common pool for reuse.

I then do another 1T file copy and the kernel/nautilus grabs the rest of RAM (1.45 gigs) or a significant portion of it, to perform that file copy.

Common-sense tells me that if whatever process grabbed 1.5 gigs of ram for a file transfer - the next file transfer should be able to use that same space if the first one is clearly and obviously finished. (which should be easily determined. The kernel, if not nautilus, should know what processes are currently active and running.

If, for whatever reason, the RAM requirements for the next process or transfer is greater than 1.5 gigs - it should consume the additional ram needed to make the total ram consumed for that process equal it's requirement. (i.e. if the next process / copy / whatever needs 1.7 gigs of RAM, the additional 0.2 gigs of ram should be added to what is there - instead of grabbing an *additional* 1.7 gigs.

Common-sense also tells me that at the end of the file copy, absent some other process that needs this large block of RAM, the excess RAM should be relinquished to the common memory pool (global heap) so that some other process can use it if needed.

It is also common-sense that "thrashing swap" is extremely expensive as far as system performance is concerned, so the kernel should try to manage memory in such a way as to avoid using swap if at all possible. (i.e. reducing copy buffer size prior to dedicating swap as buffer space.)

When I start the system - GUI running - sitting there in a quiescent state the system consumes something between 350 - 400 megs of ram - steady-state. "It is intuitively obvious. . . . ." that the kernel does not need to grab 4 gigs of RAM just because it is available.

By comparison, running PerfMon, (a feature in Task Manager), in Windows 7, or XP - shows a certain (relatively small) amount of RAM consumed when quiescent. If I start a process that requires larger amounts of RAM, Windows allocates the RAM to the process and once that process terminates the excess RAM is (almost) immediately relinquished back to the common memory pool.

*VISTA* had a problem where the kernel consumed all available RAM, forcing any additional processes to run from swap. M$ fixed that damn fast in an update - which helped but did not completely resolve the issues. Windows 7 was released with a much more efficient memory management subsystem. (I am not sure, but I believe that Vista SP1 helped to resolve that issue with a more efficient memory manager - perhaps borrowed from Win-7.)

I am continuing to research this - and as a result I am fighting with the (ahem!) "choices" that Ubuntu is giving me about the running-state of my system. Ultimately what I would like to do is reboot into run-level 3 and try these tasks again to see who the guilty party REALLY is.

Unfortunately, Ubuntu seems to have depreciated run-level 3 and getting that mode to work appears to be a non-trivial undertaking. (Ref: Question #149356 )

Since getting into a "pure" runlevel-3 is darn near impossible - and I really don't want to thrash on this. . . . .

Also, since I get the same effect from a terminal window (as root) as I do from a drag-and-drop. . . . . .

Is it safe to conclude that maybe the problem is NOT nautilus, but might be a kernel (memory management) bug?

As an additional data-point, I was reviewing some old e-mails, and I noticed that on Feb 22, 2010, I experienced the same issue, which brought my file-server to it's knees - it basically froze solid and needed a hard-reboot from the frozen state to recover.

Earlier this year, (around the mid-Feb timeframe), I experienced a catastrophic file-server crash that, though I cannot prove it because of the disastrous nature of the crash - the way things locked up leads me to believe that this bug may have cost me my server on several occasions.

It is only by the grace of Almighty God - and my backups - that I wasn't totally screwed to the wall.

[NB: If your system pukes it's guts up all over your RAID-5 array, "RAID-5" won't save you. I was running RAID-*10*, belt AND suspenders, and it still got clobbered. . . .]

It is not uncommon for me to do multi-meg / multi-T transfers of data - as I move very large file images, ISO's, and such like around on a regular basis. (I am often called upon to receive, build, test, and/or transmit large/huge file "globs", DVD-sized ISO's, and such like)

I have a sneaking suspicion that - until this gets solved - I'm going to have to schedule a Cron-daily job that does a restart (shutdown -r now) at about 02:00 each night - to make sure memory gets flushed.

Any comments?

Jim

Can you help with this problem?

Provide an answer of your own, or ask
Jim (JR) Harris
for more information if necessary.