Oracle frequently recommends vm.swappiness = 0 to get well behaved RAC nodes. Otherwise you start paging out things you don't usually want paged out in favor of a larger filesystem cache.

There is also a vm parameter that controls the minimum size of the free chain, might want to increase that a bit.

Also, look into hosting your JVM heap on huge pages, they can't be paged out and will help the JVM perform better too.

On Dec 8, 2012, at 6:09 PM, Robert Dyer <[EMAIL PROTECTED]> wrote:

> Has anyone experienced a TaskTracker/DataNode behaving like the attached image?> > This was during a MR job (which runs often). Note the extremely high System CPU time. Upon investigating I saw that out of 64GB ram the system had allocated almost 45GB to cache!> > I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is roughly where the graph goes back to normal (much lower System, much higher User).> > This has happened a few times.> > I have tried playing with the sysctl vm.swappiness value (default of 60) by setting it to 30 (which it was at when the graph was collected) and now to 10. I am not sure that helps.> > Any ideas? Anyone else run into this before?> > 24 cores> 64GB ram> 4x2TB sata3 hdd> > Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on this machine.> > 24 map slots (1gb heap each), no reducers.> > Also running HBase 0.94.2 with a RS (8gb ram) on this machine.> <cpu-use.png>

The reason is that Linux moves memory pages to swap space if they have not been accessed for a period of time (swapping). Java virtual machine (JVM) does not act well in the case of swapping that will make MapReduce (and HBase and ZooKeeper) run into trouble. So I would suggest to set vm.swappiness = 0.

Thanksac

On 9 Dec 2012, at 12:58 PM, seth wrote:

> Oracle frequently recommends vm.swappiness = 0 to get well behaved RAC nodes. Otherwise you start paging out things you don't usually want paged out in favor of a larger filesystem cache.> > There is also a vm parameter that controls the minimum size of the free chain, might want to increase that a bit.> > Also, look into hosting your JVM heap on huge pages, they can't be paged out and will help the JVM perform better too.> > On Dec 8, 2012, at 6:09 PM, Robert Dyer <[EMAIL PROTECTED]> wrote:> >> Has anyone experienced a TaskTracker/DataNode behaving like the attached image?>> >> This was during a MR job (which runs often). Note the extremely high System CPU time. Upon investigating I saw that out of 64GB ram the system had allocated almost 45GB to cache!>> >> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is roughly where the graph goes back to normal (much lower System, much higher User).>> >> This has happened a few times.>> >> I have tried playing with the sysctl vm.swappiness value (default of 60) by setting it to 30 (which it was at when the graph was collected) and now to 10. I am not sure that helps.>> >> Any ideas? Anyone else run into this before?>> >> 24 cores>> 64GB ram>> 4x2TB sata3 hdd>> >> Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on this machine.>> >> 24 map slots (1gb heap each), no reducers.>> >> Also running HBase 0.94.2 with a RS (8gb ram) on this machine.>> <cpu-use.png>

Are you sure that 24 map slots is a good number for this machine?Remember that you have three services (DN, TT and HRegionServer) withwith a 12 GB for Heap.Try to use a lower number of map slots (12 for example) and launch yourMR job again.Can you share your logs in pastebin?On Sat 08 Dec 2012 07:09:02 PM CST, Robert Dyer wrote:> Has anyone experienced a TaskTracker/DataNode behaving like the> attached image?>> This was during a MR job (which runs often). Note the extremely high> System CPU time. Upon investigating I saw that out of 64GB ram the> system had allocated almost 45GB to cache!>> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync"> which is roughly where the graph goes back to normal (much lower> System, much higher User).>> This has happened a few times.>> I have tried playing with the sysctl vm.swappiness value (default of> 60) by setting it to 30 (which it was at when the graph was collected)> and now to 10. I am not sure that helps.>> Any ideas? Anyone else run into this before?>> 24 cores> 64GB ram> 4x2TB sata3 hdd>> Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb> heap) on this machine.>> 24 map slots (1gb heap each), no reducers.>> Also running HBase 0.94.2 with a RS (8gb ram) on this machine.

Yes but even with a MR running, it is only 36GB heap total out of 64GBram. This leaves plenty for OS and caching.

The problem seems to be the OS preferring to cache over giving space to theapplications. Once I drop the caches and rerun the MR job again severaltimes, it runs perfectly fine.On Dec 8, 2012 7:06 PM, "Marcos Ortiz" <[EMAIL PROTECTED]> wrote:

> Are you sure that 24 map slots is a good number for this machine?> Remember that you have three services (DN, TT and HRegionServer) with> with a 12 GB for Heap.> Try to use a lower number of map slots (12 for example) and launch your> MR job again.> Can you share your logs in pastebin?>>> On Sat 08 Dec 2012 07:09:02 PM CST, Robert Dyer wrote:>>> Has anyone experienced a TaskTracker/DataNode behaving like the>> attached image?>>>> This was during a MR job (which runs often). Note the extremely high>> System CPU time. Upon investigating I saw that out of 64GB ram the>> system had allocated almost 45GB to cache!>>>> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync">> which is roughly where the graph goes back to normal (much lower>> System, much higher User).>>>> This has happened a few times.>>>> I have tried playing with the sysctl vm.swappiness value (default of>> 60) by setting it to 30 (which it was at when the graph was collected)>> and now to 10. I am not sure that helps.>>>> Any ideas? Anyone else run into this before?>>>> 24 cores>> 64GB ram>> 4x2TB sata3 hdd>>>> Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb>> heap) on this machine.>>>> 24 map slots (1gb heap each), no reducers.>>>> Also running HBase 0.94.2 with a RS (8gb ram) on this machine.>>>> --> Marcos Luis Ortíz Valmaseda> about.me/marcosortiz <http://about.me/marcosortiz>> @marcosluis2186 <http://twitter.com/**marcosluis2186<http://twitter.com/marcosluis2186>> >>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS> INFORMATICAS...> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION>> http://www.uci.cu> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>>

Are you seeing any performance impact with this cache increase? It is normal in linux system to grab high cache level. >>>>-Bharath>>>>________________________________> From: Andy Isaacson <[EMAIL PROTECTED]>>To: [EMAIL PROTECTED] >Sent: Monday, December 10, 2012 11:23 AM>Subject: Re: Strange machine behavior> >>What kernel did you see this on? Was there significant swap traffic>(si/so in vmstat output) during the high-system-time period?>>BTW, you don't need to nor do you want to run sync(1) when>manipulating drop_caches, it just causes additional noise and>slowdown. drop_caches doesn't have any impact on correctness; it won't>cause data loss (by dropping a dirty page or whatever). I've had sync>calls take 10 minutes to complete, so the unnecessary impact can be>significant.>>-andy>>On Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer <[EMAIL PROTECTED]> wrote:>> Has anyone experienced a TaskTracker/DataNode behaving like the attached>> image?>>>> This was during a MR job (which runs often). Note the extremely high System>> CPU time. Upon investigating I saw that out of 64GB ram the system had>> allocated almost 45GB to cache!>>>> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is>> roughly where the graph goes back to normal (much lower System, much higher>> User).>>>> This has happened a few times.>>>> I have tried playing with the sysctl vm.swappiness value (default of 60) by>> setting it to 30 (which it was at when the graph was collected) and now to>> 10. I am not sure that helps.>>>> Any ideas? Anyone else run into this before?>>>> 24 cores>> 64GB ram>> 4x2TB sata3 hdd>>>> Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on>> this machine.>>>> 24 map slots (1gb heap each), no reducers.>>>> Also running HBase 0.94.2 with a RS (8gb ram) on this machine.>>>--

Robert Dyer[EMAIL PROTECTED]

+

Bharath Mundlapudi 2012-12-18, 08:42

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext