Ganglia Monitoring System » FYIhttp://ganglia.info
Monitoring clusters and Grids since the year 2000Sat, 25 Oct 2014 19:32:04 +0000http://wordpress.org/?v=2.8.4enhourly1Upcoming Ganglia Web featureshttp://ganglia.info/?p=464
http://ganglia.info/?p=464#commentsFri, 25 Nov 2011 02:59:41 +0000vuksanhttp://ganglia.info/?p=464We have been working hard on new Ganglia Web features that will be part of Ganglia Web 2.2.0. These are the highlights

Compare Hosts

Allows you to compare hosts across all the matching metrics (this can mean hundreds of graphs ). You supply a regular expression that matches a set of the hosts and Ganglia will aggregate all hosts for each metric. This is useful in those cases where you are trying to find why a particular host or hosts are performing differently then another set.

Built-in Nagios integration

Check heartbeat – as you may know gmond daemons sends a periodic heartbeat (every 20 seconds by default). If the heartbeat is missing it is fair to assume host is down. This should avoid you from having to use things like check_ping and alert you to potential down time much quicker

Check multiple metrics – allows you to use a single check to multiple metrics on the same host ie. check that disk free on / is more than 30%, on /tmp more than 10% etc.

Check single metric across multiple hosts (not yet implemented) – use a single check to check low disk space on a set of hosts defined by a regular expression e.g. instead of having separate disk checks for every host you would have a single check that would give you a break down of hosts that were not OK.

Aggregate graphs decomposition

While viewing aggregate graphs with more than 6-7 items colors will start to blend together and it may be hard to distinguish what on graph is what. This feature allows you to decompose a graph by taking every item on the aggregate graph and putting it on a separate graph e.g. a graph like this

will decompose into this

Flot client side rendering

We have been using flot a Javascript graphing library for a while now. In this release we are planning to make it even more interactive ie. take items of graph dynamically etc.

Utilization heatmaps

In this release we are turning on utilization heatmaps instead of the old style pie charts e.g.

Most of the features have already been implemented. We are still polishing up the release and writing documentation. We could always use more help with testing and documenting things so if you are up to it please join us on Freenode channel #ganglia.

If you’d like to test drive some of these changes please visit our demo site.

]]>http://ganglia.info/?feed=rss2&p=4640Easy graph aggregationhttp://ganglia.info/?p=359
http://ganglia.info/?p=359#commentsThu, 10 Mar 2011 21:02:17 +0000vuksanhttp://ganglia.info/?p=359We have just introduced an experimental new feature to our GWeb 2.0 UI that we are very excited about. Feature is called easy graph aggregation as it allows you to graph the same metric across a number of hosts. This is often useful when you are proactively looking for problems within your infrastructure. We have made the feature even more powerful by allowing you to specify a regular expression that matches multiple hosts so if all your database servers are named db-something you can simply say db as your regular expression or db-0[1-5]. This feature is experimental so if you match too many hosts you may end up with a broken image however we have decided to put it out as a preview where we are going. Obligatory screenshots

Line graph

Stacked graph

Next steps

We need to add more error checking and bug fixes. Better composer UI and ability to add aggregate graphs to views. Stay tuned.

]]>http://ganglia.info/?feed=rss2&p=3590Gweb 2.0http://ganglia.info/?p=343
http://ganglia.info/?p=343#commentsTue, 08 Mar 2011 20:17:09 +0000Bernard Lihttp://ganglia.info/?p=343Ganglia has been around for over 10 years but it is surprising even to me that our Web Frontend has seen very little cosmetic changes over the years.

Back in October 2010, I started an email thread in the ganglia-developers mailing-list to kickstart a “re-write” of the frontend code. The idea is to make use of javascript libraries to improve on the user experience and allow customizations to cater to individual needs. We also wanted to tackle issues like visualizing a lot of data which large sites managing tens of thousands of computers are increasingly facing. These sites also tend to track upwards of hundreds of metrics per hosts bringing total metrics monitored in the range of millions.

These are indeed challenging and interesting times for the project as we see a shift in the user base from the traditional High Performance Computing and Grid sites to large web 2.0 companies and companies in the Cloud space where hosts are dynamically provisioned.

We value your suggestions and feedback, so please don’t be shy and either tweet about it @gangliainfo, ping us on IRC #ganglia at irc.freenode.net or start an email thread at ganglia-developers mailing-list!

We would like to release this code soon to the public, but we need your help to implement additional features, test the code, etc. So if you are interested, please let us know!

]]>http://ganglia.info/?feed=rss2&p=3430Got Tweets?http://ganglia.info/?p=298
http://ganglia.info/?p=298#commentsTue, 26 Oct 2010 06:39:54 +0000Bernard Lihttp://ganglia.info/?p=298Did you know that we are on Twitter? Follow us @gangliainfo here: http://twitter.com/gangliainfo. If you have something interesting to say about Ganglia, use the hashtag #Ganglia (be nice, we are sharing this with the Biology folks) and we just might re-tweet it! Twitter feed is also available in this webpage on the right hand side (although re-tweets are hidden in this view).

Happy Twittering/Tweeting!

]]>http://ganglia.info/?feed=rss2&p=2980Monitoring Hadoop Clusters with Gangliahttp://ganglia.info/?p=88
http://ganglia.info/?p=88#commentsWed, 22 Apr 2009 23:09:36 +0000Matt Massiehttp://ganglia.info/?p=88Apache Hadoop is an open-source implementation of MapReduce. Hadoop users will be happy to know that Hadoop has built-in support for publishing run-time metrics using Ganglia. For more details, visit the GangliaContext page on the Hadoop Wiki or Philip Zeyliger’s blog post on the Clouderablog. Cloudera offers an Apache 2.0 licensed distribution to make managing Hadoop clusters easier.
]]>http://ganglia.info/?feed=rss2&p=880Slides from ‘Capacity Planning for LAMP’ talk at MySQL Conf 2007http://ganglia.info/?p=54
http://ganglia.info/?p=54#commentsSun, 29 Apr 2007 03:10:39 +0000Matt Massiehttp://ganglia.info/?p=54John Allspaw, Engineering Manager at flickr (yahoo!), gave a talk on how flickr uses ganglia to help with capacity planning. The talk covers a lot of the subleties and challenges facing hugely successful web services like flickr.
]]>http://ganglia.info/?feed=rss2&p=540Building on AIX using the native compilerhttp://ganglia.info/?p=51
http://ganglia.info/?p=51#commentsTue, 18 Apr 2006 16:49:21 +0000knobihttp://ganglia.info/?p=51Hi,

this is basically the README.AIX file that will be in 3.0.4. It now has a better receipe for building with the native XLC compiler. It also describes what is needed to build “gmetad”. I thought it useful to publish this now.

Using Ganglia on AIX
~~~~~~~~~~~~~~~~~~~~

This Version is tested on AIX 5.1, 5.2 and 5.3. AIX 4.3 might work as well,
but it’s not tested by now.

Installation
~~~~~~~~~~~~

You still need some “tricks” to use ganglia on a AIX system:

1. The AIX-Version should not be compiled with shared libraries
You must add the “–disable-shared” and “–enable-static” configure
flags if you running on AIX

./configure –disable-shared –enable-static

2. You should use “gcc”. xlc does not work out of the box. If you only have
“xlc”, the following might work. Run configure first !!

(–) cpu_nice, cpu_intr and cpu_sintr:
There is no way to include this metric, because AIX
dose not know anything about them

(-) mem_buffers and mem_shared: libperfstat does not report
his information, but maybe somebody knows another way.

(+) part_max_used and cpu_aidle: it’s quite easy to do this
metrics as well using libperfstat, but no body has written
code so far.

]]>http://ganglia.info/?feed=rss2&p=510What is Ganglia?http://ganglia.info/?p=45
http://ganglia.info/?p=45#commentsWed, 09 Nov 2005 07:24:23 +0000Matt Massiehttp://ganglia.info/?page_id=45Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.

]]>http://ganglia.info/?feed=rss2&p=450Ganglia is part of OSCAR 4.0http://ganglia.info/?p=39
http://ganglia.info/?p=39#commentsTue, 11 Jan 2005 22:54:08 +0000Matt Massiehttp://ganglia.info/?p=39
The Open Cluster Group is please to announce the release of OSCAR
version 4.0.

]]>http://ganglia.info/?feed=rss2&p=390Linux POSIX Threadshttp://ganglia.info/?p=35
http://ganglia.info/?p=35#commentsThu, 16 Sep 2004 07:03:08 +0000Matt Massiehttp://ganglia.info/?p=35
People who use gexec and pcp on the latest Linux kernels will find that it hangs when executed. The problem is that Linux 2.4.x doesn’t
implement the full set of POSIX cancelation points (e.g., sem_wait,
sigwait, etc. are not implemented). This, it turns out, is the
fundamental cause for GEXEC and PCP hanging on these systems. Also,
terminal related signals (e.g., SIGTTIN) don’t appear to handled
correctly. I’m told that in 2.6.x kernels, some of these problems
have been fixed. But in the meantime, set your LD_ASSUME_KERNEL environmental variable before you start gexec daemons or clients.

export LD_ASSUME_KERNEL="2.4.10"

In the future most (if not all) ganglia components will not rely on POSIX threads at all given the chaotic nature of threads on Linux.