LinuxCzar

Rule #4 states that failure will happen, therefore you should plan for the eventual reality. The Linux workstations I build and use (if I have any say about it) use at least 2 hard drives in some mirrored or otherwise redundant fashion. My current patternis to build workstations with a small (120G or there about) SSD drive as the boot drive that contains my OS install and swap space. /home, scratch, and possibly other areas are mounted from a 2 disk mirrored array of spinning rust.

If a job in Linux System Administration / Operations can teach you one
thing its how to keep up with the ever changing landscape that Open Source
is. I’ve been working with Linux for 20+ years, and with that comes some,
hopefully, wisdom of experience. Linux distributions, and Open Source are
divergent in terms of change. The more things change, the more things there
are to change.

I originally wrote this post in October of 2008. Where did those 9 years go? I think its time for an update.
As the Linux Czar, I’m regularly asked to interview folks that are applying to various jobs that require some Linux skills. Interviewing isn’t really my strong point and I always struggle to come up with good questions that will lead the candidate to talk about himself and his skills in a helpful way.

I’m a big fan of using histograms for metrics and visibility. Over a StatsD-like approach that offers a series of summary metrics, histograms give us the ability to:
Actually visualize the distribution. You can see if your distribution is multimodal, for example. This is done with a heatmap. Aggregation. You can aggregate histograms (with the same bucket boundaries) together and produce summary metrics for an entire service. Remember, if you generate percentiles for each application instance you cannot aggregate those to get a global percentile for the entire service without the raw data.

I was able to attend Monitorama PDX 2017 this past week and had a blast.
If you are interesting in monitoring, metrics, related data analysis, alerting,
and of course logs then this is the conference for you. It struck me at
Monitorama that many of us came of age in the pre-microservices (Service
Oriented Architecture or SOA) world. But services in a SOA environment are
different and should be monitored differently than what we may be used to.

Here’s my take on the basic tenets of monitoring infrastructure and best
practices for Service Oriented Architectures. I’m an Operations Engineer
doing visibility work for a fairly large client so this comes from the
viewpoint of the caretaker of monitoring services. If you are a developer
and don’t agree, let me know!

One of the killer app features of Prometheus is its native support of
histograms. The move toward supporting and using histograms in metrics
and data based monitoring communities has been, frankly, revolutionary. I
know I don’t want to look back. If you are still relying on somehow
aggregating means and percentiles (say from StatsD) to visualize information
like latencies about a service, you are making decisions based on lies.

I wanted to dig into Prometheus’ use of histograms. I found some good, some
bad, and some ugly. Also, some potential best practices that will help you
achieve better accuracy in quantile estimations.

There are many factors that limit the available bandwidth of a network link from point A to point B. Knowing and expecting something reasonably close to the theoretical maximum bandwidth is one thing. However, the latency of the link can vastly affect available throughput. This is called the Bandwidth Delay Product. It can be thought of as the “memory” of the link as well (although that memory is the send/receive buffers).

I’ve been experimenting with Cyanite to make my Graphite cluster more reliable. The main problem I face is when a data node goes down the Graphite web app, more or less, stops responding to requests. Cyanite is a daemon written in Clojure that runs on the JVM. The daemon is stateless and stores timeseries data in Cassandra.
I found the documentation a bit lacking, so here’s how to setup Cyanite to build a scalable Graphite storage backend.