Planet Arch Linux

The zita-resampler 1.6.0-1 package was missing a library symlink that has been readded in 1.6.0-2. If you installed 1.6.0-1, ldconfig would have created this symlink at install time, and it will conflict with the one included in 1.6.0-2. In that case, remove /usr/lib/libzita-resampler.so.1 manually before updating.

Arch Linux @ FOSDEM

Arch Linux Trusted Users, Developers and members of the Security team have been at FOSDEM.
Next year there will be more stickers hopefully and maybe a talk, but it was great to meet
some Arch users in real life, discuss and even hack on the Security Tracker.

Arch Linux @ 34C3

Arch Linux Trusted Users, Developers and members of the Security team have been at 34C3 and even
held a small meetup. There was also an #archlinux.de assembly where people from the irc channel
could meet each other. Seeing how much interest there was this year, it might be worth it to host a
self organized session or assembly with more stickers \o/

Fosdem 2018

Arch Linux Trusted Users and Developers will be at Fosdem 2018 in February. We don’t have a booth or
developer room but you can probably find us by looking for Arch stickers or hoodies :-)

2017 Repository cleanup

The repository’s will be cleaned of orphan
packages, which will be
moved to the AUR, where they can be picked up and taken care of.

AUR 4.6.0 Release

A new version of aurweb has
been released on December third. It brings markdown support for comments and more Trusted User
specific changes.

Writing text in Unity isn't that easy, at least if you want to generate text with single gameobjects to be displayed in 3D and not only as a flat UI text. Every single letter must be dragged and dropped to its place to form a word.

Personally I was frustrated and didn't find a solution on the internet, so I decided to write this small script on my own which generates text in the editor while you are typing as you can see in the animation. For this example I used the letters from the Unity Asset Store package "Simple Icons - Cartoon Assets" (https://www.assetstore.unity3d.com/en/#!/content/59925). Maybe you will find this helpful or has more ideas to improve it, if so then please let me know.😉

More or less one year ago I have started with developing in VR and tried several plugins and tools in Unity to implement all the basic stuff like teleporting, grabbing, triggering and so on.

The first thing you will probably try or see in Unity is the official SteamVR plugin from Valve if you have a Vive like me. Basically this SDK has everything you will need to realize your project, but it's really a very hard start if you try several things with the SteamVR SDK directly. There is no much documentation or example scenes where you can have a look at. For me it was more frustrating than helpful, so searching the Unity Asset Store and also the internet you will maybe find the VRTK (Virtual Reality Toolkit) from Harvey Ball (aka TheStoneFox).

It's the best toolkit you can get if you are try to do something in Unity for VR. You get tons of documentations for nearly all use cases you can imagine. Furthermore he has created lots of examples where you find nearly all that stuff sorted and splitted in different scenes to show you how to use them. Every single script he has written is licensed under the MIT license and can be studied directly in the SDK or in his GitHub repository for the toolkit.

But why do I tell you this? Because Harvey Ball has done an absolute astonishing job on this toolkit. You don't have to fiddle around with each single SDK for each vendor, you don't have to think about grabbing an object, teleporting around, using an object, realize a button, realizing an usable door or anything you can think of. You can such use the VRTK and use nearly all available VR headsets out there directly and start to code your idea or project right away.

And the best thing, he has decided right from the beginning that he give this away all for free! This is even more astonishing if you see all the effort which is behind such a toolkit. These are all reasons to give something back to Harvey. How? You can decide to become a patreon on his patreon page, start contributing directly in the toolkit if you are a coder, contact Harvey and ask how you can help. Right now he is really looking for some donations to go on with the development of VRTK. It would be a shame if VRTK is dying just because of the lack of support. So give and show him a little bit of love and help for this really useful and totally necessary toolkit for VR development in Unity!

Look at all the links I have posted here and decide on your own how important this project is.

The reproducible build initiative has been started a long time ago by Debian and has been grown to include more projects. Arch is now also
in the process of getting reproducible build support, thanks to the of hard work of Anthraxx, Sangy, and many more
volunteers. In pacman git patches where landed to support reproducible builds which will be included in a hopefully soon next stable
release! Meanwhile with help of the reproducible-builds.org rebuild infrastructure rebuilds have been started!

Currently 77% of the 17% tested packages are reproducible as can be found
here. This page is fed by the work done by two Jenkins
builders, which currently build the whole Arch repository.
The builder builds the package twice in different environments and then uses diffoscope to find differences in
packages. Usually the differences are due to timestamps :-). Now that we have some results of rebuilds, we can start fixing our packages. The work I did so far:

Fixing 404 sources of our packages, some of the source failures where due to ftp://kernel.org being used and not https://www.kernel.org.
This has been fixed in SVN. Also old pypi links needed to be fixed

One package’s .install file contained a killall statement, I’m not sure why but it shouldn’t be required so it was eradicated

Integrity mismatch, so upstream did a ninja re-release, annoying but fixed

Imagemagick’s convert sets some metadata in the resized png’s which makes reproducible builds
fail. Since it does not adhere to
SOURCE_DATE_EPOCH.

Missing checkdepends on pytest-runner, which is automatically downloaded by the build tools but that failed in the reproducible build.
Some simply adding the depdency to checkdepends fixed
it.

As you can see, only one of the bullet points was really an reproducible build issue the others where packaging issues. So I can conclude
that reproducible builds will increase the packaging quality in the Arch repository. Having the packages in our repository always build-able
will also help the Arch Linux 32 project.

The Arch reproducible project still needs a lot of work, to make it possible to verify a package build as a user against the repository
package.

P.S.: If you are at 34C3 this year and interested, visit the reproducible build assembly.

Maybe you have heard about the great WRLD project, which provides a great way to display real world map data in your project. Furthermore they provide several different SDKs to access these datas.

For a small project I needed some map data to visualize them in a 3D scene inside Unity. Using the Unity SDK from WRLD you can easily access those data and they will be displayed in your scene. Sadly they render the map all over your scene and their is no restriction in size. At least I haven't found any, even if you use their script attached to a GameObject with a specific size, the map will be displayed all over the scene.

After some fails and searching the web I stumbled upon a video showing the usage of WRLD in a AR environment. There they do exactly what I needed. Luckily the video was made by WRLD and they also provided two very goodblog posts where they explained how they have done it. With the help of these blog posts I have implemented it without all the AR stuff and came up with the proof of concept you can see in the animation.

The displayed cube is used as a stencil mask for the map and if you move the cube or the map, only the part of the map which is inside the cube will be rendered. Also new tiles of the map are loaded dynamically depending on the main camera in the scene. I have published the Unity project on GitHub to provide the solution ready to use for your project and also to archive it for myself. You will need a valid API key from WRLD, just register at their website and generate one for your needs. Then insert your API key at the WRLD Map GameObject:

The project includes the WRLD Unity SDK which you also find in the Asset Store of Unity. But be careful if you replace the included one with the official one from the store, because I have done some changes they have mentioned in their blog posts. So make sure to apply the code changes if you replace the integrated WRLD SDK.

Hope you will find it useful. If you find a bug or have useful hints then let me know, because I'm quite new in Unity and thankful for anything related to it.

This is the second edition of Arch monthly, mostly due to the lack of time to
work on Arch weekly. So let’s start with the roundup of last month.

New TU David Runge

David Runge applied to become a Trusted User and was accepted! He mentioned to have a huge interest in pro-audio, so hopefully there will be improvements made in that area!

Farewell 32 bit

After nine months of deprecation period, 32 bit is now unsupported on Arch Linux. For people with 32 bit hardware there is the Arch Linux 32 project which intends to keep 32 bit support going.

AUR Changes Affecting Your Privacy

The next aurweb release, which will be released on 2017-12-03, includes a public interface to obtain a list of user names of all registered users. This means that, starting on 2017-12-03, your user name will be visible to the general public. The user name is the account name you specified when registering, and it is the only information included in this list. See this link for more information.

#archlinux-testing irc channel

An irc channel has been created for coordination between Arch Linux testers. See more about becoming an official tester here.

Following 9 months of deprecation period, support for the i686
architecture effectively ends today. By the end of November, i686
packages will be removed from our mirrors and later from the packages
archive. The [multilib] repository is not affected.

For users unable to upgrade their hardware to x86_64, an alternative is
a community maintained fork named Arch Linux 32. See their website
for details on migrating existing installations.

test-kitchen was originally written as a way to test chef cookbooks. But the provisioners and drivers are pluggable, kitchen-salt enables salt to be the provisioner instead of chef.

The goal of this kitchen-salt is to make it easy to test salt states or formulas independently of a production environment. It allows for doing quick checks of states and to make sure that upstream changes in packages will not affect deployments. By using platforms, users can run checks on their states against the environment they are running in production as well as checking future releases of distributions before doing major upgrades. It is also possible to test states against multiple versions of salt to make sure there are no major regressions.

Example formula

This article will be using my wordpress-formula to demo the major usage points of kitchen-salt.

Installing Kitchen

Most distributions provide a bundler gem in the repositories, but there are some that have a version of ruby that is too old to use kitchen. The easiest way to use kitchen on each system is to use a ruby version manager like rvm or rbenv. rbenv is very similar to pyenv.

Once ruby bundler is installed, it can be used to install localized versions of the ruby packages for each repository, using the bundle install command.

Because I am also testing opensuse, right now the git version of kitchen-docker is required.

Using kitchen

$ bundle exec kitchen help
Commands:
kitchen console # Kitchen Console!
kitchen converge [INSTANCE|REGEXP|all]# Change instance state to converge. Use a provisioner to configure one or more instances
kitchen create [INSTANCE|REGEXP|all]# Change instance state to create. Start one or more instances
kitchen destroy [INSTANCE|REGEXP|all]# Change instance state to destroy. Delete all information for one or more instances
kitchen diagnose [INSTANCE|REGEXP|all]# Show computed diagnostic configuration
kitchen driver # Driver subcommands
kitchen driver create [NAME]# Create a new Kitchen Driver gem project
kitchen driver discover # Discover Test Kitchen drivers published on RubyGems
kitchen driver help[COMMAND]# Describe subcommands or one specific subcommand
kitchen exec INSTANCE|REGEXP -c REMOTE_COMMAND # Execute command on one or more instance
kitchen help[COMMAND]# Describe available commands or one specific command
kitchen init # Adds some configuration to your cookbook so Kitchen can rock
kitchen list [INSTANCE|REGEXP|all]# Lists one or more instances
kitchen login INSTANCE|REGEXP # Log in to one instance
kitchen package INSTANCE|REGEXP # package an instance
kitchen setup [INSTANCE|REGEXP|all]# Change instance state to setup. Prepare to run automated tests. Install busser and related gems on one or more instances
kitchen test[INSTANCE|REGEXP|all]# Test (destroy, create, converge, setup, verify and destroy) one or more instances
kitchen verify [INSTANCE|REGEXP|all]# Change instance state to verify. Run automated tests on one or more instances
kitchen version # Print Kitchen's version information

The kitchen commands I use the most are:
- list: show the current state of each configured environment
- create: create the test environment with ssh or winrm.
- converge: run the provision command, in this case, salt_solo and the specified states
- verify: run the verifier.
- login: login to created environment
- destroy: remove the created environment
- test: run create, converge, verify, and then destroy if it all succeeds

For triaging github issues, I regularly use bundle exec kitchen create <setup> and then salt bootstrap to install the salt version we are testing.

Then for running tests, to setup the environment I want to run the tests in I run bundle exec kitchen converge <setup>

Configuring test-kitchen

There are 6 major parts of the test-kitchen configuration file. This is .kitchen.yml and should be in the directory inside of which the kitchen command is going to be run.

driver: This specifies the configuration of how the driver requirements. Drivers are how the virtual machine is created. kitchen drivers (I prefer docker)

verifier: The command to run for tests to check that the converge ran successfully.

platforms: The different platforms/distributions to run on

transport: The transport layer to use to talk to the vm. This defaults to ssh, but winrm is also available.

suites: sets of different test runs.

provisioner: The plugin for provisioning the vm for the verifier to run against. This is where kitchen-salt comes in.

For the driver on the wordpress-fomula, the following is set:

driver:name:dockeruse_sudo:falseprivileged:trueforward:-80

This is using the kitchen-docker driver. If the user running kitchen does not have the correct privileges to run docker, then use_sudo: true should be set. All of the containers that are being used here are using systemd as the exec command, so privileged: true needs to be set. And then port 80 is forwarded to the host so that the verifier can run commands against it to check that wordpress has been setup

For the platforms, the following are setup to run systemd on the container start.

All of these distributions except for opensuse have sshd.service enabled when the package is installed, so we only have to have one provision command to enable sshd for opensuse. The rest have a command to configure the driver run_command to the correct systemd binary for that distribution.

For suites, there is only one suite.

suites:
- name: wordpress

If multiple sets of pillars or different versions of salt were needed to be tested, they would be configured here.

remote_exec will cause the command to be run inside of the container. The kitchen command uses the installed salt to lookup if py3 was used or not, so that the correct python executable is used to run the test suite. Then if the TEST environment variable is set, that test is run, otherwise the full test suite is run.

salt_install: This defaults to bootstrap which installs using the salt bootstrap. Other options are apt and yum which use the repo.saltstack.com repository. ppa allows for specifying a ppa from which to install salt. And distrib which just uses whatever version of salt is provided by the distribution repositories.

salt_bootstrap_options: These are the bootstrap options that are passed to the bootstrap script. -X can be passed here to not start the salt services, because salt_solo runs salt-call and doesn't use the salt-minion process.

is_file_root: This is used to say just copy everything from the current directory to the tmp fileserver in the kitchen container. If there were not a custom module and state for this formula, kitchen could be set to have formula: wordpress to copy the wordpress-formula to the kitchen environment.

salt_copy_filter: This is a list of files to not copy to the kitchen environment.

dependencies: This is the fun part. If the formula depends on other formulas, they can be configured here. The following types are supported:

state_top: This is the top file that will be used to run at the end of the provisioner

pillars: This is a set of custom pillars for configuring the instance. There are a couple other ways to provide pillars that are also useful.

Running test kitchen on pull requests.

Any of the major testing platforms should be usable. If there are complicated setups needed, Jenkins is probably the best, unfortunately I do not know jenkins very well, so I have provided examples for the three I know how to use.

My personal favorite is Drone. You can setup each one of the tests suites to run with a mysql container if you did not have states that need mysql-server installed on the instance. Also, for each job runner for Drone, you just need to setup another drone-agent on a server running docker, and then hook it into the drone-server, then each drone-agent can pick up a job and run it.

The perl package now uses a versioned path for compiled modules. This means
that modules built for a non-matching perl version will not be loaded any more
and must be rebuilt.

A pacman hook warns about affected modules during the upgrade by showing output
like this:

WARNING: '/usr/lib/perl5/vendor_perl' contains data from at least 143 packages which will NOT be used by the installed perl interpreter.
-> Run the following command to get a list of affected packages: pacman -Qqo '/usr/lib/perl5/vendor_perl'

You must rebuild all affected packages against the new perl package before you
can use them again. The change also affects modules installed directly via
CPAN. Rebuilding will also be necessary again with future major perl updates
like 5.28 and 5.30.

Please note that rebuilding was already required for major updates prior to
this change, however now perl will no longer try to load the modules and then fail in strange ways.

If the build system of some software does not detect the change automatically,
you can use perl -V:vendorarch in your PKGBUILD to query perl for the
correct path. There is also sitearch for software that is not packaged with
pacman.

for the last few days I had been working on a more mobile friendlyview of our Wikis and Forums. These have just been deployed. Here iswhat changed:

While it's not perfect on small screens it should at least be way morereadable on your mobile phones. Let me know of any issues though.

Wiki:* Updated to MediaWiki 1.29.1* Removed our fork of the MonoBook skin* Introduce a new "ArchLinux" extension which injects some styles andour navigation bar independent of the skin. This was quite a lot ofwork to figure out, but future updates should be way easier now.* The default skin is Vector; MonoBook is still available and can beenabled in your personal settings* The MobileFrontend extension has been removed (So we have a brandedview for mobile as well)* PR see https://github.com/archlinux/archwiki/pull/9/files

In addition to this I have been working on a re-implementation ofhttps://www.archlinux.de. Part of this is a new more mobile friendlydesign. Especially the navigation which moves the menu entries into aso called Hamburger menu on smaller screens is still missing from theimplementation mentioned above.

I plan to extract these "somehow" so we can use a common navigation inall our websites. At least a generated snippet we can copy into ourprojects.

Finally we have released our first pre-alpha version of our newly project Till Dawn. Till Dawn is a VR zombie survival shooter. Tested with the HTC Vive, but the Oculus Rift is also supported, but untested.

You can download the game on the game page at itch.io or at gamejolt and you will also find all information about the game on these pages. At the moment, it's for free but you can donate as much as you want to support us.

So grab a free copy, connect your HTC Vive, play it and let us know what you think about it. Feedback is always welcome, so if you have some ideas for the game then let us know. But remember it's under development right now, so make sure to read the description and the known issues.

Here are some screenshots of the game, but to get a better impression you have to play it.

It's only available for Windows right now, because SteamVR and Unity3D under Linux is a little bit more complicated. Sorry to all Linux gamers out there, but as soon as it is supported without errors, we will release it for Linux, too.

If you like the game then share the information, tweet about it (don't forget to mention @devplayrepeat), make a blog post or just tell your friends. To follow the development make sure that you regularly check the itch.io page of the game because we will post news there.

New Trusted User foxxx0

Discussion about improving the overall experience of contributors

Bartłomiej has started a discussion on arch-dev-public about
improving and getting more external contributors involved in Arch Linux. Not only could existing Arch projects such as pyalpm, archweb and namcap use more contributors for development of new features and fixing bugs. Arch could also use more contributors for new projects and ideas such as rebuild automation and the maintenance of our infrastructure. For those wondering what the infrastructure is about, Arch has a few dedicated servers for the forums, building packages, etc. all these servers are managed with ansible with the playbooks in git

This is the first edition of Arch weekly, a small weekly post about the news in
the Arch Linux community. Hopefully this will be a recurring weekly blog post!

linux-hardened appears in [community]

After the disappearance of linux-grsec from the repos due to the Grsecurity
project not providing the required patches. Daniel Micay provides an alternative linux-hardened in [community]. The package is based on the following Linux fork which contains more security patches than in the Linux mainline kernel and enables more security configuration options by default such as SLAB_FREELIST_RANDOM.
More information can be found on the
wiki of the project.

Arch-boxes project

An effort has been made by
Shibumi to provide
official Arch Linux docker, vagrant (and maybe ec2) images. Currently there is a
virtualbox and qemu/libvirt option. View the project here.

Qt 4 now depends on OpenSSL 1.1

Even after the enormous OpenSSL 1.1 rebuild, not every package in the repository
uses OpenSSL 1.1 yet. Qt 4 currently in [extra] uses OpenSSL 1.1 with 27
packages left in the repository which depend on openssl-1.0. Other OpenSSL 1.0 depending packages are now being rebuilt to stay compatible with Debian Stable and non-free software. See this bug report for more information.

Due to high maintenance cost of scripts related to the Arch Build
System, we have decided to deprecate the abs tool and thus rsync
as a way of obtaining PKGBUILDs.

The asp tool, available in [extra], provides similar functionality to
abs. asp export pkgname can be used as direct alternative; more
information about its usage can be found in the documentation.
Additionally Subversion sparse checkouts, as described here, can
be used to achieve a similar effect. For fetching all PKGBUILDs, the
best way is cloning the svntogit mirrors.

While the extra/abs package has been already dropped, the rsync
endpoint (rsync://rsync.archlinux.org/abs) will be disabled by the end
of the month.

As many car insurances companies do, my car insurance company provides a
satellite device that can be put inside your car to provide its location at any
time in any place.

By installing such device in your car, the car insurance profiles your conduct,
of course, but it could also help the police in finding your car if it gets
stolen and you will probably get a nice discount over the insurance price (even
up to 40%!). Long story short: I got one.

Often such companies also provide an “App” for smartphones to easily track your
car when you are away or to monitor your partner…mine (the company!) does.

Then I downloaded my company’s application for Android, but unluckily it needs
the Google Play Services to run. I am a FLOSS evangelist and, as such, I try to
use FLOSS apps only and without gapps.

Luckily I’m also a developer and, as such, I try to develop the applications I
need most; using mitmproxy, I started to analyze the
APIs used by the App to write my own client.

Authentication

As soon as the App starts you need to authenticate yourself to enable the
buttons that allow you to track your car. Fair enough.

The authentication form first asks for your taxpayer’s code; I put mine and
under the hood it performs the following request:

Wait. What do we already see here? Yes, besides the ugliest formatting ever and
the fact the request uses plain HTTP, it takes only 3 arguments to get a cell
phone number? And guess what? The first one and the latter are two constants.
In fact, if we put an inexistent taxpayer’s code, by keeping the same values,
we get:

-1§<international_calling_code>§§-100%

…otherwise we get a cell phone number for the given taxpayer’s code!

I hit my head and I continued the authentication flow.

After that, the App asks me to confirm the cell phone number it got is still
valid, but it also wants the password I got via mail when subscribing the car
insurance; OK let’s proceed:

Yes, your assumption is right: you just need a car license and you get its
last 20 positions. And what’s that another_code? I just write it down for
the moment.

It couldn’t be real, I first thought (hoped) they stored my IP somewhere so I’m
authorized to get this data now, so let’s try from a VPN…oh damn, it
worked.

Then I tried with an inexistent car license and I got:

-2§TARGA NON ASSOCIATA%

which means: “that car license is not in our database”.

So what we could get here with the help of crunch?
Easy enough: a list of car licenses that are covered by this company and
last 20 positions for each one.

I couldn’t stop now.

The Web client

This car insurance company also provides a Web client which permits more
operations, so I logged into to analyze its requests and while it’s hosted on a
different domain, and it also uses a cookie for almost any request, it performs
one single request to the domain I previously used. Which isn’t authenticated
and got my attention:

It includes an iframe (sigh!), but that’s the interesting part!!! Look:

From that page you get:

the full name of the person that has subscribed the insurance;

the car model and brand;

the total amount of kilometers made by the car;

the total amount of travels (meant as “car is moving”) made by the car;

access to months travels details (how many travels);

access to day travels details (latitude, longitude, date and time);

access to months statistics (how often you use your car).

There are a lot of informations here and these statistics are available
since the installation of the satellite device.

The request isn’t authenticated so I just have to understand the parameters to
fill in. Often not all parameters are required and then I tried by removing
someone to find out which are really needed. It turns out that I can simplify
that as:

But there’s still a another_code there…mmm, wait it looks like the
number I took down previously! And yes, it’s!

So, http://<domain>/<company>/(S(<uuid>))/NewRicerca.aspx is the page that
really shows all the informations, but how do I generate that uuid thing?

I tried by removing it first and then I got an empty page. Sure, makes sense,
how that page will ever know which data I’m looking for?

Then it must be the NewRemoteAuthentication.aspx page that does something; I
tried again by removing the uuid from that url and to my full surprise it
redirected me to the same url, but it also filled the uuid part as path
parameter! Now I can finally invoke the NewRicerca.aspx using that uuid and
read all the data!

Conclusion

You just need a car license which is covered by this company to get all the
travels made by that car, the full name of the person owning it and its
position in real time.

I reported this privacy flaw to the CERT Nazionale
which wrote to the company.

The company fixed the leak 3 weeks later by providing new Web services
endpoints that use authenticated calls. The company mailed its users saying
them to update their App as soon as possible. The old Web services have been
shutdown after 1 month and half since my first contact with the CERT Nazionale.

I could be wrong, but I suspect the privacy flaw has been around for 3 years
because the first Android version of the App uses the same APIs.

The TalkingArch team is pleased to present the latest version of TalkingArch, available from the usual location. This version features all the latest software, including Linux kernel 4.10.6.

The most important feature of this live image is the new x86_64 only compatibility, removing the i686 compatibility that was present in previous images. This makes the latest version much smaller, but it will no longer work on older i686 machines.

This version is the only one that will be listed on the download page, as it always only includes the latest version. However, anyone needing an image that works on i686 may still download the last dual architecture image either via http or BitTorrent, until i686 is completely dropped from the Arch official repositories later this year. The TalkingArch team is also following the latest information on an i686 secondary port, and if there is enough of a need, and if the build process is as straightforward as the x86_64 version currently available, i686 images may be provided once the port is complete and fully working.

mesa-17.0.0-3 can now be installed side-by-side with nvidia-378.13 driver without any libgl/libglx hacks, and with the help of Fedora and upstream xorg-server patches.

First step was to remove the libglx symlinks with xorg-server-1.19.1-3 and associated mesa/nvidia drivers through the removal of various libgl packages. It was a tough moment because it was breaking optimus system, xorg-server configuration needs manual updating.

The second step is now here, with an updated 10-nvidia-drm-outputclass.conf file that should help to have an "out-of-the-box" working xorg-server experience with optimus system.

How docker works now.

When you build a docker container using only docker tools, what you are actually doing is building a bunch of layers. Great. Layers is a good idea. You get to build a bunch of docker images that have a lot of similar layers, so you only have to build the changes when you update containers. But what you end up with is this really hard to read ugly Dockerfile that is hard to leave because it tries to put a bunch of commands on the same line.

Here is a Dockerfile that is used to build a mysqld container. The RUN command in the middle is just really convuluted and I just can't imagine trying to write a container like this. But what if you could use salt to configure your docker container to do the same thing.

Using Salt States

NOTE: This is all to be added in the Nitrogen release of salt, but you should be able to drop-in the dockerng state and module from develop once this PR is merged.

It is worth mentioning that this is a contrived example, because one of the requirements to use dockerng.call is to have python installed in the docker container. So for the salt example you will need to build a slightly modified parent container using the following command.

Now, this shouldn't be a problem when building images. This just allows for managing the layers. If I were to do this, I would take the debian image, and use salt states to setup apache and the base stuff for building the modules and things and then run the following state.

And we are done. In my honest opinion, this is significantly easier to read. First we enable the rewrite module. Then we install the packages for compiling the different php extensions. Then we use the built in docker-php-ext-* to build the different php modules. And we put the opcache recommended plugins in place. Lastly we download and extract the drupal tarball and put it in the correct place.

There is one caveat, right now we do not have the ability to build in the WORKDIR and ENV variables, so those will have to be provided when the container is started.

Long time no post on my blog, but this time I have to. Finally I was able to release a first alpha version of my Android app "23 viewer". It's in an early alpha stage, but it can be used to browse the subscriptions of your contacts in the 23 photo community, to favor a photo, to comment on photos and to see the EXIF data of a photo. You can find the apk package for Android on the github release page here: https://github.com/isenmann/23viewer/releasesTo give you some short impression, here are some screenshots of the app:

You will need an account on 23 to use the app, otherwise you are not able to do anything in the app or to see something. If you using 23, give me some feedback via github or email. Thanks and have fun with the app.

Due to the decreasing popularity of i686 among the developers and the
community, we have decided to phase out the support of this architecture.

The decision means that February ISO will be the last that allows to
install 32 bit Arch Linux. The next 9 months are deprecation period,
during which i686 will be still receiving upgraded packages. Starting
from November 2017, packaging and repository tools will no longer
require that from maintainers, effectively making i686 unsupported.

However, as there is still some interest in keeping i686 alive, we would
like to encourage the community to make it happen with our guidance. The
arch-ports mailing list and #archlinux-ports IRC channel on Freenode
will be used for further coordination.

The TalkingArch Iso

The TalkingArch team is proud to present the first TalkingArch iso of 2017. This one is the second to fix a problem with Portaudio that was causing Espeak to crash. It comes with the latest packages, including Linux kernel 4.8.13. Find it in the usual place.

Social Media

TalkingArch is now on Twitter. Follow us, tweet @ us, or even retweet release announcements, which will post there as well as here. All other ways of contacting TalkingArch still work; nothing is going away.

The Fall of i686

There has been significant talk over the past 2 to 3 weeks on the Arch Linux e-mail list about dropping support for the i686. architecture. The upshot of the discussion is that by the end of this year, no further i686 packages will be uploaded to the official repositories. To that end, Arch will most likely be producing only x86_64 isos starting in February, ending the dual architecture builds. Because TalkingArch stays as close to Arch as is possible, if Arch does stop producing dual architecture builds in February, TalkingArch will do the same. However, if an i686 version is needed, as long as packages are still being uploaded to the official repositories, an i686 build of TalkingArch can be made available upon request. That said, it will still be possible for some months to use this dual architecture build to install Arch Linux until such time as they remove i686 packages from the repositories, and that link will continue to work as long as it is usable. Thanks for the support over the years, and we look forward to serving you the latest and greatest TalkingArch for many years to come.

The upgrade to OpenVPN 2.4.0 makes changes that are incompatible with
previous configurations. Take special care if you depend on VPN
connectivity for remote access! Administrative interaction is required:

Configuration is expected in sub directories now. Move your files
from /etc/openvpn/ to /etc/openvpn/server/ or /etc/openvpn/client/.

If you’re interested at all in tackling non-trivial timeseries alerting use cases (e.g. working with seasonal or trending data) this video should be useful to you.

It’s basically me trying to convey in a concrete way why I think the big-data and math-centered algorithmic approaches come with a variety of problems making them unrealistic and unfit,
whereas the real breakthroughs happen when tools recognize the symbiotic relationship between operators and software, and focus on supporting a collaborative, iterative process to managing alerting over time. There should be a harmonious relationship between operator and monitoring tool, leveraging the strengths of both sides, with minimal factors harming the interaction.
From what I can tell, bosun is pioneering this concept of a modern alerting IDE and is far ahead of other alerting tools in terms of providing high alignment between alerting configuration, the infrastructure being monitored, and individual team members, which are all moving targets, often even fast moving. In my experience this results in high signal/noise alerts and a happy team.
(according to Kyle, the bosun project leader, my take is a useful one)

That said, figuring out the tool and using it properly has been, and remains, rather hard. I know many who rather not fight the learning curve. Recently the bosun team has been making strides at making it easier for newcomers -
e.g. reloadable configuration and Grafana integration - but there is lots more to do.
Part of the reason is that some of the UI tabs aren’t implemented for non-opentsdb databases and integrating Graphite for example into the tag-focused system that is bosun, is bound to be a bit weird. (that’s on me)

For an interesting juxtaposition, we released Grafana v4 with alerting functionality which approaches the problem from the complete other side: simplicity and a unified dashboard/alerting workflow first, more advanced alerting methods later. I’m doing what I can to make the ideas of both projects converge, or at least make the projects take inspiration from each other and combine the good parts. (just as I hope to bring the ideas behind graph-explorer into Grafana, eventually…)

Note:
One thing that somebody correctly pointed out to me, is that I’ve been inaccurate with my terminology.
Basically, machine learning and anomaly detection can be as simple or complex as you want to make it. In particular, what we’re doing with our alerting software (e.g. bosun) can rightfully also be considered machine learning, since we construct models that learn from data and make predictions. It may not be what we think of at first, and indeed, even a simple linear regression is a machine learning model. So most of my critique was more about the big data approach to machine learning, rather than machine learning itself. As it turns out then the key to applying machine learning successfully is tooling that assists the human operator in every possible way, which is what IDE’s like bosun do and how I should have phrased it, rather than presenting it as an alternative to machine learning.

For some time I’ve wanted to play Spotify music on my stereo installation, except it doesn’t have bluetooth. I do own a nice aarch64 amlogic S905X based media center which runs LibreELEC, except libspotify which I normally use in combination with mopdiy. Libspotify (a binary blob from Spotify(tm)) however does not support aarc64.
So I decided to use one of my spare ARM boards, the nanopi NEO it has USB for the DAC and ethernet for streaming music and it’s supported in mailine (ethernet however requires patches) and librespot. Librespot is a service which enables my phone (or any other official spotify-client running device) to play music on the ARM board running librespot. All what was required was compiling librespot (rust program) for ARMv7.

Arch Linux ARM does not offer rust as binary package, so you’d have to use rustup and run the nightly channel because the default rust channel segfaults.

Compiling on an ARM board with 512 MB ram and no swap is currently not do-able with rust even with ‘cargo build -j1’. So I switched to a beefier board (orange pi pc) with 1GB ram which is luckily enough to compile libreelec.

pacman -S protobuf portaudiocargo build --release-j1

After it was compiled, install the cargo binary with ‘cargo install’ and copied the systemd unit from the AUR package. So far librespot works as expected and hasn’t crashed while running for a week.

Later I intend to integrate the nanopi and dac in a nice case with a power button to poweroff/poweron the nanopi.

I like cleaning git history, in feature branches, at least.
The goal is a set of logical commits without other cruft, that can be cleanly merged into master. This can be easily achieved with git rebase and force pushing to the feature branch on GitHub.

This meant that I not only overwrote the branch I wanted to update, but also by accident a feature branch (called httpRefactor in this case) to which a colleague had been force pushing various improvements which I did not have on my computer. And my colleague is on the other side of the world so I didn’t want to wait until he wakes up. (if you can talk to someone who has the commits just have him/her re-force-push, that’s quite a bit easier than this)
It looked something like so:

Oops!
So I wanted to reset the branch on GitHub to what it should be, and also it would be nice to update the local copy on my computer while we’re at it.
Note that the commit (or rather the abbreviated hash) on the left refers to the commit that was the latest version in GitHub, i.e. the one I did not have on my computer.
A little strange if you’re to accustomed to git diff and git log output showing hashes you have in your local repository.

Normally in a git repository, the objects dangle around until git gc is run, which clears any commits except those reachable by any branches or tags.
I figured the commit is probably still in the GitHub repo (either cause it’s dangling, or perhaps there’s a reference to it that’s not public such as a remote branch), I just need a way to attach a regular branch to it (either on GitHub, or fetch it somehow to my computer, attach the branch there and re-force-push), so step one is finding it on GitHub.

The first obstacle is that GitHub wouldn’t recognize this abbreviated hash anymore: going to
https://github.com/raintank/metrictank/commit/92a817d resulted in a 404 commit not found.

Now, we use CircleCI, so I could see what had been the full commit hash in the CI build log.
Once I had it, I could see that https://github.com/raintank/metrictank/commit/92a817d2ba0b38d3f18b19457f5fe0a706c77370 showed it.
An alternative way of opening a view of the dangling commit we need, is using the reflog syntax.
Git reflog is a pretty sweet tool that often comes in handy when you made a bit too much of a mess on your local repository,
but also on GitHub it works: if you navigate to https://github.com/raintank/metrictank/tree/httpRefactor@{1} you will be presented with the commit
that the branch head was at before the last change, i.e. the missing commit, 92a817d in my case.

Then follows the problem of re-attaching a branch to it.
Running on my laptop git fetch --all doesn’t seem to fetch dangling objects, so I couldn’t bring the object in.

Then I tried to create a tag for the non-existant object. I figured, the tag may not reference an object in my repo, but it will on GitHub, so if only I can create the tag, manually if needed (it seems to be just a file containing a commit hash), and push it, I should be good.
So:

So this approach won’t work. I can create the tag, but not push it, even though the object exists on the remote.

So I was looking for a way to attach a tag or branch to the commit on GitHub, and then I found a way.
While having the view of the needed commit open, click the branch dropdown, which you typically use to switch the view to another branch or tag.
If you type any word in there that does not match any existing branch, it will let you create a branch with that name. So I created recover.

From then on, it’s easy.. on my computer I went into httpRefactor, backed my version up as httpRefactor-old (so I could diff against my colleague’s recent work), deleted httpRefactor, and set it to
the same commit as what origin/recover is pointing to, pushed it out again, and removed the recover branch on GitHub:

How I came to work on SaltStack

I was working at Rackspace doing Linux support in the Hybrid segment. I did a lot of work with supporting Rackconnect v2/v3 and Rackspace Public cloud as well as the dedicated part of the house. I started down the road to server automation and orchestration the way I think a lot of people do. At some point, I started to think... there has to be a better way.

I began with learning chef. Rackspace's devops offering had just begun and there was a lot of people using chef and poo pooing puppet in places that I was looking. I had never used ruby before, so I did some ruby practice on codecademy and I learned the basics of different blocks in ruby. I then setup wordpress. As can be seen from that repository, it has been a very long time since I did chef. I played with chef for about 6 months, and then I decided to try using Ansible and see what that was all about. I liked the idea of pushing instead of pulling and the easy deployment method was nice. But after about a month of using Ansible, Joseph Hall came to Rackspace right after the first SaltConf in 2014 and gave a 3 day class on salt. And I was in love. I loved the extensibility of salt, the reactor, the api, salt-cloud being built in. It was all just perfect for me. And by the end of the second day of the class, I had submitted my first pull request to saltstack.

My favorite thing I think about contributing to salt is how open it is to the community and how hard we all try to be welcoming to anyone new. We kind of have a No Jerks Allowed Rule, and try to be as polite and welcoming as possible.

Anyway, lets get started. This is going to be just as much as I can think of on how to go about contributing to salt.

Getting setup

I probably run my testing setup a little different than everyone else. Anyway you can get salt running to do testing is good. If it works for you, do it.

I create a server in VMWare Fusion using CentOS. Then I install epel-release and then python-pip and I do a pip install -e git://github.com/gtmanfred/salt.git@<branch>#egg=salt. This will give me everything I need to install to get salt running. Since it is also -e the git install is editable, so the changes take effect immediately and I can edit right there in ./src/salt. From here for all my changes, I can just commit right there to save for later.

Recently I have been trying to switch to using atom as my editor. I really like it. What I have been using is the remote ftp plugin. This allows for the remote directory to be setup to ~/src/salt, and then I just have that in the .ftpconfig and once connected, there is a second project window that shows the remote ftp location with all the files, and I can treat it as if it was the local file. Then once all the files are done, I can sync from the remote down to the local and make my pull request.

Either way, get a working environment going.

Here is the salt document on getting started with the development. You can ignore parts in there about M2Crypto and swig. There are no currently supported salt versions that use M2Crypto.

Another thing you could do if you were so inclined, would be to copy the module you are going to be modifying to /srv/salt/_modules or whatever dynamic directory where it belongs. You will then need to run salt-call saltutil.sync_all to sync modules to the minion or salt-run saltutil.sync_all for the master.

Writing a ... template

The first thing that I do any time I make a new file for a salt module is to add the following template.

Each file requires a docstring at the to to list any depends and then basic configuration for usage such as s3 credentials. It is also good to use the :depends: key if there are any required packages that need to be installed for the module to be used.

I pretty much always import absolute_import. This is just useful to have and will cause less weird issues later. Plus it is the default behavior in python3, so there is nothing bad that could come from it.

Then We have the two import options. Anything that gets import from salt, like salt.utils gets put under the Import Salt libraries, and all other imports get put under python libraries.

Then we have the __virtual__ functions which we will go over later when we talk about the anatomy of a module.

Execution Modules

Now lets move to writing a module. I am going to demo with a contrived example of a redis module, and then go over every line.

There, that is a moderately simple example where we can talk about every thing going on.

You will notice the coding line at the top like in the template

Next we have the docstring.

There is a brief description

a versionadded string. Please include these if you make new modules, so that when referencing back we can see when the module was added. Also, if it is an untagged release, use the codename, otherwise use the point release where it was added. We update the code names on all versonadded added and versionchanged strings when we tag them with a release date.

A depends string, to let the user know that the redis python module is required.

An example configuration if you one is possible to be used.

Then we have the imports. We catch the import error on redis, and set HAS_REDIS as False if it can't be imported so that we can reference it in the __virtual__ function and know if the module should be available or not.

__virtualname__ is used to change the name the module should be loaded under. If __virtualname__ isn't set and returned by the __virtual__ function then the module would be called using redismod.set.

The __virtual__ function is used to decide if the module can be used or not.

If it can be used and it has a __virtualname__ variable, return that variable. Otherwise if it is to be named after the name of the file, just return True.

If this function can't be used, return a two entry tuple where the first index is False and the second is a string with the reason it could not be loaded so that the user does not have to go code diving.

Now the connect function.

If you include something like this, please be sure to also include the ability to connect to the module by passing arguments from the command-line and not only having to modify configuration files.

It is important to note, while python allows for any "private" functions to be importable and used, salt does not. The _connect function is not usable from the command-line, or from the __salt__ dictionary

There are a lot of includes that salt provides into different portions of salt. These are usually called dunder dictionaries.

Using config.get lets the configuration be put in the minion config, grains, or pillars. There is a heirarchy.

The lastly we have __context__. This is a really usefull for connections, because you only have to setup the connection one time, and then you can continually just return it and use it every time the module is used, instead of having to reinitialize the connection.

Lastly we have the functions that are available.

You want a doc string that has a description, then a code example. The code example is required. This is the doc string that gets showed when you run salt-call sys.doc <module.function>

Then just all the logic.

If you have stuff that is being used a lot in multiple functions. Maybe split it out into another function for everything else to use, and if that function shouldn't be used from the command-line, be sure to prefix it with an underscore.

And that is your basic anatomy of a salt execution module.

State Modules

Now lets move on to writing state modules. State modules are where all the idempotence, configuration, and statefullness comes in. I am going to use the above module in order to make sure that certain keys are present or absent in the redis server.

Here is my simplified salt/states/redismod.py

# -*- coding: utf-8 -*-'''Management of Redis servers============================.. versionadded:: Nitrogen:depends: redis:configuration: see :py:mod:`salt.modules.redis` for setup instructionsExample States.. code-block:: yaml set redis key: redis.present: - name: key - value: value set redis key with host args: redis.absent: - name: key - host: 127.0.0.1 - port: 1234 - database: 3 - password: somepass'''from__future__importabsolute_import__virtualname__='redis'def__virtual__():if'redis.set'in__salt__:return__virtualname__return(False,'The redis execution module failed to load: redis python module is not available')defpresent(name,value,host=None,port=None,database=None,password=None):''' Ensure key and value pair exists name Key to ensure it exists value Value the key should be set to host Host to use for connection port Port to use for connection database Database key should be in password Password to use for connection '''ret={'name':name,'changes':{},'result':False,'comment':'Failed to set key {key} to value {value}'.format(key=name,value=value)}connection={'host':host,'port':port,'database':database,'password':password}current=__salt__['redis.get'](name,**connection)ifcurrent==value:ret['result']=Trueret['comment']='Key {key} is already value correct'.format(key=name)returnretif__opts__['test']isTrue:ret['result']=Noneret['changes']={'old':{name:current},'new':{name:value},}ret['pchanges']=ret['changes']ret['comment']='Key {key} will be updated.'.format(key=name)returnret__salt__['redis.set'](name,value,**connection)current,old=__salt__['redis.get'](name,**connection),currentifcurrent==value:ret['result']=Trueret['comment']='Key {key} was updated.'.format(key=name)ret['changes']={'old':{name:old},'new':{name:current},}returnretreturnretdefabsent(name,host=None,port=None,database=None,password=None):''' Ensure key is not set. name Key to ensure it does not exist host Host to use for connection port Port to use for connection database Database key should be in password Password to use for connection '''ret={'name':name,'changes':{},'result':False,'comment':'Failed to delete key {key}'.format(key=name,value=value)}connection={'host':host,'port':port,'database':database,'password':password}current=__salt__['redis.get'](name,**connection)ifcurrentisNone:ret['result']=Trueret['comment']='Key {key} is already absent'.format(key=name)returnretif__opts__['test']isTrue:ret['result']=Noneret['changes']={'old':{name:current},'new':{name:None},}ret['pchanges']=ret['changes']ret['comment']='Key {key} will be deleted.'.format(key=name)returnret__salt__['redis.delete'](name,value,**connection)current,old=__salt__['redis.get'](name,**connection),currentifcurrentisNone:ret['result']=Trueret['comment']='Key {key} was deleted.'.format(key=name)ret['changes']={'old':{name:old},'new':{name:current},}returnretreturnret

And lets review, this will mostly be the same as the execution module with one major difference that we use the execution module in the state.

Same coding line

Include depends and configuration information. If the configuration is stored with the module, you can link to the module using py mod link like i did above.

Include any complex information about the state in the top doc string. It is important to also include an example state up here. But if you have more complicated states, it would be good to include examples in each function to show how they should be used.

Changes to see if the redis.set_key module is loaded in the __salt__ dunder. If it is not loaded, we know we can't do any work in this state, and we should return False.

pchanges: a dictionary of potential changes that is used if test=True is passed

result: True, False, None

comment: a string describing what happened in the state.

I always start with a default ret variable that describes what happens when the state fails, so I can just return it on failure at the end.

Then the first thing to do is check if the state is already as it should be. In the case of present we check if the key is already set to the desired value. For absent we check if the key is set to None which indicates that it is a null value which is what redis considers deleted. If it is already set, we set results to True, and set the comment to reflect that it is True, and return the dictionary.

There is also a testing run portion of the state we should check if __opts__['test'] is True which would signify that test=True was passed on the command-line. Then we should only set changes to reflect what is going to change, and return with result set to None to signify that it should be a successful change, but changes are required.

Last, we make the change, then check if the change took affect. If it did, result should be True, and we return with the correct stuff in changes and an updated comment.

Otherwise we return with our False dictionary we setup at the beginning.

One other thing to remember about is the mod_init and mod_watch functions. These can be used to change the way the module behaves when initially called. The mod_watch is the part that is actually called when you watch or listen to a state in your requisites.

Running pylint on your changes

We run pylint on every change, so it is a good thing to know about because you can start adjusting yourself to write more inline with what pylint wants. The only big thing that I will say you should know is that our line limit is actually 119 instead of 80.

Now, to run pylint, you are going to need a few things. You should install all the dependencies for salt, and you should install the stuff for dev_python27.txt. But then you also need to update to the newest version of SaltPylint and SaltTesting.

Then they will get compiled, then we have to add the following references to the correct index files and just add redis to doc/ref/modules/all/index.rst and doc/ref/states/all/index.rst so that they will be visible in the index pages for all execution modules and all state modules.

Creating a pull request

We love pull requests. Just look at the github repository, there have been 23,296 pull request, almost all of which I would bet are accepted, and almost none have been closed with saying we can't accept that. There have been 1642 contributors as of this writing!

Here are things to remember when opening a pull request.

If it is a new feature, add it to develop. We include a very easy way to take the changes and import them into a running system, we don't want to break other peoples deploys by adding new features into point releases.

If it is a bug fix, go back to the latest supported release, and add it there. Right now, unless it is a CVE change, the oldest supported release for commits is 2016.3, everything else is in phase 3 or extended life support. (We are working very hard to get Carbon out the door right now.)

Please fill out the form! As much of the form in the pull request that makes sense, provide us with as much information about the change you are making. I am bad about it to, sometimes I just think Mike Place or Nicole Thomas are mind readers and can just get what I mean, but the definitely can't. So let them know in detail what you are actually changing.

I will cover this in a later part, but please! provide unittests if at all possible! (though not required)

End of Part 1

This was a lot longer than I thought it was going to be. I am going to try and continue next week and talk about beacons and engines and some specifics to look for there. Hopefully this will be helpful for some reason. It basically just became a link dump to a lot of useful information in our documentation since it can be sometimes hard to find.

Leave any comments if you have any thing that you would like to be covered

ttf-dejavu 2.37 will change the way fontconfig configuration is installed. In previous versions the configuration was symlinked from post_install/post_upgrade, the new version will place the files inside the package like it is done in fontconfig now.

Modules from Nitrogen

And the thing that makes it all possible the new webhook engine needs to be put in /srv/salt/_engines: Webhook engine

Once these are all in place, run salt-call saltutil.sync_all to make sure they get put in the extmods directory and are usable.

Configuration

My configurations are located here but I will highlight some of the specifics here.

First, we want to make the minion a masterless minion, and to never query the master for anything. So to /etc/salt/minion.d/local.conf add

local:Truefile_client:localmaster_type:disable

Any one of the settings could be used, I like to use all three just to make certain.

Second, we need to setup the ssl keys so that we can have a secure connection. To do this, you can run the following command to create a generic ssl certificate, if you want to have verification in there, you can make a nice one for the domain and everything, but we just want to have the traffice encrypted, so use salt-call --local tls.create_self_signed_cert. Now that we have an ssl certificate pair, we can setup the webhook engine. I put the following in /etc/salt/minion.d/engines.conf.

This will enable the webhook on all ips on port 5000 with the listed ssl certificate. It will also enable the reactor to be able to act upon the one tag in the event stream, which we will get to later.

Now we need to setup the github webhook so we can see the events in the event stream. Go to your blog's github repository, and go to the settings. Then select webhooks, and create a new one.

For the "Payload URL" you are going to set https and then the ip address/domain and port to access, followed by the uri which should match what your are going to trigger on for the reactor. As you can see in the picture above, I have /update/blog/gtmanfred.com as my URI, and this matches what follows the prefix salt/engines/hook in the reactor config above. Be sure to add a secret! And don't forget it! We will be verifying that in a later step. Then customize which events you would like to trigger on and save. I am going to rebuild the blog on each push, so I am only sending push events.

BE SURE TO DISABLE SSL VERIFICATION IF YOU DON'T USE A SIGNED KEY!

Before you forget that secret key, we should save it somewhere. I use sdb in salt so that I can save my states and reactors in public github, but hide the secret key in sdb. Create /etc/salt/minion.d/sdb.conf with the following.

Now run salt-call --local sdb.set sdb://secrets/github_secret <secretkey> to save the key.

Now the last step, creating the reactor file in the salt fileserver. Mine is in /srv/salt/reactor/blog.gtmanfred.com.sls, so I just have to reference it with salt://reactor/blog.gtmanfred.com.sls (and can also use the reactor files from gitfs).

Lets walk through this. We are going to take the data['body'] from the github post, and our secret, and the X-Hub-Signature and run it through the github_signature function to verify if the signature is the result of signing the body with your secret key. If True comes back, we can be sure this came from github, and then our minion runs a highstate on itself. If it is False, nothing is rendered and no event is run.

At work we use Git with https auth, which sadly means I can’t use ssh keys. Since I don’t want to enter my password every time I pull or push changes to the server, I wanted to use my password manager to handle this for me.
Git has pluggable credential helper support for gnome-keyring and netrc, adding pass support turned to be quite easy.
Create a script called “pass-git.sh” and put the following contents in it where ‘gitpassword’ is your password entry.

#!/bin/bash

echo"password="$(pass show gitpassword)

In your git directory which uses https auth execute the following command to setup the script as a credential helper.

I’ve found this cheap 5 euro USB logic analyzer via cnx.com and bought it from aliexpress.
It turned out to be quite easy to get the analyzer working on Arch Linux, the following packages need to be installed:

To be able to use the logic analyzer in pulseview without running it as root, requires an udev rule to be setup. Since libsigrok does not provide this udev rule, which might be considered as a packaging bug of Arch Linux.

To test the logic analyzer I soldered a simple board which ‘taps’ the serial communication from the nanopi neo to my laptop. After connecting the ground and RX, TX from the board to the logic analyzer I started pulseview. Then select the saleae logic (or a different name) as device and press run while making some noise in the program connected to the tty device (for example screen).

In pulseview some peaks should show up, it might help to increase the sample size (for example to 1M).
Configuring pulseview to decode UART data is as easy as selecting the UART option from the GUI.

Select the connected RX/TX corresponding to how you hooked up them up to the logic analyzer, the baud rate, data bits etc. might be different in your setup.

And voila, pulseview decodes the UART data into a readable ascii representation.

Summary

This is the first logic analyzer I’ve ever used and so far it has exceeded my expectations. It was easy to get pulseview working, all the software which is required is fully open source. The analyzer should be able to analyze I2C, SPI and UART, limited up to 24 MHz. So far worth the 5 euro :-)

arch-audit output is very verbose when it’s started without any argument, but two options --quiet (or -q or -qq) and --format (or -f) allows to change the output for your use case.
There’s also a third option --upgradable to display only packages that have already been fixed in the Arch Linux repositories.

In fact, I added a systemd timer that executes arch-audit -uq everyday and saves its output to a temporary file that is configured as banner for SSH.
Then, every time I log into my server, I get notified about packages that have vulnerabilities, but that already have been fixed. Time to do a system update!

The NanoPI NEO is a little 8 dollar ARM device with an interesting form factor and specifications.

512/256 MB ram (single slot)

Cortex-A7 Quad-Core

USB 2.0

100 Mbps ethernet

40 x 40 mm board size

SD card slot

Mainline support

FriendlyARM provides an UbuntuCore image, but of course I want to run Arch Linux ARM on it.

To use Arch Linux ARM on the NanoPI you will have to compile your own kernel and u-boot, I’ve used the armv7 tarball from archlinuxarm.org. The mainline kernel does not support the board yet, so a posted DTS file is required on top of the mainline kernel. This will provide a kernel without ethernet support, ethernet support can be added by compiling this Linux tree which hopefully lands in 4.9 or later. The DTS file has to be edited to add ethernet support, just append the following underneath the usbphy node.

The first beta version of PHP 7.1 has been released and its time to have a look at the next iteration of the PHP 7 series. You will find a set of packages in my repository:

[php]
Server = https://repo.pierre-schmitz.com/$repo/os/$arch

Insert these lines on top of the other repository definitions in your /etc/pacman.conf. A copy of the PKGBUILDs I used to create these packages are available in my git repository.

Packaging

I intend to update these with beta versions and release candidates till the final release of PHP 7.1.0 later this year. Even though I will try to provide a smooth update path, please be prepared to encounter problems.

Despite of a having a new module API, third party modules seem to work fine after a simple rebuild, in contrast to our first contact with PHP 7. All these modules are available in my repository as well.

New features

With the new minor release we will get more improvements of the scalar and return type declarations introduced in PHP 7.0. My favorite new features are:

Nullable Types
Being able to declare a type to be either specific or null was a missing feature in version 7.0 which lead people to not declare any type at all.

Void Return Type
You are now able to declare a function to never return anything; another missing piece of the new return type declarations.

Iterable type
We are finally able to declare a type that matches an array but also classes that implement the Traversable interface. In short it is anything you can use with foreach(). This means we no longer need to put primitive arrays into traversable objects if we like to use type hinting.

Class constant visibility modifiers
Using constants internally is less awkward as we are now able to declare them with private visibility. People no longer need to abuse private properties to document that certain constants should not be used from a foreign context.

A complete list of changes can be found in the PHP 7.1 NEWS file. Also see the continuously updated UPGRADING file.

Testing and benchmarking

While PHP 7.1 is still under development the packages I provide are configured with production settings. Optimizations are turned on, all debugging functions and information are disabled and stripped from the binaries. This means you may use these to test and benchmark your applications and server setups.

Let me know of any issues and share your experiences with the first minor update of PHP 7.

Inspired by discussions on the arch-general mailing list, test-sec-flags was created by pid1 (with help from anthraxx, strcat, sangy, and rgacogne) to test the performance impact of various security-oriented compilation and linking flags. The goal is to determine if these flags can be the new default for all Arch Linux packages. Preliminary results suggest that the performance impact is almost nonexistent compare to the compilation flags we already use, but we would like to collect and compare more results before moving forwards.

Download the source here and see the README for installation and usage instructions. The results
subdirectory contains instructions on how to pull out the relevant
statistics from the result files.

We are collecting results in the test-sec-flags wiki on Github. Please
add your results there. In particular, we would very much like i686
results, as all of the previous contributors have been on x86_64
devices.

The TalkingArch Release

The TalkingArch team is pleased to bring you the newest TalkingArch release. This version includes version 4.5.4 of the Linux kernel, the ndisc6 package with some IPv6 networking utilities and uefi-tools for booting x86_64 systems with UEFI. Find it in the usual spot.

Torrent Changes

Some users who downloaded Talkingarch using Bittorrent may have noticed that the TalkingArch iso is now being downloaded into a directory, along with its signature files. This is being done for convenience, so that downloading the one torrent file gets everything, rather than forcing users who want to verify the integrity of the iso to go back to the website to download the signatures. This change was introduced with the previous release, but that release wasn’t posted here. In any case, enjoy the new format.

TalkingArch and IPv6

The TalkingArch team is very pleased to announce full IPv6 connectivity for all TalkingArch sites and services. Anyone with IPv6 connectivity will immediately notice the new addresses for the connections they make, including connections to the NetwIRC where the #talkingarch IRC channel resides. TalkingArch has now joined the future of the internet,. Thanks to all who answered questions and made the transition run smoothly. Have a virtual beer, on the house.

Thanks to the folks over at Scaleway, TalkingArch, its blog, its torrent tracker and its support e-mail are all now hosted on real ARM hardware in France. This server provides more disk space and processing power than the previous virtualized environment at a lower monthly cost. The transfer speed is also faster, and the connection is unmetered. RAM is the same as on the previous virtual server, although it is dedicated now, so won’t be slowed down by massive Mysql queries that other users on the same host machine may be running. We are looking into the possibility of migrating the git repository to this server as well, as our repository at NotABug has been refusing push requests since February. In spite of the difficulties, a release for the month of April is imminent, so keep checking this space for the April 2016 iso.

The release of pacman-5.0 brought support for transactional hooks. These will allow us to (e.g.) run font cache updates a single time during an update rather than after each font package installation. This will both speed up the update process, but also reduce packaging burden for the Developers and Trusted Users.

In order for the use of hooks to be started, we require all users to have updated to at least pacman-5.0.1 before 2016-04-23. Pacman-5.0.1 was released on 2016-02-23, so this will have given everyone two months to update their system.

This is a crosspost of an article I wrote on the raintank.io blog
For several years I’ve worked with Graphite, Grafana and statsd on a daily basis and have been participating in the community. All three are fantastic tools and solve very real problems. Hence my continued use and recommendation. However, between colleagues, random folks on irc, and personal experience, I’ve seen a plethora of often subtle issues, gotchas and insights, which today I’d like to share.

I hope this will prove useful to users while we, open source monitoring developers, work on ironing out these kinks. At raintank we’re addressing a bunch of these as well but we have a long road ahead of us.
Before we begin, when trying to debug what’s going on with your metrics (whether they’re going in or out of these tools), don’t be afraid to dive into the network or the whisper files. They can often be invaluable in understanding what’s up.

Getting the json output from Graphite (just append &format=json) can be very helpful as well. Many dashboards, including Grafana already do this, so you can use the browser network inspector to analyze requests. For the whisper files, Graphite comes with various useful utilities such as whisper-info, whisper-dump, whisper-fetch, etc.

1) OMG! Did all our traffic just drop?
Probably the most common gotcha, and one i run into periodically. Graphite will return data up until the current time, according to your data schema.

Example: if your data is stored at 10s resolution, then at 10:02:35, it will show
data up until 10:02:30. Once the clock hits 10:02:40, it will also include that point (10:02:40) in its response.

It typically takes some time for your services to send data for that timestamp, and for it to be processed by Graphite, so Graphite will typically return a null here for that timestamp. Depending on how you visualize (see for example the “null as null/zero” option in grafana), or if you use a function such as sumSeries (see below) it may look like a drop in your graph, and cause panic.

You can work around this with “null as null” in Grafana, transformNull() or keepLastValue() in Graphite, or plotting until a few seconds ago instead of now. See below for some other related issues.

2) Null handling in math functions.
Graphite functions don’t exactly follow the rules of logic (by design). When you request something like sumSeries(diskspace.server_*.bytes_free) Graphite returns the sum of all bytes_free’s for all your servers, at each point in time. If all of the servers have a null for a given timestamp, the result will be null as well for that time. However, if only some - but not all - of the terms are null, they are counted as 0.

Example: The sum for a given point in time, 100 + 150 + null + null = 250.

So when some of the series have the
occasional null, it may look like a drop in the summed series. This especially commonly happens for the last point at each point in time, when your servers don’t submit their metrics at the exact same time.

Now let’s say you work for a major video sharing website that runs many upload and transcoding clusters around the world. And let’s also say you build this really cool traffic routing system that sends user video uploads to the most appropriate cluster, by taking into account network speeds, transcoding queue wait times, storage cluster capacity (summed over all servers), infrastructure cost and so forth. Building this on top of Graphite would make a lot of sense, and in all testing everything works fine, but in production the decisions made by the algorithm seem nonsensical, because the storage capacity seems to fluctuate all over the place. Let’s just say that’s how a certain Belgian engineer became intimately familiar with Graphite’s implementation of functions such as sumSeries. Lesson learned. No videos were hurt.

Graphite could be changed to strictly follow the rules of logic, meaning as soon as there’s a null in an equation, emit a null as output. But a low number of nulls in a large sum is usually not a big deal and having a result with the nulls counted as 0 can be more useful than no result at all. Especially when averaging, since those points can typically be safely excluded without impact on the output. Graphite has the xFilesFactor option in storage-aggregation.conf, which lets you configure the minimum fraction of non-null values, which it uses during storage aggregation (rollups) to determine whether the input is sufficiently known for output to be known, otherwise the output value will be null.

It would be nice if these on-demand math requests would take an xFilesFactor argument to do the same thing. But for now, just be aware of it, not using the last point is usually all you need if you’re only looking at the most recent data. Or get the timing of your agents tighter and only request data from Graphite until now=-5s or so.

INTERMEZZO:
At this point we should describe the many forms of consolidation; a good understanding of this is required for the next points.

storage consolidation aka aggregation, aka rollups: data aggregated to optimize the cost of historical archives and make it faster to return large timeframes of data. Driven by storage-aggregation.conf

runtime consolidation: when you want to render a graph but there are more datapoints than pixels for the graph, or more than the maxDataPoints setting. Graphite will reduce the data in the http response. Averages by default, but can be overridden through consolidateBy()

Grafana consolidation: Grafana can visualize stats such as min/max/avg which it computes on the data returned by your timeseries system.

statsd consolidation: statsd is also commonly used to consolidate usually very high rates of irregular incoming data in properly spaced timeframes, before it goes into Graphite.

Unlike all above mechanisms which work per series,
the sumSeries(), saverageSeries(), groupByNode() Graphite functions work
across series. They merge multiple series into a single or fewer series by aggregating
points from different series at each point in time.

And then there's carbon-aggregator and carbon-relay-ng which operate on your metrics stream and can aggregate on per-series level, as well as across series, as well as a combination thereof.

3) Null handling during runtime consolidation
Similar to above, when Graphite performs runtime consolidation it simply ignores null values. Imagine a series that measures throughput with points 10, 12, 11, null, null, 10, 11, 12, 10. Let’s say it needs to aggregate every 3 points with sum. this would return 33, 10, 33. This will visually look like a drop in throughput, even though there probably was none.

For some functions like avg, a missing value amongst several valid values is usually not a big deal, but the likeliness of it becoming a big deal increases with the amount of nulls, especially with sums. For runtime consolidation, Graphite needs something similar to the xFilesFactor setting for rollups.

Never send multiple values for the same interval. However, carbon-aggregator or carbon-relay-ng may be of help (see above)

5) Limited storage aggregation options.
In storage-aggregation.conf you configure which function to use for historical rollups. It can be limiting that you can only choose one. Often you may want to retain both the max values as well as the averages for example.(very useful for throughput). This feature exists in RRDtool.

Another issue with this is that often these functions will be misconfigured. Make sure to think about all the possible metrics you’ll be sending (e.g. min and max values through statsd) and set up the right configuration for them.

If Graphite performs any runtime consolidation it will always use average unless told otherwise through consolidateBy. This means it’s easy to run into cases where data is rolled up (in whisper) using, say, max or count, but then accidentally with average while creating your visualization, resulting in incorrect information and nonsensical charts. Beware!

It would be nice if the configured roll-up function would also apply here (and also the xFilesFactor, as above),
For now, just be careful :)

8) Aggregating percentiles.
The pet peeve of many, it has been written about a lot: If you have percentiles, such as those collected by statsd, (e.g. 95th percentile response time for each server) there is in theory no proper way to aggregate those numbers.

This issue appears when rolling up the data in the storage layer, as well when doing runtime consolidation, or when trying to combine the data for all servers (multiple series) together. There is real math behind this, and I’m not going into it because in practice it actually doesn’t seem to matter much. If you’re trying to spot issues or alert on outliers you’ll get quite far with averaging the data together or taking the max value seen. This is especially true when the amount of requests, represented by each latency measurement, is in the same order of magnitude. E.g. You’re averaging per-server latency percentiles and your servers get a similar load or you’re averaging multiple points in time for one server, but there was a similar load at each point. You can always set up a separate alerting rule for unbalanced servers or drops in throughput. As we’ll see in some of the other points, taking sensible shortcuts instead of obsessing over math is often the better way to accomplish your job of operating your infrastructure or software.

Graphite’s derivative is not a derivative. A derivative divides difference in value by difference in time. Graphite’s derivative just returns the value deltas. Similar for nonNegativeDerivative(). If you want an actual derivative, use the somewhat awkwardly named perSecond() function.

Maybe for a future stable Graphite release this can be reworked. for now, just something to be aware of.

10) Graphite quantization.
We already saw in the consolidation paragraph that for multiple points per interval, last write wins. But you should also know that any data point submitted gets the timestamp rounded down.

Example: You record points every 10 seconds but submit a point with timestamp at 10:02:59, in Graphite this will be stored at 10:02:50. To be more precise if you submit points at 10:02:52, 10:02:55 and 10:02:59, and have 10s resolution, Graphite will pick the point from 10:02:59 but store it at 10:02:50. So it’s important that you make sure to submit points at consistent intervals, aligned to your Graphite retention intervals (e.g. if every 10s, submit on timestamps divisible by 10)

Example: If you start statsd at 10:02:00 with a flushInterval of 10, then it
will emit values with timestamps at 10:02:10, 10:02:20, etc (this is what you want), but if you happened to start it at 10:02:09 then it will submit values with timestamps 10:02:19, 10:02:29, etc. So typically, the values sent by statsd are subject to Graphite’s quantization and are moved into the past. In the latter example, by 9 seconds. This can make troubleshooting harder, especially when comparing statsd metrics to metrics from a different service.

Note: vimeo/statsdaemon (and possibly other servers as well) will always submit values at quantized intervals so that it’s guaranteed to map exactly to Graphite’s timestamps as long as the interval is correct.

12) Improperly time-attributed metrics.
You don’t tell statsd the timestamps of when things happened. Statsd applies its own timestamp when it flushes the data. So this is prone to various (mostly network) delays. This could result in a metric being generated in a certain interval only arriving in statsd after the next interval has started. But it can get worse. Let’s say you measure how long it takes to execute a query on a database server. Typically it takes 100ms but let’s say now many queries are taking 60s, and you have a flushInterval of 10s. Note that the metric is only sent after the full query has completed. So during the full minute where queries were slow, there are no metrics (or only some metrics that look good, they came through cause they were part of a group of queries that managed to execute timely), and only after a minute do you get the stats that reflect queries spawned a minute ago. The higher a timing value, the higher the attribution error, the more into the past the values it represents and the longer the issue will go undetected or invisible.

Keep in mind that other things, such as garbage collection cycles or paused goroutines under cpu saturation may also delay your metrics reporting. Also watch out for the queries aborting all together, causing the metrics never to be sent and these faults to be invisble! Make sure you properly monitor throughput and the functioning (timeouts, errors, etc) of the service from the client perspective, to get a more accurate picture.

Example: With points every minute, and a spike at 10:17:00, does it mean the spike happened in the timeframe between 10:16 and 10:17, or between 10:17 and 10:18?

I don’t know if a terminology for this exists already, but I’m going to call the former premarking (data described by a timestamp that precedes the observation period) and the latter postmarking (data described by a timestamp at the end). As we already saw statsd postmarks, and many tools seem to do this, but some, including Graphite, premark.

We saw above that any data received by Graphite for a point in between an interval is adjusted to get the timestamp at the beginning of the interval. Furthermore, during aggregation (say, aggregating sets of 10 minutely points into 10min points), each 10 minutes taken together get assigned the timestamp that precedes those 10 intervals. (i.e. the timestamp of the first point)

So essentially, Graphite likes to show metric values before they actually happened, especially after aggregation, whereas other tools rather use a timestamp in the future of the event than in the past. As a monitoring community, we should probably standardize on an approach. I personally favor postmarking because measurements being late is fairly intuitive, predicting the future not so much.

14) The statsd timing type is not only for timings.
The naming is a bit confusing, but anything you want to compute summary statistics (min, max, mean, percentiles, etc) for (for example message or packet sizes) can be submitted as a timing metric. You’ll get your summary stats just fine. The type is just named “timing” because that was the original (and still most common) use case. Note that if you want to time an operation that happens at consistent intervals, you may just as well simply use a statsd gauge for it. (or write directly to Graphite)

15) The choice of metric keys depends on how you deployed statsd.
It’s common to deploy a statsd server per machine or per cluster. This is a convenient way to assure you have a lot of processing capacity. However, if multiple statsd servers receive the same metric, they will do their own independent computations and emit the same output metric, overwriting each other. So if you run a statsd server per host, you should include the host in the metrics you’re sending into statsd, or into the prefixStats (or similar option) for your statsd server, so that statsd itself differentiates the metrics.

Note: there’s some other statsd options to diversify metrics but the global
prefix is the most simple and common one used.

“[UDP is] fast — you don’t want to slow your application down in order to track its performance — but also sending a UDP packet is fire-and-forget.”

UDP sends do have an overhead that can slow your application down. I think a more accurate description is that the overhead is insignificant either if you’re lucky, or if you’ve spent a lot of attention to the details. In specific, it depends on the following factors:

Your network stack. UDP sends are not asynchronous, which is a common assumption. (i.e. they are not “non-blocking”) Udp write calls from userspace will block until the kernel has moved the data through the networking stack, firewall rules, through the network driver, onto the network.
I found this out the hard way once when an application I was testing on my laptop ran much slower than expected due to my consumer grade WiFi adapter, which was holding back my program by hanging on statsd calls. Using this simple go program I found that I could do 115 k/s UDP sends to localhost (where the kernel can bypass some of the network stack), but only 2k/s to a remote host (where it needs to go through all layers, including your NIC).

other code running, and its performance. If you have a PHP code base with some statsd calls amongst slow / data-intensive code to generate complex pages, there won’t be much difference. This is a common, and the original setting, for statsd instrumentation calls. But I have conducted cpu profiling of high-performance Golang
applications, both at Vimeo and raintank, and when statsd was used, the apps were often spending most of their time in UDP writes caused by statsd invocations. Typically they were using a simple statsd client, where each statsd call corresponds to a UDP write. There are some statsd clients that buffer messages and send batches of data to statsd, reducing the UDP writes, which is a lot more efficient. (see below). You can also use sampling to lower the volume. This does come with some problems however, see the gotcha below.

The performance of client code itself can wildly vary as well. Besides how it maps metric updates to UDP writes (see above) the invocations themselves can carry an overhead, which as you can see through these benchmarks of all Go statsd clients I’m aware of, vary anywhere between 15 microseconds to 0.4 microseconds.

When I saw this, Alex Cesaro’s statsd client promptly became my favorite statsd library for Go. It has zero-allocation logic and various performance tweaks, such as the much needed client side buffering. Another interesting one is quipo’s, which offloads some of the computations into the client to reduce traffic and workload for the server.

Finally, it should be noted that sending to non-listening destinations may
make your application slower.
This is because destination hosts receiving UDP traffic for a closed socket, will return ICMP rejects to the sender, which the sender then has to process. I’ve not seen this with my own eyes, but the people on the #go-nuts channel who were helping me out definitely seemed to know what they’re talking about. Sometimes “disabling” statsd sends is implemented by pointing to a random host or port that’s not listening (been there done that) and is hence not the best idea!

Takeaway: If you use statsd in high-performance situations, use optimized client libraries . Or use client libraries such as go-metrics, but be prepared to pay a processing tax on all of your servers. Be aware that slow NICs and sending to non-listening destinations will slow your app down.

17) statsd sampling.
The sampling feature was part of the original statsd, so it can be seen in pretty much every client and server alternative. The idea sounds simple enough: If a certain statsd invocation causes too much statsd network traffic or overloads the server, sample down the messages. However, there’s several gotchas with this. A common invocation with a typical statsd client looks something like:

statsd.Increment("requests.$backend.$http_response")

In this code, we have 1 statement, and it increments a counter to track http response codes for different backends. It works for different values of backend and http_response, tracking the counts for each. However, let’s say the backend “default” and http_response 200 are causing too much statsd traffic. It’s easy to just sample down and change the line to:

statsd.Increment("requests.$backend.$http_response", 0.1)

There’s 3 problems here though:

How do you select a good sampling ratio? It’s usually based on a gut feeling or some very quick math at best to come up with a reasonable number, without any attention to the impact on statistical significance, so you may be hurting the quality of your data.

While you mainly wanted to target the highest volume metric (something like “requests.default.200”), your sampling setting affects the metrics for all values of backend and http_response. Those variable combinations that are less frequent (for a backend that’s rarely used or a response code not commonly seen) will be too drastically sampled down, resulting in an inaccurate view of those cases. Worst case, you won’t see any data over the course over a longer time period, even though there were points at every interval. You can add some code to use different invocations with different sample rates based on the values of the variables, but that can get kludgy.

Volumes of metric calls tend to constantly evolve over time. It’s simply not feasible to be constantly manually adjusting sample rates. And given the above, it’s not a good solution anyway.

One of the ideas I wanted to implement in https://github.com/vimeo/statsdaemon was a feedback mechanism where the server would maintain counts of received messages for each key. Based on performance criteria and standard deviation measurements, it would periodically update “recommended sampling rates” for every single key, which then could be fed back to the clients (by compiling into a PHP config file for example, or over a TCP connection). Ultimately I decided it was just too complicated and deployed a statsdaemon server on every single machine so I could sample as gently as possible.

Takeaway: Sampling can be used to lower load, but be aware of the trade offs.

18) As long as my network doesn’t saturate, statsd graphs should be accurate.
We all know that graphs from statsd are not guaranteed to be accurate, because it uses UDP, so is prone to data loss and inaccurate graphs. And we’re OK with that trade off, because the convenience is worth the loss in accuracy. However, it’s commonly believed that the only, or main reason of inaccurate graphs is UDP data loss on the network. In my experience, I’ve seen a lot more message loss due to the UDP buffer overflowing on the statsd server.

A typical statsd server is constantly receiving a flood of metrics. For a network socket, the Linux kernel maintains a buffer in which it stores newly received network packets, while the listening application (statsd) reads them out of the buffer. But if for any reason the application can’t read from the buffer fast enough and it fills up, the kernel has to drop incoming traffic. The most common case are the statsd flushes, which compute all summary statistics and submit them each flushInterval. Typically this operation takes between 50ms and a few seconds, depending mostly on how many timing metrics were received, because those are the most computationally expensive. Usually statsd’s cpu is maxed out during this operation, but you won’t see it on your cpu graphs because your cpu measurements are averages taken at intervals that typically don’t exactly coincide with the statsd flush-induced spikes. During this time, it temporarily cannot read from the UDP buffer, and it is very common for the UDP buffer to rapidly fill up with new packets, at which point the Linux kernel is forced to drop incoming network packets for the UDP socket. In my experience this is very common and often even goes undetected, because only a portion of the metrics are lost, and the graphs are not obviously wrong. Needless to say, there’s also plenty of other scenarios for the buffer to fill up and cause traffic drops other than statsd flushing.

There’s two things to be done here:

Luckily we can tune the size of the incoming UDP buffer with the net.core.rmem_max and net.core.rmem_default sysctl’s. I recommend setting it to the max values

The kernel increments a counter every time it drops a UDP packet. You can see UDP packet drops with netstat -s –udp on the command line. I highly encourage you to monitor this with something like collectd, visualize it over time, and make sure to set an alert on it.

Note also that the larger of a buffer you need, the more delay between receiving a metric and including it in the output data, possibly skewing data into the future.

INTERMEZZO:
There's a lot of good statsd servers out there with various feature sets. Most of them provide improved performance (upstream statsd is single threaded JavaScript), but some come with additional interesting features. My favorites include statsite, brubeck and of course vimeo's statsdaemon version

19) Incrementing/decrementing gauges.
Statsd supports a syntax to increment and decrement gauges. However, as we’ve seen, any message can be lost. If you do this and a message gets dropped, your gauge value will be incorrect forever (or until you set it explicitly again). Also by sending increment/decrement values you can confuse a statsd server if it was just started. For this reason, I highly recommend not using this particular feature, and always setting gauge values explicitly. In fact, some statsd servers don’t support this syntax for this reason.

The metrics will only be created after the values have been sent. Sounds obvious, but this can get surprisingly annoying. Let’s say you want to make a graph of http 500’s, but they haven’t happened yet. The metrics won’t exist and with many dashboarding tools, you won’t be able to create the graph yet, or at least need extra work.

I see this very commonly with all kinds of error metrics. I think with Grafana it will become easier to deal with this over time, but for now I write code to just send each metric once at startup. For counts you can just send a 0, for gauges and timers it’s of course harder/weirder to come up with a fake value but it’s still a reasonable approach.

21) Don’t let the data fool you: nulls and deleteIdleStats options.
statsd has the deleteIdleStats, deleteGauges, and similar options. By default they are disabled, meaning statsd will keep sending data for metrics it’s no longer seeing. So for example, it’ll keep sending the last gauge value even if it didn’t get an update. I have three issues with this:

Your graphs are misleading. Your service may be dead but you can’t tell from
the gauge graphs cause they will still “look good”. If you have something like counters for requests handled, you can’t tell the difference between your service being dead or it being up but not getting incoming requests. I rather have no data show up if there is none, combined with something else (alerting, different graph) to deduce whether the service is up or not

Any alerting rules defined on the metrics have the same fate as humans and
can’t reliably do their work

Tougher to phase out metrics. If you decommission servers or services, it’s nice that the metrics stop updating so you can clean them up later

My recommendation is, enable deleteIdleStats. I went as far as hard coding that behavior into vimeo/statsdaemon and not making it configurable.

To visualize it, make sure in Grafana to set the “null as null” option, though “null as zero” can make sense for counters. You can also use the transformNull() or drawNullAsZero options in Graphite for those cases where you want to explicitly get 0’s instead of nulls, which I typically do in some of my bosun alerting rules, so I can treat null counts as no traffic received, while having a separate alerting rule to make sure my service is up and running, based on a different metric, while using null as null for visualization.

22) keepLastValue works… almost always.
Another Graphite function to change the semantics of nulls is keepLastValue, which causes null values to be represented by the last known value that precedes them. However, that known value must be included in the requested time range. This is probably a rare case that you may never encounter, but if you have scripts that infrequently update a metric and you use this function, it may result in a graph sometimes showing no data at all if the last known value becomes too old. Especially confusing to newcomers, if the graph does “work” at other times.

23) statsd counters are not counters in the traditional sense.
You may know counters such as switch traffic/packet counters, that just keep increasing over time. You can see their rates per second by deriving the data. Statsd counters however, are typically either stored as the number of hits per flushInterval, or per second, or both (perhaps because of Graphite’s derivative issue?). This in itself is not a huge problem, however this is more vulnerable to loss of data. If your statsd server has trouble flushing some data to Graphite and some data gets lost. If it were using a traditional counter, you could still derive the data and average out the gap across the nulls. In this case however this is not possible and you have no idea what the numbers were. In practice this doesn’t happen often though. Many statsd implementations can buffer a writequeue or use something like carbon-relay-ng as a write queue.
Other disadvantages of this approach is that you need to know the flushInterval
value to make sense of the count values, and the rounding that happens when
computing the rates per second, which is also slightly lossy.

24) What can I send as input?
Neither Graphite, nor statsd does a great job specifying exactly what they allow as input, which can be frustrating. Graphite timestamps are 32bit unix timestamp integers, while values for both Graphite and statsd can be integers or floats, up to float 64bit precision. For statsd, see the statsd_spec project for more details.

As for what characters can be included in the metric keys. Generally, Graphite is somewhat forgiving and may alter your metric keys: it converts slashes to dots (which can be confusing), subsequent dots become single dots, prefix dots get removed, postfix dots will get it confused a bit though and create an extra hierarchy with an empty node at the end if you’re using whisper. Non-alphanumeric characters such as parenthesis, colons, or equals signs often don’t work well or at all.

Graphite as a policy does not go far in validating incoming data, citing performance in large-throughput systems as the major reason. It’s up to the senders to send data in proper form, with non-empty nodes (words between dots) separated by single dots. You can use alphanumeric characters, hyphens and underscores, but straying from that will probably not work well. You may want to use carbon-relay-ng which provides metric validation. See also this issue which aims to formalize what is, and isn’t allowed.

25) Hostnames and ip addresses in metric keys.
The last gotcha relates to the previous one but is so common it deserves its own spot. Hostnames, especially FQDN’s, and IP addresses (due to their dots), when included in metric keys, will be interpreted by Graphite as separate pieces. Typically, people replace all dots in hostnames and IP addresses with hyphens or underscores to combat this.

Closing thoughts:

Graphite, Grafana and statsd are great tools, but there’s some things to watch out for. When setting them up, make sure you configure Graphite and statsd to play well together. Make sure to set your roll-up functions and data retentions properly, and whichever statsd version you decide to use, make sure it flushes at the same interval as your Graphite schema configuration. More tips if you use the nodejs version. Furthermore, I hope this is an extensive list of gotchas and will serve to improve our practices at first, and our tools themselves, down to road. We are definitely working on making things better and have some announcements coming up…

After many broken and failed attempts, the TalkingArch team is please to bring you the latest TalkingArch, which is the first release of 2016. This version includes linux kernel 4.4.1 and all the latest base packages to give you a smooth experience when running and installing TalkingArch on your system. The new release can be found at the usual location.

Just a heads-up, gpg has changed the verification process for signatures. From now on, you should download both linked signature files and verify them individually with a command like
gpg --verify <sig-file> <iso-file>
If one signature is good, both should be, but it’s always good to double check. The download page has been changed to reflect the new verification procedure. As always, have fun with the new release, and do report any problems you find.

As is becoming tradition, I need to make a blog post to accompany a pacman release! This is a big release with a long awaited feature so it needed a major version bump (and, most importantly, we now are back ahead of the Linux kernel in version numbers). I have reclaimed the title as most prolific committer, but that just means Andrew had more patches to point out mistakes in… Here are the top 10 committers:

As always, more regular contributors would be helpful. Just have a talk to us first before running ahead implementing a new feature (it is not nice to have to reject a patchset that obviously took a lot of work because it is already being handled in a different way…).

On to the more important stuff… What is new this release?

Hooks: This has been one of the most requested features for a long time, and Andrew gets all the credit for the implementation. So, what exactly are hooks? Hooks are scripts that are run at the beginning and end of a transaction. They can be triggered by either a file or a package name. This will allow us to (e.g.) update the desktop MIME type cache at the end of a transaction (if needed) and only do it one rather than after every package. Andrew has a git repo with some examples. Lets look at the desktop MIME cache one:

It should be fairly obvious what that does… See the alpm-hooks(5) man page for more information on the hook format.

Files database operations: pacman can now search repository file lists like pkgfile, but slower and probably less flexible. Sort of like Falcon to Captain America… still a super-hero! File bug reports for improvement requests. A separate files database is used so everyday package operations do not require downloading a much larger database. After updating the database (with “pacman -Fy“, you can do things like:

libmakepkg:makepkg in the pacman-4.2 release was a 3838 line shell script. A bit daunting, hard to test and not reusable… I have started the process of splitting this into a library containing scripts that are a more reasonable size. makepkg is still 2395 lines, so a lot of work remains (help!). One outcome of this splitting is we can drop in extra checks into the PKGBUILD and package checking steps, and even extra passes to (e.g.) optimize svg files. I have started rewriting namcap using this feature (see my github repo). This also provides tools for extracting variables from PKGBUILDs without sourcing the PKGBUILD itself, which does require ensuring that variables that should be arrays are actually arrays and those that are not are not.

There were a bunch of other small changes throughout the code base. Check out the NEWS file for more details.