That was great, but original ASAP code using "numpy" library, which is a great thing, but sometimes is too heavy to include it in your product, and original Graphite had no support for custom functions (contrary to graphite-api).
But nowadays we have it, so, I adapted Bo's code a bit, you can check it using this link - https://github.com/deniszh/graphite_asap

This requires Graphite-web version 1.1.1 or newer and installed "numpy". For installation just copy asap.py and functions.pyfiles to /opt/graphite/webapp/graphite/functions/custom directory and restart Graphite-web. Check output of "http:///functions?pretty=1" - function "asap()'' should be present in the output.
Check Graphite's Function Plugin documentation for details.

I'm totally fine with people who want to develop something, and I'm not gonna say that Grisha does not understand what he's doing - he's experienced developer and Tgres looks very impressive. But to be honest all rational behind Tgres is really puzzling me. You can read it on the link above (you can go to part named "Avoid Solving the Storage Problem", but it's worth to read all article).
Grisha says: "Someone once said that “anything is possible when you don’t know what you’re talking about”, and nowhere is it more evident than in data storage. File systems and relational databases trace their origin back to the late 1960s and over half a century later I doubt that any field experts would say “the storage problem is solved”. And so it seems almost foolish to suppose that by throwing together a key-value store and a consensus algorithm or some such it is possible to come up with something better? Instead of re-inventing storage, why not focus on how to structure the data in a way that is compatible with a storage implementation that we know works and scales reliably?"
With all respect, but I think that's a wrong direction. Yes, filesystems and databases are in development from the 1960s - and what result do we have? The storage problem is not solved, indeed, but saying "OK, screw it, let's create something on top of weak foundation and hope that it'll fine" is also wrong.
I think that storage engine is the best part of any database and it creates and limits any DB - relational, column or time-series - doesn't matter. Whisper is a good example. It has its own weak points (e.g. no subsecond resolution, IO intensity, 12 bytes per point, only local storage) - and its good points (quite good speed, built-in rollups). But most of Graphite users know its limitations very well - and these limitations limiting Graphite usage from one side - but on the other hand, they created all this new generation of TSDBs / monitoring solutions which are flourishing last time.
And in the same way Tgres inherits all scalability flaws as PostgreSQL (as any relational database) has e.g. good vertical scalability, but quite weak horizontal one. Yes, the author mentions clustering for Tgres, but it's the same approach as we saw already in Whisper - it's external clustering, not built-in in storage.

Another PostgreSQL-based database, named TimescaleDB looks bit better - it still based on Postgres although it uses an own storage engine with built-in clustering and sharding. You can check their paper, it's quite interesting. Now it looks like early InfluxDB, but authors are saying that their approach is better because you can use all real SQL power across all your timeseries.
Let's see. TimescaleDB is quite young, less than 6 months in development, maybe we'll get something useful out there. They have a good and stable foundation, let's see how it will fit in TSDB world.

I still have a strong opinion that in database's world storage engine is a king, and horizontal scalability is a must for any modern data software.

понедельник, 19 сентября 2016 г.

Hello, fellow readers!
Issue 21 of "Semi-irregular Sysadmin Ninja's Github Digest" is here. The last issue was very dry, will add more of my thoughts and funny pictures. :)
Let's go!

teeproxy"A reverse HTTP proxy that duplicates requests."
"You may have production servers running, but you need to upgrade to a new system. You want to run A/B test on both old and new systems to confirm the new system can handle the production load, and want to see whether the new system can run in shadow mode continuously without any issue."https://github.com/chrislusf/teeproxy

WOW. Just W-O-W. Your eyes are not lying, it's open-source spacecraft. "Our goal is to dramatically lower the cost of spaceflight, making it easy enough and affordable enough for anyone to explore space. We can do this by shrinking the size and mass of the spacecraft, allowing many to be launched together."

web2web"P2P web powered by torrents and blockchain."
Rejoice, my paranoid brothers and sisters! New Internet is here! Wear our foil hats on!
It's a combination of webtorrent and blockhain to make not-seizable internet!
"When you open index.html in the browser (live demo), here's what happens:
Bitcoin address 1DhDyqB4xgDWjZzfbYGeutqdqBhSF7tGt4 is searched for the latest outgoing transaction containing OP_RETURN script. Inside the script there is a torrent infohash of webpage.html. webpage.html is downloaded from torrent via webtorrent and displayed."https://github.com/elendirx/web2web

ironssh"IronSSH - End-to-end secure file transfer"
"While sftp and scp use ssh to keep files secure while they are being transferred over the network, once those files hit the remote server, they are no longer protected. The ironsftp executable provides additional security. When you put a file on the server using ironsftp, the file is encrypted before it is uploaded, and it stays that way on the server. When you get a file from the server, it is downloaded then decrypted. So the file remains secure until it is at the place you want to use it - on your local machine."https://github.com/ironcorelabs/ironssh

quinedb"QuineDB is a quine that is also a key/value store.If your database can't print its own source code, can you really trust it?"Very interesting and funny project! It's simple K/V storage, written in bash4, but it's also a quine!
"When you run it, the (possibly modified) source code of quinedb is printed to STDOUT, and the results of the specific command run are printed to STDERR."https://github.com/gfredericks/quinedb

lograil"LogTrail is a plugin for Kibana to view, analyze, search and tail log events from multiple hosts in realtime with devops friendly interface inspired by Papertrail."
Like "tail -f", but for ELK!https://github.com/sivasamyk/logtrail

cog"Bringing the power of the command line to chat http://operable.io"
"Cog is an open chatops platform that gives you a secure, collaborative command line right in your chat window. It is designed to be secure, highly available, chat provider agnostic, and to be extensible using your favorite programming language."https://github.com/operable/cog

pyinfra"⚡ Deploy stuff by diff-ing the state you want against the remote server"
Interesting deploy tool. Looks nice, but IMO it better uses real configuration management tool in this case, e.g. Salt or Ansible.https://github.com/Fizzadar/pyinfra

spezImage super-resolution through deep learninghttps://github.com/david-gpu/srez"From left to right, the first column is the 16x16 input image, the second one is what you would get from a standard bicubic interpolation, the third is the output generated by the neural net, and on the right is the ground truth."Looks like magic -

PADDLEPArallel Distributed Deep LEarning http://www.paddlepaddle.org/https://github.com/baidu/Paddle"PaddlePaddle (PArallel Distributed Deep LEarning) is an easy-to-use, efficient, flexible and scalable deep learning platform, which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu."

I was slacking for a long time, I know. Sorry for that. I'll push two issues in a row now, this is the first one. Will try to make it more regular, will include other sources too.

1. The Crystal Programming Language http://crystal-lang.orghttps://github.com/manastech/crystal
New programming language, named Crystal. "We love Ruby’s efficiency for writing code. We love C’s efficiency for running code. We want the best of both worlds." Programs look like Ruby, but compiles to efficient native code, and has compile-time error evaluation like Rust. Worth to check out, if you're PL freak, like me. :)

2. chef-koans
An experimental, test-driven way to learn about Chef.https://github.com/leftathome/chef-koans
"An experimental, test-driven way to learn about Chef. Takes some inspiration from Ruby Koans and from other things that are awesome and simple." Unfortunately, only lesson number 0 is ready now - but you're welcome to contribute, of course!
Also, if you didn't read Vim koans or Git koans - please try, it's quite fun.

4. streem
prototype of stream based programming languagehttps://github.com/matz/streem
A prototype of new PL from an author of Ruby - Yukihiro "matz" Matsumoto. It's on very early stage of development.

Why ?

Disk performance is quite crucial for most of modern server applications, especially databases. E.g. MySQL - check out this slides from Percona Live conference.
Although collectd provides disk statistics out of the box, graphing the metrics as shown by iostat was found to be more useful and graphic, because iostat reports usage of block devices, partitions, multipath devices and LVM volumes.
Also this plugin was rewritten in Python, because its a preferable language for siteops' tools on my current job, and choice of using collectd-python instead of collectd-exec was made for performance and stability reasons.

How ?

Collectd-iostat-python functions by calling iostat with some predefined intervals and push that data to collectd using collectd-python plugin.
Collectd can be then configured to write the collected data into many output formats that are supported by it's write plugins, such as graphite, which was the primary use case for this plugin.

Setup

Deploy the collectd python plugin into a suitable plugin directory for your collectd instance.
Configure collectd's python plugin to execute the iostat plugin using a stanza similar to the following:

Once functioning, the iostat data should then be visible via your various output plugins.

In the case of Graphite, collectd should be writing data to graphite in thehostname_domain_tld.collectd_iostat_python.DEVICE.column-name style namespaces. Symbols like '/','-' and '%' in metric names (but not in device names) automatically replacing by underscores (i.e. '_')
Please note that plugin will take only last line of iostat output, so big Count numbers also have no sense, but Count needs to be more than 1 to get actual and not historical data. And please make Interval * Count << Collectd.INTERVAL (20 seconds by default). I found e.g. Count=2 and Interval=2 works quite well for me.

Technical notes

For parsing iostat output I'm using jakamkon'spython-iostat python module, but as internal part of script instead of separate module because of couple of fixes - using Kbytes instead of blocks, adding -N to iostat for LVM endpoint resolving, migration to subprocess module as replacement of deprecated popen3, objectification etc.

TODO

Maybe some data aggregation needed, e.g. we can use some max / avg aggregation of data across intervals instead of picking last line of iostat output.