Making Life Easierhttp://rlc.vlinder.ca
One Byte at a timeWed, 22 Jul 2015 13:38:17 +0000en-UShourly1http://wordpress.org/?v=4.2.3Git demystificationhttp://feedproxy.google.com/~r/making-life-easier/~3/whM73I9eut0/
http://rlc.vlinder.ca/blog/2015/07/git-demystification/#commentsMon, 20 Jul 2015 10:16:10 +0000http://rlc.vlinder.ca/?p=3662Continue reading →]]>There are a few misconceptions I hear about Git that I find should be cleared up a bit, so here goes:

“To use Git you need to use the command-line”

Not necessarily: there are various graphical tools that come with Git, including git-citool, gitk and git-gui, all of which are based on Tk — which turns out to make them both very light-weight and portable. Almost anything you might want to do in a daily basis can be done with these tools.
Aside from that, for our Windows-using friends, there is Tortoise Git, which integrates Git right into Explorer.
A quick search on Google reveals a number of other graphical user interfaces for Git so no, you can do without the command-line if you feel more comfortable with a mouse.

“There is no central repository”

This is technically true, as Git is a distributed version control system so the repository you’re using is on your hard disk. That doesn’t mean, however, that you cannot have a central repository: I would recommend having a “canonical” central repository with code that has been properly curated and into which only certain people can push, and one or more “experimental” repositories, which you might see as you’d see branches in a central repository.
Still, if you want a central repository into which everyone can push, that is fairly simple to set up with Git. It’s not what Git was designed to do, but because it’s so flexible, there’s no reason why it wouldn’t work.

“You need SSH”

No, you don’t. You can push and pull over a Windows share (Samba) if you want to. SSH happens to be a great way of connecting to a server securely, using a pre-shared public key for authentication, and Git happens to come with something that pretends to be a shell but really isn’t — so users don’t get shell access to your server, but you don’t have to use any of that. You can push/pull over https & webdav if you feel more comfortable with that and, due to the modular nature of Git and its extensibility, it’s actually fairly easy to create a Git-over-potato-launcher extension (except for building a potato launcher that can effectively and efficiently talk to a server). Check the documentation for git-remote-helpers if you want more information about that.

“Git doesn’t integrate with my IDE”

Really? What IDE are you using? There are Git plug-ins for Visual Studio, Eclipse, IntelliJ IDEA, NetBeans, …

“Git is difficult to understand”

Under the hood, Git is a content-addressable filesystem.
If that doesn’t tell you anything, then understand that Git is optimized for the general use-case (checkout, modify, stage, commit). The staging step puts your changes in an index, which means you get to prepare exactly what goes into your commit, and you even get to edit it if you want to, before you commit it (using the interactive git-add).
Basically, when you commit something to the repository, you’re adding a single object to it that says “I started with this version and now I have this”. It does not say “this is what I changed”. Staging lets you prepare that commit object. That means that if you have two modifications in a single file but only want to commit one of them, you can.
Now, it’s true that some things in Git can be a bit hard to catch if you’re used to systems like CVS, SVN or TFS, but as soon as you start wrapping your head around the idea that Git history is basically a linked list of commits, each of which provides a complete description of the then-current version and its complete history (through the link with the previous commit), it becomes clearer.

“Using Git you have to rebase a lot, losing history”

I’ve been using Git pretty much since its inception and I have only rebased a branch three or four times so no, you don’t have to do that.
Git is really good at merging — it’s why it’s such a good versioning system (creating branches is easy, merging them is a much bigger challenge). I know some workflows prefer rebasing over merging, but I frankly think those are misguided.
Rebasing doesn’t lose history, though: it re-writes it on top of a different version of the parent branch.

Git discourages continuous integration

No, it doesn’t. While you certainly can work in isolation, continuous integration is not a question of everybody hacking on the master/current/root/main branch and inflicting their code on everyone else: it’s a question of integrating changes continuously, which pulling and good support for merging makes fairly easy.

Just give it a try, eh?

]]>http://rlc.vlinder.ca/blog/2015/07/git-demystification/feed/0http://rlc.vlinder.ca/blog/2015/07/git-demystification/Three ideas you should steal from Continuous Integrationhttp://feedproxy.google.com/~r/making-life-easier/~3/YQrOW6Cod80/
http://rlc.vlinder.ca/blog/2015/07/three-ideas-you-should-steal-from-continuous-integration/#commentsThu, 16 Jul 2015 10:13:14 +0000http://rlc.vlinder.ca/?p=3674Continue reading →]]>I like Continuous Integration — a lot. Small incremental changes, continuous testing, continuous builds: these are Good Things. They provide statistics, things you can measure your progress with. But Continuous Integration requires an investment on the part of the development team, the testers, etc. There are, however, a few things you can adopt right now so, I decided to give you a list of things I think you should adopt.

Fail Fast

Keep the cycle between coding and knowing whether your code works as short as possible. Make it as easy as possible to know whether code works — ideally with excellent coverage (both functional and code) in unit tests that can run during a compile cycle.

If something doesn’t work, you want to know it before your developer has time to get up and fetch a cup of coffee. If that’s impossible, he should know by the time he gets back to his desk.

It should also be clear how the failure is related to the code. That means:

Self-documenting code

Self-documenting test cases

Test cases that test one thing, and one thing only

Keep the build fast

I’ve worked on projects where hitting F7 (build) meant going home for the night. This is not a good thing: it gets developers to write a lot of code without every building, let alone testing. That, in turn, slows down the development cycle which has all kinds of nefarious effects.

Automate testing

Note: not just tests: testing. You want to get tests running without anyone having to think of starting them. You want to automate feed-back to your developers. You want to tell your developer (politely) that he’s made a mistake and should repair it before he inflicts the code on others.

]]>http://rlc.vlinder.ca/blog/2015/07/three-ideas-you-should-steal-from-continuous-integration/feed/0http://rlc.vlinder.ca/blog/2015/07/three-ideas-you-should-steal-from-continuous-integration/Eliminating waste as a way to optimizehttp://feedproxy.google.com/~r/making-life-easier/~3/EDOoKQT7p7c/
http://rlc.vlinder.ca/blog/2015/07/eliminating-waste-as-a-way-to-optimize/#commentsFri, 03 Jul 2015 12:40:18 +0000http://rlc.vlinder.ca/?p=3511Continue reading →]]>I recently had a chance to work on an implementation of an Arachnida-based web server that had started using a lot of memory as new features were being added.

Arachnida itself is pretty lean and comes with a number of tools to help build web services in industrial devices, but it is not an “app in a box”: some assembly is required and you have to make some of the parts yourself.

Structure of a typical Arachnida-based web server

In most cases, the resulting web server looks a lot like one of the examples: there’s a Server class that contains the Listener instance, HTTPRequestHandler and a bunch of objects that implement services using a Service interface. Each service implements a part of the web application and is responsible for responding to its requests. To dispatch between different services, usually, some part of the request URI is used.

This scheme works very well: it allows you to separate the responsibilities of each service neatly into classes. It sometimes comes with a bit of a trade-off, though: you often end up duplicating information in different services, unless you go ahead and implement a full-fledged MVP, which most people don’t.

The code I was looking at used the standard std::string in many places and hooked into APIs that I couldn’t change for the purposes of this optimization, and used std::strings as well as raw char const* pointers. Arachnida comes with Acari, and uses Acari extensively itself — which helps to keep it lean. Acari comes with a very agressively optimized string class for this kind of situation, where strings get copied around a lot (which, in parsers, is pretty common). The best option I had, therefore, was to find out, for each string, which copy I should keep and whether I could count on its longevity. Those that I could count on for staying alive would then be referenced by instances of a non-copying String class from Acari, or Vlinder::Lite::String (an even lighter version of Acari’s String class) in some places.

This took care of a large part of the problem, but not all of it, so I had to dig a little deeper. Here’s some of the tricks I applied:

Instrumenting the code:

At key places (start of main, before and after the initialization of singletons, before and after the creation of several key objects, before and after reading configuration, etc. etc.) write a debug trace indicating how much memory is being used.

This is basically a poor man’s profiler, but it allows you to easily find chunks of code that use inordinate amounts of memory. Sometimes, though, the system may play tricks on you when the system’s allocator tries to help out by reserving more memory for your application than you need to — so you need to look out for false positives.

Go for the low-hanging fruit:

When memory usage becomes problematic enough to devote several hours to it, there’s likely to be a lot of low-hanging fruit — things that take a lot of memory and shouldn’t. Determine a threshold (e.g. 1 meg) and don’t bother with anything below it, at least until you’ve run out of candidates at or above that threshold.

Reduce caching:

the server used what effectively came down to a mirror of a cache internally, while it had access to the cache itself — removing the cache’s mirror removed a few megabytes of memory footprint.

RAII:

Using RAII consistently greatly reduces your chances of having memory leaks — or even of letting objects stay alive longer than necessary. Replacing calls to malloc with std::vector instances ended up shaving another megabyte off the application’s footprint.

Find big static variables using objdump or dumpbin:

Both binutils and the Microsoft SDK come with a tool to dump the headers of generated binary files. You can use those to find big static variables and, then, use the code to evaluate whether those variables really need to be that big. On the target platform, we could reduce the footprint of static variables by over 80% using this approach.

Conclusion

This particular instance of an Arachnida server was unusual in that the smallest of the family of devices it runs on was still rather large (128 MB of working memory) — but that’s becoming more and more common. While Arachnida itself was not part of the problem, one of its component parts was part of the solution and this gave me a chance to work with one of its instances for the first time in a rather long time (I usually only get to answer a few questions or implement the occasional feature request, but I don’t get to play with the end-product’s code much, so I mostly let those questions guide where development should go next). So suffice it to say I’m happy to know my web server is being put to good use in non-trivial projects — and people are getting their money’s worth (which isn’t surprising: they wouldn’t come back otherwise).

]]>http://rlc.vlinder.ca/blog/2015/07/eliminating-waste-as-a-way-to-optimize/feed/0http://rlc.vlinder.ca/blog/2015/07/eliminating-waste-as-a-way-to-optimize/Technical documentationhttp://feedproxy.google.com/~r/making-life-easier/~3/7K-KMus4hIQ/
http://rlc.vlinder.ca/blog/2015/06/technical-documentation/#commentsTue, 30 Jun 2015 01:49:30 +0000http://rlc.vlinder.ca/?p=3431Continue reading →]]>Developers tend to have a very low opinion of technical documentation: it is often wrong, partial, unclear and not worth the trouble of reading. This is, in part, a self-fulfilling prophecy: such low opinions of technical documentation results in them not being read, and not being invested in.
I have no easy solution for this: technical documentation is an art and a science, and not everyone is good at it. It’s all about communication and, while communication is the first thing we start learning, as soon as — and perhaps before — we are born, it is also something we never stop learning and refining.

The solution I do have is not easy: good documentation requires a consistent, considerable, continuous investment both pecuniary and of effort. Good documentation requires time, tools and tenacity — all of which cost money.

The same is true for good code, of course, but the difference between good code and good documentation from the software/firmware engineer’s perspective is that good code not only has a direct effect on the productivity of the engineer: it also has a direct effect on the quality of the product and it is a clear responsibility of the engineer (which means that if you don’t get it right, you either lose your job or get to do it over and over until you do get it right).

the argument for investment in tools and time is also easier to make for code than for documentation: customers have come to expect glossy marketing paired with low-quality technical documentation because the two types of documentation don’t target the same audiences. Once they make a purchase decision based on the glossy marketing, they’ll expect the product to work, but the decision-makers will not be looking at the technical documentation for that. Hence, the only people who end up looking at the technical documentation are the ones who typically have the least influence over purchase decisions: the standards for documentation are lower because the perceived impact is lower (though the emphasis on perceived is there for a reason). The expectations are also lower, due to the often-dismal state of existing technical documentation.

Quality is the measure by which a product meets or exceeds the needs, requirements and expectation sur of the customer. Requirements are functional or non-functional in nature and are often ill-defined: Ford’s customers wanted a faster horse, Ford came up with a car. However, one ever-recurring non-functional requirement is usability. This is the requirement the documentation addresses. Good documentation can increase the usability of the produce it documents tremendously. Bad documentation at best has no negative effect and at worst can make a product border uselessness. On the other hand, expectations are low, so it’s easy to meet or exceed them.

]]>http://rlc.vlinder.ca/blog/2015/06/technical-documentation/feed/0http://rlc.vlinder.ca/blog/2015/06/technical-documentation/The story of “Depends”http://feedproxy.google.com/~r/making-life-easier/~3/doFAClrkhf8/
http://rlc.vlinder.ca/blog/2015/06/the-story-of-depends/#commentsMon, 29 Jun 2015 03:01:17 +0000http://rlc.vlinder.ca/?p=3647Continue reading →]]>Today, I announced on behalf of my company, Vlinder Software, that we would no longer be supporting “Depends”, the dependency tracker. I think it may be worthwhile to tell you a by about the history of Depends, how it became a product of Vlinder Software, and why it no longer is one.
Depends was first written as part of Jail, an experiment I was working on in 2007. Some of the code from the Jail project was never made public but the parts that were were often interesting: there’s an implementation of Maged M. Michael’s Safe Memory Reclamation algorithm (SMR)1, for example, that has some useless sorting added to it bug is otherwise interesting to look at. I was playing a lot with lock-free code back then — it’s gotten a bit more serious since — and tried out several algorithms of which SMR is probably the most elegant.

I also wrote a dependency tracker, based on the idea over an annotated directed a cyclic graph that was serializable and accessible as an STL-style associative container. I still very much like the idea of using familiar interfaces (assuming competent C++ programmers are familiar with the STL) and hiding nifty algorithms behind them so the “thing” you’re working with just “magically” does what you want it to do.

At about the same time, I also wrote an article in Dr Dobbs’s about the Adapter pattern and a particularly interesting implementation of it, which I had needed at the time to integrate a new piece of software with a much older one, the older one having suffered from years of maintenance, which had completely shattered any remnants of encapsulation. To resist the Borg-like assimilation of my code, I needed to abstract the code I was to interface with. The solution, though complex, was really quite nifty.

Depends was not born out of necessity, but out of curiosity: as I stated in its documentation:

As professional software developers we use programs that include dependency trackers nearly every day: we basically can’t do our work without them, unless we start tracking dependencies by hand.
The trackers we use on a daily basis are integrated into such fine tools as GNU Make, Microsoft Visual Studio, etc.: dependency trackers are the behind-the-scenes magic that make tools like these work. They help us track the dependencies between our source files to determine the order in which they need to be compiled and which files need compiling. They make our jobs a whole lot easier, if not just plainly possible.

Dependency trackers further help in such diverse applications as banking (inside the calculation engine of one of France’s most wide-spread fiscal applications is a dependency tracker that tracks the dependencies of the calculation engine’s modules); OS kernels (using a dependency tracker to know which modules to load and in what order); etc.

Of course I knew about the calculation engine thing because I wrote it. The other bits are obvious. Still, dependency tracking was one of those problems for which the solution, though recurring, seemed to be re-invented every time. I wanted to create a generic solution to the problem that would work in each of the aforementioned cases and still be efficient.

Depends is an elegant solution and is extremely well-documented and, with one tweak in one place — which most users tend to find fairy quickly — is actually very efficient.

For commercial support, the business model really became “for $100 I’ll tell you what the tweak is” but, the tweak being fairly obvious (I’ve only had to point it out once) and the most prevalent use-case not requiring disclosure of the code (i.e. the fact that it’s licensed under GPLv2 was not a problem for most users and only one commercial license has ever been sold) the cost if maintaining Depends in our build and test environment simply wasn’t worth the trouble.

The “tweak”, by the way, is that for the vast majority of use-cases, you need to know either the prerequisites of a node or the dependants. By default, the Depends class calculates both, using two DAGs. You can remove one of the DAGs and still have all the features you need.

A patent application was filed under the title “Method for efficient implementation of dynamic lock-free data structures with safe memory reclamation” in 2002, but was never granted so AFAICT (but IANAL) the algorithm is in the public domain, but I’ve posted a question about it here

I am not a mathematician, but I do like Bayes’ theorem for non-functional requirements analysis — and I’d like to present an example of its application.1
Recently, a question was brought to the DNP technical committee about the application of a part of section 13 of IEEE Standard 1815-2012 (the standard that defines DNP). Section 13 explains how to use DNP over TCP/IP, as it was originally designed to be used over serial links. It basically says “pretend it’s a serial link”, and “here’s how you do TCP networking”.

Network diagram of the use-case

The use-case in question involved a master device talking to several oustation devices over a single TCP connection. The TCP connection in question really connected to a port server, which transformed the TCP connection into several serial connections.

The standard tells the master and the outstation to periodically check whether the link is still alive, and to close the TCP connection if it isn’t. This works fairly well in the specific (but most popular) case where the master connects to a single outstation using a TCP connection. I.e. in that case (here comes Bayes’ theorem):
I.e. the probability that the TCP connection is broken given that the DNP link is broken is equal to the probability that the DNP link is broken given that the TCP connection is broken, times the probability that the TCP connection is broken (at any time), divided by the probability that the DNP connection is broken (at any time). As the probability that the DNP link is broken given that the TCP connection is broken is 1 (the DNP link is wholly dependent on the TCP connection), this is really the probability that the TCP connection is broken divided by the probability that the DNP link is broken. The probability that a DNP link breaks, given a production-quality stack, should be very low, and about equal to (but strictly higher than) the probability that the TCP connection breaks, so one may safely assume that if the DNP link is broken, the TCP connection is also broken.2

For the remainder of this article, we will assume that the devices on the other end of the TCP connection are more much likely to fail than the TCP connection itself. While this was not our assumption before, and is not an assumption I would expect the authors of the standard to have, applying this assumption to a case where there is only one device at the other end has very little effect on availability, as closing the TCP connection does not render any other devices unavailable, while it a disconnect/reconnect may fix the problem — the near-zero negative effects of a false positive far outweigh the positive effect in case it’s not a false positive: even if you have a 90% chance that a disconnect/reconnect doesn’t work, it can’t hurt. This is obviously not the case in our use-case, where such a false-positive rate greatly diminishes the availability of other devices on the same connection. I.e., we will assume five-nines (99.999%) uptime for the TCP connection and four-nines (99.99%) uptime for the DNP3 devices.

The use-case in the standard — one master, one outstation, one connection — is the use-case the member came up with, which involved one TCP connection, but DNP links — the other DNP links were still working, for as far as we could tell.

The probability that all DNP links go awry at the same time is very small indeed( — i.e. the probability that the TCP connection is down plus the probability that all DNP links are down while the TCP connection is still alive — to be precise), but still strictly greater than , so our equation now becomes:
but the probability that the TCP connection is broken given that only one DNP link is broken is very small, namely:

Note: it is impossible for a DNP link that is wholly dependent on the TCP connection to be available while the TCP connection is not. Hence, as long as one DNP link is still available, the TCP connection is necessarily still alive. This means that deciding to break the TCP connection on the assumption that it was already broken while some DNP links are still communicating has the clear effect of reducing availability (the opposite of the intent).

So, if you don’t decide to cut the TCP connection as soon as you see a DNP link going down, when do you decide to cut the connection?

The issue with this question is that, while as long as there is only one link for any connection we can think in terms of “good” and “bad” links, as soon as we have more than one link we have to add the notion of an “unknown” state and a “device failure” state.

Flow chart indicating what is done when a message is received re: the link and connection statuses

If any message is received from any device whatsoever, it is clear that the TCP connection is still alive and that any device link that is down at the moment is due to a device failure.

That means that in any assessment of the likely state of the TCP connection, any devices that were previously marked as having a “bad” link status are no longer relevant: they most likely failed because of a device failure.

Link status request time-out

So, when a link status request times out, we really only know that the link status of the device for which it timed out is “bad”, and that we can no longer assume that the devices for which it was “good”, it still is “good”. This is the moment where we should assess whether the TCP connection is at fault — in which case it should be closed — or whether something else is wrong. What we need to know is .

As shown above, .3 Now, if we have five-nines uptime for TCP and our-nines uptime for DNP3, — hence the “90% chance that a disconnect/reconnect doesn’t work” I mentioned earlier.

If, however, we find that there are two DNP links that are down, . This is somewhat more difficult to calculate correctly, because while it would be tempting to say that , that is clearly not accurate as we know, due to the complete dependency of the DNP link on the TCP connection, that , so is really , which, in our case, assuming five-nines for TCP and four-nines for the DNP3 link, means.

So, while with only one system’s link down the probability of the TCP connection being the problem is only 10%, when the second link goes down, absent knowledge of the link being OK between the first and second down, the probability of the TCP connection being the problem shoots up to nearly 100%. This means that there is no need, at that point, to probe the other devices on the connection.

Note that this is regardless if how many devices there are on the other end of the connection: as soon as there are two devices that have failed to respond to a link status request and no devices have communicated between those two failures, it is almost certain that the TCP connection is down.

I was actually going to give a theoretical example of availability requirements, but then a real example popped up…

I should note that this implies that is very small indeed which, given that w’re talking about link status requests, which are implemented in the link layer and in most implementations don’t require involvement of much more than that, is a fairly safe assumpion.

Because

]]>http://rlc.vlinder.ca/blog/2015/06/bayes-theorem-in-non-functional-requirements-analysis-an-example/feed/0http://rlc.vlinder.ca/blog/2015/06/bayes-theorem-in-non-functional-requirements-analysis-an-example/Globe and Mail: Canada lacks law that defines, protects trade secretshttp://feedproxy.google.com/~r/making-life-easier/~3/e6wSLTIYwnE/
http://rlc.vlinder.ca/blog/2015/05/globe-and-mail-canada-lacks-law-that-defines-protects-trade-secrets/#commentsSun, 24 May 2015 20:48:03 +0000http://rlc.vlinder.ca/?p=3507Continue reading →]]>According to the Globe and Mail (Iain Marlow, 20 May 2015) the 32-count indictment against six Chinese nationals who allegedly used their positions to obtain intellectual property from universities and businesses in the U.S. and then take that knowledge home to China, would not be possible here: “Canadian observers say the 32 count indictment, which was unsealed late on Monday, highlights the prevalence and severity of industrial espionage in North America, and underscores the need for Canada to adopt more stringent laws. Canada has no dedicated act on trade secrets and economic espionage and has not successfully prosecuted a similar case, experts say.”
While it may be true that Canada lacks legislation around trade secrets, the same article recounts an anecdote that, I think, exemplifies a more important problem: “In 2012, a former Nortel Networks Corp. employee named Brian Shields told The Globe and Mail that Chinese hackers had been attacking the company for years, and may have contributed to the storied firm’s downfall. Mr. Shields, who worked at the Canadian telecom firm for 19 years and was a senior systems security adviser, said Nortel approached the RCMP in 2004 and turned all its evidence over, but received no help. Before that, however, a former senior member of Canada’s spy agency said it had approached Nortel with evidence of Chinese activity around the company, but was “brushed off” at a time management seemed preoccupied with booming telecom growth in China.” (emphasis mine)

Technology companies tend to not take their own intellectual property seriously enough.

I recently sat in on a presentation given by an FBI agent who specializes in IP theft. Phil point, throughout his presentation, was very clear: IP theft is not seen as a criminal activity by the countries that perpetrate it — China, Russia, Iran, … — but as a full-time job for professionals whose goals fit into five-year plans to forward the economy of their countries. Look at China’s plans for agriculture (increase production and decrease water usage and land surface used) and look where they go fish for their IP (recently, Potash). For telecommunications, they need to improve their networks, decrease noise, etc. It stands to reason they’d be interested in the kind of technology these six individuals allegedly stole.

It may be true that Canada lacks the necessary legislation, but Canadian business, like American business, lack the proper mind-set to counter IP theft: we’re not dealing with script kiddies in their mom’s basements, but with professionals who know the crafts, know the tools, know how to get around the defenses, and do not believe they’re doing anything wrong.