Also you can find some more of my english writings by looking at the blog-entries in LJ which I tagged english.

Best wishes,
Arne

A tale of foxes and freedom

Singing the songs of creation to shape a free world.

One day the silver kit asked the grey one:

“Who made the light, which brightens our singing place?”

The grey one looked at him lovingly and asked the kit to sit with him, for he would tell a story of old, a story from the days when the tribe was young.

“Once there was a time, when the world was light and happiness. During the day the sun shone on the savannah, and at night the moon cast the grass in a silver sheen.

It was during that time, when there were fewer animals in the wild, that the GNUs learned to work songs of creation, deep and vocal, and they taught us and everyone their new findings, and the life of our skulk was happiness and love.

But while the GNUs spread their songs and made new songs for every idea they could imagine, others invaded the plains, and they stole away the songs and only allowed singing them their way. And they drowned out the light, and with it went the happiness and love.

And when everyone shivered in cold and darkness, and stillness and despair were drawn over the land, the others created a false light which cast small enclosures into a pale flicker, in which they let in only those animals who were willing to wear ropes on their throats and limbs, and many animals went to them to escape the darkness, while some fell deeper still and joined the others in enslaving their former friends.

Upon seeing this, the fiercest of the GNUs, the last one of the original herd, was filled with a terrible anger to see the songs of creation turned into a tool for slavery, and he made one special song which created a spark of true light in the darkness which could not be taken away, and which exposed the falsehood in the light of the others. And whenever he sang this song, those who were near him were touched by happiness.

But the others were many and the GNU was alone, and many animals succumbed to the ropes or the ropers and could move no more on their own.

To spread the song, the GNU now searched for other animals who would sing with it, and the song spread, and with it the freedom.

It was during these days, that the GNU met our founders, who lived in golden chains in a palace of glass.

In this palace they thought themselves lucky, and though the light of the palace grew ever paler and the chains grew heavier with every passing day, they didn't leave, because they feared the utter darkness out there.

When they then saw the GNU, they asked him: "Isn't your light weaker than this whole palace?" and the GNU answered: "Not if we sing it together", and they asked "But how will we eat in the darkness?" and the GNU answered "you'll eat in the light of your songs, and plants will grow wherever you sing", and they asked "But is it a song of foxes?" and the GNU said: "You can make it so", and he began to sing, and when our founders joined in, the light became shimmering silver like the moon they still remembered from the days and nights of light, and they rejoiced in its brightness.

And whenever this light touched the glass of the palace, the glass paled and showed its true being, and where the light touched the chains, they whithered and our founders went into the darkness with the newfound light of the moon as companion, and they thanked the GNU and promised to help it, whenever they were needed.

Then they set off to learn the many songs of the world and to spread the silver light of the moon wherever they came.

And so our founders learned to sing the light, which brightens every one of our songs, and as our skulk grew bigger, the light grew stronger and it became a little moon, which will grow with each new kit, until its light will fill the whole world again one day.”

The grey one looked around where many kits had quietly found a place, and then he laughed softly, before he got up to fetch himself a meal for the night, and the kits began to speak all at once about his story. And they spoke until the silver kit raised its voice and sung the song of moonlight1, and they joined in and the song filled their hearts with joy and the air with light, and they knew that wherever they would travel, this skulk was where their hearts felt home.

PS: This story is far less loosely based on facts than it looks.
There are songs of creation, namely computer programs, which once were free and which were truly taken away and used for casting others into darkness. And there was and still is the fierce GNU with his song of light and freedom, and he did spread it to make it into GNU/Linux and found the free software community we know today.
If you want to know more about the story as it happened in our world, just read the less flowery story of Richard Stallman, free hackers and the creation of GNU or listen to the free song Infinite Hands.

PPS: I originally wrote this story for Phex, a free Gnutella based p2p filesharing program which also has an anonymous sibling (i2phex). It’s an even stronger fit for Firefox, though.

PPPS: License: This text is given into the public under the GNU FDL without invariant sections and other free licenses by Arne Babenhauserheide (who has the copyright on it).

To make it perfectly clear: This moonlight is definitely not the abhorrent and patent stricken silverlight port from the mono project. The foxes sing a song of freedom. They wouldn’t accept the shackles of Microsoft after having found their freedom. Sadly the PR departments of some groups try to take over analogies and strong names. Don’t be fooled by them. The moonlight in our songs is the light coming from the moon which resonates in the voices of the kits. And that light is free as in freedom, from copyright restrictions as well as from patent restrictions – though there certainly are people who would love to patent the light of the moon. Those are the ones we need to fight to defend our freedom. ↩

Emacs

Emacs is a self-documenting, extensible editor, a development environment and a platform for lisp-programs - for example programs to make programming easier, but also for todo-lists on steroids, reading email, posting to identi.ca, and a host of other stuff (learn lisp).

2 Package Header

As Emacs package, babcore needs a proper header.

;; Copyright (C) 2013 Arne Babenhauserheide;; Author: Arne Babenhauserheide (and various others in Emacswiki and elsewhere).;; Maintainer: Arne Babenhauserheide;; Created 03 April 2013;; Version: 0.0.2;; Version Keywords: core configuration;; This program is free software; you can redistribute it and/or;; modify it under the terms of the GNU General Public License;; as published by the Free Software Foundation; either version 3;; of the License, or (at your option) any later version.;; This program is distributed in the hope that it will be useful,;; but WITHOUT ANY WARRANTY; without even the implied warranty of;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the;; GNU General Public License for more details.;; You should have received a copy of the GNU General Public License;; along with this program. If not, see <http://www.gnu.org/licenses/>.;;; Commentary:;; Quick Start / installation:;; 1. Download this file and put it next to other files Emacs includes;; 2. Add this to you .emacs file and restart emacs:;; (require 'babcore);;;; Use Case: Use a common core configuration so you can avoid the;; tedious act of gathering all the basic stuff over the years and;; can instead concentrate on the really cool new stuff Emacs offers;; you.;;;; Todo:;;;;; Change Log:;; 2013-04-03 - Minor adjustments;; 2013-02-29 - Initial release;;; Code:

Additionally it needs the proper last line. See finish up for details.

3 Feature Gems

3.1 package.el, full setup

The first thing you need in emacs 24. This gives you a convenient way
to install just about anything, so you really should use it.

Also I hope that it will help consolidate the various emacs tips which
float around into polished packages by virtue of giving people ways to
actually get the package by name - and keep it updated almost
automatically.

;; Convenient package handling in emacs
(require 'package)
;; use packages from marmalade
(add-to-list 'package-archives '("marmalade" . "http://marmalade-repo.org/packages/"))
;; and the old elpa repo
(add-to-list 'package-archives '("elpa-old" . "http://tromey.com/elpa/"))
;; and automatically parsed versiontracking repositories.
(add-to-list 'package-archives '("melpa" . "http://melpa.milkbox.net/packages/"))
;; Make sure a package is installed
(defunpackage-require (package)
"Install a PACKAGE unless it is already installed or a feature with the same name is already active.Usage: (package-require 'package)"; try to activate the package with at least version 0.
(package-activate package '(0))
; try to just require the package. Maybe the user has it in his local config
(condition-case nil
(requirepackage)
; if we cannot require it, it does not exist, yet. So install it.
(error (package-install package))))
;; Initialize installed packages
(package-initialize)
;; package init not needed, since it is done anyway in emacs 24 after reading the init;; but we have to load the list of available packages
(package-refresh-contents)

3.2 Flymake

Flymake is an example of a quite complex feature which really everyone
should have.

It can check any kind of code, and actually anything which can be
verified with a program which gives line numbers.

As alternative you might want to look into flycheck. It looks really
cool, but I don’t yet have experience with it, so I cannot recommend
it, yet.

3.13 Basic key chords

This is the second strike for saving your pinky. Yes, Emacs is hard on
the pinky. Even if it were completely designed to avoid strain on the
pinky, it would still be hard, because any system in which you do not
have to reach for the mouse is hard on the pinky.

But it also provides some of the neatest tricks to reduce that strain,
so you can make Emacs your pinky saviour.

The key chord mode allows you to hit any two keys at (almost) the same
time to invoke commands. Since this can interfere with normal typing,
I would only use it for letters which are rarely typed after each
other.

The default chords have proven themselves to be useful in years of
working with Emacs.

3.14 X11 tricks

These are ways to improve the integration of Emacs in a graphical
environment.

We have this cool editor. But it is from the 90s, and some of the
more modern concepts of graphical programs have not yet been
integrated into its core. Maybe because everyone just adds them to the
custom setup :)

On the other hand, Emacs always provided split windows and many of the
“new” window handling functions in dwm and similar - along with a
level of integration with which normal graphical desktops still have
to catch up. Open a file, edit it as text, quickly switch to org-mode
to be able to edit an ascii table more efficiently, then switch to
html mode to add some custom structure - and all that with a
consistent set of key bindings.

But enough with the glorification, let’s get to the integration of
stuff where Emacs arguably still has weaknesses.

3.14.1 frame-to-front

Get the current Emacs frame to the front. You can for example call
this via emacsclient and set it as a keyboard shortcut in your
desktop (for me it is F12):

emacsclient -e "(show-frame)"

This sounds much easier than it proves to be in the end… but luckily
you only have to solve it once, then you can google it anywhere…

(defunshow-frame (&optional frame)
"Show the current Emacs frame or the FRAME given as argument.And make sure that it really shows up!"
(raise-frame)
; yes, you have to call this twice. Don’t ask me why…; select-frame-set-input-focus calls x-focus-frame and does a bit of; additional magic.
(select-frame-set-input-focus (selected-frame))
(select-frame-set-input-focus (selected-frame)))

3.18 Transparent GnuPG encryption

If you have a diary or similar, you should really use this. It only
takes a few lines of code, but these few lines are the difference
between encryption for those who know they need it and encryption for
everyone.

3.21.3 savehist

And I want to be able to call my recent commands in the minibuffer. I
normally don’t type the full command name anyway, but rather C-r
followed by a small part of the command. Losing that on restart really
hurts, so I want to avoid that loss.

; save minibuffer history
(require 'savehist)
(savehist-mode t)

3.21.4 desktop globals

This is the chainsaw of persistency. I commented it out, because it
can be overkill and actually disturb more than it helps, when it
recovers stuff I did not need.

3.22 use the system clipboard

Finally one more minor adaption: Treat the clipboard gracefully. This
is a tightrope stunt and getting it wrong can feel awkward.

This is the only setting for which I’m not sure that I got it right,
but it’s what I use…

; Use the system clipboard
(setq x-select-enable-clipboard t)

3.23 Add license headers automatically

In case you mostly write free software, you might be as weary of hunting for the license header and copy pasting it into new files as I am. Free licenses, and especially copyleft licenses, are one of the core safeguards of free culture, because they give free software developers an edge over proprietarizing folks. But they are a pain to add to every file…

Well: No more. We now have legalese mode to take care of the inconvenient legal details for us, so we can focus on the code we write. Just call M-x legalese to add a GPL header, or C-u M-x legalese to choose another license.

(package-require 'legalese)

3.24 finish up

Make it possible to just (require 'babcore) and add the proper package footer.

(provide 'babcore)
;;; babcore.el ends here

4 Summary

With the babcore you have a core setup which exposes some of the
essential features of Emacs and adds basic integration with the system
which is missing in pristine Emacs.

Now go and see the M-x package-list-packages to see where you can
still go - or just use Emacs and add what you need along the way. The
package list is your friend, as is Emacswiki.

Note: As almost everything on this page, this text and code is available under the GPLv3 or later.

Conveniently convert CamelCase to words_with_underscores using a small emacs hack

I currently cope with refactoring in an upstream project to which I
maintain some changes which upstream does not merge. One nasty part is
that the project converted from CamelCase for function names to
words_with_underscores. And that created lots of merge errors.

Today I finally decided to speed up my work.

The first thing I needed was a function to convert a string in
CamelCase to words_with_underscores. Since I’m lazy, I used google, and that turned up the CamelCase page of Emacswiki - and with it the following string functions:

Quite handy - and elegantly executed. Now I just need to make this available for interactive use. For this, Emacs Lisp offers many useful ways to turn Editor information into program information, called interactive codes - in my case the region-code: "r". This gives the function the beginning and the end of the currently selected region as arguments.

With this, I created an interactive function which de-camelcases and underscores the selected region:

And now we’re almost there. Just create a macro which searches for a function, selects its name, de-camelcaeses and underscores it and then replaces every usage of the CamelCase name by the underscored name. This isn’t perfect refactoring (can lead to errors), but it’s fast and I see every change it does.

C-x C-(
C-s def
M-x mark-word
M-x underscore-region
C-x C-)

That’s it, now just call the macro repeatedly.

C-x eeeeee…

Now check the diff to fix where this 13 lines hack got something wrong ( like changing __init__ into _init_ - I won’t debug this, you’ve been warned ☺).

As you can see, it’s quite simple: Just create a temporary buffer and and use the default replace-string function I’m using daily while editing. Don’t assume I had figured out that elegant way myself. I just searched for it in the net and adapted the nicest code I found :)

And that’s it. A few lines of simple elisp and I have working completion for a custom link-type which points to research papers - and can easily be adapted when I change the location of the papers.

Now don’t think I would have come up with all that elegant code myself. My favorite language is Python and I don’t think that I should have to know emacs lisp as well as Python. So I copied and adapted most of it from existing functions in emacs. Just use C-h C-f <function-name> and then follow the link to the code :)

Remember: This is free software. Reuse and learning from existing code is not just allowed but encouraged.

3 Implementation: bib

For the bib-links, I chose an even easier way. I just reused reftex-do-citation from reftex-mode:

For reftex-do-citation to allow using the bib-style link, I needed some setup, but I already had that in place for explicit citation inserting (not generalized as link-type), so I don’t count following as part of the actual implementation. Also I likely copied most of it from emacs-wiki :)

4 Result

For papers, I get an interactive file-prompt to just select the file. It directly starts in the papers folder, so I can simply enter a few letters which appear in the paper filename and hit enter (thanks to ido-mode).

For bibtex entries, a reftex-window opens in a lower split-screen and asks me for some letters which appear somewhere in the bibtex entry. It then shows all fitting entries in brief but nice format and lets me select the entry to enter. I simply move with the arrow-keys, C-n/C-p, n/p or even C-s/C-r for searching, till the correct entry is highlighted. Then I hit enter to insert it.

And that’s it. I hope you liked my short excursion into the world of extending emacs to stay focussed while connecting seperate data sets.

I never saw a level of (possible) integration and consistency anywhere else which even came close to the possibilities of emacs.

Now I use M-x org-babel-tangle to write the code to the file org-custom-link-completion.el. I attached that file for easier reference: org-custom-link-completion.el :)

Have fun with Emacs!

PS: Should something be missing here, feel free to get it from my public .emacs.d. I only extracted what seemed important, but I did not check if it runs in a pristine Emacs. My at-home branch is “fluss”.

Footnotes:

1 : Creating a custom function for string replace might not have been necessary, because some function might already exist for that. But writing it myself was faster than searching for it.

The solution

Making it possible

This solves the problem, but it is not convenient, because I have to switch to the terminal, download the file, convert it and append the result to my bibtex file.

Making it convenient

With the first step, getting the ris-citation is quite inconvenient. I need 3 steps just for getting a citation.

But those steps are always the same, and since I use Emacs, I can automate and integrate them very easily. So I created a simple function in emacs, which takes the url of a ris citation, converts it to bibtex and appends the result to my local bibtex file. Now I get a ris citation with a simple call to

M-x ris-citation-to-bib

Then I enter the url and the function appends the citation to my bibtex file.2

Feel free to integrate it into your own emacs setup (additionally to the GPLv3 you can use any license used by emacswiki or worg).

Update (2013-04-10): Thanks to Han Duply, kanban links now work for entries from other files. And I uploaded kanban.el on marmalade.

Some time ago I learned about kanban, and the obvious next step was: “I want to have a kanban board from org-mode”. I searched for it, but did not find any. Not wanting to give up on the idea, I implemented my own :)

The result are two functions: kanban-todo and kanban-zero.

“Screenshot” :)

kanban-todo

kanban-todo provides your TODO items as kanban-fields. You can move them in the table without having duplicates, so all the state maintenance is done in the kanban table. Once you are finished, you mark them as done and delete them from the table.

The important line is the #+TBLFM. That says “use my TODO items in the TODO column, except if they are in another column” and “add kanban headers for my TODO states”

The kanban-todo function takes an optional parameter match, which you can use to restrict the kanban table to given tags. The syntax is the same as for org-mode matchers. The third argument allows you to provide a scope, for example a list of files.

To only set the scope, use nil for the matcher.

See C-h f org-map-entries and C-h v org-agenda-files for details.

kanban-zero

kanban-zero is a zero-state Kanban: All state is managed in org-mode and the table only displays the kanban items.

To set it up, put kanban.el somwhere in your load path and (require 'kanban). Then just add a table like the following:

The important line is the #+TBLFM. That says “show my org items in the appropriate column” and “add kanban headers for my TODO states”.

Click C-c C-c with the point on the TBLFMT line to update the table.

The kanban-zero function takes an optional parameter match, which you can use to restrict the kanban table to given tags. The syntax is the same as for org-mode matchers. The third argument allows you to provide a scope, for example a list of files.

In my setup they are ❢ (todo) ☯ (doing) ⧖ (waiting) and ☺ (to report). Not shown in the kanban Table are ✔ (finished), ✘ (dropped) and deferred (later), because they don’t require any action from me, so I don’t need to see them all the time.

Collecting kanban entries via SSH

If you want to create a shared kanban table, you can use the excellent transparent network access options from Emacs tramp to collect kanban entries directly via SSH.

To use that, simply pass an explicit list of files to kanban-zero as 4th argument (if you don’t use tag matching just use nil as 3rd argument). "/ssh:host:path/to/file.org" retrieves the file ~/path/to/file.org from the host.

How to show the abstract before the table of contents in org-mode

I use EmacsOrg-Mode for writing all kinds of articles. The standard format for org-mode is to show the table of contents before all other content, but that requires people to scroll down to see whether the article is interesting for them. Therefore I want the abstract to be shown before the table of contents.

Table of Contents

1 Intro

With a short C-h v org-toc TAB TAB (means: search all variables which start with org- and containt -toc) I found the following even simpler way. After I got that solution working, I found that this was still much too complex and that org-mode actually provides an even easier and very convenient way to add the TOC at any place.

3 Appendix: Complex way

This is the complicated way I tried first. It only works with LaTeX, but there it works. Better use the simple way.

Set org-export-with-toc to nil as file-local variable. This means you just append the following to the file:

# Local Variables:
# org-export-with-toc: nil
# End:

(another nice local variable is org-confirm-babel-evaluate: nil, but don’t set that globally, otherwise you could run untrusted code when you export org-mode files from others. When this is set file-local, emacs will ask you for each file you open whether you want to accept the variable setting)

Then write the abstract before the first heading and add tableofcontents after it. Example:

Now just call M-x org-redisplay-inline-images to see the screenshot (or add it to the function).

In action:

Have fun with Emacs - and happy hacking!

PS: In case it’s not obvious: The screenshot shows emacs just as the screenshot is being shot - with the method shown here ☺)

Matthew Gregg: @marjoleink "way of life" thing again, but if you can invest some time, org-mode is a really powerful note keeping environment. → Marjolein Katsma: @mcg I'm sure it is - but seriously: can you embed a diagram2 or screenshot, scale it, and link it to itself? ↩

For diagrams, you can just insert a link to the image file without description, then org-mode can show it inline. To get an even nicer user-experience (plain text diagrams or ascii-art), you can use inline code via org-babel using graphviz (dot) or ditaa - the latter is used for the diagrams in my complete Mercurial branching strategy. ↩

Add :results output to the #+begin_src line of the second block to see the print results under that block when you hit C-c C-c in the block.

You can also use properties of headlines for giving the noweb-ref. Org-mode can then even concatenate several source blocks into one noweb reference. Just hit C-c C-x p to set a property (or use M-x org-set-property), then set noweb-ref to the name you want to use to embed all blocks under this heading together.

Note: org-babel prefixes each line of an included code-block with the prefix used for the reference (here <<assign_abc>>). This way you can easily include blocks inside python functions.

Note: To keep noweb-references literally in the output or similar, have a look at the different options to :noweb.

Note: To do this with shell-code, it’s useful to change the noweb markers to {{{ and }}}, because << and >> are valid shell-syntax, so they disturb the highlighting in sh-mode. Also confirming the evaluation every time makes plain exporting problematic. To fix this, just add the following somewhere in the file (to keep this simple, just add it to the end):

sem is a part of GNU parallel which makes parallel execution easy. Essentially it gives us a simple version of the convenience we know from make.

for i in {1..100}; do
sem -j -1 [code] # run N-1 processes with N as the number of# pocessors in my computerdone

This means that the above org-mode block will finish instantly, but there will be a second process managed by GNU parallel which executes the plotting script.

The big advantage here is that I can also set this to execute on exporting a document which might run hundreds of code-blocks. If I did this with naive multiprocessing, that would spawn 100 processes which overwhelm the memory of my system (yes, I did that…).

sem -j -1 ensures, that this does not happen. Essentially it provides a process-pool with which it executes the code.

If you use this on export, take care to add a final code-block which waits until all other blocks finished:

sem --wait

A word of caution: Shell escapes

If you use GNU parallel to run programs, the arguments are interpreted two times: once when you pass them to sem and a second time when sem passes them on. Due to this, you have to add escaped quote-marks for every string which contains whitespace. This can look like the following code (the example above reduced to its essential parts):

I stumbled over this a few times, but the convenience of GNU parallel is worth the small extra-caution.

Besides: For easier editing of inline-source-code, set org-src-fontify-natively to true (t), either via M-x customize-variable or by adding the following to your .emacs:

(setq org-src-fontify-natively t)

Summary

With the tool sem from GNU parallel you get parallel execution of shell code-blocks in emacs org-mode using the familiar syntax from make:

sem -j -1 [escaped code]

Publish a single file with emacs org-mode

I often write small articles on some experience I make, and since I want to move towards using static pages more often, I tried using emacs org-mode publishing for that. Strangely the simple usecase of publishing a single file seems quite a bit more complex than needed, so I document the steps here.

This is my first use of org-publish, so I likely do not use it perfectly. But as it stands, it works. You can find the org-publish version of this article at draketo.de/proj/orgmode-single-file.

Table of Contents

1 Why static pages?

I recently lost a dynamic page to hackers. I could not recover the content from all the spam which flooded it. It was called good news and I had wanted to gather positive news which encourage getting active - but I never really found the time to get it running. See what is left of it: http://gute-neuigkeiten.de

Any dynamic page carries a big maintenance cost, because I have to update all the time to keep it safe from spammers who want to abuse it for commercial spam - in the least horrible case. I can choose a managed solution, but that makes me dependant on the hoster providing what I need. Or I can take the sledgehammer and just use a static site: It never does any writes to the webserver, so there is nothing to hack.

As you can see, that’s what I’m doing nowadays.

2 Why Emacs Org-Mode?

Because after having used MacOS for almost a decade and then various visual-oriented programs for another five years, Emacs is nowadays the program which is most convenient to me. It achieves a level of integration and usability which is still science-fiction in other systems - at least when you’re mostly working with text.

And Org-mode is to Emacs as Emacs is to the Operating System: It begins as a simple todo-list and accompanies you all the way towards programming, reproducible research - and publishing websites.

3 Current Solution

Currently I first publish the single file to FTP and then rename it to index.html. This translates to the following publish settings:

Now I can use C-c C-e P x orgmode-single-file to publish this file to the webserver whenever I change it.

Note the lambda: I just copy the published to index.html, because I did not find out, how to rename the file by just setting an option. :index-filename did not work. But likely I missed something which would make this much nicer.

Note that if I had wanted to publish a folder full of files, this would have been much easier: There actually is an option to create an automatic index-file and sitemap.

4 Conclusion

This is not as simple as I would like it to be. Maybe (or rather: likely) there is a simpler way. But I can now publish arbitrary org-mode files to my webserver without much effort (and without having to switch context so some other program). And that’s something I’ve been missing for a long time, so I’m very happy to finally have it.

And it was less pain that I feared, though publishing this via my drupal-site, too, obviously shows that I’m still far from moving to static pages for everything. For work-in-progress, this is great, though - for example for my Basics for Guile Scheme.

Read your python module documentation from emacs

I just found the excellent pydoc-info mode for emacs from Jon Waltman. It allows me to hit C-h S in a python file and enter a module name to see the documentation right away. If the point is on a symbol (=module or class or function), I can just hit enter to see its docs.

In its default configuration (see the Readme) it “only” reads the python documentation. This alone is really cool when writing new python code, but it s not enough, since I often use third party modules.

And now comes the treat: If those modules use sphinx for documentation (≥1.1), I can integrate them just like the standard python documentation!

It took me some time to get it right, but now I have all the documentation for the inverse modelling framework I contribute to directly at my fingertips: Just hit C-h S ENTER when I’m on some symbol and a window shows me the docs:

The text in this image is from Wouter Peters. Used here as short citation which should be legal almost everywhere under citation rules.

I want to save you the work of figuring out how to do that yourself, so here’s a short guide for integrating the documentation for your python program into emacs.

Integrating your own documentation into emacs

The prerequisite for integrating your own documentation is to use sphinx for documenting your code. See their tutorial for info how to set it up. As soon as sphinx works for you, follow this guide to integrate your docs in your emacs.

Install pydoc-info

First get pydoc-info and the python infofile (adapt this to your local setup):

2.7 Nice quotes

2.7.1 Code B_blockBMCOL

2.7.2 Result B_blockBMCOL

Emacs org-mode is a
great presentation tool -
Fast to beautiful slides.

Arne Babenhauserheide

2.8 Math snippet

2.8.1 Code BMCOLB_block

2.8.2 Inline B_block

\( 1 + 2 = 3 \) is clear

2.8.3 As equation B_block

\[ 1 + 2 \cdot 3 = 7 \]

2.8.4 Result BMCOLB_block

2.8.5 Inline B_block

\( 1 + 2 = 3 \) is clear

2.8.6 As equation B_block

\[ 1 + 2 \cdot 3 = 7 \]

2.9 \( \LaTeX \)

2.9.1 Code BMCOLB_block

\( \LaTeX \) gives a space
after math mode.
\LaTeX{} does it, too.
\LaTeX does not.
At the end of a sentence
both work.
Try \LaTeX. Or try \LaTeX{}.
Only \( \LaTeX \) and \( \LaTeX{} \)
also work with HTML export.

4 Thanks and license

4.1 Thanks

This presentation is licensed under the GPL (v3 or later) with the additional permission to distribute it without the sources and the copy of the GPL if you give a link to those.1

Footnotes:

1 : \tiny As additional permission under GNU GPL version 3 section 7, you may distribute these works without the copy of the GNU GPL normally required by section 4, provided you include a license notice and a URL through which recipients can access the Corresponding Source and the copy of the GNU GPL.\normalsize

Sending email to many people with Emacs Wanderlust

Putting all of them into the BCC field did not work (mail rejected by provider) and when I split it into 2 emails, many did not see my mail because it was flagged as potential spam (they were not in the To-Field)2.

I did not want to put them all into the To-Field, because that would have spread their email-addresses around, which many would not want3.

So I needed a different solution. Which I found in the extensibility of emacs and wanderlust4. It now carries the name wl-draft-send-to-multiple-receivers-from-buffer.

You simply write the email as usual via wl-draft, then put all email addresses you want write to into a buffer and call M-x wl-draft-send-to-multiple-receivers-from-buffer. It asks you about the buffer with email addresses, then shows you all addresses and asks for confirmation.

Then it sends one email after the other, with a randomized wait of 0-10 seconds between messages to avoid flagging as spam.

The email was about the birth of my second child, and I wanted to inform all people I care about (of whom I have the email address), which amounted to 220 recipients. ↩

Naturally this technique could be used for real spamming, but to be frank: People who send spam won’t need it. They will already have much more sophisticated methods. This little trick just reduces the inconvenience brought upon us by the measures which are necessary due to spam. Otherwise I could just send a mail with 1000 receivers in the BCC field - which is how it should be. ↩

It only needs one careless friend, and your connections to others get tracked in facebook and the likes. For more information on Facebook, see Stallman about Facebook. ↩

Sure, there are also template mails and all such, but learning to use these would consume just as much time as extending emacs - and would be much less flexible: Should I need other ways to transform my mails, I’ll be able to just reuse my code. ↩

Simple Emacs DarkRoom

I just realized that I let myself be distracted by all kinds of not-so-useful stuff instead of finally getting to type the text I already wanted to transcribe from stenografic at the beginning of … last week.

Screenshot!

Let’s take a break for a screenshot of the final version, because that’s what we want from any program :)

As you can see, the distractions are removed — the screenshot is completely full screen and only the text is left. If you switch to the minibuffer (i.e. via M-x), the status bar (modeline) is shown.

Background

To remove the distractions I looked again at WriteRoom and DarkRoom and similar which show just the text I want to write. More exactly: I thought about looking at them again, but at second thought I decided to see if I could not just customize emacs to do the same, backed with all the power you get from several decades of being THE editor for many great hackers.

It took some googling and reading emacs wiki, and then some Lisp-hacking, but finally it’s 4 o’clock in the morning and I’m writing this in my own darkroom mode1, toggled on and off by just hitting F11.

Also I now activated cua-mode to make it easier to interact with other programs: C-c and C-x now copy/cut when the mark is active. Otherwise they are the usual prefix keys. To force them to be the prefix keys, I can use control-shift-c/-x. I thought this would disturb me, but it does not.

To make it faster, I also told cua-mode to have a maximum delay of 50ms, so I don’t feel the delay. Essentially I just put this in my ~/.emacs:

(cua-mode t)
(setq cua-prefix-override-inhibit-delay 0.005)

Epilog

Well, did this get me to transcribe the text? Not really, since I spent the time building my own DarkRoom/WriteRoom, but I enjoyed the little hacking and it might help me get it done tomorrow - and get far more other stuff done.

I hereby declare that anyone is allowed to use this post and the screenshot under the same licensing as if it had been written in emacswiki.

Actually there already is a darkroom mode, but it only works for windows. If you use that platform, you might enjoy it anyway. So you might want to call this mode “simple darkroom”, or darkroom x11 :) ↩

Staying sane with Emacs (when facing drudge work)

In the lower left window I check the identifier in the table I have to complete (left column), then I search for all instances of that identifier in the right window and insert the instrument type, the SIGMA (uncertainty due to representation error defined for the type of the instrument and the location of the site) and note whether the site is marked as assimilated in the config file.

Then I also check all the other config files and note whether the site is assimilated there.

Drudge work. There are people who can do this kind of work. My wife would likely be able to do it faster without tool support than I can do it with tool support. But I’m really bad at that: When the task gets too boring I tend to get distracted - for example by writing this article.

To get the task done anyway, I create tools which make it more enjoyable. And with Emacs that’s actually quite easy, because Emacs provides most required tools out of the box.

Firstoff: My workflow before adding tools was like this:

hit xo to switch from the lower left window to the config file at the right.

Use M-x occur then type the station identifier. This displays all occurances of the station identifier within the config file in the upper left window.

Hit xo twice to switch to the lower left window again.

Type the information into the file.

Switch to the next line and repeat the process.

I now want to simplify this to a single command per line. I’ll use F9 as key, because that isn’t yet used for other things in my Emacs setup and because it is easy to reach and my default keybinding as “useful shortcut for this file”. Other single-keystroke options would be F7 and F8. All other F-keys are already used :)

To make this easy, I define a macro:

Move to the line above the line I want to edit.

Start Macro-recording with C-x C-(.

Go to the beginning of the next line with C-n and C-a.

Activate the mark with C-SPACE and select the whole identifier with M-f.

Make the identifier lowercase with M-x downcase-region, copy it with M-w and undo the downcasing with C-x u (or use the undo key; I defined one in my xmodmap).

Switch to the config file with C-x o

Search the buffer with M-x occur, inserting the identifier with C-y.

Hit C-x oC-x o (yes, twice) to get back into the list of sites.

Move to the end of the instrument column with M-f and kill the word with C-BACKSPACE.

Tutorial: Writing scientific papers for ACPD using emacs org-mode

Emacs Org mode is an excellent tool for reproducible research,1 but research is only relevant if people learn about it.2 To reach people with scientific work, you need to publish your results in a Journal, so I show here how to publish in ACPD with Emacs Org mode.3

lineno.sty. This is required by copernicus, but not included in the package - and neither in the texlive version I use.

2 Basic Setup

2.1 Emacs

The first step in publishing to ACPD is to activate org-mode and latex export and to create a latex-class in Emacs. To do so, just add the following to your ~/.emacs (or ~/.emacs.d/init.el) and eval it (for example by moving to the closing parenthesis and typing C-x C-e):

The first line adds reftex-citations with C-c [, the rest sets some reftex-defaults and adds a menu which allows you to chose using \textbackslash citep{} instead of \textbackslash cite{} (this is what ACPD requires).

For nice Sourcecode highlighting, you should also install Pygmentize and then add the following to your .emacs.d:

2.2 The working folder

As next step, unzip the copernicus latex package in the folder you want to use for writing your article (do use a dedicated folder for that: org-mode leaves around some files). And remember to use a version-tracking system like Mercurial, so you can always take snapshots of your current state.

This will give you the following files:

authblk.sty

copernicus.bst

copernicus_discussions.cls

natbib.sty

pdfscreen.sty

pdfscreencop.sty

Ensure that all of them are in your folder, not in a subfolder. If necessary copy them there.

3.2 Delayed table of contents

The table of contents is set to be shown after the Abstract by setting the toc:nil option and later explicitely calling #+TOC: headlines 2. In org-mode this is really straightforward.

3.3 Delayed maketitle

Delaying \textbackslash maketitle is a bit more convoluted than delaying the TOC. First we add the local variable org-export-allow-bind-keywords: t at the bottom to allow file-local custom bindings for functions in the file, then we inactivate the title-command with #+BIND: org-latex-title-command /""/ and finally we add \textbackslash maketitle where we need it.

3.4 Define minted style

This defines the variables minted uses for beautiful code-blocks. Without this, your code-blocks will just look like inline text.

3.5 Intro and conclusions

The Introduction and the conclusions have their own commands in ACPD, because they use them to add bookmarks. You can also use he commands to specify another name.

We call the commands with #+LaTeX: (just like some others) which allows us to explicitely add arbitrary LaTeX-code.

3.6 Appendix

The appendix should be used sparingly. It changes the numbering of the pages.

#+Latex: \appendix

3.7 Bibliography

The bibliography allows referring to entries from your general bibtex-file. Ensure that you use the correct absolute path to that file. For more information, see the org-tutorial page for biblatex.

3.8 Babel evaluate without confirmation

This allows us to just run all code snippets which we embedded in the document when we export the file. If we do not set this local variable, we have to acknowledge each source block before it runs (the block with local variables also contains the variable which allows binding functions on a per-file basis, as explained above).

4 Conclusion

With this setup, you can publish your paper with ACPD using org-mode for the actual writing, which has a much lower overhead than LaTeX and offers quite a few unique features for more efficient working - from easy referencing over inline math preview to planning and code-evaluation directly in your file.

Footnotes:

Research, or rather science not only means to learn new things and to uncover secrets, but just as importantly to share what you learn. Fun fact: The German word for science is “Wissenschaft”, built from the words “wissen” (knowledge) and “schaft” (from schaffen: create), so it more exactly captures the essence of scientific work than the word “science”, that is based on the latin word “scientia” which just means knowledge. It isn’t enough to just learn. Creating knowledge requires telling it to others, so they can build upon it.

I chose ACPD as target for this article, because it is an Open Access journal, and because I want to publish in it (which makes it a rather natural choice for a tutorial).

Unicode char \u8:χ not set up for use with LaTeX: Solution (made easy with Emacs)

For years I regularly stumbled over LaTeX-Errors in the form of Unicode char \u8:χ not set up for use with LaTeX. I always took the chickens path and replaced the unicode characters with the tex-escapes in the file. That was easy, but it made my files needlessly unreadable. Today I decided to FIX the problem once and for all. And it worked. Easily.

Firstoff: The problem I’m facing is that my keyboard layout makes it effortless for me to input characters like ℂ Σ and χ. But LaTeX cannot cope with them out-of-the-box. Org-mode already catches most of these problems, so I can write things like x² instead of x^2, but occasionally it stumbles.

The solution to that is actually pretty simple: I only need to declare the escapes-sequences LaTeX should use when it sees one of the characters (to be used before \begin{document}!):

\DeclareUnicodeCharacter{03C7}{\chi}

Or in org-mode:

#+LaTeX_HEADER: \DeclareUnicodeCharacter{03C7}{\chi}

Thanks go to Wikibooks:LaTeX for this. Their solution suggests then to read several Unicode definition documents for tracking down the codepoint of the character. But we can make that easier with Emacs (almost everything is easier with Emacs ☺).

Instead of browsing huge documents manually, we simply rely on the unicode-definitions in Emacs: Move the cursor over the char and execute M-x describe-char.

Unicode-Characters for TODO-States in Emacs Orgmode

By default EmacsOrgmode uses uppercase words for todo keywords. But having tens of entries marked with TODO and DONE in my file looked horribly cluttered to me. So I searched for alternatives. After a few months of experimentation, I decided on the following scheme. It served me well ever since:

❢ To do

☯ In progress

⚙ A program is running (optional detail)

✍ I’m writing (optional detail)

⧖ Waiting

☺ To report

✔ Done

⌚ Maybe do this at some later time

✘ Won’t do

To set this in org-mode, just add the following to the header (and reopen the document, for example with C-x C-v):

#+SEQ_TODO: ❢ ☯ ⧖ | ☺ ✔ ⌚ ✘

or for the complex case (with details on what I do)

#+SEQ_TODO: ❢ ☯ ⚙ ✍ ⧖ | ☺ ✔ ⌚ ✘

Then use C-c C-t or SHIFT-→ (shift + right arrow) to switch to the next state or SHIFT-← (shift + left arrow) to switch to the previous state.

Anything before the | in the SEQ_TODO is shown in red (not yet done), anything after the | is show in green (done). Things which get triggered when something is done (like storing the time of a scheduled entry) happen when the state crosses the |.

And with that, my orgmode documents are not only very useful but also look pretty lean. Just as good as having a GUI with images, but additionally I can access them over SSH and edit the todo state with any tool - because it’s just text.

Use the source, Luke! — Emacs org-mode beamer export with images in figure

I just needed to tweak my Emacsorg-mode to beamer-latex export to embed images into a figure environment (not wrapfigure!). After lots of googling and documentation reading I decided to bite the bullet and just read the source. Which proved to be much easier than I had expected.

This tutorial requires at least org-mode 8.0 (before that you had to use hacks to get figure without a caption). It is only tested for org-mode 8.0.2: The code you see when you read the source might look different in other versions.

2 Use the Source!

After lots of googling and documentation reading I decided to bite the bullet and just read the source. Which proved to be much easier than I had expected (warning: obscure list of commands follows. Will be explained afterwards):

3 Commands Explained

For all those who are not fluid in emacs commands, Here’s a short breakdown here’s a breakdown of my source-reading process:

C-h f org-latex-export-as-latex

Get the help (Control-h) for the function (f) org-latex-export-as-latex. I knew that org-mode calls that. If you did not know it, you could have simply used C-h k C-e (get help on the export keyboard shortcut) which would have led you to the function org-export-dispatch and the source file ox.el. But since the org-mode guides tell you to use M-x org-latex-export-as-latex, the function to search for is actually pretty obvious. Alternatively just use M-x org-latex- and then type TAB 2 times. That will show you all the export functions.

C-x C-o

Switch to the other buffer.

C-s .el C-b ENTER

Focus on the source file and open it (the canonical suffix for emacs lisp files is .el).

C-s figure C-s C-s C-s ...

Search for figure. Repeat 9 times to find the correct place in the code (in emacs that’s really easy and fast to do).

Voilà, you found the snippet which tells you that you can use the float-keyword (:float) with the argument "figure".

4 Conclusion

Using the source was actually faster than googling in this case - and if you practise it, you learn bits and pieces about the foundation of the program you use, which will enable you to adapt it even better to your needs in the future.

Wish: KDE with Emacs-style keyboard shortcuts

I would love to be able to use KDE with emacs-style keyboard shortcuts, because Emacs offers a huge set of already clearly defined shortcuts for many different situations. Since its users tend to do very much with the keyboard alone, even more obscure tasks are available via shortcuts.

I think that this would be useful, because Emacs is like a kind of nongraphical desktop environment itself (just look at emacspeak!). For all those who use Emacs in a KDE environment, it could be a nice timesaver to be able to just use their accustomed bindings.

It also has a mostly clean structure for the bindings:

"C-x anything" does changes which affect things outside the content of the current buffer.

"C-c anything" is kept for specific actions of programs. For example "C-c C-c" in an email sends the email, while "C-c C-c" in a version tracking commit message finishes the message and starts the actual commit.

"C-anything but x or c" acts on the content you're currently editing.

"M-x" opens a 'command-selection-dialog' (just like alt-f2). You can run commands by name.

"M-anything but x" is a different flavor of "C-anything but x or c". For example "C-f" moves the cursor one character forward, while "M-f" moves one word forward. "C-v" moves one page forward, while "M-v" moves one page backwards.

On the backend side, this would require being able to define multistep shortcuts. Everything else is just porting the emacs shortcuts to KDE actions.

The actual porting of shortcuts would then require mapping of the Emacs commands to KDE actions.

Some examples:

"C-s" searches in a file. Replaces C-f.

"C-r" searches backwards.

"C-x C-s" saves a file -> close. Replaces C-w.

"C-x C-f" opens a file -> Open. Replaces C-o.

"C-x C-c" closes the program -> quit. Replaces C-q.

"C-x C-b" switches between buffers/files/tabs -> switch the open file. Replaces alt-right_arrow and a few other (to my knowledge) inconsistent bindings.

"C-x C-2" splits a window (or part of a window) vertically. "C-x C-o" switches between the parts. "C-x C-1" undoes the split and keeps the currently selected part. "C-x C-0" undoes the split and hides the currently selected part.

Write multiple images on a single page in org-mode.

How to add show multiple images on one page in the latex-export of emacs org-mode. I had this problem. This is my current solution.

The mail in ~/.local/share/mail is fetched via fetchmail and procmail to have a really reliable mail fetching system which does not rely on a non-broken database or free space on the disk to keep working…

keep auto-complete from competing with org-mode structure-templates

For a long time it bothered me that auto-complete made it necessary for me to abort completion before being able to use org-mode templates.

I typed <s and auto-complete showed stuff like <string, forcing me to hit C-g before I could use TAB to complete the template with org-mode.

I fixed this for me by adding all the org-mode structure templates as stop-words:

There is also a German Version to this Page: Freie Software. Most articles are not translated, so the content on the german page and on the english page is very different.

For me, Gentoo is about *convenient* choice

It's often said, that Gentoo is all about choice, but that doesn't quite fit what it is for me.

After all, the highest ability to choose is Linux from scratch and I can have any amount of choice in every distribution by just going deep enough (and investing enough time).

What really distinguishes Gentoo for me is that it makes it convenient to choose.

Since we all have a limited time budget, many of us only have real freedom to choose, because we use Gentoo which makes it possible to choose with the distribution-tools. Therefore only calling it “choice” doesn't ring true in general - it misses the reason, why we can choose.

So what Gentoo gives me is not just choice, but convenient choice.

Some examples to illustrate the point:

KDE 4 without qt3

I recently rebuilt my system after deciding to switch my disk layout (away from reiserfs towards a simple ext3 with reiser4 for the portage tree). When doing so I decided to try to use a "pure" KDE 4 - that means, a KDE 4 without any remains from KDE3 or qt3.

To use kde without any qt3 applications, I just had to put "-qt3" and "-qt3support" into my useflags in /etc/make.conf and "emerge -uDN world" (and solve any arising conflicts).

Imagine doing the same with a (K)Ubuntu...

Emacs support

Similarly to enable emacs support on my GentooXO (for all programs which can have emacs support), I just had to add the "emacs" useflag and "emerge -uDN world".

Selecting which licenses to use

Just add

ACCEPT_LICENSE="-* @FSF-APPROVED @FSF-APPROVED-OTHER"

to your /etc/make.conf to make sure you only get software under licenses which are approved by the FSF.

For only free licenses (regardless of the approved state) you can use:

ACCEPT_LICENSE="-* @FREE"

All others get marked as masked by license. Default (no ACCEPT_LICENSE in /etc/make.conf) is “* -@EULA”: No unfree software. You can check your setting via emerge --info | grep ACCEPT_LICENSE. More information…

One program (suite) in testing, but the main system rock stable

Another part where choosing is made convenient in Gentoo are testing and unstable programs.

I remember my pain with a Kubuntu, where I wanted to use the most recent version of Amarok. I either had to add a dedicated Amarok-only testing repository (which I'd need for every single testing program), or I had to switch my whole system into testing. I did the latter and my graphical package manager ceased to work. Just imagine how quickly I ran back to Gentoo.

And then have a look at the ease of deciding to take one package into testing in Gentoo:

emerge --autounmask-write =cathegory/package-version

etc-update

emerge =cathegory/package-version

EDIT: Once I had a note here “It would be nice to be able to just add the missing dependencies with one call”. This is now possible with --autounmask-write.

And for some special parts (like KDE 4) I can easily say something like

It’s still a bit slower than eix, but it operates directly on the portage tree and my overlays — and I no longer have to use eix-sync for syncing my overlays, just to make sure eix is updated.

Some other treats of pkgcore

Aside from pquery, pkgcore also offers pmerge to install packages (almost the same syntax as emerge) and pmaint for synchronizing and other maintenance stuff.

From my experience, pmerge is hellishly fast for simple installs like pmerge kde-misc/pyrad, but it sometimes breaks with world updates. In that case I just fall back on portage. Both are Python, so when you have one, adding the other is very cheap (spacewise).

Also pmerge has the nice pmerge @glsa feature: Get Gentoo Linux security updates. Due to it’s almost unreal speed (compared to portage) checking for security updates now doesn’t hurt anymore.

It differs from portage in that you call world as set explicitely — either via a command like pmerge -aus world or via pmerge -au @world.

pmaint on the other hand is my new overlay and tree synchronizer. Just call pmaint sync to sync all, or pmaint sync /usr/portage to sync only the given overlay (in this case the portage tree).

Caveeats

Using pix as replacement of eix isn’t yet perfect. You might hit some of the following:

pix always shows all packages in the tree and the overlays. The keywords are only valid for the highest version, though. marienz from #pkgcore on irc.freenode.net is working on fixing that.

If you only want to see the packages which you can install right away, just use pquery -nv. pix is intended to mimik eix as closely as possible, so I don’t have to change my habits ;) If it doesn’t fit your needs, just change the alias.

To search only in your installed packages, you can use pquery --vdb -nv.

Sometimes pquery might miss something in very broken overlay setups (like my very grown one). In that case, please report the error in the bugtracker or at #pkgcore on irc.freenode.net:

23:27 <marienz> if they're reported on irc they're probably either
fixed pretty quickly or they're forgotten23:27 <marienz> if they're reported in the tracker they're harder
to forget but it may take longer before they're
noticed

I hope my text helps you in changing your Gentoo system further towards the system which fits you best!

This video shows the activity of the Hurd coders and answers some common questions about the Hurd, including “How stagnated is Hurd compared to Duke Nukem Forever?”. It is created directly from commits to Hurd repositories, processed by community codeswarm.

Every shimmering dot is a change to a file. These dots align around the coder who did the change. The questions and answers are quotes from todays IRC discussions (2010-07-13) in #hurd at irc.freenode.net.

You can clearly see the influx of developers in 2003/2004 and then again a strenthening of the development in 2008 with less participants but higher activity than 2003 (though a part of that change likely comes from the switch to git with generally more but smaller commits).

I hope you enjoyed the high-level look on the activity of the Hurd project!

PS: The last part is only the information title with music to honor Sean Wright for allowing everyone to use and adapt his Album Enchanted.

Some technical advantages of the Hurd

→ An answer to just accept it, truth hurds, where Flameeyes told his reasons for not liking the Hurd and asked for technical advantages (and claimed, that the Hurd does not offer a concept which got incorporated into other free software, contributing to other projects). Note: These are the points I see. Very likely there are more technical advantages which I don’t see well enough to explain them.

Information for potential testers: The Hurd is already usable, but it is not yet in production state. It progressed a lot during the recent years, though. Have a look at the status report if you want to see if it’s already interesting for you. See running the Hurd for testing it yourself.

Thanks for explaining your reasons. As answer:

Influence on other systems: FUSE in Linux and limited translators in BSD

translator-based filesystem

On the bare technical side, the translator-based filesystem stands out: The filesystem allows for making arbitrary programs responsible for displaying a given node (which can also be a directory tree) and to start these programs on demand. To make them persistent over reboots, you only need to add them to the filesystem node (for which you need the right to change that node). Also you can start translators on any node without having to change the node itself, but then they are not persistent and only affect your view of the filesystem without affecting other users. These translators are called active, and you don’t need write permissions on a node to add them.

network transparency on the filesystem level

The filesystem implements stuff like Gnome VFS (gvfs) and KDE network transparency on the filesystem level, so those are available for all programs. And you can add a new filesystem as simple user, just as if you’d write into a file “instead of this node, show the filesystem you get by interpreting file X with filesystem Y” (this is what you actually do when setting a translator but not yet starting it (passive translator)).

This installs all deb-packages in the folder path/to on the FTP server. The shell sees normal directories (beginning with the directory “ftp:”), so shell expressions just work.

You could even define a Gentoo mirror translator (settrans mirror\: /hurd/gentoo-mirror), so every program could just access mirror://gentoo/portage-2.2.0_alpha31.tar.bz2 and get the data from a mirror automatically: wget mirror://gentoo/portage-2.2.0_alpha31.tar.bz2

unionmount as user

Or you could add a unionmount translator to root which makes writes happen at another place. Every user is able to make a readonly system readwrite by just specifying where the writes should go. But the writes only affect his view of the filesystem.

persistent translators, started when needed

Starting a network process is done by a translator, too: The first time something accesses the network card, the network translator starts up and actually provides the device. This replaces most initscripts in the Hurd: Just add a translator to a node, and the service will persist over restarts.

It’s a surprisingly simple concept, which reduces the complexity of many basic tasks needed for desktop systems.

And at its most basic level, Hurd is a set of protocols for messages which allow using the filesystem to coordinate and connect processes (along with helper libraries to make that easy).

add permissions at runtime (capabilities)

Also it adds POSIX compatibility to Mach while still providing access to the capabilities-based access rights underneath, if you need them: You can give a process permissions at runtime and take them away at will. For example you can start all programs without permission to use the network (or write to any file) and add the permissions when you need them.

Different from Linux, you do not need to start privileged and drop permissions you do not need (goverened by the program which is run), but you start as unprivileged process and add the permissions you need (governed by an external process):

groups # → root
addauth -p $(ps -L) -g mail
groups # → root mail

lightweight virtualization

And then there are subhurds (essentially lightweight virtualization which allows cutting off processes from other processes without the overhead of creating a virtual machine for each process). But that’s an entire post of its own…

Easy to test lowlevel hacking

And the fact that a translator is just a simple standalone program means that these can be shared and tested much more easily, opening up completely new options for lowlevel hacking, because it massively lowers the barrier of entry.

For example the current Hurd can use the Linux network device drivers and run them in userspace (via DDE), so you can simply restart them and a crashing driver won’t bring down your system.

subdividing memory management

And then there is the possibility of subdividing memory management and using different microkernels (by porting the Hurd layer, as partly done in the NetBSD port), but that is purely academic right now (search for Viengoos to see what its about).

Summary

So in short:

The translator system in the Hurd is a simple concept which makes many tasks easy, which are complex with Linux (like init, network transparency, new filesystems, …). Additionally there are capabilities (give programs only the access they need - at runtime), subhurds and (academic) memory management.

P5S: As an update in 2015: A pretty interesting development I saw in the past few years is that the systemd developers have been bolting features onto Linux which the Hurd already provided 15 years ago. Examples: socket-activation provides on-demand startup like passive translators, but as crude hack piggybacked on dbus which can only be used by dbus-aware programs while passive translators can be used by any program which can access the filesystem, calling priviledged programs via systemd provides jailed priviledge escalation like adding capabilities at runtime, but as crude hack piggybacked on dbus and specialized services.

That means, there is a need for the features of the Hurd, but instead of just using the Hurd, where they are cleanly integrated, these features are bolted onto a system where they do not fit and suffer from bad performance due to requiring lots of unnecessary cruft to circumvent limitations of the base system. The clean solution would be to just set 2-3 full-time developers onto the task of resolving the last few blockers (mainly sound and USB) and then just using the Hurd.

(A)GPL as hack on a Python-powered copyright system

AGPL is a hack on copyright, so it has to use copyright, else it would not compile/run.

All the GPL licenses are a hack on copyright. They insert a piece of legal code into copyright law to force it to turn around on itself.

You run that on the copyright system, and it gives you code which can’t be made unfree.

To be able to do that, it has to be written in copyright language (else it could not be interpreted).

In this Python-powered copyright-system, I could just define this after your definition but before your call to copyright(), and all calls to APGL ( code ) would suddenly return code owned by me.

Or you would have to include another way of defining which exact AGPL you mean. Something like “AGPL, but only the versions with the sha1 hashes AAAA BBBB and AABA”. cc tries to use links for that, but what do you do if someone changes the DNS resolution to point creativecommons.org to allmine.com? Whose DNS server is right, then - legally speaking?

In short: AGPL is a hack on copyright, so it has to use copyright, else it would not compile/run.

Communicating your project is an essential step for getting
users. Here I summarize my experience from working on several different projects including KDE(where I learned the basics of PR - yay, sebas!), the Hurd(where I could really make a difference by improving the frontpage and writing the Month of the Hurd), Mercurial(where I practiced the minimally invasive PR) and 1d6(my own free RPG where I see how much harder it is to do PR, if the project to communicate is your own).

Since voicing the claim that marketing is important often leads to discussions with people
who hate marketing of any kind, I added an appendix with an example which
illustrates nicely what happens when you don’t do any PR - and what
happens if you do PR of the wrong kind.

If you’re pressed for time and want the really short form, just jump to the questionnaire.

What is good marketing?

Before we jump directly to the guide, there is an important term to define: Good marketing. That is the kind of marketing, we want to do.

The definition I use here is this:

Good marketing ensures that the people to whom a project would be
useful learn about the project.

and

Good marketing starts with the existing strengths of a project and
finds people to whom these strengths are useful.

Thus good marketing does not try to interfere with the greater plan of
the project, though it might identify some points where a little effort
can make the project much more interesting to users. Instead it finds
users to whom the project as it is can be useful - and ensures that
these know about the project.

Be fair to competitors, be honest to users, put the project goals before generic marketing considerations.

As such, good marketing is an interface between the project and its
(potential) users.

How to communicate your project?

This guide depends on one condition: Your project already has at least one area in which it excels over other projects. If that isn’t the case, please start by making your project useful to at least some people.

The basic way for communicating your project to its potential users
always follows the same steps.

To make this text easier to follow, I’ll intersperse it with examples from the latest project where I did this analysis: GNU Guile: The GNU Ubiquitous Intelligent Language for Extensions. Guile provides a nice example, because its mission is clearly established in its name and it has lots of backing, but up until our discussion actually had a wikipedia-page which was unappealing to the point of being hostile against Guile itself.

To improve the communication of our project, we first identify our target groups.

Who are our Target Groups?

To do so, we begin by asking ourselves, who would profit from our project:

What can we do well and how do we compare to others?

To whom would we already be useful or interesting if people knew about our strengths?

To whom are we already the best option?

Try to find about 3 groups of people and give them names which identify them. Those are the people we must reach to grow on the short term.

In the next step, we ask ourselves, whom we want or need as users to fullfill our mission (our long-term goal):

Where do we want to get? What is our goal? (do we have a mission statement?)

Whom do we need to get there?

Whom do we want as users? Those shape us as they take part in the development - either as users or as fellow developers.

Again try to find about 3 groups of people and give them names which identify them. Those are the people we must reach to achieve our longterm goal. If while writing this down you find that one of the already identified groups which we could reach would actually detract us from our goal, mark them. If they aren’t direly needed, we would do best to avoid targeting them in our communication, because they will hinder us in our longterm progress: They could become a liability which we cannot get rid of again.

Now we have about 6 target groups: Those are the people who should know about our project, either because they would benefit from it for pursuing their goals, or because we need to reach them to achieve our own goals. We now need to find out, which kind of information they actually need or search.

GNU-folk: What does Guile offer my program? (Being a GNU package is a distinct advantage, so there is less competition by non-GNU languages)

Whose wishes can we fullfill?

If our answers for a given group are not yet strong enough, we cannot yet communicate our project convincingly to them. In that case it is best to postpone reaching out to that group, otherwise they could get a lasting weak image of our project which would make it harder to reach them when we have stronger answers at some point in the future.

Remove all groups whose wishes we cannot yet fullfill, or for whom we do not see ourselves as the best choice.

Learner: There aren’t yet tutorials for learning to program in Guile, though there are tutorials for learning to write scheme - and even one for understanding Scheme from the view of a Python-user. But our project resources cannot yet support people who cannot program at all well enough, so we have to restrict ourselves to programmers who want to learn a new language.

Starter: Guile has solid support for many unix-specific things, but it is not yet a complete project-publishing solution. So we have to restrict ourselves to targeting people who want to start a project which is mainly intended to be used in environments with proper package management (mostly GNU/Linux).

Emacser: Guile provides foreign-function-call. If guile gets used as base for Emacs, Emacs users get direct access to all scheme functions, too - as well as real threading. And that’s pretty strong. Also Geiser provides solid Guile Scheme support in Emacs.

GNU-folk: They are either extenders or project starters or learners, so we don’t need to treat them as their own group.

Provide those answers!

Now we have answers for the target groups. When we now talk or write about our project, we should keep those target groups in mind.

You can make that arbitrarily complex, for example by trying to find out which of our target groups use which medium. But lets keep it simple:

Ensure that our website (and potentially existing wikipedia page) includes the information which matters to our target groups. Just take all the answers for all the target groups we can already reach and check whether the basic information contained in them is given on the front page of our website.

And if not, find ways to add it.

As next steps, we can make sure that the questions we found for the target groups not only get answered, but directly lead the target groups to actions: For example to start using our project.

Example: The new Wikipedia-Page of Guile

For Guile, we used this analysis to fix the Wikipedia-Page. The old-version mainly talked about history and weaknesses (to the point of sounding hostile towards Guile), and aside from the latest release number, it was horribly outdated. And it did not provide the information our target groups required.

The current Wikipedia-Page of GNU Guile works much better - for the project as well as for the readers of the page. Just comare them directly and you’ll see quite a difference. But aside from sounding nicer, the new site also addresses the questions of our target groups. To check that, we now ask: Did we include information for all the
potential user-groups?

Schemers: Yepp (it’s scheme and there’s a section on Guile Scheme

Extenders: Yepp (libguile)

Learners: Not yet. We might need a syntax-section with some examples. But wikipedians do not like Howto-Like sections. Also the interpreter should get a notice.

Project-Starters: Partly in the “core idea”-part in the section Guile Scheme. It might need one more paragraph showing advantages of Guile which make it especially suited for that.

1337s: It is the preferred extension system for the GNU Project. If you’re not that kind of 1337: The Macro-System is hygienic (no surprising side-effects).

Emacs users: They got their own section.

So there you go: Not perfect, but most of the groups are covered. And this also ensures that the Wikipedia-page is more interesting to its readers: A clear win-win.

Further points

Additional points which we should keep in mind:

On the website, do all of our target groups quickly find their way to advanced information about their questions? This is essential to keep the ones interested who aren’t completely taken by the short answers.

What is a common negative misconception about our project? We need to ensure that we do not write anything which strengthens this misconception. Is there an existing strength, which we can show to counter the negative misconception?

Where do we want to go? Do we have a mission statement?

bab-com q: Arne Babenhauserheide’s Project Communication Questionaire

For whom are we already useful or interesting? Name them as Target-Groups.

(1)

(2)

(3)

Whom do we want as users on the long run? Name them as Target-Groups.

(4)

(5)

(6)

What could the Target-Groups ask? What are their needs? Formulate them as questions.

(1)

(2)

(3)

(4)

(5)

(6)

Answer their questions.

(1)

(2)

(3)

(4)

(5)

(6)

Whose needs can we already fulfill well? For whom do we see ourselves as the best choice?

(1)

(2)

(3)

(4)

Ensure that our communication includes the answers to these questions (i.e. website, wikipedia page, talks, …), at least for the groups who are likely to use the medium on which we communicate!

Use bab-com to avoid bad-com ☺

Note: The mission statement

The mission statement is a short paragraph in which a project defines its goal.

A good example is:

Our mission is to create a general-purpose kernel suitable for the GNU operating system, which is viable for everyday use, and gives users and programs as much control over their computing environment as possible. → GNU Hurd mission explained

Another example again comes from Guile:

Guile was conceived by the GNU Project following the fantastic success of Emacs Lisp as an extension language within Emacs. Just as Emacs Lisp allowed complete and unanticipated applications to be written within the Emacs environment, the idea was that Guile should do the same for other GNU Project applications. This remains true today. → Guile and the GNU project

Closely tied to the mission statement is the slogan: A catch-phrase which helps anchoring the gist of your project in your readers mind. Guile does not have that, yet, but judging from its strengths, the following could work quite well for Guile 2.0 - though it falls short of Guile in general:

GNU Guile scripting: Use Guile Scheme, reuse anything.

Summary

We saw why it is essential to communicate the project to the outside, and we discussed a simple structure to check whether our way of communication actually fits our projects strengths and goals.

Finding the communication strategy actually boils down to 3 steps:

Target those who would profit from our project or whom we need.

Check what they need to know.

Answer that.

Also a clear mission statement, slogan and project description help to make the project more tangible for readers. In this context, good marketing means to ensure that people learn about the real strengths of the project.

With that I’ll conclude this guide. Have fun and happy hacking!
— Arne Babenhauserheide

Appendix: Why communicating your project?

In free software we often think that quality is a guarantee for
success. But in just the 10 years I’ve been using free software nowadays, I saw my
share of technically great projects succumbing to inferior projects
which simply reached more people and used that to build a dynamic
which greatly outpaced the technically better product.

One example for that are pkgcore and paludis. When portage, the
package manager of Gentoo, grew too slow because it did ever more
extensive tests, two teams set out to build a replacement.

One of the teams decided that the fault of the low performance lay in
Python, the language used by portage. That team built a package
manager in C++ and had --wonderfully-long-command-options without
shortcuts (have fun typing), and you actually had to run it twice:
Once to see what would get installed and then again to actually
install it (while portage had had an --ask option for ages, with -a as
shortcut). And it forgot all the work it had done in the previous run,
so you could wait twice as long for the result. They also had
wonderful latin names, and they managed the feat of being even slower
than portage, despite being written in C++. So their claim that
C++ would be magically faster than python was simply wrong. They
called their program paludis.

Note: Nowadays paludis has a completely new commandline interface which actually supports short command options. That interface is called cave and looks sane.

The other team did a performance analysis and realized that the low
performance actually lay with the filesystem: The portage tree, which
holds the required information, contains about 30,000 ebuilds and
almost 200,000 files in total, and portage accessed far more of those
files than actually needed for resolving the dependencies needed to
install the package. They picked python as their language - just like
portage. They used almost the same commandline options as portage,
except for the places where functionality differed. And they actually
got orders of magnitude faster than portage - so fast that their
search command often finished after less than a second while. portage
took over 10 seconds. They called their program pkgcore.

Both had more exact resolution of packages and could break cyclic
dependencies and so on.

So, judging from my account of the quality, which project would you
expect to succeed?

I sure expected pkgcore to replace portage within a few months. But
this is not what happened. And as I see it in hindsight, the
difference lay purely in PR.

The paludis team with their slow and hard-to-use program went all over
the Gentoo forums claiming that Python is a horrible language and that
a C program will kick portage any time. On their website they repeated
their attacks against python and claimed superiority at every
step. And they gathered quite a few zealots. While actually being
slower than portage. Eventually they rebranded paludis as just better
and more correct, not faster. And they created their own distribution
(exherbo) as direct rival of Gentoo. With a new, portage-incompatible
package format. As if they had read the book, how not to be a
friendly competitor.

The pkgcore team on the other hand focussed on good technology. They
created the snakeoil library for high-performance python code, but
they were friendly about it and actually contributed back to portage
where code could be shared. But their website was out of date, often
not noting the newest release and you actually had to run pmerge
--help to see the most current commandline options (though you could
simply guess them if you knew portage). And they got attacked by
paludis zealots so much, that this year the main developer finally
sacked the project: He told me on IRC that he had taken so much
vitriol over the years that it simply wasn’t worth the cost anymore.

So, what can we learn from this? Technical superiority does not gain
you anything, if you fail to convince people to actually use your
project.

If you don't communicate your project, you don't get users. If you don’t get users, your chances of losing motivation are orders of magnitude higher than if you get users who support you.

And aggressive marketing works, even if you cannot actually deliver on
your promises. Today they have a better user-interface and even short
option-names. But even to date, exherbo has much fewer
packages in its repositories than Gentoo. If the number of files is
any measure, the 10,000 files in their special repositories are just
about 5% of the almost 200,000 files portage holds. But they managed
quite well to fraction the Gentoo users - at least for some time. And
their repeated pushes for new standards in the portage tree (EAPIs)
created a constant pressure on pkgcore to adapt, which had the effect
that nowadays pkgcore cannot install from the portage tree anymore
(the search still works, though, and I still use it - I will curse
mightily on the day they manage to also break that).

So aggressive marketing and doing everything in the book of unfriendly
competition might have allowed the paludis devs to gather some users
and destroy the momentum of pkgcore, but it did not allow them to
actually become a replacement of portage within Gentoo. Their
behaviour alienated far too many people for that. So aggressive and
unfriendly marketing is better than no marketing, but it has severe
drawbacks which you will likely want to avoid.

If you use overly aggressive, unfriendly or dishonest communication
tactics, you get some users, but if your users know their stuff, you
won’t win the mindshare you need to actually make a difference.

If on the other hand you want to see communication done right, just
take a look at KDE and Gnome nowadays. They cooperate quite
well, and they compete on features and by improving their project so
users can take an informed choice about the project they choose.

And their number of contributors steadily keeps growing.

So what do they do? Besides being technically great, it boils down to
good marketing.

Download one page from a website with all its prerequisites

Often I want to simply backup a single page from a website. Until now I always had half-working solutions, but today I found one solution using wget which works really well, and I decided to document it here. That way I won’t have to search it again, and you, dear readers, can benefit from it, too ☺

The meaning of the options (and why you need them):

--no-parent: Only get this file, not other articles higher up in the filesystem hierarchy.

--timestamping: Only get newer files (don’t redownload files).

--page-requisites: Get all files needed to display this page.

--convert-links: Change files to point to the local files you downloaded.

--no-directories: Do not create directories: Put all files into one folder.

--no-host-directories: Do not create separate directories per web host: Really put all files in one folder.

--span-hosts: Get files from any host, not just the one with which you reached the website.

--adjust-extension: Add a .html extension to the file.

--no-check-certificate: Do not check SSL certificates. This is necessary if you’re missing one of the host certificates one of the hosts uses. Just use this. If people with enough power to snoop on your browsing would want to serve you a changed website, they could simply use one of the fake certifications authorities they control.

-e robots=off: Ignore robots.txt files which tell you to not spider and save this website. You are no robot, but wget does not know that, so you have to tell it.

-U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4': Fake being an old Firefox to avoid blocking based on being wget.

--directory-prefix=[target-folder-name]: Save the files into a subfolder to avoid having to create the folder first. Without that options, all files are created in the folder in which your shell is at the moment. Equivalent to mkdir [target-folder-name]; cd [target-folder-name]; [wget without --directory-prefix]

Conclusion

If you know the required options, mirroring single pages from websites with wget is fast and easy.

Note that if you want to get the whole website, you can just replace --no-parent with --mirror.

Happy Hacking!

Fix Quod Libet empty panes on Gentoo GNU/Linux (bug solving process)

For a few days now my Quod Libet has been broken, showing only empty space instead of information panes.

I investigated halfheartedly, but did not find the cause with quick googling. Today I decided to change that. I document my path here, because I did not yet write about how I actually tackle problems like these - and I think I would have profited from having a writeup like this when I started, instead of having to learn it by trial-and-error.

Update: Quodlibet 2.6.3 is now in the Gentoo portage tree - using my ebuild. The update works seamlessly. So to get your Quodlibet 2.5 running again, just call emerge =media-sound/quodlibet-2.6.3 =media-plugins/quodlibet-plugins-2.6.3. Happy Hacking!

Update: I got a second reply in the bug tracker which solved the plugins problem: I had user-plugins which require Quod Libet 3. Solution: mv ~/.quodlibet/plugins ~/.quodlibet/plugins.for-ql3. Quod Libet works completely again.

Solution for the impatient: Update to Quod Libet 2.5.1. In Gentoo that’s easy.

Table of Contents

1 Gathering Information

As starting point I installed the Quod Libet plugins (media-libs/quodlibet-plugins), thinking that the separation between plugins and mediaplayer might not be perfect. That did not fix the problem, but a look at the plugin listing gave me nice backtraces:

And these actually show the reason for the breakage: Cannot import GTK:

So I finally have an answer: pygobject changed the API. Can’t be hard to fix… (a realization process follows)

2 The solution-hunting process

let’s check the Gentoo forums for pygobject

pygobject now pulls systemd??? - and they wonder why I’m pissed off by systemd: hugely invasive changes just for some small packages… KDE gets rid of the monolithic approach, and now Gnome starts it, just much more invasive into the basic structure of all distros?

set the USE flag -systemd to avoid systemd (why didn’t I have that yet? I guess I did not expect that Gentoo would push that on me…)

check when I updated pygobject:

qlop -l pygobject

...
Thu Dec 5 00:26:27 2013 >>> dev-python/pygobject-3.8.3

a week ago - that fits the timeframe. Damn… pygobject-3.8.3, you have to go.

3 The core solution

In the bug report at Quod Libet I got a reply: Known issue with quodlibet 2.5 “which triggered a bug in a recent pygtk release, resulting in lists not showing”. The plugins seem to be unrelated. Solution to my immediate problem: Update to 2.5.1. That’s not yet in gentoo, but this is easy to fix:

cd /usr/portage/media-sound/
# create the category in my local portage overlay, defined as# PORTAGE_OVERLAY=/usr/local/portage in /etc/make.conf
mkdir -p /usr/local/portage/media-sound
# copy over the quodlibet directory, keeping the permissiong with -p
cp -rp quodlibet /usr/local/portage/media-sound
# most times it is enough to simply rename the ebuild to the new versioncd /usr/local/portage/media-sound/quodlibet
mv quodlibet-2.5.ebuild quodlibet-2.5.1.ebuild
# now prepare all the metadata portage needs - this requires# app-portage/gentoolkit
ebuild quodlibet-2.5.1.ebuild digest compile
# now it's prepared for the package manager. Just update it as usual:
emerge -u quodlibet

I wrote the solution in the Gentoo bug report. I should also state, that the gentoo package for Quod Libet is generally out of date (releases 2.6.3 and 3.0.2 are not yet in the tree).

Quod Libet works again.

As soon as the ebuild in the portage tree is renamed, Quod Libet should work again for all Gentoo users.

The plugins still need to be fixed, but I’ll worry about that later.

4 Conclusion

Solving the core problem took me some time, but it wasn’t really complicated. The part of the solution process which got me forward boils down to:

reporting a bug for the project with the information I could gather - including screenshots (or anything else which shows the problem directly - see How to Report Bugs Effectively for hints on that), and

checking the reported bug again a few hours or days later - and synchronizing the information between the project bug tracker and the distribution bug tracker to help fixing the bug for all users of the distribution and of other distributions.

And that’s it: To get something working again, check the bug trackers, report bugs and help synchronizing bug tracker info.

GnuPG/PGP signature, short explanation

»What is the .asc file?« This explanation is intended to be copied as-is into emails when someone asks about your signature.

The .asc file is a signature which can be used to verify that the email was really sent by me and wasn’t tampered with.[1] It can be verified with standard email security tools like Enigmail[2], Gpg4win[3] or MacGPG[4] - and others tools supporting OpenPGP[5].

Intro

I recently started looking into Autotools, to make it easier to run my code on multiple platforms.

Naturally you can use cmake or scons or waf or ninja or tup, all of which are interesting in there own respect. But none of them has seen the amount of testing which went into autotools, and none of them have the amount of tweaks needed to support about every system under the sun. And I recently found pyconfigure which allows using autotools with python and offers detection of library features.

I had already used Makefiles for easily storing the build information of anything from python projects (python setup.py build) to my PhD thesis with all the required graphs.

I also had used scons for those same tasks.

But I wanted to test, what autotools have to offer. And I found no simple guide which showed me how to migrate from a Makefile to autotools - and what I could gain through that.

So I decided to write one.

My Makefile

The starting point is the Makefile I use for building my PhD. That’s pretty generic and just uses the most basic features of make.

The code above is a rule. If you put a file with this content into some folder using the filename Makefile and then run make thing in that folder (in a shell), the program “make” will check whether the source files have been changed after it last created the thing and if they have been changed, it will execute the build commands.

You can use things from other rules as source file for your thing and make will figure out all the tasks needed to create your thing.

My Makefile below creates plots from data and then builds a PDF from an org-mode file.

Feature Equality

The first step is simple: How can I replicate with autotools what I did with the plain Makefile?

For that I create the files configure.ac and Makefile.am. The basic Makefile.am is simply my Makefile without any changes.

The configure.ac sets the project name, inits automake and tells autoreconf to generate a Makefile.

dnl run `autoreconf -i` to generate a configure script. dnl Then run ./configure to generate a Makefile.dnl Finally run make to generate the project.AC_INIT([Doktorarbeit Inverse GHG], [0.1], [arne.babenhauserheide@kit.edu])
dnl we use the build type foreign here instead of gnu because I do not have a NEWS file and similar, yet.AM_INIT_AUTOMAKE([foreign])
AC_CONFIG_FILES([Makefile])
AC_OUTPUT

Now, if I run `autoreconf -i` it generates a Makefile for me. Nothing fancy here: The Makefile just does what my old Makefile did.

But it is much bigger, offers real –help output and can generate a distribution - which does not work yet, because it misses the source files. But it clearly tells me that with `make distcheck`.

make dist: distributing the project

Since `make dist` does not work yet, let’s change that.

… easier said than done. It took me the better part of a day to figure out how to make it happy. Problems there:

I have to explicitely give automake the list of sources so it can copy them to the distributed package.

distcheck uses a separate build dir. Yes, this is the clean way, but it needs some hacking to get everything to work.

I use pyxplot for generating some plots. Pyxplot does not have a way (I know of) to search for datafiles in a different folder. I have to copy the files to the build dir and kill them after the build. But only if I use a separate build dir.

pdflatex can’t find included images. I have to adapt the TEXINPUT environment variable to give it the srcdir as additional search path.

Some of my commands litter the build directory with temporary or intermediate files. I have to clean them up.

So, after much haggling with autotools, I have a working make distcheck:

You might recognize that this is not the simple Makefile anymore. It is now a setup which defines files for distribution and has custom rules for preparing script runs and for cleanup.

But I can now make a fully working distribution, so when I want to publish my PhD thesis, I can simply add the generated release tarball. I work in a Mercurial repo, so I would more likely just include the repo, but there might be reasons for leaving out the history - and be it only that the history might grow quite big.

An advantage is that in the process of preparing the dist, my automake file got cleanly separated into a section defining files and dependencies and one defining build rules.

But I now also understand where newer build tools like scons got their inspiration for the abstractions they use.

I should note, however, that if you were to build a software project in one of the languages supported by automake (C, C++, Python and quite a few others), I would not have needed to specify the build rules myself.

And being able to freely mix the dependency declaration in automake style with Makefile rules gives a lot of flexibility which I missed in scons.

Finding programs

Now I can build and distribute my project, but I cannot yet make sure that the programs I need for building actually exist.

And that’s finally something which can really help my build, because it gives clear error messages when something is missing, and it allows users to specify which of these programs to use via the configure script. For example I could now build 5 different versions of Emacs and try the build with each of them.

Also I added cross compilation support, though that is a bit over the top for simple PDF creation :)

Summary

With this I’m at the limit of the advantages of autotools for my simple project.

They allow me to create and check a distribution tarball with relative ease (if I know how to do it), and I can use them to check for tools - and to specify alternative tools via the commandline.

For a C or C++ project, autotools would have given me a lot of other things for free, but even the basic features shown here can be useful.

You have to judge for yourself if they outweight the cost of moving away from the dead simple Makefile syntax.

Comparing SCons

A little bonus I want to share.

I also wrote an scons script as alternative to my Makefile which I think might be interesting to you. It is almost equivalent to my Makefile since it can build my files, but scons does not match the features of the full autotools build and distribution system. Missing: Clean up temporary files and create a validated distribution tarball.

You might notice that the more declarative style with explicit dependency information looks quite a bit more similar to automake than to plain Makefiles.

Here you can see part of the beauty of autotools, because you can just add this to your Makefile.am instead of the Makefile and it will work inside the full autotools project (though without the dist-integration). So autotools is a real superset of simple Makefiles.

Notes

If org-mode export keeps pestering you about selecting a TeX-master everytime you build the PDF, add the following to your org-mode file:

Huge datafiles in free culture projects under GPL

4 ways how large raw artwork files are treated in free culture projects to provide the editable source.1

In the discussion about license compatibility of the creativecommons sharealike license towards the GPL, Anthony asked how the source-requirement is solved for artwork which often has huge raw files. These are the 4 basic ways I described in my answer.

1. The Wesnoth Way

“The Source is what we have”

The project just asks artists for full resolution PNG image files
(without all the layering information) - and only uses these to
develop the art. This was spearheaded by the GPL-licensed strategy
game Battle for Wesnoth.

This is a viable strategy and also allows developing art, though a bit
less convenient than with the layered sources. For example the
illustrator who created many of the images in the RPG I work on used
our PNG instead of her photoshop file to extract a die from the cover
she created for us. She took the chance to also touch up the colors a
bit - she had learned some new tricks to improve her paintings.

This clearly complies with the GPL, because the GPL just requires
providing the file used for editing published file. If the released
file is what you actually use to change published files, then the
published file is the source.

2. The External Storage

“Use the FTP, Luke”

Here, files which are too big to be versioned effectively or which
most people don’t need when working with the project get
version-numbers and are put into an external storage - like an FTP
server.

3. The Elegant Way

“Make it so!”

Here huge files are simply versioned alongside other files and the
versions to be used are created directly from the multi-layered
files. The usual way to do that is a Makefile in which scripts
explicitly define how the derived file can be extracted.

This is most elegant, because it has no duplication of information,
the source is always trivial to find, it’s always clear that the
derived file really originated from the source and it is easy to avoid
quality loss or even reduce it later.

The disadvantage is that it can be very cumbersome to force new
developers to get all huge files and then create them before being
able to really start developing.

4. Pragmatic Elegance

“Hybrids win”

All the ways above can be combined: Huge files are put in version
control, but the derived files are included, too, to make it easier
for new people to get in. Maybe the huge files are only included on
request - for example they could be stubs with which the version
control system can retrieve the full files when the user wants
them. This can partially be done with the largefiles extension in
Mercurial by just not getting the large files.

Also you can just keep separate raw files and derived files. This is also used in Battle for Wesnoth: Optimized files of the right size for the game are stored in one folder while the bigger full resolution files are stored separately.

If you want to include free art in a GPL-covered work, I hope this article gave you some inspiration!

The die was created by Trudy Wenzel (2013) and is licensed under GPLv3 or later. ↩

Installing GNU Guix 0.6, easily

“Got a power-outage while updating? No problem: Everything still works”

GNU Guix is the new functional package manager from the GNU Project which complements the Nix-Store with a nice Guile Scheme based package definition format.

What sold it to me was “Got a power-outage while updating? No problem: Everything still works” from the Guix talk of Ludovico at the GNU Hacker Meeting 2013. My son once found the on-off-button of our power-connector while I was updating my Gentoo box. It took me 3 evenings to get it completely functional again. This would not have happened with Guix.

Update (2014-05-17): Thanks to zerwas from IRC @ freenode for the patch to guix 0.6 and nice cleanup!

Intro

Installation of GNU Guix is straightforward, except if you follow the docs, but it’s not as if we’re not used to that from other GNU utilities, which often terribly short-sell their quality with overly general documentation ☺

So I want to provide a short guide how to setup and run GNU Guix with ease. My system natively runs Gentoo, My system natively runs Gentoo, so some details might vary for you. If you use Gentoo, you can simply copy the commands here into the shell, but better copy them to a text-file first to ensure that I do not try to trick you into doing evil things with the root access you need.

But anyway: A very interesting project which I plan to keep tracking. It might allow me to do less risky local package installs of stuff I need, like small utilities I wrote myself.

The big advantage of that would be, that I could actually take them with me when I have to use different distros (though I’ve been a happy Gentoo user for ~10 years and I don’t see it as likely that I’ll switch completely: Guix would have to include all the roughly 30k packages in Gentoo to actually be a full-fledged alternative - and provide USE flags and all the convenient configurability which makes Gentoo such a nice experience).

Using guix for such small stuff would allow me to decouple experiments from my production environment (which has to keep working).

Installing Scipy and PyNIO on a Bare Cluster with the Intel Compiler

2 years ago I had the task of running a python-program using scipy on our university cluster, using the Intel Compiler. I needed all those (as well as PyNIO and some other stuff) for running TM5 with the python shell on the HC3 of KIT.

This proved to be quite a bit more challenging than I had expected - but it was very interesting, too (and there I learned the basics of GNU autotools which still help me a lot).

But no one should have to go to the same effort with as little guidance as I had, so I decided to publish the script and the patches I created for installing everything we needed.1

The script worked 2 years ago, so you might have to fix some bits. I won’t promise that this contains everything you need to run the script - or that it won’t be broken when you install it. Actually I won’t promise anything at all, except that if the stuff here had been available 2 years ago, that could have saved me about 2 months of time (each of the patches here required quite some tracking of problems, experimenting and fixing, until it provided basic functionality - but actually I enjoyed doing that - I learned a lot - I just don’t want to be forced to do it again). Still, this stuff contains quite some hacks - even a few ugly ones. But it worked.

2 libraries and programs which get installed (=requirements)

This script requires and installs quite a few libraries. I retrieved most of the following tarballs from my Gentoo distfiles dir after installing the programs locally. I uploaded them to draketo.de/dateien/scipy-pynio-deps. These files are included there:

satexputils.so also needs interpolatelevels.F90 which I think that I am not allowed to share, so you’re on your own there. Guess why I do not like using non-free (or not-guaranteed-to-be-free) software.

7 Summary

I hope this helps someone out there saving some time - or even better: improving the upstream projects. At least it should be a nice reference for all who need to get scipy working on not-quite-supported architectures.

Table of Contents

Intro

We just had the problem to find out whether a given dataset will be shareable without complex trickery. So we took the easiest road and checked the memory requirements of the datastructure.

If you have such a need, there’s always a first stop: Fire up the interpreter and try it out.

The test

We just created a three dimensional numpy array of floats and then looked at the memory requirement in the system monitor - conveniently bound to CTRL-ESC in KDE. By making the array big enough we can ignore all constant costs and directly get the cost per stored value by dividing the total memory of the process by the number of values.

Numpy

Also we tested what happens when we use "f4" and "f2" instead of "float" as dtype in numpy.

Native lists

For the native lists, we use the same array, but convert it to a list of lists of lists:

import numpy as np
a = [[[float(i) for i in j] for j in k]
for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]

Array module

Instead of using the full-blown numpy, we can also turn the inner list into an array.

import numpy as np
a = [[array.array("d", [float(i) for i in j]) for j in k]
for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]

The results

With a numpy array we need roughly 8 Byte per float. A linked list
however requires roughly 32 Bytes per float. So switching from native
Python to numpy reduces the required memory per floating point value
by factor 4.

Using an inner array (via array module) instead of the innermost list
provides roughly the same gains.

I would have expected factor 3: The value plus a pointer to the next
and to the previous entry.

The details are in the following table.

Table 1: Memory requirement of different ways to store values in Python

total memory

per value

list of floats

3216.6 MiB

32.166 Bytes

numpy array of floats

776.7 MiB

7.767 Bytes

np f4

395.2 MiB

3.95 Bytes

np f2

283.4 MiB

2.834 Bytes

inner array

779.1 MiB

7.791 Bytes

This test was conducted on a 64 bit system, so floats are equivalent to doubles.

Summary

In Python large numpy arrays require 4 times less memory than a linked
list structure with the same data. Using an inner array from the array
module instead of the innermost list provides roughly the same gains.

Ogg Theora and h.264 - which video codec as standard for internet-video?

Results

What you can see by comparing both is that h.264 wins in terms of raw image quality at the same bitrate (single pass).

So why am I still strongly in favor of Ogg Theora?

The reason is simple:

Due to licensing costs of h.264 (a few millions per year, due from 2015 onwards) making h.264 the standard for internet video would have the effect that only big companies would be able to make a video enabled browser - or we would get a kind of video tax for free software: if you want to view internet video with free software, you have to pay for the right to use the x264 library (else the developers couldn't cough up the money to pay for the parent license). And noone but the main developers and huge corporations could distribute the x264 library, because they’d have to pay license fees for that.

And noone could hack on the browser or library and distribute the changed version, so the whole idea of free software would be led ad absurdum. It wouldn't matter that all code would be free licensed, since only those with a h.264 patent license could change it.

Theoras raw quality may still be worse, but the license costs and their implications provide very clear reasons for supporting Theora - which in my view are far more important than raw technical stuff.

The test-script

for k in {0..1}
do for i in {0..9}
do for j in {0..9}
do
wget http://media.xiph.org/BBB/BBB-360-png/big_buck_bunny_00$k$i$j.png
done
done
done

Phoronix recently did a benchmark of GCC vs. LLVM on AMD hardware. Sadly their conclusion did not fit the data they showed. Actually it misrepresented the data so strongly, that I decided to speak up here instead of having my comments disappear in their forums. This post was started on 2013-05-14 and got updates when things changed - first for the better, then for the worse.

Update 3 (the last straw, 2013-11-09): In the recent most blatant attack by Phoronix on copyleft programs - this time openly targeted at GNU - Michael Larabel directly misrepresented a post from Josh Klint to badmouth GDB (Josh confirmed this1). Josh gave a report of his initial experience with GDB in a Kickstarter Update in which he reported some shortcomings he saw in GDB (of which the major gripe is easily resolved with better documentation2) and concluded with “the limitations of GDB are annoying, but I can deal with it. It's very nice to be able to run and debug our editor on Linux”. Michael Larabel only quoted the conclusion up to “annoying” and abused that to support the claim that game developers (in general) call GDB “crap” and for further badmouthing of GDB. With this he provided the straw which I needed to stop reading Phoronix: Michael Larabel is hostile to copyleft and in particular to GNU and he goes as far as rigging test results3 and misrepresenting words of others to further his agenda. I even donated to Phoronix a few times in the past. I guess I won’t do that again, either. I should have learned from the error of the german pirates and should have avoided reading media which is controlled by people who want to destroy what I fight for (sustainable free software).Update 2 (2013-07-06): But the next went down the drain again… “Of course, LLVM/Clang 3.3 still lacks OpenMP support, so those tests are obviously in favor of GCC.” — I couldn’t find a better way to say that those tests are completely useless while at the same time devaluing OpenMP support as “ignore this result along with all others where GCC wins”…Update (2013-06-21): The recent report of GCC 4.8 vs. LLVM 3.3 looks much better. Not perfect, but much better.

Taking out the OpenMP benchmarks (where GCC naturally won, because LLVM only processes those tests single-threaded) and the build times (which are irrelevant to the speed of the produced binaries), their benchmark had the following result:

» The performance of LLVM/Clang 3.3 for most tests is at least comparable to GCC «

Nobu from their Forums supplied a conclusion which represents the data much better:

» GCC is much faster in anything which uses OpenMP, and moderately faster or equal in anything (except compile times) which doesn't [use OpenMP] «

But Michael from Phoronix did not stop at just ignoring the performance difference between GCC and LLVM. He went on claiming, that

In a few benchmarks LLVM/Clang is faster, particularly when it comes to build times.

And this is blatant reality-distortion which I am very tempted to ascribe to favoritism. LLVM is not “particularly” faster when it comes to build times.

LLVM on AMD FX-8350 Vishera is faster ONLY when it comes to build times!

This was not the first time that I read data-distorting conclusions on Phoronix - and my complaints about that in their forum did not change their actions. So I hope that my post here can help making them aware that deliberately distorting test results is unacceptable.

For my work, compiler performance is actually quite important, because I use programs which run for days or weeks, so 10% runtime reduction can mean saving several days - not counting the cost of using up cluster time.

To fix their blunders, what they would have to do is:

Avoiding Benchmarks which only one compiler supports properly (OpenMP).

Marking the compile time tests explicitely, so they strongly stand out from the rest, because they measure a completely different parameter than the other tests: Compiler Runtime vs. Performance of the Compiled Binaries.

Writing conclusions which actually fit their results.

Their current approach gives a distinct disadvantage to GCC (even for the OpenMP tests, because they convey the notion that if LLVM only had OpenMP, it would be better in everything - which as this test shows is simply false), so the compiler-tests from Phoronix work as covert propaganda against GCC, even in tests where GCC flat-out wins. And I already don’t like open propaganda, but when the propaganda gets masked as objective testing, I actually get angry.

PS: I write so strongly here, because I actually like the tests from Phoronix a lot. I think we need rather more than less testing and their testsuite actually seems to do a good job - when given the right parameters - so seeing Phoronix distorting the tests to a point where they become almost useless (except as political tool against GCC) is a huge disappointment to me.

Josh Klint from Leadwerks confirmed that Phoronix misrepresented his post and wrote a followup-post: » @ArneBab That really wasn't meant to be controversial. I was hoping to provide constructive feedback from the view of an Xcode / VS user.« » Slightly surprised my complaints about GDB are a hot topic. I can make just as many criticisms of other compilers and IDEs.« » The first 24 hours are the best for usability feedback. I figure if they notice a pattern some of those things will be improved.« » GDB Follwup «
— @Leadwerks, 2:04 AM - 11 Nov 13, 2:10 AM - 11 Nov 13 and @JoshKlint, 2:07 AM - 11 Nov 13, 8:48 PM - 11 Nov 13. ↩

The first-impression criticism from Josh Klint was addressed by a Phoronix reader by pointing to the frame command. I do not blame Josh for not knowing all tricks: He wrote a fair account of his initial experience with GDB (and he said later that he wrote the post after less than 24 hours of using GDB, because he considers that the best time to provide feedback) and his experience can serve as constructive criticism to improve tutorials, documentation and the UI of GDB. Sadly his visibility and the possible impact of his work on free software made it possible for Phoronix to abuse a personal report as support for a general badmouthing of the tool. In contrast the full message of Josh Klint ended really positive: Although some annoyances and limitations have been discovered, overall I have found Linux to be a completely viable platform for application development. — Josh Klint, Leadwerks ↩

I know that rigging of tests is a strong claim. The actions of Michael Larabel deserve being called rigging for three main reasons: (1) Including compile-time data along with runtime performance without clear distinction between both, even though compile-time of the full code is mostly irrelevant when you use a proper build system and compile time and runtime are completely different classes of results, (2) including pointless tests between incomparable setups whose only use is to relativate any weakness of his favorite system and (3) blatantly lying in the summaries (as I show in this article). ↩

So I can wholeheartedly recommend Python to beginners in programming, and as the other reviews on Ohloh show, it is also a great language for experienced programmers and seems to be a good language to accompany you in your whole coding life.

PS: Yes, I know about the double meaning of "first language" :)

Recursion wins!

After starting my programming life with Python, I normally use for-loops to solve problems. But actually they are an inferior mechanism when compared to recursion, if the language provides proper syntactic support for that. Since that claim pretty much damns Python on a theoretical level (even though it is still a very good tool in practice and I still love it!), I want to share a simplified version of the code which made me realize this.

Let’s begin with how I would write that code in Python.

res = ""instring = Falsefor letter in text:
ifletter = "\"":
# special conditions for string handling go here# lots of special conditions# and more special conditions# which cannot easily be moved out, # because we cannot skip multiple letters# in one stepinstring = not instring
if instring:
res += letter
continue# other cases

Did you spot the comment “special conditions go here”? That’s the point which damns for-loops: You cannot easily factor out these special conditions.1 In this example all the complexity is in the variable instring. But depending on the usecase, this could require lots of different states being tracked within the loop and cluttering up the namespace as well as entangling complexity from different parts of the loop.

The basic code for recursion is a bit longer, because the new values in the next step of the processing are given explicitly. But it is almost trivial to shell out parts of the loop to another function. It just needs to return the next state of the recursion.

It’s funny to see how Guile Scheme allows me to follow that principle more thoroughly than Python.

(I love Python, but this is a case where Scheme simply wins - and I’m not afraid to admit that)

PS: Actually I found this technique when thinking about use-cases for multiple return-values of functions.

PPS: This example uses wisp-syntax for the scheme-examples to avoid killing Pythonistas with parens.

While you cannot factor out parts of for loops easily, functions which pass around iterators get pretty close to the expressivity of tail recursion. They might even go a bit further and I already missed them for some scheme code where I needed to generate expressions step by step from a function which always returned an unspecified number of expressions per call. If Python continues to make it easier to use iterators, they could reduce the impact of the points I make in this article. ↩

Reducing the Python startup time

The python startup time always nagged me (17-30ms) and I just searched again for a way to reduce it, when I found this:

The Python-Launcher caches GTK imports and forks new processes to reduce the startup time of python GUI programs.

Python-launcher does not solve my problem directly, but it points into an interesting direction: If you create a small daemon which you can contact via the shell to fork a new instance, you might be able to get rid of your startup time.

To get an example of the possibilities, download the python-launcher and socat and do the following:

Todo: Adapt it to a given program and remove the GTK stuff. Note the & at the end: Closing the socket connection seems to be slow, so I just don’t wait for socat to finish. Breaks at somewhere over 200 simultaneous connections. Option: Use a datagram socket instead.

The essential trick is to just create a server which opens a socket. Then it reads all the data from the socket. Once it has the data, it forks like the following:

Running a program that way 100-times took just 0.23 seconds for me so the Python startup time of 17ms got reduced to 2.3ms.

You might have to switch from forking to just executing the code instead of forking if you want to be even faster and the code snippets are small. For example when running the same test without the fork and the signals, 100 executions of the same code took just 0.09s, cutting down the startup time to an impressing 0.9ms - with the cost of no longer running in parallel.

(That’s what I also do with emacsclient… My emacs takes ~30s to start (due to excessive use of additional libraries I added), but emacsclient -c shows up almost instantly.)

I tested the speed by just sending a file with the following snippet to the server:

import time
with open("2", "a") as f:
f.write(str(time.time()) + "\n")

Note: If your script only needs the included python libraries (batteries) and no custom-installed libs, you can also reduce the startuptime by avoiding site initialization:

python -S [script]

Without -S python -c '' takes 0.018s for me. With -S I am down to

time python -S -c '' → 0.004s.

Note that you might miss some installed packages that way. This is slower than the daemon method by up to factor 4 (4ms instead of 0.9), but still faster than the default way. Note that cold disk buffers can make the difference much bigger on the first run which is not relevant in this case but very much relevant in general for the impression of startup speed.

PS: I attached the python-launcher 0.1.0 in case its website goes down. License: GPL and MIT; included. This message was originally written at stackoverflow.

Simple positive trust scheme with threshholds

I don’t see a reason for negative reputation schemes — voting down is in my view a flawed concept.

The rest of this article is written for freetalk inside freenet, and also posted there with my nonanonymous ID.

That just allows for community censorship, which I see as incompatible with the goals of freenet.

Would it be possible to change that to use only positive votes and a threshhold?

If I like what some people write, I give them positive votes.

If I get too much spam, I increase the threshhold for all people.

Effective positive votes get added. It suffices that some people I trust also trust someone else and I’ll see the messages.

Effective trust is my trust (0..1) · the trust of the next in the chain (0..1) · …

Usecase:

Zwister trusts Alice and Bob.

Alice trusts Lilith.

Bob hates Lilith.

In the current scheme (as I understand it), zwister wouldn’t see posts from Lilith.

In a pure positive scheme, zwister would see the posts. If zwister wants to avoid seeing the posts from Lilith, he has to untrust Alice or ask Alice to untrust Lilith. Add to that a personal (and not propagating) blocking option which allows me to “never see anything from Lilith again”.

Bob should not be able to interfere with me seeing the messages from Lilith, when Alice trusts Lilith.

If zwisters trust for Alice (0..1) multiplied with Alices trust for Lilith (0..1) is lower than zwisters threshhold, zwister doesn’t see the messages.

PS: somehow adapted from Credence, which would have brought community spam control to Gnutella, if Limewire had adopted it.

PPS: And adaption for news voting: You give positive votes on news which show up. Negative votes assign a private threshhold to the author of the news, so you then only see news from that author which enough people vote for.

Simple steps to attach the GNU Public License (GPL) to your project

Here's the simple steps to attach a GPL license to your source files (written after requests by DiggClone and Bandnet):

For your own project, just add the following text-notice to the header/first section of each of your source-files, commented out in whatever way your language uses:

----------------following is the notice-----------------
/*
* Your Project Name - -you slogan-
* Copyright (C) 2007 - 2007 Your Name
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
----------------------------------------------
the "2007 - 2007" needs to be adjusted to "year when you gave it the license in the first place" - "current year".

GPL is trust: Contributors can trust, that their contributions will keep helping the community, and that the software they contribute to will keep being accessible for the community.

(That's why I decided some years ago to only support GPL projects. My contributions to one semi-closed project got lost, because the project wasn't free and the developer just decided not to offer them anymore, and I could only watch hundreds of hours of work disappear, and that hurt.)

Surprising behaviour of Fortran (90/95)

1 Introduction

I recently started really learning Fortran (as opposed to just dabbling with existing code until it did what I wanted it to).

Here I document the surprises I found along the way.

As reference: I come from Python, C++ and Lisp, and I actually started to like Fortran while learning it. So the horror-stories I heard while studying were mostly proven wrong. I uploaded the complete code as base60.f90.

Also note that the indizes are inclusive. The following actually gets the single letter at index n+1:

Getting a single letter from a character array

base60chars(n+1:n+1)

In python on the other hand, the second argument of the array is exclusive, so to get the same result you would use [n:n+1]:

Python-example for getting a single letter from a list

pythonarray[n:n+1]

3.6 I have to trim strings when concatenating

It is necessary to get rid of trailing blanks (whitespace) from the last char to the end of the declared memory space, otherwise there will be huge gaps in combined strings - or you will get missing characters.

3 The results

So now, let’s actually check the results. Since I’m interested in tail call optimization, I check the memory consumption of each run. If we have proper tail call optimization, the required memory will stay the same over time, if not, the function stack will get bigger and bigger till the program crashes.

3.5 Optimized for size

Can we invert the question? Is all well, now?

Actually not…

If we activate minor optimization, we get the same unoptimized behaviour again.

g++ -O1 C.c -o C1
./C1

It counts to about 260,000 and then dies from a stack overflow. And that is pretty bad™, because it means that a programmer cannot trust his code to work when he does not know all the optimization strategies which will be used with his code.

And he has no way to define in his code, that it requires TCO to work.

4 Summary

Tail Call Optimization (TCO) turns an operation with a memory requirement of O(N)1 into one with a memory requirement of O(1).

It is a nice tool to reduce the complexity of code, but it is only safe in languages which explicitely require tail call optimization - like Scheme.

And from this we can find a conclusion for compilers:

C/C++ compilers should always use tail call optimization, including debug builds, because otherwise C/C++ programmers should never use that feature, because it can make it impossible to use certain optimization settings in any code which includes their code.

And as a finishing note, I’d like to quote (very loosely) what my colleague told me from some of his real-life debugging experience:

“We run our project on an AIX ibm-supercomputer. We had spotted a problem in optimized runs, so we activated the debugger to trace the bug. But when we activated debug flags, a host of new problems appeared which were not present in optimized runs. We tried to isolate the problems, but they only appeared if we ran the full project. When we told the IBM coders about that, they asked us to provide a simple testcase… The problems likely happened due to some crazy optimizations - in our code or in the compiler.”

So the problem of undebuggable code due to a dependency of the program on optimization changes is not limited to tail call optimization. But TCO is a really nice way to show it :)

Let’s use that to make the statement above more general:

C/C++ compilers should always do those kinds of optimizations which lead to changes in the algorithmic cost of programs.

Or from a pessimistic side:

You should only rely on language features, which are also available in debug mode - and you should never develop your program with optimization turned on.

And by that measure, C/C++ does not have Tail Call Optimization - at least until all mainstream compilers include TCO in their default options. Which is a pretty bleak result after the excitement I felt when I realized that optimizations can actually give C/C++ code the behavior of Tail Call Optimization.

Never develop with optimizations which the debug mode of the compiler of the future maintainer of your code does not use. ⇒ Never develop with optimizations which are not required by the language standard.

Footnotes:

1 : O(1) and O(N) describe the algorithmic cost of an algorithm. If it is O(N), then the cost rises linearly with the size of the problem (N is the size, for example printing 20,000 consecutive numbers). If it is O(1), the cost is stable regardless of the size of the problem.

Top 5 systemd troubles - a strategic view for distros

systemd is a new way to start a Linux-system with the expressed goal of rethinking all of init. These are my top 5 gripes with it. (»skip the updates«)

Update (2014-12-11): One more deconstruction of the strategies around systemd: systemd: Assumptions, Bullying, Consent. It shows that the attitude which forms the root of the dangers of systemd is even visible in its very source code.

Update (2013-11-20): Furtherreadingshowsthatpeople have been giving arguments from my list since 2011, and they got answers in the range of “anything short of systemd is dumb”, “this cannot work” (while OpenRC clearly shows that it works well), requests for implementation details without justification and insults and further insults; but the arguments stayed valid for the last 2 years. That does not look like systemd has a friendly community - or is healthy for distributions adopting it. Also an OpenRC developer wrote the best rebuttal of systemd propaganda I read so far: “Alternativlos”: Systemd propaganda(note, though, that I am biased against systemd due to problems I had in the past with udev kernel-dependencies)

Losing Control: systemd does so many crucial things itself that the developers of distributions lose their control over the init process: If systemd developers decide to change something, the distributions might actually have to fork systemd and keep the fork up-to-date, and this requires rare skills and lots of resources (due to the pace of systemd). See the Gentoo eudev-Project for a case where this had to happen so the distribution could keep providing features its users rely on. Systemd nowadays incorporates udev. Go reason how systemd devs will act.1Why losing control is a bad idea: Strategy Letter V: Commodities

No scripts (as if you can know beforehand all the things the init system will need to do in each distribution). Nowadays any system should be user-extendable to avoid bottlenecks for development. This essentially boils down to providing a scripting language. Using the language which almost every system administrator knows is a very sane choice for that - and means making it possible to use Shell-Scripts to extend the init-system. Scripts mean that the distribution will never be in a position where it is blocked because it absolutely can’t provide a given fringe feature. And as the experiment with paludis in Gentoo shows, an implementation in C isn’t magically faster than one in a scripting language and can actually be much slower (just compare paludis to pkgcore), because the execution time of the language only very rarely is the real bottleneck - and you can easily shell out that part to a faster language with negligible time loss,2 especially in shell-scripts (pun partially intended). While systemd can be told to run a shell script, this requires a mental context switch and the script cannot tie into all the machinery inside systemd. If there’s a bug in systemd, you need to fix systemd, if you need more than systemd provides out of the box, you need either a script or you have to patch systemd, and otherwise you write in a completely different language (so most people won’t have the skills to go beyond the fences of the ground defined by the systemd developers as proper for users). Why killing scripts is a bad idea: Bloatware and the 80/20 Myth

Linux-specific3 (are you serious??). This makes the distribution an add-on to the kernel instead of the distribution being a focus point of many different development efforts. This is a second point where distributions become commodities, and as for systemd itself, this is against the interest of the distributions. On the other hand, enabling the use of many different kernels strengthens the Distribution - even if currently only few people are using them. Why being Linux-only is a bad idea for distributions: Strategy Letter V: Commodities

Requiring an up-to-date kernel. This problem already gives me lots of headaches for my OLPC due to udev (from the same people as systemd… which is one of the reasons why I hope that Gentoo-devs will succeed with eudev), since it is not always easy to go to a newer kernel when you’re on a fringe platform (I’m currently fighting with that). An init system should not require some special kernel version just to boot… Why those hard dependencies are a bad idea: Bloatware and the 80/20 Myth AND Strategy Letter V: Commodities

Requiring D-Bus. D-Bus was already broken a few times for me, and losing not just some KDE functionality but instead making my system unbootable is unacceptable. It’s bad enough that so much stuff relies on udev.4

In my understanding, we need more services which can survive without the others, so the system gets resilient against failures in a given part. As the system gets more and more complex, this constantly gets more important: Less interdependencies, and the services which are crucial to get my system in a debuggable state should be small and simple - and should not require many changes to implement new features.

Having multiple tools to solve the same problem looks like wasted resources, but actually this extends the range of problems which can be solved with our systems and avoids bottlenecks and single points of failure (either tools or communities), so it makes us resilient. Also it encourages standard-formats to minimize the cost of maintaining several systems side-by-side.

You can see how systemd manages to violate all these principles…

This does not mean, that the features provided by systemd are useless. It says that the way they are embedded in systemd with its heavy dependencies is detrimental to a healthy distribution.

Note: I am neither a developer of systemd, nor of upstart, sysvinit or OpenRC. I am just a humble user of distributions, but I can recognize impending horrible fallout when I see it.

We try to get rid of many of the more pointless differences of the various distributions in various areas of the core OS. As part of that we sometimes adopt schemes that were previously used by only one of the distributions and push it to a level where it's the default of systemd, trying to gently push everybody towards the same set of basic configuration.
— Lennart Poettering, main developer of systemd

I could not show much clearer why distributions should be very wary about systemd than Lennart Poettering does here in the post where he tries to refute myths about systemd.

PS: I’m definitely biased against systemd, after having some horrifying experiences with kernel-dependencies in udev. Resilience looks different. And I already modified some init scripts to adjust my systems behavior so it better fits my usecase. Now go and call me part of a fringe group which wants to add “pointless differences” to the system. If you force Gentoo devs to issue a warning in the style of “you MUST activate feature X in your kernel, else your system will become unbootable”, this should be a big red flag to you that you’re doing something wrong. If you do that twice, this is a big red flag to users not to trust your software. And regaining that trust requires reestablishing a long record of solid work. Which I do not see at the moment. Also do read Bloatware and the 80/20 Myth (if you didn’t do that by now): It might be true that 80% of the users only use 20% of the features, but they do not use the same 20%.

Update 2014: Actually there is no need to guess how the systemd developers will act: They showed (again) that they will keep breaking systems of their users: “udev now silently fails to do anything useful if devtmpfs is missing, almost as if resilience was a disease” — bonsaikitten, Gentoo developer, 2014-01, long after udev was subsumed into systemd. ↩

Running a program in a subshell increases the runtime by just six milliseconds. I measured that when testing ways to run GNU Guile modules as scripts. So you have to start almost 100 subshells during bootup to lose half a second of runtime. Note that OpenRC can boot a system and power down again in under 0.7 seconds and the minimal boot-to-login just takes 250 ms. There is no need for systemd to get a faster boot. ↩

The systemd proponents in the debian initsystem discussion explicitly stated that they don’t want to port systemd to other kernels. ↩

And D-Bus is slow, slow, slow when your system is under heavy memory and IO-pressure, as my systems tend to be (I’m a Gentoo user. I often compile a new version of all KDE-components or of Firefox while I do regular work on the computer). From dbus I’m used to reaction times up to several seconds… ↩

Weltenwald-theme under AGPL (Drupal)

After the last round of polishing, I decided to publish my theme under AGPLv3. Reason: If you use AGPL code and people access it over a network, you have to offer them the code. Which I hereby do ;)
That’s the only way to make sure that website code stays free.

It’s still for Drupal 5, because I didn’t get around to port it, and it has some ugly hacks, but it should be fully functional.

And should I change the theme without posting a new layout here, just drop me a line and I’ll upload a new version — as required by AGPL. And should you have some problem, or if something should be missing, please drop me a line, too.

No screenshot, because a live version kicks a screenshot any day ;)(in case it isn’t clear: Weltenwald is the theme I use on this site)

Why Gnutella scales quite well

You might have read in some (almost ancient) papers, that a network like Gnutella can't scale. So I want to show you, why the current Version of Gnutella does scale, and does it well.

In earlier versions, up to v0.4, Gnutella was a a pure broadcast network. That means, that every search request did reach every participant, so the number of search requests hitting each node was for an optimal network exactly equal to the number of requests, made by nodes who were in the network. And you can see easily why that can't scale.
But that was only true for Gnutella 0.4.

In the current incarnation of Gnutella (Gnutella 0.6), Gnutella is no longer a pure Broadcast network. Instead, only the smallest percentage of the traffic is done via broadcast.

If you want to read about the methods used to realize this, please have a look at the GnuFU guide (english, german).

Here I want to limit it to the statement, that the first two hops of a search request are governed via Dynamic Querying, which stops the request as soon as it has enough sources (this stops a search as soon as it gets about 250 results), and that the last two hops are governed via the Query Routing Protocol, which ensures, that a search request reaches only those hosts, which can actually have the file (which is only about 5% of the nodes).

So in todays reality, Gnutella is a quite structured and very flexible network.

To scale it, Ultrapeers can increase their number of connections from their current 32 upwards, which makes Dynamic Querying (DQ) and the Query Routing Protocol (QRP) even more effective.

In the case of DQ most queries for popular files will still provide enough results after the same number of clients have been contacted, so increasing the number of connections won't change the network traffic at all which is caused by the first two steps.

In the case of QRP, queries wil still only reach the hosts, which can have the file, and if Ultrapeers are connected to more nodes at the same time (by increasing the number of connections), it will provide more results for each connection, so DQ will stop even earlier than with fewer connections per Ultrapeer.

So Gnutella is now far from a broadcast model, and the act of increasing the size of the Gnutella Network can even increase its efficiency for popular files.

For rare files, QRP kicks in with full force, and even though DQ will likely check all other nodes for content, QRP will make sure that only those nodes are reached, which can have the content, which might be only 0.1% of the net or even far less.

Here, increasing the number of nodes per Ultrapeer means that nodes with rare files are in effect closer to you than before, so Gnutella also gets more efficient when you increase the network size, when rare file searches are your major concern.

So you can see, that Gnutella has become a network, which scales extremly well for keyword searches, and due to that it can also very efficiently be used to search for metadata and similar concepts.

The only thing which Gnutella can't do well are searches for strings which aren't seperate words (for example file-hashes), because that kills QRP, so they will likely not reach (m)any hosts. For these types of searches, the Gnutella developers work on a DHT (Distributed Hash Table), which will only be used, if the string can't be split into seperate words, and that DHT will most likely be Kademlia, which is also proven to work quite well.

And with that, the only problem which remains in need of fixing is spam, because that inhibits DQ when you do a rare search, but I am sure that the devs will also find a way to stop spamming, and even with spam, Gnutella is quite effective and consumes very little bandwidth, when you are acting as a leaf, and only moderate bandwidth when you are acting as ultrapeer.

Some figures as finishing touch:

Leaf network traffic: About 1kB/s if you add outgoing and incoming traffic, which is about the seventh part of the speed of a 56k modem.

Ultrapeer traffic: About 7kB/s, outgoing and incoming added together, which is about one full ISDN line of less than 1/8th of a DSLs outgoing speed.

Have fun with Gnutella!
- ArneBab 08:14, 15. Nov 2006 (CET)

PS: This guide ignores, that requests must travel through intermediate nodes. But since those nodes make up only about 3% of the network and only 3% of those nodes will be reached by a (QRP-routed) rare file request, it seems safe to ignore these 0.1% of the network in the calculations for the sake of making it easier to follow them mentally (QRP takes care of that).

Write programs you can still hack when you feel dumb

"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan

I just read the post Hyperfocus and balance of Arc Riley from PySoy who talks about trying to get to the Hyperfocus state without endangering his health. Since I have similar needs, I am developing some strategies for that myself (though not for my health, but because my wife and children can’t be expected to let me work 8h without any interruptions in my free time).

Different from Arc, I try to change my programming habits instead of changing myself to fit to the requirements of my habits.1

Only 4 things left for the code of your function. (three if you use both class attributes/global values and function arguments. Two, if you have complex custom data-structures with peculiar names or access-methods which you have to understand for doing anything. One if you also have to remember the commands of an unfamiliar editor or VCS tool. See how fast this approaches zero even starting with 7 things?)

Add an if-switch, for-loop or similar and you have only 3 things left.

You need those for what the function should actually do, so better put further complexities into subfunctions.

Also ensure that each of the things you work with is easy enough. If you get the things you use down to 7 by writing functions with 20 arguments, you don’t win anything. Just the resources you could use in the function will blow your mind when you try to change the function a few months later. This goes for every part of your program: The number of functions, the number of function arguments, the number of variables, the lines of code per function and even the number of hierarchy levels you use to reduce the other things you need to keep in mind at any given time.

Hard times

But if you want to be able to hack that code while you feel dumb (compared to those streaks of genius when you can actually hold the whole structure of your program in your head and forsee every effect of a given change before actually doing it), you need to make sure that you don’t have to take all 7 things into account.

Tune it down for the times when you feel dumb by starting with 5 things.2 After substracting one for the location, for the task and for the resources, you are left with only two things:

Two things for your function. Some Logic and calling stuff are 2 things.

If it is an if-switch, let it be just an if-switch calling other functions. Yes, it may feel much easier to do it directly here, when you are fully embedded in your code and feel great, but it will bite you when you are down. Which is exactly when you won’t want to be bitten by your own code.

Loose coupling and tight cohesion

Programming is a constant battle against complexity. Stumble from the sweet spot of your program into any direction, and complexity raises its ugly head. But finding the sweet spot requires constant vigilance, as it shifts with the size and structure of your program and your development group.

The effects of any given change should be contained in the part of the code you work in - and in one type of code.

As web framework, Django seperates the templates, the URI definitions, the program code and the database access from each other. (see how these are already 4 categories, hitting the limit of our mind again?)

For a game on the other hand, you might want to seperate story, game logic, presentation (what you see on the screen) and input/user actions. Also people who write a scenario or level should only have to work in one type of code, neatly confined in one file or a small set of files which reside in the same place.

And for a scientific program, data input, task definition, processing and data output might be seperated.

Remember that this seperation does not only mean that you put those parts of the code into different files, but that they are loosely coupled:

They only use lean and clearly defined interfaces and don’t need to know much about each other.

Conclusions

This strategy does not only make your program easier to adapt (because the parts you need to change for implementing a given feature are smaller). If you apply it not only to the bigger structure, but to every part of the program, it’s main advantage is that any part of the code can be understood without having to understand other parts.

And you can still understand and hack your code, when your child is sick, your wife is overworked, you slept 3 hours the night before - and can only work for half an hour straight, because it’s evening and you don’t want to be a creep (but this change has to be finished nontheless).

Note that finding a design which accomplishes this is far more complex than it sounds. If people can read your code and say “oh, that’s easy. I can hack that” (and manage to do so), then you did it right.

Designing a simple structure to solve a complex task is far harder than designing a complex structure to solve that task.

And being able to hack your program while you feel dumb (and maybe even hold it in your head) is worth investing some of your genius-time into your design (and repeating that whenever your code grows too hairy).

Where I got bitten badly by my high-performance coding habits is the keyboard layout evolution program. I did not catch my error when the structure grew too complex (while adding stuff), and now that I do not have as much uninterrupted time as before, I cannot actually work on it efficiently anymore. I’m glad that this happened with a mostly finished project on whoose evolution no ones future depended. Still it is sad that this will keep me from turning it into a realtime visual layout optimizer. I can still work on its existing functionality (I kept improving it for the most importang task: the cost calculation), but adding new functionality is a huge pain. ↩

See how I actually don’t get below 5 here? A good TODO list which shows you the task so you can forget it while coding might get you down to 4. But don’t bet on it. Not knowing where you are or where you want to go are recipes for desaster… And if you make your functions too small, the collection of functions gets more complex, or the object hierarchy too deep, adding complexity at other places. Well, no one said creating well-structured programs would be easy. You need to find the right compromise for you. ↩

Your browser history can be sniffed with just 64 lines of Python (tested with Firefox 3.5.3)

After the example of making-the-web, I was quite intrigued by the ease of sniffing the history via simple CSS tricks.

So I decided to test, how small I get a Python program which can sniff the history via CSS - without requiring any scripting ability on the browser-side.

I first produced fully commented code (see server.py) and then stripped it down to just 64 lines (server-stripped.py), to make it really crystal clear, that making your browser vulnerable to this exploit is a damn bad idea. I hope this will help get Firefox fixed quickly.

If you see http://blubber.blau as found, you're safe. If you don't see any links as found, you're likely to be safe. In any other case, everyone in the web can grab your history - if given enough time (a few minutes) or enough iframes (which check your history in parallel). This doesn't use Javascript.

It currently only checks for the 1000 or so most visited websites and doesn't keep any logs in files (all info is in memory and wiped on every restart), since I don't really want to create a full fledged history ripper but rather show how easy it would be to create one.

Besides: It does not need to be run in an iframe. Any Python-powered site could just run this test as regular part of the site while you browse it (and wonder why your browser has so much to do for a simple site, but since we’re already used to high load due to Javascript, who is going to care?). So don’t feel safe, just because there are no iframes. To feel and be safe, use one of the solutions from What the Internet knows about you.

Konqueror seems to be immune: It also (pre-)loads the "visited"-images from not visited links, so every page is seen as visited - which is the only way to avoid spreading my history around on the web and still providing “visited” image-hints in the browser!

Firefox 4.0.1 seems to be immune, too: It does not show any :visited-images, so the server does not get any requests.

So please don't let your browser load anything depending on the :visited state of a link tag! It shouldn't load anything based on internal information, because that always publicizes private information - and you don't know who will read it!

And to the Firefox developers: Please remove the optimization of only loading required css data based on the visited info! I already said so in a bug report, and since the bug isn't fixed, this is my way to put a bit of weight behind it. Please stop putting your users privacy at risk.

Usage:

python server.py
start the server at port 8000. You can now point your browser to http://127.0.0.1:8000 to get sniffed :)

To get more info, just use ./server.py --help.

complex number compiler and libc bugs (cexp+conj) on OSX and with the intel compiler (icc)

Today a bug in complex number handling surfaced in guile which only appeared on OSX.

This is a short note just to make sure that the bug is reported somewhere.

Test-code (written mostly by Mark Weaver who also analyzed the bug - I only ran the code on a few platforms I happened to have access to):

Install

in any distro

This should automatically pull in pyKDE4. If it doesn’t, you need to install that seperately.

Visual icon selection requires the kdialog program (a standard part of KDE).

For a "live" version, just clone the pyrad Mercurial repo and let KDE run "path/to/repo/pyrad.py" at startup. You can stop a running pyrad via pyrad.py --quit. pyrad.py --help gives usage instructions.

In unfree systems (like MacOSX and Windows)

I have no clue since I don’t use them. You’ll need to find out yourself or install a free system. Examples are Kubuntu for beginners and Gentoo for convenient tinkering. Both run GNU/Linux.

Setup

Run /usr/bin/pyrad.py. Then add it as script to your autostart (systemsettings→advanced→autostart). You can now use Alt-F6 and Meta-F6 to call it.

Mouse gesture (optional)

Add the mouse gesture in systemsettings (systemsettings→shortcuts) to call D-Bus: Program: org.kde.pyRad ; Object: /MainApplication ; Function: newInstance (you might have to enable gestures in the settings, too - in the shortcuts-window you should find a settings button).

Alternately set the gesture to call the command dbus-send --type=method_call --dest=org.kde.pyRad /MainApplication org.kde.KUniqueApplication.newInstance.

Customize the wheel

Customize the menu by editing the file "$HOME/.pyradrc" or middle-clicking (add) and right-clicking (edit) items.

Usage and screenshots

To call pyRad and see the command wheel, you simply use the gesture or key you assigned.

Then you can activate an action with a single left click. Actions can be grouped into folders. To open a folder, you also simply left-click it.

Also you can click the keyboard key shown at the beginning of the tooltip to activate an action (hover the mouse over an icon to see the tooltip).

To make the wheel disappear or leave a folder, click the center or hit the key 0. To just make it disappear, hit escape.

For editing an action, just right click it, and you’ll see the edit dialog.

Each item has an icon (either an icon name from KDE or the path to an icon) and an action. The action is simply the command you would call in the shell (only simple commands, though, no real shell scripting or glob).

To add a new action, simply middle-click the action before it. The wheel goes clockwise, with the first item being at the bottom. To add a new first item, middle-click the center.

To add a new folder (or turn an item into a folder), simply click on the folder button, say OK and then click it to add actions in there.

pyRad is now in Gentoo portage! *happy*

So now you can install it in Gentoo with a simple emerge kde-misc/pyrad.

Many thanks go to the maintainer Andreas K. Hüttel (dilfridge) and to jokey and Tommy[D] from the Gentoo sunrise project (wiki) for providing their user-overlay and helping users with creating ebuilds as well as Arfrever, neurogeek, floppym from the Gentoo Python-Herd for helping me to clean up the ebuild and convert it to EAPI 3!

I needed to convert a huge batch of mediawiki-files to html (had a 2010-03 copy of the now dead limewire wiki lying around). With a tip from RoanKattouw in #mediawiki@freenode.net I created a simple python script to convert arbitrary files from mediawiki syntax to html.

This script is neither written for speed or anything (do you know how slow a webrequest is, compared to even horribly inefficient code? …): The only optimization is for programming convenience — the advantage of that is that it’s just 47 lines of code :)

It also isn’t perfect: it breaks at some pages (and informs you about that).

If it prints 120120 (two times 120, the factorial of 5), your wisp is fully operational. That’s it - have fun with wisp syntax!

Update (2014-11-06): wisp v0.8.0 released! The new parser now passes the testsuite and wisp files can be executed directly. For more details, see the NEWS file. To test it, install Guile 2.0.x and bootstrap wisp:

If it prints 120120 (two times 120, the factorial of 5), your wisp is fully operational. That’s it - have fun with wisp syntax!On a personal note: It’s mindboggling that I could get this far! This is actually a fully bootstrapped indentation sensitive programming language with all the power of Scheme underneath, and it’s a one-person when-my-wife-and-children-sleep sideproject. The extensibility of Guile is awesome!

Update (2014-10-17): wisp v0.6.6 has a new implementation of the parser which now uses the scheme read function. `wisp-scheme.w` parses directly to a scheme syntax-tree instead of a scheme file to be more suitable to an SRFI. For more details, see the NEWS file. To test it, install Guile 2.0.x and bootstrap wisp:

That’s it - have fun with wisp syntax at the REPL! Caveat: It does not support the ' prefix yet (syntax point 4).

Update (2014-01-04): Resolved the name-clash together with Steve Purcell und Kris Jenkins: the javascript wisp-mode was renamed to wispjs-mode and wisp.el is called wisp-mode 0.1.5 again. It provides syntax highlighting for Emacs and minimal indentation support via tab. You can install it with `M-x package-install wisp-mode`

Update (2014-01-03): wisp-mode.el was renamed to wisp 0.1.4 to avoid a name clash with wisp-mode for the javascript-based wisp.

Update (2013-09-13): Wisp now has a REPL! Thanks go to GNU Guile and especially Mark Weaver, who guided me through the process (along with nalaginrut who answered my first clueless questions…). To test the REPL, get the current code snapshot, unpack it, run ./bootstrap.sh, start guile with $ guile -L . (requires guile 2.x) and enter ,language wisp. Example usage:

display "Hello World!\n"

then hit enter thrice. Voilà, you have wisp at the REPL!Caveeat: the wisp-parser is still experimental and contains known bugs. Use it for testing, but please do not rely on it for important stuff, yet.

2 What is wisp?

Wisp is a simple preprocessor which turns indentation sensitive syntax into Lisp syntax.

The basic goal is to create the simplest possible indentation based syntax which is able to express all possibilities of Lisp.

Basically it works by inferring the brackets of lisp by reading the indentation of lines.

It is related to SRFI-49 and the readable Lisp S-expressions Project (and actually inspired by the latter), but it tries to Keep it Simple and Stupid. Instead of a full alternate reader like readable, it is a simple preprocessor which can be called by any lisp implementation to add support for indentation sensitive syntax.

Just call ./wisp.py –help to see what you can do with it (`./wisp.py -` takes its input from stdin, so it can be used with pipes):

Currently wisp is implemented in Python, because that’s the language which I know best and which inspired my wish to use indentation-sensitive syntax in Lisp. To repeat the initial quote:

I love the syntax of Python, but crave the simplicity and power of Lisp.

With wisp I hope to make it possible to create lisp code which is easily readable for non-programmers (and me!) and at the same time keeps the simplicity and power of Lisp.

Its main technical improvements over SRFI-49 and Project Readable are using lines prefixed by a dot (". ") to mark the continuations of the parameters of a function after intermediate function calls and working as a simple preprocessor which can be used with any flavor of Lisp.

The dot-syntax means, instead of marking every function call, it marks every line which does not begin with a function call - which is the much less common case in lisp-code.

3 Wisp syntax rules

A line without indentation is a function call, just as if it would start with a bracket.

display "Hello World!" ↦ (display "Hello World!")

A line which is more indented than the previous line is a sibling to that line: It opens a new bracket.

A line which is not more indented than previous line(s) closes the brackets of all previous lines which have higher or equal indentation. You should only reduce the indentation to indentation levels which were already used by parent lines, else the behaviour is undefined.

To add any of ' , or ` to a bracket, just prefix the line with any combination of "' ", ", " or "` " (symbol followed by one space).

' "Hello World!" ↦ '("Hello World!")

A line whose first non-whitespace characters are a dot followed by a space (". ") does not open a new bracket: it is treated as simple continuation of the first less indented previous line. In the first line this means that this line does not start with a bracket and does not end with a bracket, just as if you had directly written it in lisp without the leading ". ".

A line which contains only whitespace and a colon (":") defines an indentation level at the indentation of the colon. It opens a bracket which gets closed by the next less-indented line. If you need to use a colon by itself. you can escape it as "\:".

You can replace any number of consecutive initial spaces by underscores, as long as at least one whitespace is left between the underscores and any following character. You can escape initial underscores by prefixing the first one with \ ("\___ a" → "(___ a)"), if you have to use them as function names.

3.3 Nested function calls

If a line is more indented than a previous line, it is a sibling to the previous function: The brackets of the previous function gets closed after the (last) sibling line.

3.4 Continue function arguments

By using a . followed by a space as the first non-whitespace character
on a line, you can mark it as continuation of the previous
less-indented line. Then it is no function call but continues the list
of parameters of the funtcion.

I use a very synthetic example here to avoid introducing additional unrelated concepts.

3.6 One-line function calls inline

Since we already use the colon as syntax element, we can make itpossible to use it everywhere to open a bracket - even within a linecontaining other code. Since wide unicode characters would make ithard to find the indentation of that colon, such an inline-functioncall always ends at the end of the line. Practically that means, theopened bracket of an inline colon always gets closed at the end of theline.

3.7 Visible indentation

To make the indentation visible in non-whitespace-preservingenvironments like badly written html, you can replace any number ofconsecutive initial spaces by underscores, as long as at least onewhitespace is left between the underscores and any followingcharacter. You can escape initial underscores by prefixing the firstone with \ ("\___ a" → "(___ a)"), if you have to use them as functionnames.

4 Syntax justification

I do not like adding any unnecessary syntax element to lisp. So I want to show explicitely why the syntax elements are required to meet the goal of wisp: indentation-based lisp with a simple preprocessor.

4.1 . (the dot)

We have to be able to continue the arguments of a function after a call to a function, and we must be able to split the arguments over multiple lines. That’s what the leading dot allows. Also the dot at the beginning of the line as marker of the continuation of a variable list is a generalization of using the dot as identity function - which is an implementation detail in many lisps.

`(. a)` is just `a`.

So for the single variable case, this would not even need additional parsing: wisp could just parse ". a" to "(. a)" and produce the correct result in most lisps. But forcing programmers to always use separate lines for each parameter would be very inconvenient, so the definition of the dot at the beginning of the line is extended to mean “take every element in this line as parameter to the parent function”.

Essentially this dot-rule means that we mark variables at the beginning of lines instead of marking function calls, since in Lisp variables at the beginning of a line are much rarer than in other programming languages. In lisp assigning a value to a variable is a function call while it is a syntax element in many other languages, so what would be a variable at the beginning of a line in other languages is a function call in lisp.

(Optimize for the common case, not for the rare case)

4.2 : (the colon)

For double brackets and for some other cases we must have a way to mark indentation levels without any code. I chose the colon, because it is the most common non-alpha-numeric character in normal prose which is not already reserved as syntax by lisp when it is surrounded by whitespace, and because it already gets used for marking keyword arguments to functions in Emacs Lisp, so it does not add completely alien characters.

The function call via inline " : " is a limited generalization of using the colon to mark an indentation level: If we add a syntax-element, we should use it as widely as possible to justify the added syntax overhead.

But if you need to use : as variable or function name, you can still do that by escaping it with a backslash (example: "\:"), so this does not forbid using the character.

4.3 _ (the underscore)

In Python the whitespace hostile html already presents problems with sharing code - for example in email list archives and forums. But in Python the indentation can mostly be inferred by looking at the previous line: If that ends with a colon, the next line must be more indented (there is nothing to clearly mark reduced indentation, though). In wisp we do not have this help, so we need a way to survive in that hostile environment.

The underscore is commonly used to denote a space in URLs, where spaces are inconvenient, but it is rarely used in lisp (where the dash ("-") is mostly used instead), so it seems like a a natural choice.

You can still use underscores anywhere but at the beginning of the line. If you want to use it at the beginning of the line you can simply escape it by prefixing the first underscore with a backslash (example: "\___").

5 Background

A few months ago I found the readable Lisp project which aims at producing indentation based lisp, and I was thrilled. I had already done a small experiment with an indentation to lisp parser, but I was more than willing to throw out my crappy code for the well-integrated parser they had.

Fast forward half a year. It’s February 2013 and I started reading the readable list again after being out of touch for a few months because the birth of my daughter left little time for side-projects. And I was shocked to see that the readable folks had piled lots of additional syntax elements on their beautiful core model, which for me destroyed the simplicity and beauty of lisp. When language programmers add syntax using \\, $ and <>, you can be sure that it is no simple lisp anymore. To me readability does not just mean beautiful code, but rather easy to understand code with simple concepts which are used consistently. I prefer having some ugly corner cases to adding more syntax which makes the whole language more complex.

I told them about that and proposed a simpler structure which achieved almost the same as their complex structure. To my horror they proposed adding my proposal to readable, making it even more bloated (in my opinion). We discussed a long time - the current syntax for inline-colons is a direct result of that discussion in the readable list - then Alan wrote me a nice mail, explaining that readable will keep its direction. He finished with «We hope you continue to work with or on indentation-based syntaxes for Lisp, whether sweet-expressions, your current proposal, or some other future notation you can develop.»

It took me about a month to answer him, but the thought never left my mind (@Alan: See what you did? You anchored the thought of indentation based lisp even deeper in my mind. As if I did not already have too many side-projects… :)).

Then I had finished the first version of a simple whitespace-to-lisp preprocessor.

And today I added support for reading indentation based lisp from standard input which allows actually using it as in-process preprocessor without needing temporary files, so I think it is time for a real release outside my Mercurial repository.

Freenet

It lets you anonymously share files, browse and publish “freesites”, chat on forums and even do microblogging, using a generic Web of Trust, shared by different plugins, to avoid spam. For really careful people it offers a “darknet” mode, where users only connect to their friends, with which it is very hard to detect that they are running freenet.

The overarching design goal of freenet is to make censorship as hard as technically possible. That’s the reason for providing anonymity (else you could be threatened with repercussions - as seen in the case of the wikileaks informer from the army in the USA), building it as a decentral network (else you could just shut down the central website, as people tried with wikileaks), providing safe pseudonyms and caching of the content on all participating nodes (else people could censor by spamming or overloading nodes) and even the darknet mode and enhancements in usability (else freenet could be stopped by just prosecuting everyone who uses it, or it would reach too few people to be able to counter censorship in the open web).

I don’t know anymore what triggered my use of freenet initially, but I know all too well what keeps me running it instead of other anonymizers:

I see my country (Germany) turning more and more into a police-state, starting with attacks on p2p, continuing with censorship of websites (“we all know child-porn is bad, so it can’t be bad to censor it, right? Sure we could just make the providers delete it, so noone can access it, but… no, we have to censor it, so only people who can use google can find it – which luckily excludes us, because we are not pedocriminals.”) and leading into directions I really don’t like.

And in case the right for freedom of speech dies, we need a place where we can organize to get it back and fight for the rights laid out in our constitution (the Grundgesetz).

PS: New entries on my site are also available in freenet (via freereader: downloads RSS feeds and republishes them in freenet).

PPS: If you like this text, please redent/retweet the associated identi.ca/twitter notices so it spreads:

https://identi.ca/notice/46221737

https://twitter.com/ArneBab/status/21217822748

50€ for the Freenet Project - and against censorship

As I pledged1, I just donated to freenet 50€ of the money I got back because I cannot go to FilkCONtinental. Thanks go to Nemesis, a proud member of the “FiB: Filkers in Black” who will take my place at the Freusburg and fill these old walls with songs of stars and dreams - and happy laughter.

A bitcoin-marketplace using Freenet?

A few days ago, xor, the developer of the Web of Trust in Freenet got in contact with the brain behind the planned Web of Trust for Openbazaar, and toad, the former maintainer of Freenet questioned whether we would actually want a marketplace using Freenet.

I took a a few days to ponder the question, and I think a marketplace using Freenet would be a good idea - for Freenet as well as for society.

Freenet is likely the most secure way for implementing a digital market, which means it can work safely for small sums, but not for large ones - except if you can launder huge amounts of digital money. As such it is liberating for small people, but not for syndicates. For example a drug cartel needs to be able to turn lots of money into clean cash to pay henchmen abroads. Since you can watch bitcoin more easily than cash and an anonymous network makes it much harder to use scare-tactics against competing sellers, moving the marketplace from the street to the internet weakens syndicates and other organized crime by removing part of their options for creating a monopoly by force.

Also the best technologies in freenet were developed (or rather: got to widespread use), because it had to actually withstand attacks.

Freenet as marketplace with privacy for small people equivalent to cash-payments would also help improve its suitability for whistleblowers - see hiding in the forest: A better alternative.

For free speech this would also help, because different from other solutions, freenet has the required properties for that: a store with lifetime depending on the popularity of content, not the power of the publisher, which provides DoS-resistant hosting without the need to have a 24/7 server, stable and untraceable pseudonyms (ignoring fixable attack-vectors) and an optional friend-to-friend darknet.

In short: A decentralized ebay-killer would be cool and likely beneficial to Freenet and Free Speech without bringing actual benefit for organized crime.

Also this might be what is needed to bring widespread darknet adoption.

And last but not least, we would not be able to stop people from implementing a marketplace over freenet: Censorship resistance also means resistance against censorship by us.

Final note: Openbazaar is written in Python and Freenet has decent Python Bindings (though they are not beautiful everywhere), so it should not be too hard to use it for Openbazaar. A good start could be the WoT-code written for Infocalypse in last years GSoC: Web of Trust integration as well as private messaging.

A vision for a social Freenet with WoT, FreeTalk and Sone

I let my thought wander a bit around the question how a social Freenet (2.0 ;) ) could look from the view of a newcomer.

I imagine myself installing freenet. The first thing to come up after starting it is the node page.
(italic Text in brackets is a comment. The links need a Freenet running on 127.0.0.1 to work)

“Welcome to Freenet, where no one can tell you’re reading”

“Freenet tries hard to project your privacy. Therefore we created a pseudonymous ID for you. Its name is Gandi Schmidt. Visit the [your IDs site] to see a legend we prepared for you. You can use this legend as fictional background for your ID, if you are really serious about staying anonymous.”

(The name should be generated randomly for each ID. A starting point for that could be a list of scientists from around the world compiled from the wikipedia (link needs freenet).
The same should be true for the legend, though it is harder to generate. The basic information should be a quote (people remember that), a job and sex, the country the ID comes from (maybe correlated with the name) and a hobby.)

“During the next few restarts, Freenet will ask you to solve various captchas to prove that you are indeed human. Once enough other nodes successfully confirmed that you are human, you will gain write access to the forums and microblogging. This might take a few hours to a few days.”

(as soon as the ID has sufficient trust, automatically activate posting to FreeTalk, Sone and others. Access is delayed to ensure that when people talk they can get answers)

“Note that other nodes don’t know who you are. They don’t know your IP, nor your real identity. The only thing they know is that you exist, that you can solve captchas and how to send you a message.”

“You can create additional IDs at any time and give them any name and legend you choose by adding it on the WebOfTrust-page. Each new ID has to verify for itself that it’s human, though. If you carefully keep them seperate, others can only find out with a lot of effort that your IDs are related. Mind your writing style. In doubt, keep your sentences short. To make it easier for you to stay anonymous, you can autogenerate Name and Legend at random. Don’t use the nicest from many random trials, else you can be traced by the kind of random IDs you select.”

“While your humanity is being confirmed, you can find a wealth of content on the following indexes, some published anonymously, some not. If you want to publish your own anonymous site, see Upload a Freesite. The list of indexes uses dynamic bookmarks. You get notified whenever a bookmarked site (like the indexes below) gets updated.”

“Note: If you download content from freenet, it is being cached by other nodes. Therefore popular content is faster than rare content and you cannot overload nodes by requesting their data over and over again.”

“You are currently using medium security in the range from low to high.”

“In this security level, seperated IDs are no perfect protection of your anonymity, though, since other members might not be able to see what you do in Freenet, but they can know that you use freenet in the first place, and corporations or governments with medium sized infrastructure can launch attacks which might make it possible to trace your contributions and accesses. If you want to disappear completely from the normal web and keep your freenet usage hidden, as well as make it very hard to trace your contributions, to be able to really exercise your right of free speech without fearing repercussions, you can use Freenet as Darknet — the more secure but less newcomer friendly way to use freenet; the current mode is Opennet.”

“To enter the Darknet, you add people you know and trust personally as your darknet friends. As soon as you have enough trusted friends, you can increase the security level to high and freenet will only connect to your trusted friends, making you disappear from the regular internet. The only way to tell that you are using freenet will then be to force your ISP to monitor all traffic coming from your computer.”

“And once transport plugins are integrated, steganography will come into reach and allow masking your traffic as regular internet usage, making it very hard to distinguish freenet from encrypted internet-telephony. If you want to help making this a reality in the near future, please consider contributing or donating to freenet.”

“Welcome to the pseudonymous web where no one can know who you are, but only that you are always using the same ID — if you do so.”

“To show this welcome message again, you can at any time click on Intro in the links.”

What do you think? Would this be a nice way to integrate WoT, FreeTalk, Sone and general user education in a welcome message, while adding more incentive to keep the node running?

PPS: This vision is not yet a reality, but all the necessary infrastructure is already in place and working in Freenet. You can already do everything described in here, just without the nice guide and the level of integration (for example activating plugins once you have proven your humanity, which equals enough trust by others to be actually seen).

Anonymous code collaboration with Mercurial and Freenet

Anonymous DVCS in the Darknet.

There is a new Mercurial extension for interaction with Freenet called "infocalypse" (which should keep working after the information apocalypse).

It offers "fn-push" and "fn-pull" as an optimized way to store code in freenet: bundles are inserted and pulled one after the other. An index tells infocalypse in which order to pull the bundles. It makes using Mercurial in freenet far more efficient and convenient.

The rest of the article is concerned with the older FreenetHG extension. If you need to choose between the two, use Infocalypse: It’s concept for sharing over Freenet is more robust.

Using FreenetHG you can collaborate anonymously without having to give everyone direct write access to your code.

To work with others, you simply setup a local repository for your own work and use FreenetHG to upload your code automatically into Freenet under your private ID. Others can then access your code with the corresponding public ID, do their changes locally and publish them in their own anonymous repository.

You then pull changes you like into your repository and publish them again under your key.

FreenetHG uses freenet which offers the concept of pseudonymity to make anonymous communication more secure and Mercurial to allow for efficient distributed collaboration.

With pseudonymity you can't find out whom you're talking to, but you know that it is the same person, and with distibuted collaboration you don't need to let people write to your code directly, since every code repository is a full clone of the main repository.

Even if the main repository should go down, every contributor can still work completely unhindered, and if someone else breaks things in his repository, you can simply decide not to pull the changes from him.

What you need

To use FreenetHG you obviously need a running freenet node and a local Mercurial installation. Also you need the FreenetHG plugin for Mercurial and PyFCP which provides Python bindings for Freenet.

This is the line others can use to clone your project and pull from it.

And with this we finished setting up our anonymous collaboration repository.

When we commit, every commit will directly be uploaded into Freenet.

So now we can pass the freenet Request uri to others who can clone our repository and setup their own repositories in freenet. When they add something interesting, we then pull the data from their Request uri and merge their code with ours.

Setup a more convenient anonymous workflow

This workflow is already useful, but it's a bit inconvenient to have to wait after each commit until your changes have been uploaded. So we'll now change this basic workflow a bit to be able to work more conveniently.

First step: clone our repositories to a backup location:

hg clone AnoFoo BackFoo

Second step: change our .hg/hgrc to only update when we push to the backup repository, and add the default-push path to the backup repository:

Changes: We now have a default-push path, and we changed the "commit" hook to an "outgoing" hook which is evoked everytime changes leave this repository. It will also be evoked when someone pulls from this repo, but not when we clone it locally.

Now our commits roll as fast as we're used to from other Mercurial repositories and freenethg will make sure we don't use the wrong username.

When we want to anonymously publish the repository we then simply use

hg push

This will push the changes to the backup and then upload it to your anonymous repository.

And now we finished setting up our reopsitory and can begin using an anonymous and almost infinitely scaleable workflow which only requires our freenet installation to be running when we push the code online.

One last touch: If an upload should chance to fail, you can always repeat it manually with

Background of Freenet Routing and the probes project (GSoC 2012)

The probes project is a google summer of code project of Steve Dougherty intended to optimize the network structure of freenet. Here I will give the background of his project very briefly:

The Small World Structure

Freenet organizes nodes by giving them locations - like coordinates. The nodes know some others and can send data only to those, to which they are connected directly. If your node wants to contact someone it does not know directly, it sends a message to one of the nodes it knows and asks that one to forward the message. The decision whom to ask to forward the message is part of the routing.

And the routing algorithm in Freenet assumes a small world network: Your node knows many people who are close to you and a few who are far away. Imagine that as knowing many people in your home town and few in other towns. There is mathematical proof, that the routing is very efficient and scales to billions of users - if it really operates on a small world network.

So each freenet node tries to organize its connections in such a way, that it is connected to many nodes close by and some from far away.⁽¹⁾ The structure of the local connections of your own node can be characterized by the link length distribution: “How many short and how many long connections do you have?”

Probes and their Promise

The probes project from Steve is to analyze the structure of the network and the structure of the local connections of nodes in an anonymous way to improve the self-organization algorithm in freenet. The reason is that if the structure of the network is no small world network, the routing algorithm becomes much less efficient.

That in turn means that if you want to get some data on the network, that data has to travel over far more intermediate nodes, because freenet cannot determine the shortest route. And if the data has to travel over more nodes, it consumes more bandwidth and takes longer to reach you. In the worst case it could happen that freenet does not find the data at all.

To estimate the effect of that, you can look at the bar chart The Seeker linked to:

Low is an ideal structure with 16 connections per node, Conforming is the measured structure with about 17 connections per node (a cluster with 12, one with ~25). Ideally we would want Normal with 26 connections per node and an ideal structure. High is 86 connections. The simulated network sizes are 6000 nodes (Small), 18 000 (Normal, as measured), 36 000 (Large). Fewer hops is better.

It shows how many steps a request has to take to find some content. “Conforming” is the actually measured structure. “low”, “normal” and “high” shows the number of connections per node in an optimal network: 16, 26 and 86. The actually measured mean number of connections in freenet is similar to “low”, so that’s the bar with which we need to compare the “confirming” bar to see the effect of the suboptimal structure. And that effect is staggering: By default a request needs about two times as many steps in the real world than it would need in an optimally structured network.

Practically: If freenet would manage to get closer to the optimal structure, it could double its speed and cut the reaction times by factor 2. Without changing anything else - and also without changing the local bandwidth consumption: You would simply get your content much faster.

If we would manage to increase the mean number of connections to about 26 (that’s what a modern DSL connection can manage without too many ill effects), we could double the speed and half the reaction times again (but that requires more bandwidth in the nodes who currently have a low number of connections: Many have only about 12 connections, many have about 25 or so, few have something in between).

Essentially that means we could gain factor 2 to factor 4 in speed and reaction times. And better scaleability (compare the normal and the large network).

Note ⁽¹⁾: Network Optimization using Only Local Knowledge

To achieve a good local connection-structure, the node can use different strategies for Opennet and Darknet (this section is mostly guessed, take it with a grain of salt. I did not read the corresponding code).

In Opennet it can look if it finds nodes which would improve its local structure. If it finds one, it can replaces the local connection, which distorts its local structure the most, with the new connection.

In Darknet on the other hand, where it can only connect to the folks it already knows, it looks for locations of nodes it hears about. It then checks if its local connection would be better if it had that other nodes location. In that case, it asks the other node if it would agree to swap its location with it (without changing any real connections: It only changes the notion where it lives. As if you would swap the flat with someone else but without changing who your friends are. Afterwards both the other one and you live closer to your respective friends).

In short: In Opennet, Freenet changes to whom it is connected in order to achieve a small world structure: It selects its friends based on where it lives. In Darknet it swaps its location with stranges to live be closer to its friends.

Bootstrapping the Freenet WoT with GnuPG - and GnuPG with Freenet

Intro

When you enter the freenet Web of Trust, you first need to get some
trust from people by solving captchas. And even when people trust you
somehow, you have no way to prove your identity in an automatic way,
so you can’t create identities which freenet can label as trusted
without manual intervention from your side.

Proposal

To change this, we can use the Web of Trust used in GnuPG to infer
trust relationships between freenet WoT IDs.

Practically that means:

Write a message: “I am the WoT ID USK@” (replace with the
public key of your WoT ID).

Sign that message with a GnuPG key you want to connect to the
ID. The signature proves, that you control the GnuPG key.

Upload the signed message to your WoT key:
USK@/bootstrap/0/gnupg.asc. To make this upload, you need the
private key of the ID, so the upload proves, that you control the
WoT ID.

Now other people can download the file from you, and when they trust
the GnuPG key, they can transfer their trust to the freenet WoT-ID.

Automatic

Ideally all this should be mostly automatic:

click a link in the freenet interface and select the WoT ID to have
freenet create the file and run your local GnuPG program.

Then select your GnuPG key in the GnuPG program and enter your
password.

Finally check the information to be inserted and press a button to
start the upload.

As soon as you have a GnuPG key connected with your WoT ID, freenet
should scout all other WoT IDs for gnupg keys and check if the local
GnuPG key you assigned to your WoT ID trusts the other key. If yes,
give automatic trust (real person → likely no spammer).

Anonymously

To make the connection one-way (bootstrap the WoT from GnuPG, but not
expose the key), you might be able to encrypt the message to all
people who signed your GnuPG key. Then these can recognize you, but
others cannot.

This will lose you the indirect trust in the GnuPG web-of-trust, though.

I hope this bootstrap-WoT draft sounded interesting :)

Happy hacking!

De-Orchestrating Freenet with the QUEEN program

So Poul-Henning Kamp thought this just a thought experiment …

In Fosdem2014 Poul-Henning Kamp talked about a hypothetical “Project ORCHESTRA” by the NSA with the goal of disrupting internet security: Information, Slides, Video (with some gems not in the slides).

One of the ideas he mentioned was the QUEEN program: Psy-Ops for Nerds.

I’ve been a contributor to the Freenet Project for several years. And in that time, I experienced quite a few of these hypothetical tactics first-hand.

This is the list of good matches: Disruptive actions which managed to keep Freenet from moving onwards, often for several months. It’s quite horrifying how many there are. Things which badly de-orchestrated Freenet:

Steer discussions to/from hot spots (“it can’t be that hard to exchange a text file!” ⇒ noderef exchange fails all the time, which is the core of darknet!)

Disrupt consensus building: Horribly long discussions which cause the resolution to be forgotten due to a fringe issue.

“Secrecy without authentication is pointless”.

“It gives a false sense of security” (if you talor [these kind of things] carefully, they speak to people's political leanings: If it’s not perfect: “No, that wouldn’t do it”. This stopped many implementations, till finally Bombe got too fed up and started the simple and working microblogging tool Sone)

“you shouldn’t do that! Do you really know what you are doing? Do you have a PhD in that? The more buttons you press, the more warnings you get” ← this is “filter failed”: No, I don’t understand this, “get me out of that!” ⇒ Freenet downloads fail when the filter failed.

Getting people to not do things by misdirecting their attention on it. Just check the Freenet Bugtracker for unresolved simple bugs with completely fleshed out solutions that weren’t realized.

FUD: I could be supporting bad content! (just like you do if your provider has a transparent proxy to reduce outgoing bandwidth - or with any VPN, Tor, i2p, .... Just today I read this: « you seriously think people will ever use freenet to post their family holiday photos, favourite recipes etc? … can you envisage ordinary people using freenet for stuff where they don't really have anything to hide? » — obvious answer: I do that, so naturally other people might do it, too.)

“Bikeshed” discussions: Sometimes just one single email from an anonymous person can derail a free software project for months!

Soak mental bandwidth with bogus crypto proposals: PSKs? (a new key-proposal which could make forums scale better but actually just soaked up half a year of the time of the main developer and wasn’t implemented - and in return, critical improvements for existing forums where delayed)

Witless volunteers (overlooking practical advantages due to paranoia, theoretical requirements which fail in the real world, overly pessimistic stance which scares away newcomers, voicing requirements for formal specification of protocols which are in flux).

Affect code direction (lot’s of the above - also ensuring that there is no direction, so it doesn’t really work well for anybody because it tries to have the perfect features for everybody before actually getting a reasonable user experience).

Code obfuscation (some of the stuff is pretty bad, lots of it looks like it was done in a hurry, because there was so much else to do).

Misleading documentation (or outdated or none…: There is plenty of Freenet 0.5 documentation while 0.7 is actually a very different beast)

Deceptive defaults (You have to setup your first pseudonym by hand, load two plugins manually and solve CAPTCHAS, before you are able to talk to people anonymously, darknet does not work out of the box, the connection speed when given no unit is interpreted as Bytes/s - I’m sure someone once voiced a reason for that)

Phew, quite a list…

I provided this because naming the problems is an important step towards resolving them. I am sure that we can fix most of this, but it’s important to realize that while many of the points I named are most probably homegrown, it is quite plausible that some of them were influenced from the outside. Freenet was always a pretty high profile project in the crypto community, so it is an obvious target. We’d be pretty naive to think that we weren’t targeted.

And we have to keep this in mind when we communicate: We don’t only have to look out for bad code, but also for influences which make us take up toxic communication patterns which keep us from moving forward.

The most obvious fix is: Stay friendly, stick together, keep honest and greet every newcomer as a potential ally. And call out disrupting behaviour early on: If someone insults new folks or takes up huge amounts of discussion time by rehashing old discussions instead of talking about the way forward - in a way which actually leads to going forward - then say that this is your impression. Still stay friendly: Most of the time that’s not intentional. And people can be affected by outside influences like someone attacking them in other channels, so it would be important to help them recover and not to push them away because their behaviour became toxic for some time (as long as the time investment for that is not overarching).

Overall it’s about keeping the community together despite the knowledge that some of us might actually be aggressors or influenced from the outside to disrupt our work.

Effortless password protected sharing of files via Freenet

Inserting a file into freenet using the key KSK@<password> creates an invisible, password protected file which is available over Freenet.

Often you want to exchange some content only with people who know a given password and make it accessible to everyone in your little group but invisible to the outside world.

Until yesterday I thought that problem slightly complex, because everyone in your group needs a given encryption program, and you need a way to share the file without exposing the fact that you are sharing it.

<ArneBab> evanbd: If I insert a tiny file without telling anyone the key, can they get the content in some way?
<evanbd> ArneBab: No.

<toad_> dogon: KSK@<any string of text> -> generate an SSK private key from the hash of the text
<toad_> dogon: if you know the string, you can both insert and retrieve it

In other words: Just inserting a file into freenet using the key KSK@<password> creates an invisible, password protected file which is shared over Freenet.

The file is readable and writeable by everyone who knows the password (within limits1), but invisible to everyone else.

To upload a file as KSK, just go to the filesharing tab, click “upload a file”, switch to advanced mode and enter the KSK key.

Or simply click here(requires freenet to be running on your computer with default settings).

It’s strange to think that I only learned this after more than 7 years of using Freenet. How many more nuggets might be hidden there, just waiting for someone to find them and document them in a style which normal users understand?

Freenet is a distributed datastore which can find and transfer data efficiently on restricted routes (search for meshnet scaling to see why that type of routing is really hard), and it uses a WebOfTrust for real-life spam-resistance without the need for a central authority (look at your mailbox to see how hard that is, even with big money).

How many more complex problems might it already have solved as byproduct of the search for censorship resistance?

So, what’s still to be said? Well, if Freenet sounds interesting: Join in!

A KSK is writeable with the limit, that you cannot replace the file if people still have it in their stores: You have to wait till it has been displaced or be aware that now two states for the file exist: One with your content and one with the old. Better just define a series of KSKs: Add a number to the KSK and if you want to write, simply insert the next one. ↩

Exact Math to the rescue - with Guile Scheme

I needed to calculate the probability that for every freenet user there are at least 70 others in a distance of at most 0.01. That needs binomial coefficients with n and k on the order of 4000. My old Python script failed me with an OverflowError: integer division result too large for a float. So I turned to Guile Scheme and exact math.

To use this with exact math, I just need to call it with p as exact number:

(use-modules (spielfaehig))
(spielfähig #e.03 4000 70)
; ^ note the #e - this means to use an exact representation; of the number; To make Guile show a float instead of some huge division, just; convert the number to an inexact representation before showing it.
(format #t "~A\n" (exact->inexact (spielfähig #e.03 4000 70)))

And that’s it. Automagic hassle-free exact math is at my fingertips.

It just works and uses less then 200 MiB of memory - even though the intermediate factorials return huge numbers. And huge means huge. It effortlessly handles numbers with a size on the order of 108000. That is 10 to the power of 8000 - a number with 8000 digits.

4 The Answer

42! :)

The real answer is 0.0125: That’s the maximum length we need to choose for short links to get more than a 95% probability that in a network of 4000 nodes there are at most 5 nodes for which there are less than 70 peers with a distance of at most the maximum length.

If we can assume 5000 nodes, then 0.01 is enough. And since this is the number we directly got from an analysis of our link length distribution, it is the better choice, though it will mean that people with huge bandwidth cannot always max out their 100 connections.

5 Conclusion

Most of the time, floats are OK. But there are the times when you simply need exact math.

In this text I want to explore the behaviour of the degrading yet redundant anonymous file storage in Freenet. It only applies to files which were not subsequently retrieved.

Every time you retrieve a file, it gets healed which effectively resets its timer as far as these calculations here are concerned. Due to this, popular files can and do live for years in freenet.

1 Static situation

Firstoff we can calculate the retrievability of a given file with different redundancy levels, given fixed chunk retrieval probabilities.

Files in Freenet are cut into segments which are again cut into up to 256 chunks each. With the current redundancy of 100%, only half the chunks of each segment have to be retrieved to get the whole file. I call that redundancy “2x”, because it inserts data 2x the size of the file (actually that’s just what I used in the code and I don’t want to force readers - or myself - to make mental jumps while switching from prose to code).

We know from the tests done by digger3, that after 31 days about 50% of the chunks are still retrievable, and after 30 days about 30%. Let’s look how that affects our retrieval probabilities.

To find a better approximation of the effects of increasing the redundancy, we have to stop looking at freenet as a fixed store and have to start seeing it as a process. More exactly: We have to look at the replacement rate.

2.1 Math

A look on the stats from digger3 shows us, that after 4 weeks 50% of the chunks are gone. Let’s call this the dropout rate. The dropout rate consists of churn and chunk replacement:

dropout = churn + replacement

Since after one day the dropout rate is about 10%, I’ll assume that the churn is lower than 10%. So for the following parts, I’ll just ignore the churn (naturally this is wrong, but since the churn is not affected by redundancy, I just take it as constant factor. It should reduce the negative impacts of increasing redundancy). So we will only look at replacement of blocks.

Replacement consists of new inserts and healing of old files.

replacement = insert + healing

If we increase the redundancy from 2 to 3, the insert and healing rate should both increase by 50%, so the replacement rate should increase by 50%, too. The healing rate might increase a bit more, because healing can now restore 66% of the file as long as at least 33% are available. I’ll ignore that, too, for the time being (which is wrong again. We will need to keep this in mind when we look at the result).

redundancy 2 → 3 ⇒ replacement rate × 1.5

Increasing the replacement rate by 50% should decrease the lifetime of chunks by 1/1.5, or:

chunk lifetime × 2/3

So we will be at the 50% limit not after 4 weeks, but after 10 days. But on the other hand, redundancy 3 only needs 33% chunk probability, which has 2× the lifetime of 50% chunk probability. So the file lifetime should change by 2×2/3 = 4/3:

file lifetime × 4/3 = file lifetime +33%

Now doesn’t that look good?

As you can imagine, this pretty picture hides a clear drawback: The total storage capacity of Freenet gets reduced by 33%, too, because now every file requires 1.5× as much space as before.

2.2 Caveats (whoever invented that name? :) )

We ignored churn, so the chunk lifetime reduction should be a bit less than the estimated 33%%. That’s good and life is beautiful, right? :)

NO. We also ignored the increase in the healing rate. This should be higher, because every retrieved file can now insert more of itself in the healing process. If we had no new inserts, I would go as far as saying that the healing-rate might actually double with the increased redundancy. So in a network completely filled network without new data, the effects of the higher redundancy and the higher replacement rate would exactly cancel. But the higher redundancy would be able to store less files. Since we are constantly pushing new data into the network (for example via discussions in Sone), this should not be the case.

2.3 Dead space

Aside from hiding some bad effects, this simple model also hides a nice effect: A decreased amount of dead space.

Firstoff, lets define it:

2.4 What is dead space?

Dead space is the part of the storage space which cannot be used for retrieving files. With any redundancy, that dead space is just about the size of the original file without redundancy multiplier. So for redundancy 2, the storage space occupied by the file is dead, when less than 50% are available. With redundancy 3, it is dead when less than 33% are available.

2.5 Effect

That dead space is replaced like any other space, but it is never healed. So the higher replacement rate means that dead space is recovered more quickly. So, while a network with higher redundancy can store less files overall, those files which can no longer be retrieved take up less space. I won’t add the math for that, here, though (because I did not do that yet).

2.6 Closing

So, as closing remark, we can say that increasing the redundancy will likely increase the lifetime of files. It will also reduce the overall storage space in Freenet, though. I think it would be worthwhile.

It might also be possible to give probability estimates in the GUI which show how likely it is that we can retrieve a given file after a few percent were downloaded: If more than 1/redundancy chunks succeed, the probability to get the file is high. if close to 1/redundancy succeed, the file will be slow, because we might have to wait for nodes which went online and will come back at some point. Essentially we will have to hope for churn. If much less than 1/redundancy of the chunks succeed, we can stop trying to get the file.

Just use the code in here for that :)

3 Background and deeper look

3.1 No redundancy

Let’s start with redundancy 1. If one chunk fails, the whole file fails.

Compared to freenet today the replacement rate would be halved, because each file takes up only half the current space. So the 50% dead chunks rate would be reached after 8 weeks instead of after 4 weeks. And 90% would be after 2 days instead of after 1 day. We can guess that 99% would be after a few hours.

Let’s take a file with 100 chunks as example. That’s 100× 32 kiB, or about 3 Megabyte. After a few hours the chance will be very high that it will have lost one chunk and will be irretrievable. Freenet will still have 99% of the chunks, but they will be wasted space, because the file cannot be recovered anymore. The average lifetime of a file will just be a few hours.

With 99% probability of retrieving a chunk, the probability of retrieving a file will be only about 37%.

66% would be reached in the current network after about 20 days (between 2 weeks and 4 weeks), and in a zero redundancy network after 40 days. fetch-pull-stats

At the same time, though, the chunk replacement rate increased by 50%, so the mean chunk lifetime decreased by factor 2/3. So the lifetime of a file would be 4 weeks.

3.4 Generalize this

So, now we have calculations for redundancy 1, 1.5, 2 and 3. Let’s see if we can find a general (if approximate) rule for redundancy.

From the fetch-pull-graph from digger3 we see empirically, that between one week and 18 weeks each doubling of the lifetime corresponds to a reduction of the chunk retrieval probability of 15% to 20%.

Also we know that 50% probability corresponds to 4 weeks lifetime.

And we know that redundancy x has a minimum required chunk probability of 1/x.

With this, we can model the required chunk lifetime as a function of redundancy:

chunk lifetime = 4 * 2**((0.5-1/x)/0.2)

with x as redundancy. Note: this function is purely empirical and approximate.

Having the chunk lifetime, we can now model the lifetime of a file as a function of its redundancy:

file lifetime = (2/x) * 4 * (2**((0.5-1/x)/0.2))

We can now use this function to find an optimum of the redundancy if we are only concerned about file lifetime. Naturally we could get the trusty wxmaxima and get the derivative of it to find the maximum. But that is not installed right now, and my skills in getting the derivatives by hand are a bit rusty (note: install running). So we just do it graphically. The function is not perfectly exact anyway, so the errors introduced by the graphic solution should not be too big compared to the errors in the model.

Note however, that this model is only valid in the range between 20% and 90% chunk retrieval probability, because the approximation for the chunk lifetime does not hold anymore for values above that. Due to this, redundancy values close to or below 1 won’t be correct.

Also keep in mind that it does not include the effect due to the higher rate of removing dead space - which is space that belongs to files which cannot be recovered anymore. This should mitigate the higher storage requirement of higher redundancy.

4 Summary: Merit and outlook

Now, what do we make of this?

Firstoff: If the equations are correct, an increase in redundancy would improve the lifetime of files by a maximum of almost a week. Going further reduces the lifetime, because the increased replacement of old data outpaces the improvement due to the higher redundancy.

Also higher redundancy needs a higher storage capacity, which reduces the overall capacity of freenet. This should be partially offset by the faster purging of dead storage space.

The results support an increase in redundancy from 2 to 3, but not to 4.

Well, and aren’t statistics great? :)

Additional notes: This exploration ignores:

healing creates less insert traffic than new inserts by only inserting failed segments, and it makes files which get accessed regularly live much longer,

inter-segment redundancy improves the retrieving of files, so they can cope with a retrievability of 50% of any chunks of the file, even if the distribution might be skewed for a single segment,

Non-uniformity of the network which makes it hard to model effects with global-style math like this,

Seperate stores for SSK and CHK keys, which improve the availability of small websites and

Usability and security impact of increased insert times (might be reduced by only inserting 2/3rd of the file data and letting healing do the rest when the first downloader gets the file)

Due to that, the findings can only provides clues for improvements, but cannot perfectly predict the best path of action. Thanks to evanb for pointing them out!

This text was written and checked in emacs org-mode and exported to HTML via `org-export-as-html-to-buffer`. The process integrated research and documentation. In hindsight, that was a pretty awesome experience, especially the inline script evaluation. I also attached the org-mode file for your leisure :)

Freenet Communication Primitives: Part 1, Files and Sites

Basic building blocks for communication in Freenet.

This is a guide to using Freenet as backend for communication solutions - suitable for anything from filesharing over chat up to decentrally hosted game content like level-data. It uses the Python interface to Freenet for its examples.

This guide consists of several installments: Part 1 (this text) is about exchanging data, Part 2 is about finding people and services without drowning in spam and Part 3 is about confidential communication and tieing it all together. Happy Hacking and welcome to Freenet, the forgotten cypherpunk paradise where no one can watch you read!

1 Introduction

The immutable datastore in Freenet provides the basic structures for
implementing distributed, pseudonymous, spam-resistant communication
protocols. But until now there was no practically usable documentation
how to use them. Every new developer had to find out about them by
asking, speculating and second guessing the friendly source (also
known as SGTFS).

Just share this key, and others can retrieve it. Use
http://127.0.0.1:8888/ as prefix, and they can even click it - if
they run Freenet on their local computer or have an SSH forward for
port 8888.

The code above only returns once the file finished uploading. The
Freenet Client Protocol (that’s what fcp stands for) however is
asynchronous. When you pass async=True to n.put() or n.get(),
you get a job object which gives you the result via job.wait().

To generate the key without actually uploading the file, use
chkonly=True as argument to n.put().

This code anonymously uploads an invisible file into Freenet which can
only be retrieved with the right key. Then it downloads the file from
Freenet using the key and shows the data.

That the put and the get request happen from the same node is a
mere implementation detail: They could be fired by total strangers on
different sides of the globe and would still work the same. Even the
performance would be similar.

Note: fcp.node.FCPNode() opens a connection to the Freenet node. You
can have multiple of these connections at the same time, all tracking
their own requests without interfering with each other. Just remember
to call n.shutdown() on each of them to avoid getting ugly
backtraces.

So that’s it. We can upload and download files, completely
decentrally, anonymously and confidentially.

There’s just one caveat: We have to exchange the key. And to generate
that key, we have to know the content of the file.

Let’s fix that.

3 Public/Private key publishing: The SSK (signed subspace key)

Our goal is to create a key where we can upload a file in the
future. We can generate this key and tell someone else: Watch this
space.

So we will generate a key, start to download from the key and insert
the file to the key afterwards.

These 8 lines of code create a key which you could give to a
friend. Your friend will start the download and when you get hold of
that secret hello-file, you upload it and your friend gets it.

Hint: If you want to test whether the key you give is actually used,
you can check the result of n.put(). It returns the key with which
the data can be retrieved.

Using the .txt suffix makes Freenet use the mimetype
text/plain. Without extension it will use
application/octet-stream.

If you start downloading before you upload as we do here, you can
trigger a delay of about half an hour due to overload protections (the
mechanism is called “recently failed”).

Note that you can only write to a given key-filename combination
once. If you try to write to it again, you’ll get conflicts – your
second upload will in most cases just not work. You might recognize
this from immutable datastructures (without the conflict
stuff). Freenet is the immutable, distributed, public/private key
database you’ve been phantasizing about when you had a few glasses too
many during that long night. So best polish your functional
programming skills. You’re going to use them on the level of practical
communication.

3.1 short roundtrip time (speed hacks)

A SSK is a special type of key, and similar to inodes in a filesystem
it can carry data. But if used in the default way, it will forward to
a CHK: The file is salted and then inserted to a CHK which depends on
the content and then some, ensuring that the key cannot be predicted
from the data (this helps avoid some attacks against your anonymity).

When we want a fast round trip time, we can cut that. The condition is
that your data plus filename is less than 2KiB, the amount of data a
SSK can hold. And we have to get rid of the metadata. And that means:
Use the application/octet-stream mime type, because that’s the default
one, so it is left out on upload. And insert single files (we did not
yet cover uploading folders: You can do that, but they will forward to
a CHK).

When I run this code, I get less than 80 seconds round trip time. Remember that we’re
uploading two files anonymously into a decentralized network, discover
them and then download them, and all that in serial. Less than a
minute to detect an upload to known key.

90s is not instantaneous, but when looking at usual posting
frequencies in IRC and other chat, it’s completely sufficient to
implement a chat system. And in fact it’s how FLIP is implemented: IRC
over Freenet.

Compare this to the performance when we do not use the short round
trip time trick of avoiding the Metadata and using the realtime queue:

Now we can navigate to the key in the freenet web interface and look
at our freshly uploaded website! The text is colored red, so it uses
the stylesheet. We have files in Freenet which can reference each
other by relative links.

4.1 Multiple directories below an SSK

So now we can create simple websites on an SSK. But here’s a catch:
key/hello/hello.txt simply returns key/hello. What if we want
multiple folders?

For this purpose, Freenet provides manifests instead of single
files. Manifests are tarballs which include several files which are
then downloaded together and which can include references to external
files - named redirects. They can be uploaded as folders into the
key. And in addition to these, there are quite a few other
tricks. Most of them are used in freesitemgr which uses
fcp/sitemgr.py.

But we want to learn how to do it ourselves, so let’s do a more
primitive version manually via n.putdir():

But now that it’s there, how do we upload a better version? As already
said, files in Freenet are immutable. So what’s the best solution if
we can’t update the data, but only upload new files? The obvious
solution would be to just number the site.

And this is how it was done in the days of old. People uploaded
hello-1, hello-2, hello-3 and so forth, and in hello-1 they
linked to an image under hello-2. When visitors of hello-1 saw
that the image loaded, they knew that there was a new version.

When more and more people adopted that, Freenet added core support:
USKs, the updatable subspace keys.

Freenet anonymity: Best case and Worst case

As the i2p people say, anynomity is no boolean. Freenet allows you to take it a good deal further than i2p or tor, though. If you do it right.

Worst case: If all of Apple would want to find you, because you declared that you would post the videos of the new iDing - and already sent them your videos as teaser before starting to upload them from an Apple computer (and that just after they lost their beloved dictator), you might be in problems if you use Opennet. You are about as safe as with tor or i2p.

Best case: If a local politician would want to find you, after you uploaded proof that he takes bribes, and you compressed these files along with some garbage data and used Freenet in Darknet-mode with connections only to friends who would rather die than let someone take over their computer, there’s no way in hell, you’d get found due to freenet (the file data could betray you, or they could find you by other means, but Freenet won’t be your weak spot).

Naturally real life is somewhere in-between.

Things which improve anonymity a lot in the best case:

Don’t let others know the data you are going to upload before the upload finished (would allow some attacks).

Use only Darknet with trusted friends (Darknet means that you connect only to people you know personally. For that it is necessary to know other people who use Freenet).

Upload small files so the time in which you are actively uploading is short.

Implied are:

Use an OS without trojans. So no Windows. (Note: Linux can be hacked, too, but it is far less likely to already have been compromised)

Use no Apple devices. You don’t control them yourself and can’t know what they have under the hood. (You are compromised from the time you buy them)

If you use Android, flash it yourself to give it an OS you control (Freenet is not yet available for Android. That would be a huge task).

Know your friends.

Important questions to ask:

Who would want to find you?

How much would they invest to find you?

Do they already try to monitor Freenet? (in that case uploading files with known content would be dangerous)

Do they already know you personally? If yes and if they might have already compromised your computer or internet connection, you can’t upload anything anonymously anywhere. In that case, never let stuff get onto your computer in the first place. Let someone else upload it, who is not monitored (yet).

Can they eavesdrop on your internet connection? Then they might guess that you use Freenet from the amount of encrypted communication you do and might want to bug your computer just in case you want to use freenet against them some day.

Watch Edward Snowden reveal DickPic, the latest, most massive surveillance program from the NSA:

Thanks to John Oliver for one of the most awesome acts of journalism I’ve seen!

FAQ

Anonymous@lFG3mGbGf0b8nE6j8RC0i5ZgWEhsQXDG3ghkYIa-1wQ wrote :
I thought Freenet wasn’t able to protect against the NSA?

The link “Connect to your friends” shows how to connect via darknet and communicate via darknet N2N messages (node-to-node messages). From my understanding, these are currently one of the most secure communication methods we can get, because they hide our personal communication beneath Freenet traffic.

They aren’t suited to communicating anonymously (because we can only talk with our friends), but they are well suited to communicating confidentially.

PS: The image is licensed under GPL, copyright: the freenet team (for the rabbit) and Arne Babenhauserheide. It uses the source images Zuchineee (thanks to Arthurcravan prrrr!) and National Security Agency from the public domain. See the sources below. … but you know what: Just share it anyway you like. I’m sure the author of the rabbit agrees, and I for sure do ☺

PPS: Yes, I had lots of fun creating this ;-)

PPPS: For some reason, the image disappeared from my server. I did not take it down. Yes, that worries me. What you see above is served from an in-proxy into Freenet. Should that go down, too, you can still use Freenet to access the image or setup your own in-proxy to allow others to see it.

Freenet: The forgotten cryptopunk paradise

I planned to get this into a newspaper, but it was too technical for the Guardian and too non-practical for Linux Voice. Then my free time ran out. Today I saw Barret Brown comment his 5 years sentence for quoting a Fox news commentator and sharing a public link. I knew it was time to publish. Welcome to Freenet: The forgotten cryptopunk paradise!

A long time ago in a chatroom far away, select groups of crypto-anarchists gathered to discuss the death of privacy since the NSA could spy on all communications with ease. Among those who proposed technical solutions was a student going by the name sanity, and he published the widely regarded first paper on Freenet: A decentralized anonymous datastore which was meant to be a cryptopunk paradise: true censorship resistance, no central authority and long lifetime only for information which people were actually interested in.

Many years passed, two towers fell, the empire expanded its hunt for rebels all over the globe, and now, as the empire’s grip has become so horrid that even the most loyal servants of the emperors turn against them and expose their dark secrets to the masses, Freenet is still moving forward. Lost to the eye of the public, it shaped and reshaped itself - all the while maintaining its focus to provide true freedom of the press in the internet.

Table of Contents

A new old hope

Once only a way to anonymously publish one-shot websites anonymously into Freenet that other members of the group could see, Freenet now provides its users with most services found in the normal internet, yet safe from the prying eyes of the empire. Its users communicate with each other using email which hides metadata, micro-blogging with real anonymity, forums on a wide number of topics - from politics to drug-experiences - and websites with update-notifications (howto) whose topics span from music and anime over religion and programming to life without a state and the deepest pits of depravity.

All these possibilities emerge from its decentralized datastore and the tools built on top of a practically immutable data structure, and all its goals emerge from providing real freedom of the press. Decentralization is required to avoid providing a central place for censorship. Anonymity is needed to protect people against censorship by threat of subsequent punishment, prominently used in China where it is only illegal to write something against the state if too many people should happen to read it. Private communication is needed to allow whistleblowers to contact journalists and also to discuss articles before publication, invisible access to information makes it hard to censor articles by making everyone a suspect who reads one of those articles, as practiced by the NSA which puts everyone on the watchlist who accesses freenetproject.org (reported by german public TV program Panorama). And all this has to be convenient enough that journalists to actually use it during their quite stressful daily work. As side effect it provides true online freedom, because if something is safe enough for a whistleblower, it is likely safe enough for most other communication, too.

These goals pushed Freenet development into areas which other groups only touched much later - or not at all. And except for convenience, which is much harder to get right in a privacy-sensitive context than it seems, Freenet nowadays manages to fulfill these goals very well.

The empire strikes the web

The cloud was “invented” and found to be unsafe, yet Freenet already provided its users with a safe cloud. Email was found to spill all your secrets, while Freenet already provided its users with privacy preserving emails. Disaster control became all the rage after hurricane Katrina and researchers scrambled to find solutions for communicating on restricted routes, and Freenet already provided a globally connectable darknet on friend-to-friend connections. Blogs drowned in spam comments and most caved in and switched to centralized commenting solutions, which made the fabled blogosphere into little more than a PR outlet for Facebook, but Freenet already provided spam resistance via an actually working web of trust - after seeing the non-spam-resistant forum system Frost burn when some trolls realized that true anonymity also means complete freedom to use spam-bots. Censorship and total surveillance of user behavior on Facebook were exposed, G+ required users to use their real names and Twitter got blocked in many repressive regimes, whereas Freenet already provided hackers with convenient, decentralized, anonymous micro-blogging. Now websites are cracked by the minute and constant attacks made it a chore for private webmasters simply to stay available, though Freenet already offers attack-resistant hosting which stays online as long as people are interested in the content.

All these developments happened in a private microcosmos, where new and strange ideas could form and hatch, an incubator where reality could be rethought and rewritten to reestablish privacy in the internet. The internet was hit hard, and Freenet evolved to provide a refuge for those who could use it.

The return of privacy

What started as the idea of a student was driven forward by about a dozen free-time coders and one paid developer for more than a decade - funded by donations from countless individuals - and turned into a true forgotten cryptopunk paradise: actual working solutions to seemingly impossible problems, highly detailed documentation streams in a vast nothingness to be explored only by the initiated (where RTFS is a common answer: Read The Friendly Source), all this with plans and discussions about saving the world mixed in.

The practical capabilities of Freenet should be known to every cryptopunk - but a combination of mediocre user experience, bad communication and worse PR (and maybe something more sinister, if Poul-Henning Kamp should prove to be farsighted about project Orchestra) brought us to a world where a new, fancy, half finished, partially thought through, cash-cow searching project comes around and instead of being asked “how’s that different from Freenet?”, the next time I talk to a random crypto-loving stranger about Freenet I am asked “how is Freenet different from X which just made the news?” (the answer which fits every single time is: “Even if X should work, it would provide only half of Freenet, and none of the really important features - friend-to-friend darknet, access dependent content lifetime, decentralized spam resistance, stable pseudonyms, hosting without a server”).

Now, after many years of work have culminated in a big step forward, it is time for Freenet to re-emerge from hiding and take its place as one of the few privacy tools actually proven to work - and as the single tool with the most ambitious goal: Reestablishing freedom of the press and freedom of speech in the internet.

Join in

If you do not have the time for large scale contribution, a good way to support freenet is to run and use it - and ask your friends to join in, ideally over darknet.

More information about the movement which spawned Freenet can be found in Wikipedia under Cypherpunk, which would have made a more correct title for this text, but did not rime with "forgotten paradise".

If you can program, there are lots of low hanging fruit: small tasks which allow reaping the fruits of existing solutions to hard problems. For example my recent work on freenet includes 4 hours of hacking the Python-based site uploader in pyFreenet which sped up the load time of its sites by up to a factor of 4. If you want to join, come to #freenet @ freenode to chat, discuss with us in the freenet devl mailing list and check the github-project.

I hereby release this article under the CC attribution License: You can use the text however you like as long as you name me (Arne Babenhauserheide) and link here ( draketo.de/english/freenet/forgotten-cryptopunk-paradise or draketo.de/node/656 ).

A huge thank you goes to Lacrocivious who helped me improve this text a lot! A second thank you goes to the other Freenet users with whom I discussed the article via Darknet-messages, when we were still thinking about submitting it to Wired and therefore needed to keep it confidential.

I now have a spam-resistant, decentralized comment system via Freenet

In the last years, spam became worse and worse. The more my site grew, the more time I had to spend deleting blatant advertisements. Even captchas did not help anymore: Either they were so hard that I myself needed 3 tries on average to get through, or I got hundreds of spam messages per day. A few years ago, I caved in and disabled comments. The alternative would have been to turn my Website into a mere PR-outlet of Facebook, twitter or one of the commenting platforms out there.

But this all changed now. I finally have decentralized, spam-resistant comments using babcom with Freenet as backend!

The comment-system builds on the decentral, spam-resistant social features of the Freenet Project, one of the old cypherpunk creations which started in 2000 with the goal to provide true Freedom of the Press in the Internet and has been evolving ever since. It’s an irony that nowadays spam became a vehicle to push people into censorship-enabling platforms, to use up their limited free time or to drown their words in a pile of dung, so other people cannot find them.

If you do not run Freenet now, this screenshot shows how the comment-system looks for me:

And for me this is a huge relief: I can finally get comments to my articles again without having to sell my conscience or waste most of my time deleting advertisements.

If that sounds interesting, head over to babcom and check if it suits you!

PythonI test on Python 2.5.4 and 2.6.1. Any 2.5.x or later version should work. Earlier versions may work.

You probably won't have to worry about installing Python. It's included in the Windows binary Mercurial distributions and most *nix flavor OS's should have a reasonably up to date version of Python installed.

FMSInstallation of the Freenet Messaging System (FMS) is optional buthighly recommended. The hg fn-fmsread and hg fn-fmsnotify commands won't work without FMS. Without fn-fmsread it is extremely difficult to reliably detect repository updates.

If you run your Freenet node on another machine or on a non-standard port you'll need to use the --fcphost and/or --fcpport parameters to set the FCP host and port respectively.

By default fn-setup will write the configuration file for the extension (.infocalype on *nix, infocalypse.ini on Windows) into your home directory and also create a temp directory called infocalypse_tmp there.

You can change the location of the temp directory by using the --tmpdir argument.

If you want to put the config file in a different location set the cfg_file option in the [infocalypse] section of your .hgrc/mercurial.ini file before running fn-setup.

An Infocalypse repository is just a collection of hg bundle files which have been inserted into Freenet as CHKs and some metadata describing how to pull the bundles to reconstruct the repository that they represent. When you 'push' to an infocalypse repository a new bundleCHK is inserted with the changes since the last update. When you 'pull', only the CHKs for bundles for changesets not already in the local repository need to be fetched.

The latest version of the repository's metadata is stored on a Freenet Updateable Subspace Key (USK) as a small binary file.

You'll notice that repository USKs end with a number without a trailing '/'. This is an important distinction. A repository USK is not a freesite. If you try to view one with fproxy you'll just get a 'Potentially Dangerous Content' warning. This is harmless, and ugly but unavoidable at the current time because of limitation in fproxy/FCP.

Pushes incremental changes from the local directory into an existing Infocalypse repository.

The <keypart>/test.R1/0 repository must already exist in Freenet.In the example above the default private key is used. You could have specified a full Insert URI. The URI must end in a number but the value doesn't matter because fn-push searches for the latest unused index.

There's a trust map in the .infocalypse/infocalypse.ini config file which determines which fms ids can update the index values for which repositories. It is purely local and completely separate from the trust values which appear in the FMS web of trust.

Will read update notifications for all the repos in the trust map and locally cache the new latest index values. If you run with -vit prints a message when updates are available which weren't used because the sender(s) weren't in the trust map.

will re-insert the bundles for the repository that was last pulled into the directory.

The exact behavior is determined by the level argument.

level:

1 - re-inserts the top key(s)

2 - re-inserts the top keys(s), graphs(s) and the most recent update.

3 - re-inserts the top keys(s), graphs(s) and all keys required to bootstrap the repo.

This is the default level.

4 - adds redundancy for big (>7Mb) updates.

5 - re-inserts existing redundant big updates.

Levels 1 and 4 require that you have the privatekey for the repository. For other levels, the top key insert is skipped if you don't have the private key.

DO NOT use fn-reinsert if you're concerned aboutcorrelation attacks. The risk is on the order of re-inserting a freesite, but may be worse if you use redundant(i.e. USK@<line noise>/name.R1/0) top keys.

If you decide to share keys, you should generate a special key on a per repo basis with fn-genkey. There is no way to revoke a private key once it has been shared. This could be mitigated with an ad-hoc convention. e.g. if I find any file named USK@<public_key>/revoked.txt, I stop using the key.

Non-atomic top key inserts

Occasionally, you might end up overwriting someone elses commits because the FCP insert of the repo top key isn't atomic. I think you should be able to merge and re fn-push to resolve this. You can fn-pull a specific version of the repo by specify the full URI including the version number with --uri and including the --nosearch option.

inserts a freesite based on the configuration inthe freesite.cfg file in the root of the repository.

Use:

hg fn-putsite --createconfig

to create a basic freesite.cfg file that you can modify. Look at the comments in it for an explanation of the supported parameters.

The default freesite.cfg file inserts using the same private key as the repo and a site name of 'default'. Editing the name is highly recommended.

You can use --key CHK@ to insert a test version of the site to a CHK key before writing to the USK.

Limitations:

You MUST have fn-pushed the repo at least once in order to insert using the repo's private key. If you haven't fn-push'd you'll see this error: "You don't have the insert URI for this repo. Supply a private key with --key or fn-push the repo."

Inserts all files in the site_dir directory in the freesite.cfg file. Run with --dryrun to make
sure that you aren't going to insert stuff you don't want too.

You must manually specify the USK edition you want to insert on. You will get a collision error
if you specify an index that was already inserted.

Don't use this for big sites. It should be fine for notes on your project. If you have lots of images
or big binary files use a tool like jSite instead.

I don't believe that using this extension is significantly more dangerous that using any other piece of Freenet client code, but here is a list of the risks which come to mind:

Freenet is beta softwareThe authors of Freenet don't pretend to guarantee that it is free of bugs that could that could compromise your anonymity or worse.

While written in Java, Freenet loads native code via JNI (FEC codecs, bigint stuff, wrapper, etc.) that makes it vulnerable to the same kinds of attacks as any other C/C++ code.

FMS == anonymous softwareFMS is published anonymously on Freenet and it is written in C++ with dependencies on large libraries which could contain security defects.

I personally build FMS from source and run it in a chroot jail.

Somedude, the author of FMS, seems like a reputable guy and has conducted himself as such for more than a year.

correlation attacksThere is a concern that any system which inserts keys that can be predicted ahead of time could allow an attacker with control over many nodes in the network to eventually find the IP of your node.

Any system which has this property is vulnerable. e.g. fproxy Freesite insertion,Freetalk, FMS, FLIP. This extension's optional use ofredundant top keys may make it particularly vulnerable. If you are concerned don't use '.R1' keys.

You only need to insert/retrieve what has actually changed. Changes of up to 32kof compressed deltas can be fetched in as little as one SSK fetch and one CHK fetch.

Redundant

The top level SSK and the CHK with the representation of the repository state are inserted redundantly so there are no 'critical path' keys. Updates of up to ~= 7Mbare inserted redundantly by cloning the splitfile metadata at the cost of a single32k CHK insert.

Re-insertable

Anyone can re-insert all repository data except for the top level SSKs with a simple command (hg fn-reinsert). The repository owner can re-insert the top levelSSKs as well.

Automatic rollups

Older changes are automatically 'rolled up' into large splitfiles, such that the entire repository can almost always be fetched in 4 CHK fetches or less.

This gives you code hosting like a minimal version of BitBucket, Gitorious or GitHub but without the central control. Additionally the Sone plugin for freenet supplies anonymous communication and the site extension allows creating static sites with information about the repo, recent commits and such without the need of a dedicated hoster.

Basic Usage

Clone a repo into freenet with a new key:

hg clone localrepo USK@/repo

(Write down the insert key and request key after the upload! Localrepo is an existing Mercurial repository)

For convenient copy-pasting of freenet keys, you can omit the “freenet://” here, or use freenet:USK@… instead.

Also, as shown in the first example, you can let infocalypse generate a new key for your repo:

hg clone localrepo USK@/repo

mind the “USK@/” (slash after @ == missing key). Also see the missing .R1/0 after the repo-name and the missing freenet://. Being able to omit those on repository creation is just a convenience feature - but one which helps me a lot.

Contribute

This script is just a quick sketch, feel free to improve it and upload improved versions (for example with support for more GNU/Linux distros). If you experience any problems, please contact me! (i.e. write a comment)

If you want to contribute more efficiently to this script, get the repo via

On systems based on Debian or Gentoo - including Ubuntu and many others - this script will install all needed software except for freenet itself. You will have to give your sudo password in the process. Since the script is just a text file with a set of commands, you can simply read it to make sure that it won’t do anything evil with those sudo rights. ↩

Let us talk over Freenet, so I can speak freely again

I sent this email to many of my friends to regain confidential private communication. If you want to do the same, feel free to reuse the text-version (be sure to replace the noderef textblock with your own noderef from http://127.0.0.1:8888/friends/myref.txt).

About 10% of my friends joined - which is enough to build the darknet and makes it possible for me to speak freely again.

First: The Essence of this text:

I’ve been censoring my emails for years. Not just what I write, but also whom and when.

Freenet allows me to write invisible messages to my friends. Those are messages I do not need to censor. They give me freedom. Surveillance can show that we could write, but not whether, when or what we actually write. If Freenet is used for that, it needs very little resources.

As soon as I add, too, that we are connected. We can then write messages via the friends page (click my name):

Write message,: http://127.0.0.1:8888/friends/

Read messages: http://127.0.0.1:8888/alerts/

Hi,

I’ve been self-censoring what I write by email for years. But over the
past year, with ever more details of surveillance being proven as fact
and not just conspiracy theory, that became more serious: I no longer
see email as safe, and with that, email is lost for me as a medium for
personal communication. If I want to talk privately, I don’t use
email.

You might have noticed that since then I’ve been writing fewer and
fewer non-public emails.

This started impeding my life, when the critical law reporter at
groklaw stopped publishing, because the owner did not consider sending
information via email as safe anymore. Now I self-censor what I write,
to whom I write, and when I write.

But I have one haven left: Instead of writing private stuff by email,
I’m communicating more and more via Freenet, especially with darknet
contacts: People I know personally. And I’d like to do that with you,
too. The reason is that Freenet Darknet messages hide even the
information that we have a conversation at all:

I can finally send completely invisible messages.

This gives me the confidentiality back which allows talking
freely. Talking without self-censoring every word I write.

And I would like to have that freedom when talking to you online. So I
would be very happy if you’d install Freenet and connect to me over
Darknet.

Connect with me

There simply paste the following into the empty text field below the
blurp of explanation (note: for this article I replaced the identifying info with X-es. Use your own from http://127.0.0.1:8888/friends/myref.txt):

Once I copy that text into my own addfriends page, our computers will
connect over Freenet.

(no need to babysit Freenet for this: simply let it run when you’re
online and as soon as I add you, our Computers will connect over
Freenet. Please give me a few days: With the PhD and the two little
ones I’m often no longer able to answer email daily, but I see them)

And that’s it. We’re connected. In the rest of this mail, I’ll
describe what you can do with Freenet.

Welcome to Freenet, where no one can watch you read!

I hope we will connect soon!

Best wishes,
Arne

Using Freenet

Talk with me over Freenet

Once we are connected, you can send me confidential messages by going
on the Friends page and clicking my name.

That page lists all the people you are connected to. You can also tick
the checkbox for multiple people and then use the drop down list “–
Select Action –” and select “Send N2NTM to selected peers”. A N2NTM
is a “node to node text message”.

Send me files over Freenet

When they finish uploading, just go to the list of Uploads, select the
files you want to share with me and click the button “Recommend files
to friends”. Then select my name and click the “Recommend” button at
the bottom.

Websites in Freenet

Websites in Freenet are also simple. To get a basic website, just
install the ShareWiki plugin, enter text, click publish and once the
upload finished, send the URL to your friends by clicking “share” in
the list of uploads. With this you can publish in Freenet: Your
friends will know that it’s your site, but no one else.

Configure Plugins: http://127.0.0.1:8888/plugins/
The key for sharewiki to add as “Plugin from Freenet”:
CHK@aCQTjPQI3uGsahMiTuddwJ51UJypA5Mqg4y0tf1VqXQ,eEkO3uge6IJ1QcrT5KGlJ1R6kEcMhQV4rXfv6NzoL5o,AAMC–8/ShareWiki-b17.jar

(note: ignore the search box on the main page. It’s broken)

Anonymous Discussions

Anonymous Discussions are somewhat different from the other features,
because they require the Web of Trust, and that is very
heavyweight.

If you want to keep the resource consumption of Freenet low, avoid the
anonymous discussion platforms.

You will see people recommend it - even me. It is cool, but you should
only enable it, if you have a computer which always runs and for
which it does not matter when it runs at high load.

If you only want confidential communication with Friends, just avoid
the Web of Trust for now. If you stick to the basic features (darknet
messages, uploads, downloads bookmarks), Freenet will require few
resources and little bandwidth.

For a low-spec computer or a laptop, avoid the Web of Trust and
anonymous discussions: They are really cool, but still require lots
of resources.

If you value truly anonymous discussions higher than keeping the load
on your computer low, or if you have a computer which is always
running, have a look at the Freenet Social Networking guide. It shows
you how to setup and use the social features of Freenet.

Run pure Darknet (disable Opennet) or reduce your bandwidth
limits. See the security levels ( http://127.0.0.1:8888/seclevels/
) and the upload and download bandwidth limit in the core settings (
http://127.0.0.1:8888/config/node - note that the unit is Bytes per
second: give at least 10000, otherwise Freenet will not work).

Naturally it would be better to send the freenet addfriend
text via encrypted emails with a full chain of trusted
signatures. But for the basic goal of confidential
communication that is not necessary. We can check sometime
later, whether the text we exchanged was changed, so if
someone wants to eavesdrop, we can detect that. And we would
have proof, which would make for the next great story for
political magazines like Panorama - which would help a lot at
fighting surveillance on the long term (so it’s unlikely that
people who want surveillance will dare to do that). Example:
“NSA targets privacy conscious” in german public media documentation. ↩

Lots of site uploads into freenet

I just finished lots of new uploads of sites into freenet - with the new freesitemgr (which actually uploads quickly when WoT is disabled, check todays IRC-logs tomorrow to get background on that). You can get the new freesitemgr from github.com/ArneBab/lib-pyfreenet-staging or via infocalypse:

freenet-funding - the freenet fundraising plan, still lacking good design and crisp presentation slides or a video

freenet-meltdown - on the recent massive performance degradation which lasted a few month and ended with the link length fix.

fix-link-length - background on the link-length fix which made freenet actually do small world routing again instead of random routing (into which it had degraded, partially due to local requests, partially due to having so many peers per node that random routing actually worked for the current network size, so the pressure by routing-success to go back to small world routing was too weak compared to the pressure from local requests to randomize the connections)

download-web-site - how to download a single page from a website - for example to mirror it into freenet. Hint: For all the sites on draketo.de or 1w6.org you are allowed to do so freely (licensed under GPL).

guiledocs - the online documentation for GNU Guile with a focus on Scheme (using Guile): A powerful lisp-like language with multiple implementations.

decorrespondent-metadata - experiment how much information one can glean about your life from just one week of metadata, in dutch.

tao of programming - "When you have learned to snatch the error code from the trap frame, it will be time for you to leave."

On the 2014 freenet-meltdown

Update (2014-09-06): The meltdown is stopped and reversed. We implemented the link length fix and this solved an issue with the network structure we had for the last few years. We’re currently watching anxiously, whether the performance only comes back to the performance before the backdown or whether the lifetime actually gets much better. Watch the fetch-pull stats!

^ inserted one day ago: You see the meltdown starting in april and the improvement with the latest version: It’s back on at least the level before the meltdown.

^ 4 weeks ago inserted, 2 weeks ago accessed. If this goes above 0.6 starting 2014-09-19, the improvement could prove to be huge: It could give us much longer lifetimes of data in freenet.

Update (2014-07-23): The fetch-pull graphs look like we have oscillating here. This could mean that this is NOT an attack, but rather the direct effect of the KittyPorn patches: First the good connections get broken. This ruins the network. Then they can’t get any worse and the network recovers. Then they break again. This is still speculative. For an up to date plot, see fetchplots1.

Update (2014-05-22): The performance stats are much better again and the link-length distribution recovered. We might have been hit by an attack after all (which failed to take down freenet, but hurt pretty much). With some luck we’ll soon see a paper published with the evaluation of the attack and ways to mitigate it cleanly. (hint to computer scientists: we link to papers from the freenetproject.org website, so if you want to get a small boost for your chances of citation, send the link to your paper to devl@freenetproject.org)

Summary: There is a freenet patch floating around which claims to increase performance. The reality is (to our current-knowledge), that it breaks the network as soon as more than a few percent run it. And this is the case, which is why the network is almost completely broken right now. If you run that patch, please get rid of it!

Freenet is currently experiencing a meltdown, with extremely slow downloads, high connection churn and lifetimes for bigger files down to about a day. For a visualization, see the fetch-performance in the following graph and take note of the drop at the end. It nicely shows how a bad patch spread while more and more users installed it (hoping for better performance) and slowly killed the network. When that line goes below 50%, bigger files are dead approximately one day after being uploaded.

We suspect that patch, because the number of nodes reporting 100 or more connections in the anonymised probe-stats increased a lot over the past few weeks (this is only possible with a patched freenet) and the link-length-distribution almost completely lost a bump it had at 0.004, suggesting that freenet essentially reverted to random routing, while the number of nodes did not change significantly.

(thanks for these stats goes to probe stats which operhiem1 implemented in Google Summer of Code 2012)

We are working on creating a clean solution.

Freesites still work, because the SSK-queue did not get hammered, so if you are a freesite author, please inform your readers about the patch and ask them to get rid of it!

In case you use freenet and want information on that patch, please read the note from TheSeeker:

Information from TheSeeker in IRC (#freenet @ freenode)

Recently Kittyporn released an autopatcher-script: CHK@r6dUGAYs2No4lWT3DTkY2dIYdgA-eoBAcU~U-kLU-0I,hxGN5OTN4j~04YnCS4UTflMK4fpW2hfhB58CU1KNRAw,AAMC--8/FNAutoPatch-1.0_by_Kittyporn.rar

This increased usage of the patch by probably several hundred nodes, judging by the partial logs from the webserver that we have for fetches of the source tarball.

The script stupidly pulls the freenet source from freenetproject.org rather than say, github, or freenet. Really bad for anonymity, but good for tracking.

logs only go back a couple weeks, which is why they are incomplete, and we don't know the real number of people that have run it.
hard to tell how much less the people that are cheating feel the effects of the whole network collapsing around them. surely can't be long before they too start complaining about speeds given the data retention issues it's causing.

NLM was supposed to fix all this shit. :|

modified nodes are flooding the network, creating broad backoff issues. this makes routing suffer, and avg path lengths increase, which reduces overall availability of bandwidth and more backoff and more misrouting.
Death spiral until we hit some equilibrium that is roughly equal to random routing.

essentially what the broken NLM did. thankfully, it is only routing for bulk chk, so it'll still be possible to do some things if forced through the realtime queue...
e.g. if we want to deploy an update, and have the constituent blocks actually get routed anywhere near the correct destination...

Additional comment

To do the math:
a few hundred users easily equals 10% of the network. No wonder we have a meltdown.

and even worse, these few hundred users are likely the high-bandwidth folks with a huge number of connections.

Let’s assume that they each have 40 connections while the others have ~10. Every node connected to such an abusive node will essentially be blocked. That’s 100% of the nodes…

Freenet Setup

Then activate the Web of Trust plugin and the Freemail plugin. As soon as your Freenet is running, you’ll find the Web of Trust and Freemail plugins on the Plugins-Page(this link will work once you have a running Freenet. If you want to run Freenet on another computer, you can make it accessible to your main machine via ssh port forwarding: ssh -NL 8888:localhost:8888 -L 9481:localhost:9481 <host>).

Privacy Protections

Infocalypse takes your privacy seriously. When you clone a repository
from freenet, your username for that repository is automatically set
to “anonymous” and when you commit, the timezone is faked as UTC to
avoid leaking your home country.

If you want to add more security to your commits, consider also using
a fake time-of-day:

Open path/to/repo-from-freenet/.hg/hgrc to set this permanently via
an alias (just adapt the alias for rewriting the commit-date to UTC -
these are already in the hgrc file if you cloned from Freenet).

Background Information

Let’s look at a few interesting steps in the example to highlight the strengths of Infocalypse, and provide an outlook with steps we already took to prepare Infocalypse for future development.

Efficient storage in Freenet

hg clone life-repo freenet://ArneBab/life-repo

Here we clone the local repository into Freenet. Infocalypse looks up the private key from the identity ArneBab. Then it creates two repositories in Freenet: <private key>/life-repo.R1/0 and <private key>/life-repo.R0/0. The URLS only differ in the R1 / R0: They both contain the same pointers to the actual data, and if one becomes inaccessible, the chances are good that the other still exists. Doubling them reduces the chance that they fall out and become inaccessible, which is crucial because they are the only part of your repository which does not have 100% redundancy. Also these pointers are the only part of the repository which only you can insert. As long as they stay available, others can reinsert the actual data to keep your repository accessible.

To make that easy, you can run the command hg fn-reinsert in a cloned repository. It provides 5 levels:

1 - re-inserts the top key(s)

2 - re-inserts the top keys(s), graphs(s) and the most recent update.

3 - re-inserts the top keys(s), graphs(s) and all keys required to bootstrap the repo (default).

4 - adds redundancy for big (>7Mb) updates.

5 - re-inserts existing redundant big updates.

To reinsert everything you can insert, just run a tiny bash-loop:

for i in {1..5}; do hg fn-reinsert --level $i; done

Let’s get to that “actual data”. When uploading your data into Freenet, Infocalypse creates a bundle with all your changes and uploads it as a single file with a content-dependent key (a CHK). Others who know which data is in that bundle can always recreate it exactly from the repository.

When someone else uploads additional changes into Freenet, Infocalypse calculates the bundle for only the additional changes. This happens when you push:

hg push freenet://ArneBab/life-repo

To clone a repository, Infocalypse first downloads the file with pointers to the data, then downloads the bundles it needs (it walks the graph of available bundles and only gets the ones it needs) and reassembles the whole history by pulling it from the downloaded bundles.

hg clone freenet://ArneBab/life-repo real-life

By reusing the old bundles and only inserting the new data, Infocalypse minimizes the amount of data it has to transfer in and out of Freenet, and more importantly: Many repositories can share the same bundles, which provides automatic deduplication of content in Freenet. When you take into account that in Freenet often accessed content is faster and more reliable than seldomly accessed content, this gives Infocalypse a high degree of robustness and uses the capabilities of Freenet in an optimal way.

If you want to go into Infocalypse-specific commands, you can also clone a repository directly to your own keyspace without having to insert any actual data yourself:

This works by sending a Freemail to the owner of that repository which contains a YAML-encoded footer with the data about the repository to use.

You have to trust the owner of the other repository to send the pull-request, and the owner of the other repository has to trust you to receive the message. If the other does not trust you when you send the pull-request, you can change this by introducing your Pseudonym in the Web of Trust plugin (this means solving CAPTCHAs).

Convenience

To make key management easier, you can add the following into path/to/repo/.hg/hgrc

Now pull and push will by default go to freenet://ArneBab/life-repo and you can pull from the other repo via hg pull real-life.

Your keys are managed by the Web of Trust plugin in Freenet, so you can use the same freenet-uri for push and pull, and you can share the paths without having to take care that you don’t spill your private key.

DVCS WebUI

When looking for repositories with the command line interface, you are reliant on finding the addresses of repositories somewhere else. To ease that, Steve also implemented the DVCS WebUI for Freenet during his GSoC project. It provides a web interface via a Freenet plugin. In addition to providing a more colorful user interface, it could add 24/7 monitoring, walking remote repositories and pre-fetching of relevant data to minimize delays in the command line interface. It is still in rudimentary stages, though.

All the heavy lifting is done within the Infocalypse Mercurial plugin: Instead of implementing DVCS parsing itself, The DVCS WebUI asks you to connect Infocalypse so it can defer processing to that:

hg fn-connect

The longterm goal of the DVCS WebUI is to use provide a full-featured web interface for repository exploration. The current version provides the communication with the Mercurial plugin and lists the paths of locally known repositories.

Gitocalypse

If you prefer working with git, you can use gitocalypse written by SeekingFor to seamlessly use Infocalypse repositories as git remotes. Gitocalypse is available from https://github.com/SeekingFor/gitocalypse

Troubleshooting

When I'm running "hg fn-setup" I get the error "abort: No module named fcp.node"Do you have pyFreenet installed? Also ensure that you installed it for python 2.wget bootstrap.pypa.io/ez_setup.py -O - | python2.7 - --usereasy_install --user Mercurial defusedxml PyYAML pyFreenet

Conclusion

Infocalypse provides hosting of repositories in Freenet with a level of convenience similar to GitHub or Bitbucket, but decentralized, anonymous and entirely built of Free Software.

You can leverage it to become independent from centralized hosting platforms for sharing your work and collaborating with other hackers.

This guide shows the convenient way of working which has a higher barrier of entry. It uses WoT Pseudonyms to allow you to insert repositories by Pseudonym and repository name. If you can cope with inserting by private key and sending pull-requests manually, you can use it without the WoT, too, which reduces the setup effort quite a bit. Just skip the setup of the Web of Trust and Freemail and plugins. You can then clone the life repo via hg clone freenet://USK@6~ZDYdvAgMoUfG6M5Kwi7SQqyS-gTcyFeaNN1Pf3FvY,OSOT4OEeg4xyYnwcGECZUX6~lnmYrZsz05Km7G7bvOQ,AQACAAE/life-repo.R1/4 life-repo. See hg fn-genkey and hg help infocalypse for details. ↩

Spread Freenet: A call for action on identi.ca and twitter

Reposted from Freetalk, the distributed pseudonymous forum in Freenet.

For all those among you, who use twitter1 and/or identi.ca2, this is a call to action.

Go to your identi.ca or twitter accounts and post about freenet. Tell us in 140 letters why freenet is your tool of choice, and remember to use the !freenet group (identi.ca) or #freenet hashtag (twitter), so we can resend your posts!

I use !freenet because we might soon need it as safe harbour to coordinate the fight against censorship → freenetproject.org !zensur — ArneBab

The broader story is the emerging concept of a right to freely exchange arbitrary data — Toad (the main freenet developer)

Background

There are still very many people out there who don’t know what freenet is. Just today a coder came into the #freenet IRC channel, asked what it did and learned that it already does everything he had thought about. And I still remember someone telling me “It would be cool if we had something like X-net from Cory Doctorow’s ‘Little Brother’” — he did not know that freenet already offers that with much improved security.

So we need to get the word out about freenet. And we have powerful word to choose from, beginning with Mike Godwin’s cite above but going much further. To just name a few buzz-words: Freenet is a crowdfunded distributed and censorship resistant freesoftware cloud publishing system. And different from info about corporate PR-powered projects, all these buzz words are true.

But to make us effective, we need to achieve critical mass. And to reach that, we need to coordinate and cross promote heavily.

Call to action

So I want to call to you to go to your identi.ca or twitter accounts and post about freenet. Tell us in 140 letters why freenet is your tool of choice, and remember to use the !freenet group or #freenet hashtag, so we can find and retweet your posts!

If you use identi.ca, join the !freenet group, so you get informed about new freenet-posts automatically.

We can make a difference, if we fight together.

And if you always wanted to get an identi.ca account, here’s the opportunity to get it and do something good at the same time :)

If you already have a twitter-account, you can connect your identi.ca account to your twitter account, then post to identi.ca and have your post forwarded to twitter automatically.

Additional info

But no need to tell me your account and connect your Freetalk ID with it. Just use identi.ca or twitter and remember to tell your friends to talk about freenet, too (so we can’t find out who read this post and who decided to join in because he learned about the action from a friend).

As second line of defense, I also posted this message to my website and hereby allow anyone to reuse it in any form and under any license (up to the example tweets), so I can’t know who saw it here and who saw it elsewhere.

Twitter is a service for sending small text messages to people who “follow” you (up to 140 letters), so it works like a newsticker of journalists. Sadly it is no free software, so you can’t trust them to keep your data or even just the service available. It’s distinctive features are hashtags (#blafoo) for marking and searching messages and retweeting for passing a message on towards people who read your messages. ↩

identi.ca is like twitter and offers the same features and a few more advanced ones, but as a decentral free software system where everyone can create his own server and connect it to others. When using identi.ca, you make yourself independent from any single provider and can even run the system yourself. And it is free to stay due to using the AGPL (v3 or later). ↩

USK and Date-Hints: Finding the newest version of a site in Freenet's immutable datastore

Freenet provides a global, anonymous datastore where you can upload sites which then work like normal websites. But different from websites, they have a version-number.

The reason for this is, that you can only upload to a given key once1. This data then gets stored in the network and is effectively immutable (much like immutable data structures in functional programming).

In this model conflicts can arise from uploads of different users and from uploads of different versions of the site.

Avoid conflicts between users

So what if Alice uploads the file gpl.txt, and then Mallory tries to upload it again before users get the upload from Alice?

To avoid these conflicts between users, you can upload to an address defined by a key-pair. That key-pair has two keys, a public and a privat one. The URL of the site is derived from the public key. Everyone who has this URL can access the site. The private one allows uploading new data to the site. Only the owner of the private key can upload files to the site. This is the SSK: The Signed Subspace Key. It defines a space in Freenet which only you can update.

An SSK looks like this: SSK@[key]/[sitename]/[path/to/file]

Avoid conflicts between versions

But now what if Alice wants to upload a new version of gpl.txt - say GPLv3?

To avoid conflicts between different versions, each new version gets a unique identifier. The reason for using version numbers and not some other identifier is historical: To update sites despite not being able to rewrite published data, freenet users started to version their sites by simply appending a number to the name and then adding small images for future versions. If these images showed up, the new version existed.2

Most sites in freenet had a section like this (the images might take a bit to load - they are downloaded from a freenet outproxy):

Note that this link will automatically get you to version 117 (or whatever version is the current one when you read this article), even though it has version 116 in its URL.

Internally the USK simply gets translated to an SSK in the form of SSK@[key]/[sitename]-[version]/[path/to/file]. You’ll surely recognize the scheme which is used here.

This is a prime example of demand-driven development: Users found a way to make sites dynamic with the activelink-hack. Then the Freenet developers added this as official feature. As nice side-effect, the activelink-images stayed with us as part of the Freenet Culture: Almost every site in freenet has a small logo with width and height 108x36 (pixels).

Date-Hints

USKs solved the problem of having updatable sites by checking some versions into the future. But they had a limitation: If your USK-Link was very old, freenet would have to check hundreds or even thousands of URLs to find the newest version. And this would naturally be very, very slow. Due to the distributed nature of Freenet, it is also not possible to just list all files under a given Key. You can only check for directories - the sitenames.

Also files in Freenet only stay available when people access them - but checking to see whether some file might still be accessible isn’t a defined problem: The data to that file could be on the computer of someone who is currently offline. When he or she comes online again, the file could suddenly be available, so determining whether a file does not exist isn’t actually possible.

A timeline of versions could look like this:

2009

2010

2011

2012

2013

2014

1,2,3

4,5

6

7,8,9,10,11,12,13,14

15

16,17,18

Now imagine that you find a link on a site which was added in 2010. It would for example link to version 4 of the site. If you access this site in 2014, freenet has to check versions 5,6,7,8...18 to find the most recent version. That requires 13 downloads - and for normal freesites the versions can be as high as 1200.

But remember that you can upload to arbitrary filenames. So what if the author of the site gave you a hint of the first version in 2014? With that, freenet would only have to start at version 16 - just 3 versions to check, and the hint.

Why the first? Remember that files cannot be overwritten, so the author cannot give you the most recent version in 2014.

And this is just what the freenet developers did: Date-Hints are simply files in freenet which contain the information about the most recent version of the site at some point in time.

The file found at this key is a simple plain text file with content like the following:

HINT
46
2013-7-5

The first line is the identifier, the second is the most recent version at the time of insert (the first version in the year) and the last is the date of the upload of that version.

A yearly date-hint speeds up getting the most recent version a lot. But since sites in freenet have hundreds of versions rather then tens, it is a bit too coarse. It can still leave you with 20 or 30 possible new versions. So it actually provides additional date hints on a monthly, weekly and daily basis:

SSK@[key]/[sitename]-DATEHINT-[year]

SSK@[key]/[sitename]-DATEHINT-[year]-WEEK-[week]

SSK@[key]/[sitename]-DATEHINT-[year]-[month]

SSK@[key]/[sitename]-DATEHINT-[year]-[month]-[day]

If you give freenet a USK-link, it starts on the order of 10 requests: 4 date hints with the current date and requests for versions following the version in the link. Normally it gets a result in under 10 seconds.

Conclusion

With USKs and Date-Hints Freenet implements updatable sites with acceptable performance in its anonymous datastore with effectively immutable data.

If you want to see it for yourself, come to freenetproject.org and install freenet. It’s free software and available for Windows, Linux and MacOSX.

If you try to upload to a given key twice, you can get collisions. In that case, it isn’t clear which data a client will retrieve - similar to race conditions in threaded programs. That’s why we do not write to the same key twice in practice (though there is a key-type which can be used for passwords or simple file-names. It is called KSK and was the first key-type freenet provided. That led to wars on overwriting files like gpl.txt - similar to the edit-wars we nowadays get on Wikipedia, but with real anonymity thrown in ☺). ↩

The images for new versions were called activelinks. They are the reason why most freenet sites had a small logo long before logo-linklists became en-vogue on Wordpress. ↩

and freereader is, too (even though it needs lots of polish): forward RSS feeds into freenet

you can actually exchange passwords in a safe way via freemail: anonymous email with an intergrated web-interface and imap access.

Justus and me coordinated the upload of the
social networking site onto my FTP solely over freemail, and I did not
have any fear of eavesdropping - different from any other mail I
write.

… I think I should store this conversation somewhere

which I hereby did - I hope you enjoyed this little insight into the #freenet channel :)

And if you grew interested, why not install freenet yourself? It only takes a few clicks via webstart and you’re part of the censorship-resistant web.

toad alias Matthew Toseland is the main developer of freenet. He tends to see more of the remaining challenges and fewer of the achievements than me - which is a pretty good trait for someone who builds a system to which we might have to entrust our basic right of free speech if the world goes on like this. From a PR perspective it is a pretty horrible trait, though, because he tends to forget to tell people what freenet can already do well :) ↩

Downloads only the ones you follow + ones you get told about. Telling others means that you need to include info about people you follow, because you only get information from them.

Telling others about replies, options

Include all replies to anyone which I see in my own Sone → size rises massively, since you include all replies of all people you follow in your own Sone.

Include all IDs from which you saw replies along with the people they replied to → needs to poll more IDs. Optionally forward that info for several hops → for efficient routing it needs knowledge about the full follower topology, which is a privacy risk.

Discovering replies from people you don’t know yet: Add a WoT info: replies. Updated only when you reply to someone you did not reply to before. Poll people’s reply lists based on their WoT rating. Keep a list of people who answered one of your posts and poll these more often. Maybe poll people instantly who solve one of your captchas (your general captcha queue) → new users can enter quickly. When you solve captchas in WoT, preferably solve those from people you follow.
→ four ways to discover a reply:

poll those you follow,

poll the people who posted the latest replies to you (your usual discussion-group),

poll those who solve one of your captchas (get new people in as fast as possible) and

poll the replies-info from everyone with the polling frequency based on their WoT rating.

You can find Sone in Freenet using the key USK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE/sone/38/ ↩

“regarding B.S. like SOPA, PIPA, … freenet seems like a good idea after all!”

“Some years ago, I had a look at freenet and wasn't really convinced, now I'm back - a lot has changed, it grew bigger and insanely fast (in freenet terms), like it a lot, maybe this time I'll keep it. Especially regarding B.S. like SOPA, PIPA and other internet-crippling movements, freenet seems like a good idea after all!”
— sparky in Sone

So, if you know freenet and it did not work out for you in the past, it might be time to give it another try: freenetproject.org

This quote just grabbed me, and sparky gave me permission to cite it.

Freenet: WoT, database error, recovery patch

I just had a database error in WoT (the Freenet generic Web of Trust plugin) and couldn’t access one of my identities anymore (plus I didn’t have a backup of its private keys though it told me to keep backups – talk about carelessness :) ).

I asked p0s on IRC and he helped me patch together a WoT which doesn’t access the context for editing the ID (and in turn misses some functionality). This allowed me to regain my IDs private key and with that redownload my ID from freenet.

Mercurial

With it you can save snapshots of your work on documents and go back to these at all times.

Also you can easily collaborate with other people and use Mercurial to easily merge your work.

Someone changes something in text file you also worked on? No problem. If you didn't work on the same line, you can simply let Mercurial do an automatic merge and your work will be joined. (If you worked on the same line you'll naturally have to select how you want to merge these two changes).

It doesn't need a network connection for normal operation, except when you want to push your changes over the internet or pull changes of others from the web, so its commands are very fast. The time to do a commit is barely noticeable which makes atomic commits easy to do.

And if you already know subversion, the switch to Mercurial will be mostly painless.

But its most important strength is not its speed. It is that Mercurial just works. No hassle with complicated setup. No arcane commands. Almost everything I ever wanted to do with it just worked out of the box, and that's a rare and precious feature today.

And to answer a common question:

“Once you have learned git well, what use is hg?” — Ross Bartlett in Why Mercurial?

A complete Mercurial branching strategy

This is a complete collaboration model for Mercurial. It shows you all the actions you may need to take, except for the basics already found in the guide Mercurial in workflows or the talk hg init science. Extensions allow optimizing the model for special needs like maintaining multiple releases1 and an explicit code review stage.

Summary

Any model to be used by people should consist of simple, consistent rules. Programming is complex enough without having to worry about elaborate branching directives. Therefore this model boils down to 3 simple rules:

(2) on stable you only do hotfixes, merges for release3 and tagging for release. Only maintainers4 touch stable.

(3) you can use arbitrary feature-branches5, as long as you don’t call them default or stable. They always start at default (since you do all the work on default).

Diagram

To visualize the structure, here’s a 3-tiered diagram. To the left are the actions of developers (commits and feature branches) and in the center the tasks for maintainers (release and hotfix). The users to the right just use the stable branch.6

An overview of the branching strategy. Click the image to get the emacsorg-modeditaa-source.

Mind the tag field which is now shown in changeset 0 and the branchname for changeset 1. This is the only release which will ever be on the default branch (because the stable branch only starts to exist after the first commit on it: The commit which adds the tag).

Hotfix

If a hotfix has to be applied to the release out of order, we just update to the stable branch, apply the hotfix and then merge the stable branch into default11. This gives us changesets 4 for the hotfix and 5 for the merge (2 and 3 are shown as reference).

Regular release

To do a regular release, we just merge the default branch into the stable branch and tag the merge. Then we merge stable back into default. This gives us changesets 6 to 812. The commit-message you use for the merge to stable will become the description for your tag, so you should choose a good description instead of “merge default into stable for release”. Userfriendly, simplified release notes would be a good choice.

Feature branches

Now we want to do some larger development, so we use a feature branch. The one feature-commit shown here (x) could be an arbitrary number of commits, and as long as you stay in your branch, the development of your colleagues will not disturb your own work. Once the feature is finished, we merge it into default. The feature branch gives us changesets 9 to 13 (with 10 being an example for an unrelated intermediate commit on default).

Note: Closing the feature branch hides that branch in the output of hg branches (except when using --closed) to make the repository state lean and simple while still keeping the feature branch information in history. It shows your collegues, that they no longer have to keep the feature in mind as soon as they merge the most recent changes from the default branch into their own feature branches.

Note: To make the final merge of your feature into default easier, you can regularly merge the default branch into the feature branch.

Note: We use feature branches to ensure that new clones start at a revision which other developers can directly use. With bookmarks you could get trapped on a feature-head which might not be merged to default for quite some time. For more reasons, see the bookmarks footnote.

The final action is a regular merge to stable to get into a state from which we could safely do a release. Since we already showed how to do that, we are finished here.

Extensions

If you have special needs, this model can easily be extended to fullfill your requirements. Useful extensions include:

multiple releases - if you need to provide maintenance for multiple releases side-by-side.

grafted micro-releases - if you need to segment the next big changes into smaller releases while leaving out some potentially risky changes.

explicit review - if you want to ensure that only reviewed changes can get into a release, while making it possible to leave out some already reviewed changes from the next releases. Review gets decoupled from releasing.

All these extensions are orthogonal, so you can use them together without getting side-effects.

Multiple maintained releases

To use the branching model with multiple simultaneously maintained releases, you only need to change the hotfix procedure: When applying a hotfix, you go back to the old release with hg update tagname, fix there, add a new tag for the fixed release and then update to the next release. There you merge the new fix-release and do the same for all other releases. If the most recent release is not the head of the stable branch, you also merge into stable. Then you merge the stable branch into default, as for a normal hotfix.14

With this merge-chain you don’t need special branches for releases, but all changesets are still clearly recorded. This simplification over git is a direct result of having real anonymous branching in Mercurial.

In the Diagram this just adds a merge path from the hotfix to the still maintained releases. Note that nothing changed in the workflow of programmers.

An overview of the branching strategy with maintained releases. Click the image to get the emacsorg-modeditaa-source.

Graft changes into micro-releases

If you need to test parts of the current development in small chunks, you can graft micro releases. In that case, just update to stable and merge the first revision from default, whose child you do not want, and graft later changes15.

An overview of the branching strategy with grafted micro-releases. Click the image to get the emacsorg-modeditaa-source.

Grafted micro-releases add another layer between development and releases. They can be necessary in cases where testing requires actually deploying a release, as for example in Freenet.

Explicit review branch

If you want to add a separate review stage, you can use a review branch1819 into which you only merge or graft reviewed changes. The review branch then acts as a staging area for all changes which might go into a release.

To use this extension of the branching model, just create a branch on default called review in which you merge or graft reviewed changes. The first time you do that, you update to the first commit whose children you do not want to include. Then create the review branch with hg branch review and use hg graft REV to pull in all changes you want to include.

On subsequent reviews, you just update to review with hg update nextrelease, merge the first revision which has a child you do not want with hg merge REV and graft additional later changes with hg graft REV as you would do it for micro-releases..

In both cases you create the release by merging the review branch into stable.

A special condition when using a review branch is that you always have to merge hotfixes into the review branch, too, because the review branch does not automatically contain all changes from the default branch.

In the Diagram this just adds the review branch between default and stable instead of the release merge. Also it adds the hotfix merge to the review branch.

An overview of the branching strategy with a review branch. Click the image to get the emacsorg-modeditaa-source.

Frequently Asked Questions (FAQ)

Where does QA (Quality Assurance) come in?

In the default flow when the users directly use the stable branch you do QA on the default branch before merging to stable. QA is a part of the maintainers job, there.

If your users want external QA, that QA is done for revisions on the stable branch. It is restricted to signing good revisions. Any changes have to be done on the default branch - except for hotfixes for previously signed releases. It is only a hotfix, if your users could already be running a broken version.

Simple Summary

We now have nice graphs, examples, potential extensions and so on. But since this strategy uses Mercurial instead of git, we don’t actually need all the graphics, descriptions and branch categories in the git version - or in this post.

Instead we can boil all of this down to 3 simple rules:

(1) you do all the work on default - except for hotfixes.

(2) on stable you only do hotfixes, merges for release and tagging for release. Only maintainers touch stable.

(3) you can use arbitrary feature-branches, as long as you don’t call them default or stable. They always start at default (since you do all the work on default).

They are the rules you already know from the starting summary. Keep them in mind and you’re good to go. And when you’re doing regular development, there is only one rule to remember:

You do all the work on default.

That’s it. Happy hacking!

if you need to maintain multiple very different releases simultanously, see ⁰ or 20 for adaptions ↩

default is the default branch. That’s the named branch you use when you don’t explicitely set a branch. Its alias is the empty string, so if no branch is shown in the log (hg log), you’re on the default branch. Thanks to John for asking!↩

If you want to release the changes from default in smaller chunks, you can also graft specific changes into a release preparation branch and merge that instead of directly merging default into stable. This can be useful to get real-life testing of the distinct parts. For details see the extension Graft changes into micro-releases. ↩

Maintainers are those who do releases, while they do a release. At any other time, they follow the same patterns as everyone else. If the release tasks seem a bit long, keep in mind that you only need them when you do the release. Their goal is to make regular development as easy as possible, so you can tell your non-releasing colleagues “just work on default and everything will be fine”. ↩

This model does not use bookmarks, because they don’t offer benefits which outweight the cost of introducing another concept: If you use bookmarks for differenciating lines of development, you have to define the canonical revision to clone by setting the @ bookmark. For local work and small features, bookmarks can be used quite well, though, and since this model does not define their use, it also does not limit it.
Additionally bookmarks could be useful for feature branches, if you use many of them (in that case reusing names is a real danger and not just a rare annoyance) or if you use release branches:
“What are people working on right now?” → hg bookmarks
“Which lines of development do we have in the project?” → hg branches↩

Those users who want external verification can restrict themselves to the tagged releases - potentially GPG signed by trusted 3rd-party reviewers. GPG signatures are treated like hotfixes: reviewers sign on stable (via hg sign without options) and merge into default. Signing directly on stable reduces the possibility of signing the wrong revision. ↩

hg pull and hg push to transfer changes and hg merge when you have multiple heads on one branch are implied in the actions: you can use any kind of repository structure and synchronization scheme. The practical actions only assume that you synchronize your repositories with the other contributors at some point. ↩

Here a hotfix is defined as a fix which must be applied quickly out-of-order, for example to fix a security hole. It prompts a bugfix-release which only contains already stable and tested changes plus the hotfix. ↩

If your project needs a certain release preparation phase (like translations), then you can simply assign a task branch. Instead of merging to stable, you merge to the task branch, and once the task is done, you merge the task branch to stable. An Example: Assume that you need to update translations before you release anything. (next part: init: you only need this once) When you want to do the first release which needs to be translated, you update to the revision from which you want to make the release and create the “translation” branch: hg update default; hg branch translation; hg commit -m "prepared the translation branch". All translators now update to the translation branch and do the translations. Then you merge it into stable: hg update stable; hg merge translation; hg ci -m "merged translated source for release". After the release you merge stable back into default as usual. (regular releases) If you want to start translating the next time, you just merge the revision to release into the translation branch: hg update translation; hg merge default; hg commit -m "prepared translation branch". Afterwards you merge “translation” into stable and proceed as usual. ↩

We merge the hotfix into default to define the relevance of the fix for general development. If the hotfix also affects the current line of development, we keep its changes in the merge. If the current line of development does not need the hotfix, we discard its changes in the merge. We do this to ensure that it is clear in future how to treat the hotfix when merging new changes: let the merge record the decision. ↩

We can also merge to stable regularly as soon as some set of changes is considered stable, but without making an actual release (==tagging). That way we always have a stable branch which people can test without having to create releases right away. The releases are those changesets on the stable branch which carry a tag. ↩

The reason for that is the more expressive history model of Mercurial. In short: The git version has 5 types of branches: feature, develop, release, hotfix and master (for tagging). With Mercurial you can reduce them to 3: default, stable and feature branches:

Tags are simple in-history objets, so we need no special branch for them: a tag signifies a release (down to 4 branch-types - and no more duplication of information, since in the git-model a release is shown by a tag and a merge to master).

Hotfixes are simple commits on stable followed by a merge to default, so we also need no branch for them (down to 3 branch-types). And if we only maintain one release at a time, we only need one branch for them: stable (down from branch-type to single branch).

And feature branches are not required for clean separation since mercurial can easily cope with multiple heads in a branch, so developers only have to worry about them if they want to use them (down to 2 mandatory branches).

And since the default branch is the branch to which you update automatically when you clone a repository, new developers don’t have to worry about branches at all.

So we get down from 5 mandatory branches (2 of them are categories containing multiple branches) to 2 simple branches without losing functionality.

And new developers only need to know two things about our branching model to contribute:

“If you use feature branches, don’t call them default or stable. And don’t touch stable”.

Merging old releases into new ones sounds like a lot of work. If you get that feeling, then have a look how many releases you really maintain right now. In my Gentoo tree most programs actually have only one single release, so using actual release branches would incur an additional burden without adding real value. You can also look at the rule of thumb whether to choose feature branches instead↩

If you want to make sure that every changeset on stable is production-ready, you can also start a new release-branch on stable, then merge the first revision, whose child you do not want, into that branch and graft additional changes. Then close the branch and merge it into stable. You can achieve the same with much lower overhead (unneeded complexity) by changing the requirement to “every tagged revision on stable is production-ready”. To only see tagged revisions on stable, just use hg log -r "branch(stable) and tag()". This also works for incoming and outgoing, so you can use it for triggering a build system. ↩

The short graphlog for the grafted micro-releases was created via hg glog --template "{desc} ({branch})". ↩

The review branch is a special preparation-branch, because it can get discontinous changes, if maintainers decide to graft some changes which have ancestors they did not review yet. ↩

We use one single review branch which gets reused at every review to ensure that there are no changes in stable which we did not have in the review. As alternative, you could use one branch per review. In that case, ensure that you start the review-* branches from stable and not from default. Then merge and graft the changes from default which you want to review for inclusion in your next release. ↩

If you want to adapt the model to multiple very distinct releases, simply add multiple release-branches (i.e. release-x). Then hg graft the changes you want to use from default or stable into the releases and merge the releases into stable to ensure that the relationship of their changes to current changes is clear, recorded and will be applied automatically by Mercurial in future merges21. If you use multiple tagged releases, you need to merge the releases into each other in order - starting from the oldest and finishing by merging the most recent one into stable - to record the same information as with release branches. Additionally it is considered impolite to other developers to keep multiple heads in one branch, because with multiple heads other developers do not know the canonical tip of the branch which they should use to make their changes - or in case of stable, which head they should merge to for preparing the next release. That’s why you are likely better off creating a branch per release, if you want to maintain many very different releases for a long time. If you only use tags on stable for releases, you need one merge per maintained release to create a bugfix version of one old release. By adding release branches, you reduce that overhead to one single merge to stable per affected release by stating clearly, that changes to old versions should never affect new versions, except if those changes are explicitely merged into the new versions. If the bugfix affects all releases, release branches require two times as many actions as tagged releases, though: You need to graft the bugfix into every release and merge the release into stable.22↩

If for example you want to ignore that change to an old release for new releases, you simply merge the old release into stable and use hg revert --all -r stable before committing the merge. ↩

A rule of thumb for deciding between tagged releases and release branches is: If you only have a few releases you maintain at the same time, use tagged releases. If you expect that most bugfixes will apply to all releases, starting with some old release, just use tagged releases. If bugfixes will only apply to one release and the current development, use tagged releases and merge hotfixes only to stable. If most bugfixes will only apply to one release and not to the current development, use release branches. ↩

A short introduction to Mercurial with TortoiseHG (GNU/Linux and Windows)

Note: This tutorial is for the old TortoiseHG (with gtk interface). The new one works a bit differently (and uses Qt). See the official quick start guide. The right-click menus should still work similar to the ones described here, though.

Downloading the Repository

After installing TortoiseHG, you can download a repository to your computer by right-clicking in a folder and selecting the menu "TortoiseHG" and then "Clone" in there (currently you still need Windows for that - all other dialogs can be evoked in GNU/Linux on the commandline via "hgtk").

(that's also the address of the repository in the internet - just try clicking the link.

When you log in to bitbucket.org you will find a clone-address directly on the site. You can also use that clone address to upload changes (it contains your login-name, and I can give you "push" access on that site).

Workflow with TortoiseHG

This gives you two basic abilities:

Save and view changes locally, and

synchronize changes with others.

(I assume that part of what I say is redundant, but I'd rather write a bit too much than omit a crucial bit)

To save changes, you can simlply select "HG Commit" in the right-click-menu. If some of your files aren't known to HG yet (the box before the file isn't ticked), you have to add them (tick the box) to be able to commit them.

To go back to earlier changes, you can use "Checkout Revision" in the "TortoiseHG" menu. In that dialog you can then select the revision you want to see and use the icon on the upper left to get all files to that revision.

You can synchronize by right-clicking in the folder and selecting "Synchronize" in the "TortoiseHG" menu (inside the right-click menu). In the opening dialog you can "push" (upload changes - arrow up with the bar above it), "pull" (download changes to your computer - arrow down with bar below), and check what you would pull or push (arrows iwthout bars). I thing that using dialog will soon became second nature for you, too :)

Basic usecases for DVCS: Workflow Failures

If you came here searching for a way to set the username in Mercurial: just edit $HOME/.hgrc and add[ui]username = YOURNAME <EMAIL>If that file does not exist, simply create it.

Update (2014-05-01): The Mercurial breakage is fixed in Mercurial 3.0: When you commit without username it now says “Abort: no username supplied
(use "hg config --edit" to set your username)”. The editor shows a template with a commented-out field for the username. Just put your name and email after the pre-filled username = and save the file. The Git breakage still exists.

Update (2013-04-18): In #mercurial @ irc.freenode.net there were discussions yesterday for improving the help output if you do not have your username setup, yet.

1 Intro

I recently tried contributing to a new project again, and I was quite surprised which hurdles can be in your way, when you did not setup your environment, yet.

So I decided to put together a small test for the basic workflow: Cloning a project, doing and testing a change and pushing it back.

I did that for Git and Mercurial, because both break at different points.

I’ll express the basic usecase in Subversion:

svn checkout [project]

(hack, test, repeat)

(request commit rights)

svn commit -m "added X"

You can also replace the request for commit rights with creating a patch and sending it to a mailing list. But let’s take the easiest case of a new contributor who is directly welcomed into the project as trusted committer.

A slightly more advanced workflow adds testing in a clean tree. In Subversion it looks almost like the simple commit:

2.3.2 Push it back

git push

warning: push.default is unset; its implicit value is changing in
Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the current behavior after the default changes, use:
git config --global push.default matching
To squelch this message and adopt the new behavior now, use:
git config --global push.default simple
See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)
Counting objects: 5, done.
(1/3)
Writing objects: 66% (2/3)
Writing objects: 100% (3/3)
Writing objects: 100% (3/3), 234 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
remote: error: refusing to update checked out branch: refs/heads/master
remote: error: By default, updating the current branch in a non-bare repository
remote: error: is denied, because it will make the index and work tree inconsistent
remote: error: with what you pushed, and will require 'git reset --hard' to match
remote: error: the work tree to HEAD.
remote: error:
remote: error: You can set 'receive.denyCurrentBranch' configuration variable to
remote: error: 'ignore' or 'warn' in the remote repository to allow pushing into
remote: error: its current branch; however, this is not recommended unless you
remote: error: arranged to update its work tree to match what you pushed in some
remote: error: other way.
remote: error:
remote: error: To squelch this message and still keep the default behaviour, set
remote: error: 'receive.denyCurrentBranch' configuration variable to 'refuse'.
To /tmp/gitflow/mine
master (branch is currently checked out)
error: failed to push some refs to '/tmp/gitflow/mine'

Uh… what? If I were a real first time user, at this point I would just send a patch…

The simple local test clone does not work: You actually have to also checkout a different branch if you want to be able to push back (needless duplication of information - and effort). And it actually breaks this simple workflow.

(experienced git users will now tell me that you should always checkout a work branch. But that would mean that I would have to add the additional branching step to the simplest case without testing repo, too, raising the bar for contribution even higher)

The most happy marriage I can imagine to myself would be the union
of a deaf man to a blind woman.
-- Samuel Taylor Coleridge
arne@fluss ~/.emacs.d/private/journal $ arne@fluss ~/.emacs.d/private/journal $ $$$$$$$$$$$$
orig

hg --version

Mercurial Distributed SCM (version 2.5.2)
(see http://mercurial.selenic.com for more information)
Copyright (C) 2005-2012 Matt Mackall and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

3.2.2 Hack a bit

ARGL, what??? Mind
the update at the top of this article: This is fixed in Mercurial 3.0

Well, let’s do what it says (but only see the first 30 lines to avoid blowing up this example):

hg help config | head -n 30 | grep -B 3 -A 1 per-repository

These files do not exist by default and you will have to create the
appropriate configuration files yourself: global configuration like the
USERPROFILE%\mercurial.ini" or
HOME/.hgrc" and local configuration is put into the per-repository
/.hg/hgrc" file.

Are you serious??? I have to actually read a guide just to commit my change??? As normal user this would tip my frustration with the tool over the edge and likely get me to just send a patch… Mind
the update at the top of this article: This is fixed in Mercurial 3.0

But I am no normal user, since I want to write this guide. So I assume a really patient user, who does the following (after reading for 3 minutes):

echo'[ui]username = "contributor"' >> .hg/hgrc

and tries again:

hg commit -m "hack"

Now it worked. But this is MAJOR BREAKAGE. Mind
the update at the top of this article: This is fixed in Mercurial 3.0

If I had worked on mine in the meantime, I would have to merge there, too - just as with git with the exception that I would not have to give a branch name. But since we’re in the simplest case, we don’t need to do that.

3.3.3 Overview

In short the required commands for testing look like this:

hg clone mine test

cd test; (hack)

hg commit -m "hack"

hg push ../mine

cd ../mine

hg push

Compare to Subversion

and to git

3.4 Wrapup

The Mercurial workflow broke only ONCE, but there it broke HARD: To commit you actually have to READ THE HELP PAGE on config to find out how to set your username.

So, to wrap it up: ARE YOU SERIOUS? Mind
the update at the top of this article: This is fixed in Mercurial 3.0

That’s a really nice workflow, disturbed by a devastating user experience for just one of the commands.

This is a place where hg should learn from git: The initial setup must be possible from the commandline, without reading a help page and without changing to an editor and then back into the commandline.

4 Summary

Git broke at several places, and in one place it broke hard: Pushing between local clones is a huge hassle, even though that should be a strong point of DVCSs.

Mercurial broke only once, but there it broke hard: Setting the username actually requires reading help output and hand-editing a text file.

Also the workflows for a user who gets permission to push always required some additional steps compared to Subversion.

One of the additional steps cannot be avoided without losing offline-commits (which are a major strength of DVCS), because those make it necessary to split svn commit into commit and push: That separates storing changes from sharing them.

But git actually requires additional steps which are only necessary due to implementation details of its storage layer: Pushing to a repo with the same branch checked out is not allowed, so you have to create an additional branch in your local clone and merge it in the other repo, even if all your changes are siblings of the changes in the other repository, and it requires either a flag to every commit command or explicit adding of changes. That does not amount to the one unavoidable additional command, but actually further three commands, so the number of commands to get code, hack on it and share it increases from 5 to 9. And if you work in a team where people trust you to write good code, that does not actually reduce the required effort to share your changes.

On the other hand, both Mercurial and Git allow you to work offline, and you can do as many testing steps in between as you like, without needing to get the changes from the server every time (because you can simply clone a local repo for that).

Even shorter, but not quite correct

The shortest representation is without the heads, though. It does not represent the current state of development if the last commit was a merge or if some branches were not merged. Otherwise it is equivalent.

Better representations?

PS: Yes, you can rewrite history, but that’s a really bad idea if you have many people who closely interact and publish early and often.

Factual Errors in “Git vs Mercurial: Why Git?” from Atlassian

2 years ago, Atlassian developer Charles O’Farrell published the article Git vs. Mercurial: Why Git? in which he “showed the winning side of Git” as he sees it. This article was part of the Dev Tools series at Atlassian and written as a reply to the article Why Mercurial?. It was spiced with so much misinformation that the comments exploded right away. But the article was never corrected. Just now I was referred to the text again, and I decided to do what I should have done 2 years ago: Write an answer which debunks the myths.

Table of Contents

“I also think that git isn’t the most beginner-friendly program. That’s why I’m only using its elementary features” — “I hear that from many git-users …” — part of the discussion which got me to write this article

Safer history and rewriting history with Git

Charles starts off by contradicting himself: He claims that git is safer, because it “actually never lets you change anything” - and goes on to explain, that all unreferenced data can be garbage collected after 30 days. Since nowadays the git garbage collector runs automatically, all unreferenced changes are lost after approximately 30 days.

This obviously means that git does allow you to change something. That this change only becomes irreversible after 30 days is an implementation detail which you have to keep in mind if you want to be safe.1

He then goes on to say how this allows for easy history rewriting with the interactive rebase and correctly includes, that the histedit extension of Mercurial allows you to do the same. (He also mentions the Mercurial Queues Extension (mq), just to admit that it is not the equivalent of git rebase -i but instead provides a staging area for future commits).

Then he starts the FUD2: Since histedit stores its backup in an external file, he asks rhetorically what new commands he would have to learn to restore it.

Dear reader, what new command might be required to pull data out of a backup? Something like git ref? Something like git reflog to find it and then something else?

Turns out, this is as easy and consistent as most things in Mercurial: Backup bundles can be treated just like repositories: To restore the changes, simply use

hg pull backup.bundle

So, all FUD removed, his take on safer history and rewriting history is reduced to “in hg it’s different, and potentially confusing features are shipped as extensions. Recovering changes from backups is consistent with your day-to-day usage of hg”.

(note that the flexibility of hg also enables the creation of extensions like mutable hg which avoids all the potential race conditions with git rebase - even for code you share between repositories (which is a total no-go in git), with a safety net which warns you if you try to change published history; thanks to the core feature phases)

Branching

On branching Charles goes deep into misinformation: He wrote his article in the year 2012, when Mercurial had already provided named branches as well as anonymous branching for 6 years, and one year after bookmarks became a core feature in hg 1.8, and he kept talking about how Mercurial advised to keep one clone per branch by referencing to a blog post which incorrectly assumed that the hg developers were using that workflow (obviously he did not bother to check that claim). Also he went on clamoring, that bookmarks initially could not be pushed between repositories, and how they were added “due to popular demand”. The reality is, that at some point a developer simply said “I’ll write that”. And within a few months, he implemented the equivalent of git branches. Before that, no hg developer saw enough need for them to excert that effort and today most still simply use named branches.

But obviously Charles could not imagine named branches to work, so he kept talking about how bookmarks do not have namespaces while git branches have them, and that this would create confusion. He showed the following example for git and Mercurial (shortened here):

Git: there is a commit marked as (origin/master, origin/test), and one marked as (HEAD, master). If you know that origin is the canonical remote repository in git, then you can guess, that the names prefixed with origin/ come from the remote repository.

Mercurial: There is a commit with the bookmark master@default and one with the bookmark master. When you know that default is the canonical remote repository in Mercurial, then you can guess, that the bookmark postfixed with @default comes from the remote repository.

But Charles concludes his example with the sentence: “Because there is no notion of namespaces, we have no way of knowing which bookmarks are local and which ones are remote, and depending on what we call them, we might start running into conflicts.”

And this is not only FUD, it is factually wrong and disproven in his own example. After this, I cannot understand how anyone could take his text seriously.

But he goes on.

Staging

His final misinformation is about the git index - a staging area for uncommitted changes. He correctly identifies the index as “one of the things that people either love or hate about Git”. As Mercurial cares a lot about giving newcomers a safe environment to work in, it ships this controversial feature as extension and not as core command.

Charles now claims that the equivalent of the git index is the record extension - and then complains that it does not imitate the index exactly, because it does not give a staging area but rather allows committing partial changes. Instead of now turning towards the Mercurial Queues Extension which he mentioned earlier as staging area for commits, he asserts that record cannot provide the same feature as git.

Not very surprisingly, when you have an extension to provide partial commits (record) and one to provide a staging area (mq), if you want both, you simply activate both extensions. When you do that, Mercurial offers the qrecord command which stores partial changes in the current staging area.

Not mentioning this is simply a matter of not having done proper research for his article - and not updating the post means that he intentionally continues to spread misinformation.

Blame

The only thing he got right is that git blame is able to reconstruct copies of code from one file to another.

Mercurial provides this for renamed files, but not for directly copy-pasted lines. Analysis of the commits would naturally allow doing the same, and all the information for that is available, but this is not implemented yet. If people ask for it loud enough, it will only be a matter of time, though. As bookmarks showed, the Mercurial code base is clean enough that it suffices to have a single developer who steps up and create an extension for this. If enough people use it, the extension can become a core feature later on.

Conclusion

“There is a reason why hg users tend to talk less about hg: There is no need to talk about it that much.” — Arne Babenhauserheide as answer to Why Mercurial?

Charles concludes with “Git means never having to say, you should have”, and “Mercurial feels like Git lite”. Since he obviously did not do his research on Mercurial while he took the time to acquire in-depth knowledge of git, it’s quite understandable that he thinks this. But it is no base for writing an article - especially not for Atlassian, the most prominent Mercurial hosting provider since their acquisition of Bitbucket, which grew big as pure Mercurial hoster and added git after being acquired by Atlassian.

He then manages to finish his article with one more unfounded smoke bomb: The repository format drives what is possible with our DVCS tools, now and in the future.

While this statement actually is true, in the context of git-vs-mercurial it is a horrible misfit: The hg-git extension shows since 2009, 3 years before Charles wrote his article, that it is possible to convert transparently from git to Mercurial and back. So the repository format of Mercurial has all capabilities of the repository format of git - and since git cannot natively store named branches, represent branches with multiple heads or push changes into a checked out branch, the capabilities of the repository format of Mercurial are actually a superset of the capabilities of the storage format of Git.

But what he also states is that “there are more important things than having a cuddly command line”. And this is the final misleading statement to debunk: While the command line does not determine what is theoretically possible with the tool, it does determine what regular users can do with it. The horrible command line of git likely contributes to the many git users who never use anything but commit -a, push and pull - and to the proliferation of git gurus whom the normal users call when git shot them into their foot again.

It’s sad when someone uses his writing skills to wrap FUD and misinformation into pretty packaging to get people to take his side. Even more sad is, that this often works for quite some time and that few people read the comments section.3

And now that I finished debunking the article, there is one final thing I want to share. It is a quote from the discussion which prompted me to write this piece:

<…> btw. I also think that git isn’t the most beginner-friendly program.
<…> That’s why I’m only using its elementary features
<ArneBab> I hear that from many git-users…
<…> oh, maybe I should have another look at hg after all

Footnotes:

Garbage collection after 30 days means that you have to remember additional information while you work. And that is a problem: You waste resources which would be better spent on the code you write. A DVCS should be about having to remember less, because your DVCS keeps the state for you.

FUD means fear-uncertainty-doubt and is a pretty common technique used to discredit things when one has no real arguments: Instead of giving a clear argument which can be debunked, just make some vague hints that something might be wrong or that there might be some deficiency or danger. Most readers will never check this and so this establishes the notion that something IS wrong.

Lesson learned: If you take the time to debunk something in the comments, be sure to also write an article about it. Otherwise you might find the same misinformation still being spread 2 years later by the same people. When Atlassian bought Bitbucket, that essentially amounted to a hostile takeover of a Mercurial team by git-zealots. And they got away with this, because too few people called them up on it in public.

git vs. hg - offensive

In many discussions on DVCS over the years I have been fair, friendly and technical while receiving vitriol and misinformation and FUD. This strip visualizes the impression which stuck to my mind when talking to casual git-users.

Update: I found a very calm discussion at a place where I did not expect it: reddit. I’m sorry to you, guys. Thank you for proving that a constructive discussion is possible from both sides! I hope that you are not among the ones offended by this strip.

To Hg-users: There are git users who really understand what they are doing and who stick to arguments and friendly competition. This comic arose from the many frustrating experiences with the many other git users. Please don’t let this strip trick you into going down to non-constructive arguments. Let’s stay friendly. I already feel slightly bad about this short move into competition-like visualization for a topic where I much prefer friendly, constructive discussions. But it sucks to see contributors stumble over git, so I think it was time for this.

»I also think that git isn’t the most beginner-friendly program. That’s why I’m using only its elementary features«

»I also think that git isn’t the most beginner-friendly program.
That’s why I’m using only its elementary features«
<ArneBab> I hear that from many git-users…
»oh, maybe I should have another look at hg after all«

Why this?

Because there are far too many Git-Users who only dare using the most basic commands which makes git at best useless and at worst harmful.

This is not the fault of the users. It is the fault of the tool.

This strip is horrible!

If you are offended by this strip: You knew the title when you came here, right?

And if you are offended enough, that you want to make your own strip and set things right, go grab the source-file, fire up krita and give me what I deserve! This strip is free.1

Commentary

If you feel that this strip fits Mercurial and Git perfectly, keep in mind, that this is only one aspect of the situation, and that using Git is still much better than being forced to use centralized or proprietary version tracking (and people who survive the initial phase mostly unscarred can actually do the same with Git as they could with Mercurial).

And Mercurial also has its share of problems - even horrible ones - but compared to Git it is a wonder of usability.

And in case this strip does not apply to your usage of Git: there are far too many people whose experience it fits - and this should not be the case for the most widespread system for accessing the code of free software projects.

(and should this strip be completely unintelligible to you: curse a world in which the concept of monofilament whips isn’t mainstream ☺ — let’s get more people to play Shadowrun)

The way forward

So if you are one of the people, who mostly use commit, pull and push, and turn to a Git-Guru when things break, then you might want to kiss the Git-Guru goodbye and give Mercurial a try.

By the way: the extensions named in the Final Round are record, mutable and infocalypse: Select the changes to commit on a hunk-by-hunk base, change history with automatic conflict resolution (even for rebase) and collaborate anonymously over Freenet.

To use the eclass in an ebuild, just add inherit mercurial at the beginning of the ebuild and set EHG_REPO_URI to the correct repository URI. If you need to share a single repository between several ebuilds, set EHG_PROJECT to
the project name in all of them.

Have fun with Mercurial!

Learning Mercurial in Workflows

It delves into nonlinear history and merging right from the beginning and uses only features you get without activating extensions. Due to this it offers efficient and safe workflows without danger of losing already committed work.

With Mercurial you can use a multitude of different workflows. This page shows some of them, including their use cases. It is intended to make it easy for beginners of version tracking to get going instantly and learn completely incrementally. It doesn't explain the concepts used, because there are already many other great resources doing that, for example the wiki and the hgbook.

This guide doesn't require any prior knowledge of version control systems (though subversion users will likely feel at home quite quickly). Basic command line abilities are helpful, because we'll use the command line client.

Basic workflows

We go from simple to more complex workflows. Those further down build on previous workflows.

Log keeping

Use Case

The first workflow is also the easiest one: You want to use Mercurial to be able to look back when you did which changes.

This workflow only requires an installed Mercurial and write access to some file storage (you almost definitely have that :) ). It shows the basic techniques for more complex workflows.

Workflow

Prepare Mercurial

As first step, you should teach Mercurial your name. For that you open the file ~/.hgrc (or mercurial.ini in your home directory for Windows) with a text-editor and add the ui section (user interaction) with your username:

[ui]
username = Mr. Johnson <johnson@smith.com>

Initialize the project

Now you add a new folder in which you want to work:

$ hg init project

Add files and track them

You can also go into an existing directory with files and init the repository there.

$ cd project
$ hg init

Alternatively you can add only specific files instead of all files in the directory. Mercurial will then track only these files and won't know about the others. The following tells mercurial to track all files whose names begin with "file0" as well as file10, file11 and file12.

$ hg add file0* file10 file11 file12

Save changes

$ (do some changes)

see which files changed, which have been added or removed, and which aren't tracked yet

$ hg status

see the exact changes

$ hg diff

commit the changes.

$ hg commit

now an editor pops up and asks you for a commit message. Upon saving and closing the editor, your changes have been stored by Mercurial.

Note:

You can also supply the commit message directly via hg commit -m 'MESSAGE'.

Move and copy files

When you copy or move files, you should tell Mercurial to do the copy or move for you, so it can track the relationship between the files.

Remember to commit after moving or copying. From the basic commands only commit creates a new revision

Seeing an earlier revision

Different from the log keeping workflow, you'll want to go back in history at times and do some changes directly there, for example because an earlier change introduced a bug and you want to fix it where it occurred.

To look at a previous version of your code, you can use update. Let's assume that you want to see revision 3.

$ hg update 3

Now your code is back at revision 3, the fourth commit (Mercurial starts counting at 0).
To check if you're really at that revision, you can use identify -n.

$ hg identify -n

Note:

identify without options gives you the short form of a unique revision ID. That ID is what Mercurial uses internally. If you tell someone about the version you updated to, you should use that ID, since the numbers can be different for other people. If you want to know the reasons behind that, please read up Mercurials [basic concepts]. When you're at the most recent revision, hg identify -n will return "-1".

To update to the most recent revision, you can use "tip" as revision name.

$ hg update tip

Note:

If at any place any command complains, your best bet is to read what it tells you and follow that advice.

Note:

Instead of hg update you can also use the shorthand hg up. Similarly you can abbreviate hg commit to hg ci.

Note:

To get a revision devoid of files, just update to "null" via hg update null. That's the revision before any files were added.

Fixing errors in earlier revisions

When you find a bug in some earlier revision you have two options: either you can fix it in the current code, or you can go back in history and fix the code exactly where you did it, which creates a cleaner history.

To do it the cleaner way, you first update to the old revision, fix the bug and commit it. Afterwards you merge this revision and commit the merge. Don't worry, though: Merging in mercurial is fast and painless, as you'll see in an instant.

Let's assume the bug was introduced in revision 3.

$ hg update 3
$ (fix the bug)
$ hg commit

Now the fix is already stored in history. We just need to merge it with the current version of your code.

$ hg merge

If there are conflicts use hg resolve - that's also what merge tells you to do in case of conflicts.

First list the files with conflicts

$ hg resolve --list

Then resolve them one by one. resolve attempts the merge again

$ hg resolve conflicting_file
(fix it by hand, if necessary)

Mark the fixed file as resolved

$ hg resolve --mark conflicting_file

Commit the merge, as soon as you resolved all conflicts. This step is also necessary when there were no conflicts!

$ hg commit

At this point, your fix is merged with all your other work, and you can just go on coding. Additionally the history shows clearly where you fixed the bug, so you'll always be able to check where the bug was.

Note:

Most merges will just work. You only need resolve, when merge complains.

So now you can initialize repositories, save changes, update to previous changes and develop in a nonlinear history by committing in earlier changesets and merging the changes into the current code.

Note:

If you fix a bug in an earlier revision, and some later revision copied or moved that file, the fix will be propagated to the target file(s) when you merge. This is the main reason why you should always use hg cp and hg mv.

Separate features

Use Case

At times you'll be working on several features in parallel. If you want to avoid mixing incomplete code versions, you can create clones of your local repository and work on each feature in its own code directory.

After finishing your feature you then pull it back into your main directory and merge the changes.

Workflow

Work in different clones

Now check what will come in when you pull from feature1, just like you can use diff before committing. The respective command for pulling is incoming

$ cd ../project
$ hg incoming ../feature1

Note:

If you want to see the diffs, you can use hg incoming --patch just as you can do with hg log --patch for the changes in the repository.

If you like the changes, you pull them into the project

$ hg pull ../feature1

Now you have the history of feature1 inside your project, but the changes aren't yet visible. Instead they are only stored inside a ".hg" directory of the project (more information on the store).

Note:

From now on we'll use the name "repository" for a directory which has a .hg directory with Mercurial history.

If you didn't do any changes in the project, while you were working on feature1, you can just update to tip (hg update tip), but it is more likely that you'll have done some other changes in between changes. In that case, it's time for merging.

Merge feature1 into the project code

$ hg merge

If there are conflicts use hg resolve - that's also what merge tells you to do in case of conflicts. After you merge, you have to commit explicitly to make your merge final

$ hg commit
(enter commit message, for example "merged feature1")

You can create an arbitrary number of clones and also carry them around on USB sticks. Also you can use them to synchronize your files at home and at work, or between your desktop and your laptop.

Note:

You also have to commit after a merge when there are no conflicts, because merging creates new history and you might want to attach a specific message to the merge (like "merge feature1").

Rollback mistakes

Now you can work on different features in parallel, but from time to time a bad commit might sneak in. Naturally you could then just go back one revision and merge the stray error, keeping all mistakes out of the merged revision. However, there's an easier way, if you realize your error before you do another commit or pull: rollback.

Rolling back means undoing the last operation which added something to your history.

Imagine you just realized that you did a bad commit - for example you didn't see a spelling error in a label. To fix it you would use

hg rollback

And then redo the commit

hg commit -m "message"

If you can use the command history of your shell and you added the previous message via commit -m "message", that following commit just means two clicks on the arrow-key "up" and one click on "enter".

Though it changes your history, rolling back doesn't change your files. It only undoes the last addition to your history.

But beware, that a rollback itself can't be undone. If you rollback and then forget to commit, you can't just say "give me my old commit back". You have to create a new commit.

Note:

Rollback is possible, because Mercurial uses transactions when recording changes, and you can use the transaction record to undo the last transaction. This means that you can also use rollback to undo your last pull, if you didn't yet commit anything new.

Sharing changes

Use Case

Now we go one step further: You are no longer alone, and you want to share your changes with others and include their changes.

The basic requirement for that is that you have to be able to see the changes of others.

Mercurial allows you to do that very easily by including a simple webserver from which you can pull changes just as you can pull changes from local clones.

Note:

There are a few other ways to share changes, though. Instead of using the builtin webserver, you can also send the changes by email or setup a shared repository, to where you push changes instead of pulling them. We'll cover one of those later.

Workflow

Using the builtin webserver

This is the easiest way to quickly share changes.

First the one who wants to share his changes creates the webserver

$ hg serve

Now all others can point their browsers to his IP address (for example 192.168.178.100) at port 8000. They will then see all his history there and can decide if they want to pull his changes.

$ firefox http://192.168.178.100:8000

If they decide to include the changes, they just pull from the same URL

$ hg pull http://192.168.178.100:8000

At this point you all can work as if you had pulled from a local repository. All the data is now in your individual repositories and you can merge the changes and work with them without needing any connection to the served repository.

Sending changes by email

Often you won't have direct access to the repository of someone else, be it because he's behind a restrictive firewall, or because you live in different timezones. You might also want to keep your changes confidential and prefer internal email (if you want additional protection, you can also encrypt the emails, for example with GnuPG).

In that case, you can easily export your changes as patches and send them by email.

Another reason to send them by email can be that your policy requires manual review of the changes when the other developers are used to reading diffs in emails. I'm sure you can think of more reasons.

Sending the changes via email is pretty straightforward with Mercurial. You just export your changes and attach (or copy paste) it in your email. Your colleagues can then just import them.

First check which changes you want to export

$ cd project
$ hg log

We assume that you want to export changeset 3 and 4

$ hg export 3 > change3.diff
$ hg export 4 > change4.diff

Now attach them to an email and your colleagues can just run import on both diffs to get your full changes, including your user information.

To be careful, they first clone their repository to have an integration directory as sandbox

That's it. They can now test your changes in feature clones. If they accept them, they pull the changes into the main repository

$ cd ../project
$ hg pull ../integration

Note:

The patchbomb extension automates the email-sending, but you don't need it for this workflow.

Note:

You can also send around bundles, which are snippets of your actual history. Just create them via

$ hg bundle --base FIRST_REVISION_TO_BUNDLE changes.bundle

Others can then get your changes by simply pulling them, as if your bundle were an actual repository

$ hg pull path/to/changes.bundle

Using a shared repository

Sending changes by email might be the easiest way to reach people when you aren't yet part of the regular development team, but it creates additional workload: You have to bundle the changes, send mails and then import the bundles manually. Luckily there's an easier way which works quite well: The shared push repository.

Till now we transferred all changes either via email or via pull. Now we use another option: pushing. As the name suggests it's just the opposite of pulling: You push your changes into another repository.

But to make use of it, we first need something we can push to.

By default hg serve doesn't allow pushing, since that would be a major security hole. You can allow pushing in the server, but that's no solution when you live in different timezones, so we'll go with another approach here: Using a shared repository, either on an existing shared server or on a service like BitBucket. Doing so has a bit higher starting cost and takes a bit longer to explain, but it's well worth the effort spent.

Otherwise you first need to setup a BitBucket Account. Just signup at BitBucket. After signing up (and login) hover your mouse over "Repositories". There click the item at the bottom of the opening dialog which say "Create new".

Give it a name and a description. If you want to keep it hidden from the public, select "private"

$ firefox http://bitbucket.org

Now your repository is created and you see instructions for pushing to it. For that you'll use a command similar to the following (just with a different URL)

$ hg push https://bitbucket.org/ArneBab/hello/

(Replace the URL with the URL of your created repository. If your username is "Foo" and your repository is named "bar", the URL will be https://bitbucket.org/Foo/bar/)

Mercurial will ask for your BitBucket name and password, then push your code.

Now it's time to tell all your colleagues to sign up at BitBucket, too.

After that you can click the "Admin" tab of your created repository and add the usernames of your colleagues on the right side under "Permission: Writers". Now they are allowed to push code to the repository.

(If you chose to make the repository private, you'll need to add them to "Permission: Readers", too)

If one of you now wants to publish changes, he'll simply push them to the repository, and all others get them by pulling.

Publish your changes

$ hg push https://bitbucket.org/ArneBab/hello/

Pull others changes into your local repository

$ hg pull https://bitbucket.org/ArneBab/hello/

People who join you in development can also just clone this repository, as if one of you were using hg serve

That local repository will automatically be configured to pull/push from/to the online repository, so new contributors can just use hg push and hg pull without an URL.

Note:

To make this workflow more scalable, each one of you can have his own BitBucket repository and you can simply pull from the others repositories. That way you can easily establish workflows in which certain people act as integrators and finally push checked code to a shared pull repository from which all others pull.

Note:

You can also use this workflow with a shared server instead of BitBucket, either via SSH or via a shared directory. An example for an SSH URL with Mercurial is be ssh://user@example.com/path/to/repo. When using a shared directory you just push as if the repository in the shared directory were on your local drive.

Summary

Now let's take a step back and look where we are.

With the commands you already know, a bit reading of hg help <command> and some evil script-fu you can already do almost everything you'll ever need to do when working with source code history. So from now on almost everything is convenience, and that's a good thing.

First this is good, because it means, that you can now use most of the concepts which are utilized in more complex workflows.

Second it aids you, because convenience lets you focus on your task instead of focusing on your tool. It helps you concentrate on the coding itself. Still you can always go back to the basics, if you want to.

A short summary of what you can do which can also act as a short check, if you still remember the meaning of the commands.

Use shared repositories on BitBucket

Let's move on towards useful features and a bit more advanced workflows.

Advanced workflows

Backing out bad revisions

Use Case

When you routinely pull code from others, it can happen that you overlook some bad change. As soon as others pull that change from you, you have little chance to get completely rid of it.

To resolve that problem, Mercurial offers you the backout command. Backing out a change means, that you tell Mercurial to create a commit which reverses the bad change. That way you don't get rid of the bad code in history, but you can remove it from new revisions.

Note:

The basic commands don't directly rewrite history. If you want to do that, you need to activate some of the extensions which are shipped with mercurial. We'll come to that later on.

Workflow

Let's assume the bad change was revision 3, and you already have one more revision in your
repository. To remove the bad code, you can just backout of it. This creates a new
change which reverses the bad change. After backing out, you can then merge that new change
into the current code.

That's it. You reversed the bad change. It's still recorded that it was once there (following the principle "don't rewrite history, if it's not really necessary"), but it doesn't affect future code anymore.

Collaborative feature development

Now that you can share changes and reverse them if necessary, you can go one step further: Using Mercurial to help in coordinating the coding.

The first part is an easy way to develop features together, without requiring every developer to keep track of several feature clones.

Use Case

When you want to split your development into several features, you need to keep track of who works on which feature and where to get which changes.

Mercurial makes this easy for you by providing named branches. They are a part of the main repository, so they are available to everyone involved. At the same time, changes committed on a certain branch don't get mixed with the changes in the default branch, so features are kept separate, until they get merged into the default branch.

Note:

Cloning a repository always puts you onto the default branch at first.

Workflow

When someone in your group wants to start coding on a feature without disturbing the others, he can create a named branch and commit there. When someone else wants to join in, he just updates to the branch and commits away. As soon as the feature is finished, someone merges the named branch into the default branch.

Working in a named branch

Now you can commit, pull, push and merge (and anything else) as if you were working in a separate repository. If the history of the named branch is linear and you call "hg merge", Mercurial asks you to specify an explicit revision, since the branch in which you work doesn't have anything to merge.

Merge the named branch

When you finished the feature, you merge the branch back into the default branch.

And that's it. Now you can easily keep features separate without unnecessary bookkeeping.

Note:

Named branches stay in history as permanent record after you finished your work. If you don't like having that record in your history, please have a look at some of the advanced workflows.

Tagging revisions

Use Case

Since you can now code separate features more easily, you might want to mark certain revisions as fit for consumption (or similar). For example you might want to mark releases, or just mark off revisions as reviewed.

For this Mercurial offers tags. Tags add a name to a revision and are part of the history. You can tag a change years after it was committed. The tag includes the information when it was added, and tags can be pulled, pushed and merged just like any other committed change.

Note:

A tag must not contain the char ":", since that char is used for specifying multiple revisions - see "hg help revisions".

Note:

To securely mark a revision, you can use the gpg extension to sign the tag.

Workflow

Let's assume you want to give revision 3 the name "v0.1".

Add the tag

$ hg tag -r 3 v0.1

See all tags

$ hg tags

When you look at the log you'll now see a line in changeset 3 which marks the Tag. If someone wants to update to the tagged revision, he can just use the name of your tag

$ hg update v0.1

Now he'll be at the tagged revision and can work from there.

Removing history

Use Case

At times you will have changes in your repository, which you really don't want in it.

There are many advanced options for removing these, and most of them use great extensions (Mercurial Queues is the most often used one), but in this basic guide, we'll solve the problem with just the commands we already learned. But we'll use an option to clone which we didn't yet use.

This workflow becomes inconvenient when you need to remove changes, which are buried below many new changes. If you spot the bad changes early enough, you can get rid of them without too much effort, though.

Workflow

Let's assume you want to get rid of revision 2 and the highest revision is 3.

The first step is to use the "--rev" option to clone: Create a clone which only contains the changes up to the specified revision. Since you want to keep revision 1, you only clone up to that

$ hg clone -r 1 project stripped

Now you can export the change 3 from the original repository (project) and import it into the stripped one

That's it. hg export also includes the commit message, date, committer and similar metadata, so you are already done.

Note:

removing history will change the revision IDs of revisions after the removed one, and if you pull from someone else who still has the revision you removed, you will pull the removed parts again. That's why rewriting history should most times only be done for changes which you didn't yet publicise.

Summary

So now you can work with Mercurial in private, and also share your changes in a multitude of ways.

Additionally you can remove bad changes, either by creating a change in the repository which reverses the original change, or by really rewriting history, so it looks like the change never occurred.

And you can separate the work on features in a single repository by using named branches and add tags to revisions which are visible markers for others and can be used to update to the tagged revisions.

With this we can conclude our practical guide.

More Complex Workflows

If you now want to check some more complex workflows, please have a look at the general workflows wikipage.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

Mercurial Workflow: Feature seperation via named branches

For Whom?

want to develop features collaboratively and you want to be able to see later for which feature a given change was added or

want to do changes concurrently which would likely affect each other negatively while they are not finished, but which need to be developed in a group with minimal overhead,

then this workflow might be right for you.

Note: If you have a huge number of small features (2000 and upwards), the number of persistent named branches can create some performance problems for listing the branches(only for the listing!) (as different example, pushing is unaffected: Linear history is just as fast as 2000 branches). For features which need no collaboration or need only a few commits, this workflow also has much unnecessary overhead. It is best used for features which will be developed side by side with default for some time (and many commits), so tracking the default branch against the feature is relevant. To mark single-commit features as belonging to a feature, just use the commit message.

Note: The difference between Mercurial named branches and git branches is that git branches don’t stay in history. They don’t allow you to find out later in which branch a certain commit was added. If you want git-style branching, just use bookmarks.

Note: If you avoid using stable as branch name, you can always upgrade this workflow to the complete branching model later on.

What you need

Just vanilla Mercurial.

Workflow

The workflow is 6-stepped:

create the new feature,

Implement and share,

merge other changes into it,

merge stable features,

close finished features and

reopen features.

Let’s see the steps in detail.

1. New feature

First start a new branch with the name of the feature (starting from default).

Mercurial for two Programmers who are (mostly) new to SCM

Am Dienstag 03 Februar 2009 20:19:14 schrieb ... ...: > Most of the docs I can find seem to assume the reader is familiar with > existing software developemnt tools and methodologies. > > This is not the case for me.

It wasn't for me either, and I can assure you that using Mercurial becomes
natural quite quickly.

> Now, I need to coordinate with a second (also SCM clueless) programmer.
... > I envision us both working the main trunk for many small day-to-day > changes, and our own isolated repo for larger additions that we will each > be working on.

I don't know about a HOWTO, but I can give you a short description about basic
usage and the workflow I'd use:

Basic usage

Just commit as you'd have done in SVN via "hg commit".

To get changes from others, do "hg pull -u".
The "-u" says 'update my files'. Always commit before you pull. Otherwise "hg pull -u" will try to merge the new changes.

If you already committed and then pull changes from someone else, you merge
the changes with yours via "hg merge". Merging is quite painless in Mercurial, so you can easily do it often.

Once you want to share your changes, do "hg push".
Should that complain about "adding heads", pull and merge, then do the push again. If you really want to create new remote heads, you can use "hg push -f".

Workflow

Firstoff: Create a main repository you both can push changes to. If you have ssh access to a shared machine, that's as simple as creating a repository on that machine via "hg init project".

Now both of you clone from that repository via
hg clone ssh://USER@ADDRESS:path/to/project project

(ADDRESS can be either a host or an IP).

That's your repository for the small day to day changes.

If you want to do bigger changes, you create a feature clone via
hg clone project feature1

In that clone you simply work, pull and commit as usual, but you only push after you finished the feature.

Once you finished the feature, you push the changes from the feature clone via "hg push" in feature1 (which gets them into your main working clone) and then push then onward into the shared repository.

That's it - or rather that's what I'd do. It might be right for you, too, and
if it isn't, don't be shy of experimenting. As long as you have a backup clone
lying around (for example cloned to a USB stick via "hg clone project
path/to/stick/project"), you can't do too much damage :)

This test uses the mutable-hg extension in revision c70a1091e0d8 (24 changesets after 2.1.0). It will likely be obsolete, soon, since mutable-hg is currently moved into Mercurial core by Pierre-Yves David, its main developer. I hope it will be useful for you, to assess the future possibilities of Mercurial today. This is not (only) a pun on “obsolete”, the functionality at the core of evolve which allows safe, collaborative history rewriting ☺

2.4 …together

Add a bad change. Followed by a good change. Pull both into another repo and amend it. Do a good change in the other repo. Then amend the bad change in the original repo, pull it into the other and evolve.

2.4.1 Setup

Now we change the format to planning a roleplaying session to have a more complex task. We want to present this as coherent story on how to plan a story, so we want clean history.

note: the revisions numbers are different because the other repo only gets those obsolete revisions which are ancestors to non-obsolete revisions. That way evolve slowly cleans out obsolete revisions from the history without breaking repositories which already have them (but giving them a clear and easy path for evolution).

Now I amended my commit, but my history does not look good, yet. Actually it looks evil, since I have 2 heads, which is not so nice. The changeset under which we just pulled away the bad change has become unstable, because its ancestor has been obsoleted, so it has no stable foothold anymore. In other DVCSs, this means that we as users have to find out what was changed and fix it ourselves.

Changeset evolution allows us to evolve our repository to get rid of dependencies on obsolete changes.

As you can see, he is told that his changes became unstable, since they depend on obsolete history. No need to panic: He can just evolve his repo to be state of the art again.

But the unstable change is the current working directory, so evolve does not change it. Instead it tells us, that we might want to call it with `–any`. And as it is the case with most hints in hg, that is actually the case.

Now trying to amend history will fail (except if we first change the phase to draft with `hg phase –force –draft .`).

cd testother
hg amend -m "change published history"# change to draft
hg phase -fd .
hg phase .
# now we could amend, but that would defeat the point of this section, so we go to public again.
hg phase -p .
cd ..

abort: can not rewrite immutable changeset 058175606243
7: draft

Once I pull from that repo, the revisions which are in there will also switch phase to public in my repo.

So by pushing the changes into a publishing repo, you can get the Mercurial of all contributors to track which revisions are safe to change - and which are not. An alternative is using `hg phase -p REV`.

2.6 Fold

Do multiple commits to create a patch, then fold them into one commit.

But we actually only need one change, so make it one by folding the last 4 changes changes into a single commit.

Since fold needs an interactive editor (it does not take -m, yet), we will leave that out. The commented commands allow you to fold the changesets.

cd testmy
# hg fold -r "-1:-4"# hg log -G -r 9:cd ..

2.7 Split

Do one big commit, then split it into two atomic commits.

Now I apply the scenes to wishes, places and people. Which is not useful: First I should apply them to the wishes and check if all wishes are fullfilled. But while writing I forgot that, and anxious to show my co-gamemaster, I just did one big commit.

Let’s fix that: uncommit it and commit it as separate changes. Normally I would just use `hg record` to interactively select changes to record. Since this is a non-interactive test, I manually undo and redo changes instead.

note: We use graft here, because using a second amend would just change the changeset in between but not add another change. If there had been more changes after the single followup commit, we would simply have called evolve to fix them, because graft -O left an obsolete marker on the grafted changeset, so evolve would have seen how to change all its children.

That’s it. All that’s left is finishing plan.txt, but I’ll rather do that outside this guide :)

3 Conclusion

Evolve does a pretty good job at making it convenient and safe to rework history. If you’re an early adopter, I can advise testing it yourself. Otherwise, it might be better to wait until more early adopters tested it and polished its rough edges.

note: hg amend was subsumed into hg commit –amend, so the dedicated command will likely disappear.

PS: In case you’re interested: The roleplaying session worked out wonderfully and a good deal of our planning actually survived the contact with the players - enough that we could wing the rest with short coordination meetings in which we two game masters enthusiastically told each other what happened in the respective group, planned the next steps and enjoyed the evil gamemasters giggle ☺.

note: This guide was created by Arne Babenhauserheide with emacsorg-mode and turned to html via M-x org-export-as-html - including results of the evaluation of the code snippets.

Track your scientific scripts with Mercurial

If you want to publish your scientific scripts, as Nick Barnes advises in Nature, you can very easily do so with Mercurial.

All my stuff (not just code), excempting only huge datasets, is in a Mercurial source repository.1

Whenever I change something and it does anything new, I commit the files with a simple commit (even if it’s only “it compiles!”).

With that I can always check “which were the last things I did” (look into the log) or “when did I change this line, and why?” (annotate the file). Also I can easily share my scripts folder with others and Mercurial can merge my work and theirs, so if they fix a line and I fix another line, both fixes get integrated without having to manually copy-paste them around.

For all that it doesn’t need much additional expertise: The basics can be learned in just 15 minutes — and you’ll likely never need more than these for your work.2

Update 2013: Nowadays I include the revision of scripts I use in the name of their output files or folders, so I always know which version of my scripts I used to create some result.

concise commit messages

When I look up a commit, I’m not searching for prose. I’m searching for short snippets of information I need. If they are long-winded explanations, I am unlikely to even read them.

To understand this, please imagine coming back home, getting off the bike and taking 15 minutes to look at the most recent pull-request. You know that you’ll need to start making dinner at 19:00, so there is no time to waste. You look into the pull-request and the explanations are longer than the code changes. You can either read half the explanations or just look at the code. So you try to understand what the code does and what it intends to do from the code alone. After 15 minutes you post a partial review and start cooking. Next slot for code review is tomorrow evening, or maybe next friday. The pull-request lies open for several weeks while more changes pile up.

Contrast that with short commit messages: You look into the pull-request. The commit message gives you the intention of the change (“sounds good”), maybe with a short note on non-obvious side-effects of the implementation, and you skim the code to see whether it realizes the intention. If it does and you don’t see problems which the writer might have overlooked: Great, code review finished. You write the review and go make dinner. The pull-request is merged the same week.

That’s why I’d suggest to just write short messages and put detailed explanations into a blog. If you like writing those explanations. That’s what you have a blog for, and you can search that later if you need these notes. If they are essential to understand effects of later changes, you might want to document them in a text file like HACKING or docs/devnotes.txt.

workflow concept: automatic trusted group of committers

Goal

A workflow where the repository gets updated only from repositories
whose heads got signed by at least a certain percentage or a certain number of trusted committers.

Requirements

Mercurial, two hooks for checking and three special files in the repo.

The hooks do all the work - apart from them, the repo is just a normal
Mercurial repository. After cloning it, you only need to setup the hooks to
activate the workflow.

Extensions: gpg

Hooks: prechangegroup and pretxnchangegroup

Files: .hgtrustedkeys , .hgbackuprepos , .hgtrustminimum

concept

Hooks

prechangegroup: Copy the local versions of the files for access in the
pretxnchangegroup hook (might be unnecessary by letting the pretxnchangegroup
hook use the rollback-info).

pretxnchangegroup:

per head: check if the tipmost non-signature changeset has been
GnuPG signed by enough trusted keys.

If not all heads have enough signatures, rollback, discard the
current default repo and replace it with the backup repo which has the most
changesets we lack. Continue discarding bad repos until you find one with
enough signatures.

Special Files

.hgtrustedkeys contains a list of public GnuPG keys.

.hgbackuprepos contains a list of (pull) links to backup repositories.

.hgtrustminimum contains the percentage or number of keys from which a signature is
needed for a head to be accepted.

Notes

With this workflow you can even do automatic updates from the repository. It
should be ideal for release repositories of distributed projects.

If you want to work on the project, a very worthwhile goal might be implementing it in infocalypse: anonymous code collaboration via Freenet and Mercurial, built to survive the informational apocalypse (and any kind of censorship).

Politics and Free Licensing

Being unpolitical
means being political
without realizing it.
— Arne Babenhauserheide