Almar's bloghttp://almarklein.org/2017-06-21T00:00:00+02:00The thee language problem, and how Web Assembly will help solve it2017-06-21T00:00:00+02:002017-06-21T00:00:00+02:00Almartag:almarklein.org,2017-06-21:/n_language_problem.html<p>Historically, languages are either easy to use <em>or</em> fast. Julia has
shown us that we can have both. I argue that in this day and age, we
may <em>also</em> aspire a language to run on the web and mobile devices. I'll
explain how I think that Web Assembly will bring us closer to this goal.</p>
<p>Historically, languages are either easy to use <em>or</em> fast. Julia has
shown us that we can have both. I argue that in this day and age, we
may <em>also</em> aspire a language to run on the web and mobile devices. I'll
explain how I think that Web Assembly will bring us closer to this goal.</p>
<p><img alt="turtoise and rabit" src="images/turtoise_and_rabit.jpg">
<br /><small><i>
Image from "St. Nicholas" (no known restrictions), via Flickr
</i></small></p>
<hr>
<h2>The two language problem</h2>
<p>The programmer John Ousterhout <a href="https://en.wikipedia.org/wiki/Ousterhout%27s_dichotomy">stated in 1998</a>
that high-level programming languages tend to fall into two groups, one being fast
but hard to write, and one being easy, but slow.</p>
<p>Ousterhout meant this as a way of saying "use the right tool for the job", but
this idea has also become known as <em>the two language problem</em>. The point is that
programmers, especially in science and engineering, write their initial code
in an "easy language", because they <a href="why_dynamic.html">need its dynamic nature</a>
and abstractions; the problems they work on are hard enough already.
When such code is to be used in real applications, it will often have to be re-written
in a language that is faster (and perhaps more robust).</p>
<h2>Consequences</h2>
<p>Having to rewrite code in another (harder to use) language complicates things a lot,
because the people who wrote the original code are often not capable of writing
in another language, and the people who are, do not always sufficiently understand
the problem that the code solves.</p>
<p>I believe that this effect is a significant inhibiting factor in turning
scientific innovation into real world solutions. At university I've
seen countless times how research is done, the (PhD) student finishes,
and the research code is never touched again.</p>
<p>This problem has been one of the reasons why I moved from Matlab to Python. In this
case it was not so much about speed, but about the fact that Matlab is terrible
for building GUI applications and equally bad at distributing them (caused in part
by its proprietary license model). In Python I can write code, build a
GUI around it, freeze the app, and put it online for anyone to use.</p>
<h2>The three language problem</h2>
<p>A decade ago, the internet played a role in <em>distributing</em> apps, but (except for
a few annoying applets and Flash) apps were all desktop apps. With JavaScript
becoming much faster and better standardization (e.g. HTML5), things have changed a lot,
and the browser is now a common platform for <em>running</em> apps. For scientists,
this provides an opportunity to <a href="future_vis.html">publish results</a> and ideas in
interactive ways. If you can write JavaScript, that is. (Or CoffeeScript,
or TypeScript, or Elm; JavaScript kinda sucks, and many things have
been made to make it suck a bit less.)</p>
<p>We're at a point where many people use their phone more than their
laptop/desktop. Mobile apps are typically written in Java or C++, or
Swift for iOS. You could thus argue that we have a four or five language
problem. On the other hand, mobile apps can be created using web
technology too. So for the sake of argument, we'll stick to "the three language problem".</p>
<p>Summarizing, the three language problem states that we generally need one language for
each of the following tasks:</p>
<ul>
<li>Writing code easily, allowing an easy start for newcomers, quick development, and solving complex problems;</li>
<li>Writing code that runs fast;</li>
<li>Writing code that runs in the browser and mobile devices (i.e. is "safe").</li>
</ul>
<p>Granted, when Python was just being
created, it was hard to anticipate that web browsers would become such
an important platform to run apps on, nor that mobile devices would
become so ubiquitous. Similarly, we cannot anticipate what the digital world
will look like 20 years from now, and what N language
problems we will have then. But let's focus on solutions for today's problems.</p>
<h2>Partial solutions</h2>
<p>The <a href="https://julialang.org/">Julia language</a> has shown that a dynamic language can
also be fast (on par with C). However, I share the view of <a href="https://software-carpentry.org/blog/2015/06/why-i-am-not-excited-about-julia.html">Greg
Wilson</a>
that Julia is not as easy as it could be. I'm also sceptical about Julia's ability to run
in the browser. Julia leans quite heavily on C++, and its (scientific-oriented) standard library
is pretty big, dependending on several Fortran libraries, including code that embeds
machine instructions. This means that bringing Julia to the browser is not trivial and
likely results in large (i.e. slow to load) libraries. There are
<a href="https://github.com/JuliaLang/julia/issues/5155">discussions</a> in the
Github issue tracker though, about defining a smaller base library and/or
a "Julia lite".</p>
<p>In Python, many solutions are available for writing faster code, most notably
<a href="http://cython.org/">Cython</a> and <a href="http://numba.pydata.org/">Numba</a>. These projects
are amazing, allowing Python developers (including myself) to write code in a familiar style and
have it run fast too. However, in the greater scheme of things, these
feel like patches to overcome a fundamental limitation of Python.</p>
<p>I'm also <a href="http://flexx.readthedocs.io/">trying</a>
to make it easier to write code that targets the web in Python. And although we've
been building great things with this approach, it’s far from perfect.</p>
<hr>
<h2>Enter Web Assembly</h2>
<p>Lately, I've been <a href="https://github.com/almarklein/pywasm">looking into</a> Web Assembly (WASM).</p>
<p><a href="http://webassembly.org">WASM</a> is a new open standard developed by representatives from all major browsers. It is a low level binary format designed to be compact and run at native speed, while being memory-safe. WASM is primarily intended to run code in browsers, but will also run in <a href="http://webassembly.org/docs/non-web/">other environments</a> like desktop, mobile and more.</p>
<p>WASM's features make it an attractive intermediate
representation (or "platform"), also beyond the browser. For instance,
it's designed to be inherently "safe" (code being run is guaranteed to
not do things that it’s not supposed to), which is also great for mobile
apps. There are already solutions for running WASM on desktop, and I'd
not be surprised if someday there'll be a microprocessor that consumes
WASM directly.</p>
<h2>How WASM will help</h2>
<p>WASM can help relieve the three language problem in various ways. For instance, Python libraries can use it as a tool to make certain code run fast, similar to how Numba uses LLVM. I’m not sure how WASM compares to LLVM for this use-case, but I can imagine that WASM might be easier to target.</p>
<p>Similarly, certain Python tools could produce WASM to run specific code on the web. Python interpreters intended for the browser (like <a href="http://www.skulpt.org/">Skulpt</a> and <a href="https://www.brython.info/">Brython</a>) could make use of WASM to become (much) faster and more compact.</p>
<p>And, of course, what WASM is intended for, to compile C/C++/Rust programs to a format that can run in the browser. A similar approach can be used for Julia, although (as mentioned above) this is not trivial.</p>
<h2>Dreaming out loud</h2>
<p>The approaches mentioned above will provide many opportunities, but what if WASM is targeted more directly?
People have been dreaming of "one language to rule them all". Some say
<a href="https://www.wired.com/2014/02/julia/">Julia is it</a> is it. Others say
<a href="https://medium.com/@teerasej/javascript-one-programming-language-to-rule-them-all-5d10079068da">JavaScript</a>, which is a thought gives me the shivers.</p>
<p>I don't believe that there will ever be a language that is <em>the best</em> in
all fields.
But I can't help thinking ... what if there was a language that is designed to be easy to use,
dynamic and fast like Julia, <em>and</em> run anywhere?
Such language might be <em>great</em> for almost every field. And in my view, the best
chance of such a language right now, is one that directly targets WASM.</p>
<h2>Disclaimer</h2>
<p>Don't take any of the above as critique on any of the mentioned
projects. This post is written from a high-level (idealistic, perhaps
a wee bit naive) perspective. I know that creating a new language is
much more than inventing syntax and writing a compiler; communities and
package ecosystems take many years to build. But a man can dream!</p>Write Python 3, while supporting Python 2.72016-02-12T00:00:00+01:002016-02-12T00:00:00+01:00Almartag:almarklein.org,2016-02-12:/legacy_python.html<p>In this post I discuss an approach for writing code in Python 3,
and still support Python 2.7. I've recently used this approach in one
of my own projects. Most projects should get away with only minor
modifications and an automatic translation step during package-build.
However, there are some pitfalls (bytes/str) that might need special
attention.</p>
<p>I've always been a proponent of Python 3, and I have been bashing Python 2
(a.k.a. legacy Python) at many opportunities. To be fair, Python 3 was
<a href="https://www.youtube.com/watch?v=sUm876SoUPM">announced</a> just a few months
after I started using Python, so I did not have much existing code that
was holding me back. In fact I started writing Python 3 wherever the
dependencies allowed it, and indeed <a href="http://pyzo.org">the Pyzo IDE</a>
was written in Python 3 (almost) from the start. For most other
projects, though, I have written code that runs on both Python 3 and
Python 2.</p>
<p>In this post I discuss an approach for writing code in Python 3,
and still support Python 2.7. I've recently used this approach in one
of my own projects. Most projects should get away with only minor
modifications and an automatic translation step during package-build.
However, there are some pitfalls (bytes/str) that might need special
attention.</p>
<p>I've always been a proponent of Python 3, and I have been bashing Python 2
(a.k.a. legacy Python) at many opportunities. To be fair, Python 3 was
<a href="https://www.youtube.com/watch?v=sUm876SoUPM">announced</a> just a few months
after I started using Python, so I did not have much existing code that
was holding me back. In fact I started writing Python 3 wherever the
dependencies allowed it, and indeed <a href="http://pyzo.org">the Pyzo IDE</a>
was written in Python 3 (almost) from the start. For most other
projects, though, I have written code that runs on both Python 3 and
Python 2.</p>
<p><img alt="Silly walks grafity" src="images/sillywalksgrafity.jpg">
<br /><small><i>
Image by southtyrolean (CC BY 2.0)
</i></small></p>
<h2>Supporting both Python 3 and Python 2</h2>
<p>Initially, most projects started supporting Python 3 by making use of the
<code>2to3</code> tool, which translates the code (to make it compatible with
Python 3) during the installation. In other words, developers were
still writing in Python 2, but put some effort to support "early
adopters" of Python 3.</p>
<p>An approach that is increasingly popular is to write code that runs on both
Python 2 and 3 without the need for a translation step. Tools to help with this
are (amongst others) <a href="https://pypi.python.org/pypi/six">six.py</a> and
<a href="http://python-future.org/">python-future</a>.</p>
<p>I have been following this approach for several years. It's not even
that hard if you know what to watch out for. However, I am increasingly
annoyed by this approach, and I feel a great sense of satisfaction when
I write code for Python 3 only, when I can just write <code>super()</code> and
don't have to worry about <code>isinstance(x, basestring)</code>, etc.</p>
<p>Therefore, I think we should move to an approach of writing Python 3,
and support legacy Python (i.e. Python 2.7) where needed by making use
of an automated translation step at build time. It does not seem like
there are currently many projects that follow this approach, but my
guess is that we will be seeing this more, because Python 2 is going to
be depreciated by 2020.</p>
<h2>Why support Python 2</h2>
<p>Python 3 is obviously the future, and as a community we should try to
get rid of Python 2. By supporting legacy Python, we are holding back
progres, because we can not always make use of certain features (e.g.
<code>super()</code> or the matrix multiplication operator).</p>
<p>Making new (and useful) packages only available in Python 3 can be an
effective incentive for users to transition to Python 3, and thereby speeding
up the transition process of the scientific Python community.</p>
<p>At the same time, however, not supporting legacy Python can hold back the
adoption of a project, and there may be social reasons, e.g. an important
cliet still relies on Python 2. I don't think I would care working on
supporting legacy Python in unpaid time though...</p>
<h2>My experience</h2>
<p>I have been writing the <a href="http://flexx.readthedocs.org">Flexx project</a>
in Python 3 from the start. The project contains a few quite interesting
pieces, such as PyScript, a Python to JavaScript transpiler. We have
been adopting this transpiler in the <a href="http://bokeh.pydata.org">Bokeh project</a>
to allow users to write client-side callbacks in Python. The initial
approach (that I've been advocating) was to only support this functionality
on Python 3, and look at Python 2 in case there was demand. As Bryan
Van de Ven anticipated, one of the first user questions were about when this
functionality would be available in Python 2. </p>
<p>We agreed to make Flexx available in Python 2.7, but I did not want to
sprinkle the code with references to <code>basestring</code> or snippets from
<code>six.py</code>, so I set out to look at the approach to translate the code
at build time.</p>
<p>Some googling did not reveal many projects following this
approach (yet), but I did find a library called <code>lib3to2</code>, which more
or less promised to do what I needed. I was not quite happy
with some of the translations though (especially the handling of <code>bytes</code>
and <code>str</code>), and I found it quite slow. While the <code>lib3to2</code> library
supports Python 2.6 and even 2.5, I am only interested in 2.7, which means
some things could be done much simpler.</p>
<h2>The translate_to_legacy module</h2>
<p>I decided to write my own
<a href="https://github.com/almarklein/translate_to_legacy">translater module</a>,
that uses a tokenizer instead of a full AST parser. This means higher
speed and much simpler code, for a less detailed description of the
code, but it proved enough to do all the translations that I wanted
to do.</p>
<p>I decided to only target Python 2.7, which means that much less
translations are necessary. For instance, you can just keep <code>b'xx'</code>,
and there is a <code>bytes</code> class as an alias for <code>str</code>.</p>
<p>A brief overview of the applied translations:</p>
<ul>
<li>futures: at the top of each file <code>from __future__ import ...</code> was added, for
'print_function', 'absolute_import', 'with_statement',
'unicode_literals', and 'division'. The 'unicode_literals' means that
all string literals are unicode, so no need to add a "u" in front
of any strings.</li>
<li>cancel: if the code already contains <code>from __future__ import ...</code> with
any of the above names, it is assumed that the code is already made to
be compatible for Python 2 and 3. No translation is performed.</li>
<li>newstyle: classes that do not inherit from a base class, are made to
inherit from <code>object</code>.</li>
<li>super: use of <code>super()</code> is translated to <code>super(Cls, self)</code>.</li>
<li>unicode: translate <code>chr()</code> and <code>str()</code> to their unicode equivalents, and
<code>isinstance(x, str)</code> is translated to <code>isinstance(x, basestring)</code>.</li>
<li>range: translated to <code>xrange()</code>.</li>
<li>encode: <code>.encode()</code> and <code>.decode()</code> are translated to
<code>.encode('utf-8')</code> and <code>.decode('utf-8')</code>.</li>
<li>getcwd: <code>os.getcwd()</code> is translated to <code>os.getcwdu()</code></li>
<li>imports: simple import translations.</li>
<li>imports2: advanced import translation (e.g. <code>urllib.request.urlopen</code> -&gt;
<code>urllib2.urlopen</code>). In contrast to <code>lib3to2</code>, we only translate imports,
not the use of imported variable names.</li>
</ul>
<p>With these translations, and only minor tweaks to the code, things were
looking good.</p>
<h2>The loose ends</h2>
<p>There are however, a few situations that need special attention. Most
notably places that use <code>isinstance(x, bytes)</code>. I used this in one
spot to determine whether a filename or actual data was provided. But
on legacy Python a <code>str</code> is the same as <code>bytes</code>, so it would work
incorrectly. In this case I wrote a little function to check whether
the given input looked like a filename or not (e.g. no zero bytes and
no newlines). It's not 100% bulletproof, but good enough. (On Python3
it obviously still just checks whether the input is <code>bytes</code>.)</p>
<p>Another situation that needs special attention is the creation of
metaclasses, e.g. <code>type('MyName', bases, {})</code>. On Python 2, the name cannot
be unicode. Together with the 'unicode_literals', this causes problems.
It's easily resolved, but it takes a few extra lines of code.</p>
<p>Likewise, setting environment variables must be done with <code>str</code> objects, and
not <code>unicode</code> in Python 2. </p>
<h2>Using this approach in your projects</h2>
<p>To adopt this approach in another project you need to:</p>
<ul>
<li>add <a href="https://github.com/almarklein/translate_to_legacy/blob/master/translate_to_legacy.py">one module</a>
to the root of your project.</li>
<li>in <code>setup.py</code>
<a href="https://github.com/zoofIO/flexx/blob/v0.3/setup.py#L54-L69">invoke the translation</a>
at build time.</li>
<li>in the root <code>__init__.py</code> add
<a href="https://github.com/zoofIO/flexx/blob/v0.3/flexx/__init__.py#L32-L33">two lines</a>
to make legacy Python use the translated code.</li>
<li>optionally add more translations by subclassing <code>LegacyPythonTranslator</code>.</li>
</ul>
<p>You might need to make a few small modification to make the translations
work correctly, or resolve tricky situations like <code>isinstance(x, bytes)</code>.</p>
<p>Note that the code is still a single-source distribution, so if your code
is pure Python, you can e.g. still build universal wheels, or noarch
conda packages.</p>
<h2>Summary</h2>
<p>If you want to write in Python 3, but still support legacy Python, using
a translation step at build time seems like a viable solution.
If you limit legacy support to Python 2.7, the number of required
translations is certainly feasible, though you may have to make small
adjustments to your code, e.g. to make any import translations work
appropriately. Further, be aware where <code>bytes</code> are used explicitly.</p>
<p>Other than that, you can just write in Python 3! Well, mostly anyway,
some features like function annotations, <code>async</code> and matrix multiplication
cannot be translated. But no worries, by 2020 we can forget about Python 2
once and for all!</p>We need more visualization libs - and a protocol to bind them2015-10-20T00:00:00+02:002015-10-20T00:00:00+02:00Almartag:almarklein.org,2015-10-20:/future_vis2.html<p>We have a rich ecosystem of visualization libraries, each with their
own API. By splitting our libraries in a user-facing part and a
rendering backend, and defining a standard to allow all these to
connect, we can have a rich visualization ecosystem while users only
have to learn one API.</p>
<p>We have a rich ecosystem of visualization libraries, each with their
own API. By splitting our libraries in a user-facing part and a
rendering backend, and defining a standard to allow all these to
connect, we can have a rich visualization ecosystem while users only
have to learn one API.</p>
<p><img alt="Escher's study of reptiles" src="images/escher_reptiles.jpg">
<br /><small><i>
Image by Escher via Wikimedia (fair use)
</i></small></p>
<hr>
<h2>The problem</h2>
<p>As I mentioned in a <a href="http://www.almarklein.org/future_vis.html">previous post</a>,
we seem to be getting different visualization libraries which are each
good at a particular task. I mentioned Bokeh and Vispy in particular:
the former does very well in the browser, but it can't do 3D. The latter
has awesome performance by making use of the GPU, but it does not work
(well) in the browser. And neither can export vector images like
Matplotlib does. The fact that we get more and more cool visualization
libraries is awesome, but it's not making it easier for our users, which
are usually researchers who just want to get things done.</p>
<p>Imagine a scientist who works with 3D data, who makes some plots
that should be shared online, which also need to export to EPS for a
paper. This person would now need to learn three different visualization
libraries.</p>
<p><em>The main problem is that users need to learn different API's to perform
different visualization tasks.</em></p>
<h2>Building a monolithic library that is good at everything is <em>not</em> a good idea</h2>
<p>The solution I proposed earlier was basically to discuss building a new
library that has it all. After thinking this over, I think that's a
terrible idea:</p>
<ul>
<li>It will be a tremendous effort, if only to get the right people together.</li>
<li>Building something that can do what Vispy can do, and at the same
time render in a browser like Bokeh can, would be challenging, and
would result in a complex design.</li>
<li>It's impossible to include all potential features. Even if we could
create something that has all the features people need today, tomorrow
someone will need something else, and the design might not be flexible
enough to add such a feature.</li>
</ul>
<p>I think that there is a better approach to fix the problem. One
that is less crazy, and allows for a much easier transition. </p>
<h2>We need a split</h2>
<p>If we separate the user API and the rendering part of a visualization
system, and define a protocol for the communication between the two,
it should be possible to connect different rendering systems to the
same front-end (i.e. user API); the user only needs to learn how to work
with one library, and can switch between rendering backends depending
on the needs (e.g. rendering to SVG, rendering something 3D, or
rendering in a browser).</p>
<p><img width=450 src='../images/vis_split.svg' /></p>
<p>By defining a formal protocol and separating user-targeted front-ends
from rendering back-ends each part can focus more on one particular
task. Front ends focus on allowing the user to effectively spelling out
a visualization, while backends focus on rendering a visualization in a
particular way, supporting a particular set of rendering primitives
(e.g. Vispy supports volumes, Matplotlib supports pie charts).</p>
<p>I suspect that the number of front-ends will first expand and then
settle on a few common libraries. For the rendering backends, I think
we would see a much broader array of options, including tools to support
very specific visualizations.</p>
<h2>State of the art</h2>
<p>This idea is not new. <a href="https://github.com/ellisonbg/altair">Altair</a> for
instance, proposes something very similar. Also Bokeh is somewhat
organized in these two parts, except that the protocol is not
standardized.</p>
<p>However, the above projects are very much aimed at visualizing the
relation between 2 or more 1D data structures (i.e. plotting and
charts). The visualization of 3D data, let alone fancy specific data
structures seems an afterthought.</p>
<p>Further, these approaches are based on the serialization of a scene
into one static representation (e.g. a JSON structure). What about
interaction? What about <em>changes</em>? Bokeh does interaction with the
Python process via the backbone model and AJAX to update data, but can
we not make this part of the protocol?</p>
<h2>Proposal</h2>
<p>Instead of defining a static data structure, I think we should formalize
a protocol to represent <em>changes</em>.</p>
<p>Here's a stab at what such a protocol might look like: consider a simple
core set of commands to create a tree of objects with attributes. Each
command consists of a message written as a "tuple". This tuple can exist
as an object, or be serialized so it can be send over a socket or saved
to disk. There are just three commands to manipulate objects:</p>
<ul>
<li><code>('create', type, object_id)</code></li>
<li><code>('delete', object_id)</code></li>
<li><code>('set', object_id, attr_name, value)</code></li>
</ul>
<p>The code that produces these commands should manage the object_id's of
the objects. For data we need something similar, allowing setting and
(partial) updating of data:</p>
<ul>
<li><code>('data_create', data_id, shape, dtype)</code></li>
<li><code>('data_delete', data_id)</code></li>
<li><code>('data_set', data_id, offset, bytes)</code></li>
</ul>
<p>The renderer can also communicate back (e.g. when the value of a slider has changed):</p>
<ul>
<li><code>('set', object_id, attr_name, value)</code></li>
</ul>
<p>These commands form a foundation, on top of which we can define a
standard set of "types" and their "attributes". E.g. we can define that the
"Line" type shows a line, and that it has attributes "width" and
"color". The formulation of the standard would be an ongoing process and should
probably involve a group of people with representatives from all major
visualization libraries.</p>
<p>Naturally, each rendering library can also specify its own types, e.g. Vispy
would have a objects with attributes to allow custom shading. Similar,
rendering libraries do not necessarily have to import each part of the
standard (e.g. Vispy may not support pie charts).</p>
<h2>An example</h2>
<p>Here is an example of how this protocol could like for a simple plot with
a red line:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="s1">&#39;create&#39;</span><span class="p">,</span> <span class="s1">&#39;Plot&#39;</span><span class="p">,</span> <span class="s1">&#39;plot1&#39;</span><span class="p">)</span>
<span class="p">(</span><span class="s1">&#39;create&#39;</span><span class="p">,</span> <span class="s1">&#39;Line&#39;</span><span class="p">,</span> <span class="s1">&#39;line1&#39;</span><span class="p">)</span>
<span class="p">(</span><span class="s1">&#39;set&#39;</span><span class="p">,</span> <span class="s1">&#39;line1&#39;</span><span class="p">,</span> <span class="s1">&#39;parent&#39;</span><span class="p">,</span> <span class="s1">&#39;plot1&#39;</span><span class="p">)</span> <span class="c1"># Make the line a child of the plot</span>
<span class="p">(</span><span class="s1">&#39;data_create&#39;</span><span class="p">,</span> <span class="s1">&#39;data1&#39;</span><span class="p">,</span> <span class="p">(</span><span class="mi">100</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span> <span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="p">(</span><span class="s1">&#39;data_set&#39;</span><span class="p">,</span> <span class="s1">&#39;data1&#39;</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="sa">b</span><span class="s1">&#39;000as...&#39;</span><span class="p">)</span>
<span class="p">(</span><span class="s1">&#39;data_create&#39;</span><span class="p">,</span> <span class="s1">&#39;data2&#39;</span><span class="p">,</span> <span class="p">(</span><span class="mi">100</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span> <span class="s1">&#39;float32&#39;</span><span class="p">)</span>
<span class="p">(</span><span class="s1">&#39;data_set&#39;</span><span class="p">,</span> <span class="s1">&#39;data2&#39;</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="sa">b</span><span class="s1">&#39;f7a41...&#39;</span><span class="p">)</span>
<span class="p">(</span><span class="s1">&#39;set&#39;</span><span class="p">,</span> <span class="s1">&#39;line1&#39;</span><span class="p">,</span> <span class="s1">&#39;x&#39;</span><span class="p">,</span> <span class="s1">&#39;data1&#39;</span><span class="p">)</span> <span class="c1"># Set x to data with id &#39;data1&#39;</span>
<span class="p">(</span><span class="s1">&#39;set&#39;</span><span class="p">,</span> <span class="s1">&#39;line1&#39;</span><span class="p">,</span> <span class="s1">&#39;y&#39;</span><span class="p">,</span> <span class="s1">&#39;data2&#39;</span><span class="p">)</span>
<span class="p">(</span><span class="s1">&#39;set&#39;</span><span class="p">,</span> <span class="s1">&#39;line1&#39;</span><span class="p">,</span> <span class="s1">&#39;color&#39;</span><span class="p">,</span> <span class="s1">&#39;#ff0000&#39;</span><span class="p">)</span> <span class="c1"># make the line red</span>
</pre></div>
<p>This is not as readable as a JSON structure would be, but the point is
that changes can be handled by just adding commands. Also, it's trivial
to transform a series of commands to a JSON structure, or the other way
around.</p>
<h2>Further thoughts</h2>
<p>Having the proposed protocol in place would mean that Bokeh and Vispy
can keep doing what they're good at, and let exporting to vector
graphics be handled by Matplotlib. New user-facing libraries like
<a href="http://ioam.github.io/holoviews/">Holoviews</a> would just work for all
available rendering engines. And new rendering engines like what
<a href="http://cyrille.rossant.net/compiler-data-visualization/">Cyrille Rossant proposes</a>
should just work without the user having to change the code. Also, I would
love to make a simple volume renderer in WebGL, which would allow
existing code for displaying a volume to be exported to static HTML.</p>
<p>Maybe we could eventually have some sort of auto-selection for the
backend, such that visualizations get shown in the backend that is most
capable of handling what the user spelled out. If that works well,
people could make dedicated renders for specific visualizations, which
would Just Work for any user.</p>
<h2>Summary</h2>
<p>Here's what I think we need to do to make Python's visualization ecosystem
more friendly and powerful:</p>
<ul>
<li>We should define a standard to describe visualizations.</li>
<li>We should split up our current visualization libraries in parts that
are either user-facing or rendering backend.</li>
<li>The protocol should describe changes, rather than a static scene.</li>
</ul>
<p>In this way, a user can learn just one API and still "have it all",
by using a different rendering backend for different needs.</p>The future of visualization in Python - are we going where we want to be?2015-08-02T00:00:00+02:002015-08-02T00:00:00+02:00Almartag:almarklein.org,2015-08-02:/future_vis.html<p>Bokeh and VisPy are both awesome projects. However, I wonder
whether we need to change where things are currently going. While Bokeh
is great at 2D and the browser, 3D is not supported. While Vispy is
super-fast and good at 3D and custom visualizations, it's support for
the browser is poor. I don't want to tell scientists that they need two
or three visualization libraries. I want it all in one library.</p>
<p>Bokeh and VisPy are both awesome projects. However, I wonder
whether we need to change where things are currently going. While Bokeh
is great at 2D and the browser, 3D is not supported. While Vispy is
super-fast and good at 3D and custom visualizations, it's support for
the browser is poor. I don't want to tell scientists that they need two
or three visualization libraries. I want it all in one library.</p>
<p><img alt="Escher's hand with reflectice sphere" src="images/escher_sphere.jpg">
<br /><small><i>
Image by Escher via Wikimedia (fair use)
</i></small></p>
<hr>
<h2>Pride and discomfort</h2>
<p>A short while ago, I was watching the talks from SciPy 2015 about
<a href="https://www.youtube.com/watch?v=c9CgHHz_iYk">Bokeh</a> by Bryan Van de Ven
and <a href="https://www.youtube.com/watch?v=_3YoaeoiIFI">VisPy</a> by Luke Campagnola.
Being involved in both these projects, I felt very proud at seeing how
both these projects are progressing. Both projects are awesome in their own way
and really change the way that scientists can do visualization in Python.</p>
<p>At the same time, however, I felt a certain sense of discomfort. At first
I was not sure why this was, but it was in part triggered by a question
from the audience at the end of the VisPy talk about whether the VisPy
and Bokeh projects can be combined in some way.</p>
<p>I don't think they can.</p>
<p>Bokeh is very much aimed at 2D plotting and does all its rendering in
JavaScript. VisPy has strong support to allow 3D visualization, but does not
have good browser support. Combining the two projects would essentially
mean a rewrite.</p>
<p>And this is the thing that bothers me; both projects offer something
great, but both also lack a fundamental feature that is essential to
becoming <em>the</em> visualization library.</p>
<h2>Bokeh</h2>
<p>Bokeh was written from the start to render in the browser. The drawing
system is implemented in CoffeeScript (which is compiled to JavaScript),
which means that interactions with the plot can work without a Python
(server) process.</p>
<p>I think this is essential for a visualization library of the future.
Scientists want to share their results using interactive visualizations,
or even small (dashboard) apps. HTML is the obvious medium to achieve
this.</p>
<p>Bokeh uses the 2D canvas for its drawing, which is pretty fast, but not
nearly as fast as WebGL. WebGL is currently being added to Bokeh,
but I'm not sure if we can reach the performance that WebGL might
ideally provide, because Bokeh was not designed to target WebGL from
the start.</p>
<p>There have also been discussions (triggered by user requests) about
adding support for 3D. I have mixed feelings about this, because 3D is very
much an afterthought. Sure, we could add support for a few specific
plot types, but users are going to ask for more. And 3D is always going
to suck, as it does in Matplotlib, because 3D was not a primary target
to start with. For proper 3D support, you need a good scene graph and
first class support for lighting, cameras, etc. You can build a good
2D plotting API on top of a 3D visualization system (if taken into
account from the start), but not the other way around.</p>
<h2>VisPy</h2>
<p>VisPy was written from the start to be a very fast and flexible
visualization system, to support 2D, 3D and any special visualization
that a user could think of.</p>
<p>I think this is essential for a visualization library of the future.
Scientists sometimes have weird data that needs to be visualized in
special ways, and 3D is nowadays not a rarity.</p>
<p>VisPy uses OpenGL for its drawing. We did target OpenGL ES 2.0 from the
start, to allow rendering in WebGL and on mobile devices, so we did
have the web in mind to some extend.
However, the browser support of VisPy is implemented at a low level and
goes more or less like this: the Python process sends OpenGL commands
(via a custom format) to the browser to make the visualization appear.
User input (mouse, keyboard) is captured and send to the Python process,
where VisPy's event system takes care of translating and zooming
camera's, etc., which causes an update, and thus new commands send to
the browser. This all works pretty well, except that the visualization
relies on the Python process. As a consequence, VisPy in the browser is not
as snappy as it should be, and it won't work in an exported HTML
document.</p>
<p>Cyrille Rossant has thought of some ideas on how to incorporate e.g.
camera models in JavaScript, but these solutions are complex and do not
scale well to user-defined interactions.</p>
<h2>I want it all</h2>
<p>Don't get me wrong: I think both Bokeh and VisPy are awesome. However,
the way things are going now, it looks like we'll be having one library
that's good at interactive 2D plotting via HTML, but sucks at 3D, and
one library that's good at 2D/3D/other visualizations, but sucks on the
web.</p>
<p>And this worries me.</p>
<p>Further, although both Bokeh and VisPy have plans to support svg/eps
export, for the time being Matplotlib is the way to go for publication
quality static images.</p>
<p>Imagine a scientist asking which visualization library she should use.
<em>"It depends ..."</em> What if she has 3D data, she wants to share
visualizations with her peers, and wants publication quality figures?
These are not unreasonable or rare requests, and we should be able to
answer: <em>"Sure, just use this one library!"</em></p>
<p>Here is my list of the utopian visualization library:</p>
<ul>
<li>Great at 2D, 3D and flexible enough for custom visualizations.</li>
<li>Leverages the GPU to allow large datasets and still render in real time.</li>
<li>Targets (or can target) the browser; visualizations in static HTML
should be fully functional and interactive.</li>
<li>Also works great on the desktop (though this can be achieved via
<a href="http://flexx.readthedocs.org/en/latest/webruntime/index.html">Xul/Electron/NW.js</a>).</li>
<li>Supports export to eps and svg.</li>
</ul>
<p>(Let me know if you think more points should be added.)</p>
<p><em>EDIT: In my <a href="http://www.almarklein.org/future_vis2.html">next post</a> I
explain that this is actually a bad idea, and that there are other ways
to achieve an easier workflow for our users.</em></p>
<h2>Can we have it all?</h2>
<p>I'm certain that we can have it all, but I won't pretend it's easy.
To realize such a system, all requirements need to be incorporated from
the start. Modifying either Bokeh or VisPy to support these will
essentially be a rewrite.</p>
<p>One requirement would be that the complete drawing system runs in
JavaScript. That does not sound like a fun implementation task, since
the scene graph, transformations, and camera/interaction models are
complex enough as it is.
Maybe we can use
<a href="http://flexx.readthedocs.org/en/latest/pyscript/">PyScript</a> to write
such a system in JavaScript using a Python syntax. That way we should
be able to reuse some of the code from VisPy.</p>
<p>A compromise from the point of view of Bokeh is that WebGL is not as
widely supported as the 2D canvas, although this is getting much better.</p>
<p>A compromise from the point of view of VisPy is that WebGL is slightly
slower than desktop GL and lacks certain features that are currently
supported in VisPy, such as integration with OpenCL. In
theory, the code could be written in a way to allows running it both
as Python and PyScript, which might solve these issues.</p>
<p>Finally, implementing such a system is a major undertaking. Are there
enough developers on the current projects that are interested in this?
Are there financial resources to get this off the ground? Should we
even try to create a single holistic visualization library?</p>
<p>I don't know. But I think we should at least discuss this.</p>
<h2>Side notes</h2>
<p>Khronos (the group responsible for the OpenGL standard) is working on a
new API to target the GPU: Vulcan. This API is lower level and simpler,
and intended for both visualization and compute. Cyrille Rossant wrote a
<a href="http://cyrille.rossant.net/compiler-data-visualization/">blog post</a> about
targeting it in VisPy. It is not clear yet whether Vulkan will be
available in the browser. If it will, I think we should probably use
that instead of WebGl.</p>
<h2>Final words</h2>
<p>I know that what I am proposing is a bit insane. Yet, if we do not take
some form of action, we'll be moving towards a future that is different
from what we need to serve scientists with a proper unified
visualization solution.</p>Performance gain for combining multiple GL objects into a single buffer2015-06-26T00:00:00+02:002015-06-26T00:00:00+02:00Almartag:almarklein.org,2015-06-26:/gl_collections.html<p>Rendering a set of 100.000 vertices with OpenGL is very fast. However,
rendering 100 sets of 1000 vertices is significantly slower (even though
the total number of vertices is the same). Therefore, in visualization
libraries, collecting multiple objects in a single buffer can help
increase performance. In this post I try to get a grip on how much this
really matters. Result: it depends.</p>
<p>Rendering a set of 100.000 vertices with OpenGL is very fast. However,
rendering 100 sets of 1000 vertices is significantly slower (even though
the total number of vertices is the same). Therefore, in visualization
libraries, collecting multiple objects in a single buffer can help
increase performance. In this post I try to get a grip on how much this
really matters. Result: it depends.</p>
<p><img alt="Escher's moebius band of birds" src="images/escher_birds.jpg">
<br /><small><i>
Image by Omega via Flicker (CC BY-SA 2.0)
</i></small></p>
<hr>
<h2>Introduction</h2>
<p>OpenGL (and WebGL for the browser) is the obvious choice for making fast
visualizations. It is capable of drawing tens of millions of points
while maintaining a framerate that is sufficient for interactive use.
Even without a beefy GPU, the intel integrated graphics is nowadays
sufficiently powerful to e.g. render volumes interactively.</p>
<p>In a visualization system there are sometimes many (small) objects
that need to be visualized. The program iterates over these objects and
draws them one by one, which adds overhead and thus reduces performance.</p>
<p>This is why we are working on implementing a technique in
<a href="http://vispy.org">Vispy</a> that allows multiple objects to share buffers,
and be drawn using only one GL call. Nicolas Rougier, who has been
doing most of the work in this direction, calls this technique
"collections".</p>
<h2>Are collections worth the effort?</h2>
<p>Recently, I was implementing a WebGL-based plotting system in JavaScript
for the <a href="http://bokeh.pydata.org">Bokeh</a> project, and I (obviously)
wanted good performance, so I thought about collections. However,
implementing collections also adds complexity to the code, and this
caused me to wonder how much we really need it.</p>
<p>The overhead consists of (at least) three sources:</p>
<ul>
<li>There is a for loop in the code. The significance of this depends
on the programming language (and its implementation). It's pretty bad
for an interpreted language like Python. For most JavaScript
implementations it should matter much less.</li>
<li>There is an overhead in the OpenGL API calls. In Python the calls go
via ctypes. On the browser its more direct, but on Windows they go
via the Angle library to translate the calls to DirectX.</li>
<li>There is an overhead in the driver. Swapping the current GL program
takes time.</li>
</ul>
<p>To what extend each source degrades performance is not very clear. The
only way to find out, of course, is to try!</p>
<h2>Method</h2>
<p>I wrote two simple scripts (one in Python and one in JavaScript) that
draw M lines of N points, and measure the FPS. The Python
code relies on vispy.gloo, and the JS code on a variant of gloo
implemented in JS. The code can be obtained from its
<a href="https://github.com/almarklein/gl_collections_bench">github repo</a>.</p>
<p>The scripts were run multiple times, while the M parameter was varied
from 1 to 500, increasing the number of GL programs (thereby simulating
the number of objects being drawn) while keeping the total number of
vertices the same.</p>
<p>This was done on a number of platforms and browsers. Most experiments were
performed with a total number of points of 100.000, and a smaller experiment
was done with 1 million points to validate if the trends hold up.</p>
<p>In a second experiment, the same measurement is performed, except that
for all M interations, we use the same GL program, attaching the i-th buffer
to it right before drawing.</p>
<p>All experiments were run on a laptop with an Intel i7-4710HQ CPU (2.50GHz)
and a GeForce GTX 870M GPU.</p>
<h2>Results</h2>
<p><img width=400 src='https://raw.githubusercontent.com/almarklein/gl_collections_bench/master/benchmark_collections_result1.jpg' />
<img width=400 src='https://raw.githubusercontent.com/almarklein/gl_collections_bench/master/benchmark_collections_result1_.jpg' /></p>
<p>The illustration above clearly shows that Python suffers the most as the number of
objects is increased, the FPS dropping below 10 at about 180 objects
on Linux. Interestingly, on Windows things are much better (though the
curve is equally steep). This can probably be explained by the fact the
GL drivers are better developed on Windows (as any gamer can tell you).</p>
<p>We can see how in the browser the framerate is limited to 60 FPS. This is
common practice for applications but explicitly turned off in Vispy.</p>
<p>For Firefox, the framerate stays at a steady 60 FPS, even for Intel
graphics. Wait what? I am not sure how to explain that ... This looks a bit
like a vsync thing.</p>
<p>For Chrome, the downward trend is less steep than it
is for Python. This can be explained by the fact that the overhead for
iterating over the objects is faster (because JavaScript is much
faster). What's left is mostly the overhead of the GPU driver itself.</p>
<p><img width=400 src='https://raw.githubusercontent.com/almarklein/gl_collections_bench/master/benchmark_collections_result2.jpg' />
<img width=400 src='https://raw.githubusercontent.com/almarklein/gl_collections_bench/master/benchmark_collections_result2_.jpg' /></p>
<p>This illustration shows the same trends when we render more vertices. However,
they are less strong; The cost of drawing one object is now higher,
causing the relative cost of the overhead to be smaller. In other words,
for more complex a visualizations, the overhead of rendering multiple
objects is smaller.</p>
<p>We also see that Firefox has a small dip at first, but then
remains steady at 30 FPS. Not sure what this means, but I think we should
ignore the Firefox measurements.</p>
<p><img width=400 src='https://raw.githubusercontent.com/almarklein/gl_collections_bench/master/benchmark_collections_result3.jpg' />
<img width=400 src='https://raw.githubusercontent.com/almarklein/gl_collections_bench/master/benchmark_collections_result3_.jpg' /></p>
<p>This illustration shows that by using multiple buffers against the same
GL program, the performance can be increased, though not by a lot.</p>
<h2>Conclusions</h2>
<p>It is clear that drawing M objects of N points is more costly
than drawing MxN points in one go. The effects are larger in Python
than they are in JavaScript, which can be explained by the fact that
JS is faster.</p>
<p>The relative importance of the overhead also depends on the complexity
of the visualization.</p>
<p>Finally, in this experiment we did not take into account the costs for
making collections happen: collections requires code to pack multiple
objects together. Typically, right before each draw, some code will be
run for each object, for example to prepare the collection, or to ensure
that it is up to date. How high this cost really is depends on how
efficiently the system is able to integrate the idea of collections.
And of course, implementing collections adds a maintenance burden.</p>
<p>From this we can state that collections are probably worth the effort
when implementing fast visualization of multiple simple objects in
Python. When implementing complex visualizations in JavaScript, this
is much less the case.</p>
<p>A marginal performance gain can be achieved by reusing the GL program
object, which will usually be easier to implement than sharing both
program and buffers.</p>
<p>Finally, these benchmarks were performed on just one machine, and different
results might be found on different hardware. Nevertheless, I believe
the trends should hold up.</p>Comparing methods for box-layout in HTML2015-04-02T00:00:00+02:002015-04-02T00:00:00+02:00Almartag:almarklein.org,2015-04-02:/html_boxlayout.html<p>This post describes a small experiment that compares a few methods for
doing a box-layout in HTML. On a variety of browsers the result was
validated, and performance measured. The results show that the CSS
<code>display: flex</code> method is the way to go.</p>
<p>This post describes a small experiment that compares a few methods for
doing a box-layout in HTML. On a variety of browsers the result was
validated, and performance measured. The results show that the CSS
<code>display: flex</code> method is the way to go.</p>
<p><img alt="Escher's impossible cube" src="images/impossible_cube.png">
<br /><small><i>
Image from Wikimida (CC BY-SA 3.0)
</i></small></p>
<hr>
<h2>About box-layout</h2>
<p>The box-layout model is a concept in GUI design that allows one to stack
multiple elements either horizontally (HBox) or vertically (VBox). Each
element can have a flex value (a.k.a. stretch factor) of 0 or more. A
flex of 0 means that the element should assume its "natural size". A
higher value indicates that it can assume a larger size; remaining space
is distributed among the elements by ratio of the flex values.</p>
<p>This model is a very common tool to layout widgets in an application
(e.g. Qt's <a href="http://doc.qt.io/qt-4.8/qhboxlayout.html">QHBoxLayout</a>).
However, HTML does not have a trivial way to achieve such a layout;
there are multiple possible solutions. This post tries to explore which
one is best.</p>
<h2>Natural size</h2>
<p>As mentioned above, each element will scale to its natural size or
larger. The natural size is determined by the content of the element.
E.g. for a button, it is the text on the button. But it also depends
on any children of the element (which could represent another box
layout). Solving this is complex.</p>
<p>A particularly problematic issue is that <em>in JavaScript there is no way
to measure the natural size of the HTML elements</em>. Internally, the
browser has information on the natural size of all elements, but it's
impossible to access this information from JavaScript. Therefore, one
wants to take natural size into account, needs to rely on "native"
HTML+CSS methods to achieve layout.</p>
<h2>The methods</h2>
<p>Below is a description of the three methods that were tested. For each
method two examples are provided: one simple version in the form of one
hbox with 3 elements, and one more complex version that consists of an
hbox with two vboxes that each have 4 hboxes with 3 elements. The latter
is thus an example of deep nesting as one might find in more complex
user interfaces.</p>
<h3>HTML Table</h3>
<p>In the old days, a table was used for many layout tasks, because it
was all there was. This method essentially comes down to putting the
elements inside a table like so:</p>
<div class="highlight"><pre><span></span><span class="c">&lt;!-- HBox implementation using a table element --&gt;</span>
<span class="nt">&lt;table&gt;</span> <span class="nt">&lt;tr&gt;</span>
<span class="nt">&lt;td&gt;</span> ELEMENT1 <span class="nt">&lt;/td&gt;</span> <span class="nt">&lt;td&gt;</span> ELEMENT2 <span class="nt">&lt;/td&gt;</span> <span class="nt">&lt;td&gt;</span> ELEMENT3 <span class="nt">&lt;/td&gt;</span>
<span class="nt">&lt;/tr&gt;&lt;/table&gt;</span>
</pre></div>
<p>One advantage is that the table can already do much of the layout,
especially for the horizontal direction. In the vertical direction it
needs some help from JavaScript when resizing.</p>
<p>The implementation used here is based on an earlier version of a UI
project that I'm working on. It uses buttons for elements, which is why
it looks a bit different from the other methods. But it's the layout
that matters. Also I did not add images to the layout here.</p>
<p>Links:
<a href='html/boxdemo_table1.html' target='new'>Simple table layout</a>,
<a href='html/boxdemo_table2.html' target='new'>Nested table layout</a></p>
<h3>CSS Box</h3>
<p>In 2009, the CSS <code>display: box</code> model was defined. This is probably the
most common method found on the web used for box-layout. However, it’s
more or less deprecated. To get this working, it is important to use
all the <code>-moz</code>, <code>-webkit</code> CSS prefixes (see the source of the linked pages).</p>
<p>Links:
<a href='html/boxdemo_box1.html' target='new'>Simple box layout</a>,
<a href='html/boxdemo_box2.html' target='new'>Nested box layout</a></p>
<h3>CSS Flex</h3>
<p>The CSS <code>display: flex</code> model is like next generation of <code>display: box</code>.
It is the latest iteration of the <a href="http://www.w3.org/TR/css-flexbox-1/">flexbox
model</a>, and should presumable
become <em>the way</em> to achieve box-layout in HTML. However, at the time
of writing it is still in draft.
To get this working, it is important to use all the <code>-moz</code>, <code>-ms</code>,
<code>-webkit</code> CSS prefixes.</p>
<p>Links:
<a href='html/boxdemo_flex1.html' target='new'>Simple flex layout</a>,
<a href='html/boxdemo_flex2.html' target='new'>Nested flex layout</a></p>
<h2>Results</h2>
<p>The tests were loaded in several different browsers and machines. Between
the big three (FireFox, Chrome and IE) some FPS measurements were taken.
These were taken on two machines: a modern Windows laptop, and a relatively old
laptop running Linux (thus no IE measurement).</p>
<style>
table.boxresults {
text-align: center;
padding: 10px;
}
table.boxresults td {
border: 1px solid #999;
border-width: 0px 1px 0px 0px;
padding: 2px 7px 2px 7px;
}
table.boxresults th {
text-align: center;
font-size: 1.1em;
border: 1px solid #444;
border-width: 0px 0px 1px 0px;
}
table.boxresults td.browser {
text-align: right;
}
</style>
<table class='boxresults'>
<tr>
<th></th><th>Table</th> <th>Box</th> <th>Flex</th>
</tr><tr>
<td class='browser'>Firefox 36/35:</td> <td>&#x2714 20/1 fps</td> <td>&#x2714 42/20 fps</td> <td>&#x2714 32/10 fps</td>
</tr><tr>
<td class='browser'>Chromium 41/40:</td> <td>&#x2714 55/28 fps</td> <td>&#x2714 55/45 fps</td> <td>&#x2714 60/40 fps</td>
</tr><tr>
<td class='browser'>IE 11:</td> <td>&#x2714 60/- fps</td> <td>fail</td> <td>&#x2714 60/- fps</td>
</tr><tr>
<td class='browser'>IE 10:</td> <td>~</td> <td>fail</td> <td>~</td>
</tr><tr>
<td class='browser'>IE 9:</td> <td>fail</td> <td>fail</td> <td>fail</td>
</tr><tr>
<td class='browser'>Qt Webkit:</td> <td>&#x2714</td> <td>&#x2714 </td> <td>&#x2714 </td>
</tr><tr>
<td class='browser'>Iceweasel (RaspPI):</td> <td>&#x2714 </td> <td>&#x2714 </td> <td>&#x2714 </td>
</tr><tr>
<td class='browser'>Ephiphany (RaspPI):</td> <td>&#x2714 </td> <td>&#x2714 </td> <td>&#x2714 </td>
</tr><tr>
<td class='browser'>Chromium (RaspPI):</td> <td>&#x2714 </td> <td>&#x2714 </td> <td>&#x2714 </td>
</tr><tr>
<td class='browser'>Firefox (mobile):</td> <td>&#x2714 </td> <td>&#x2714 </td> <td>&#x2714 </td>
</tr><tr>
<td class='browser'>Standard Android (mobile):</td> <td>&#x2714 </td> <td>&#x2714 </td> <td>fail </td>
</tr>
</table>
<p>Notes:</p>
<ul>
<li>On Linux, Chrome does not resize the content until you release the
mouse. In this case the developer mode was used to resize either width
or height.</li>
<li>On Raspberry Pi, all browsers do not resize the content until you
release the mouse.</li>
<li>The standard Android browser that was tested is from a rather old phone.</li>
<li>On IE10, the Flex method almost works, but it seems that things go wrong
when there is deeper nesting (the toplevel hbox only shows the left vbox).</li>
<li>With the Table method, the minimum size is not taken into account if
flex &gt; 0.</li>
<li>In the Box method the elements seem just a bit too small on Firefox.</li>
</ul>
<h2>Conclusions</h2>
<p>One could argue about how important performance really is. Resizing is
not something that happens all the time. On some browsers (in particular
on mobile devices) the resize event is not fired until you're done
dragging the window border. One can also imagine a hybrid approach where
the browser fires resize events during dragging, but not <em>all</em> the time.
Perhaps this is how IE gets its surprisingly high performance.</p>
<p>Even though performance may not matter that much, the performance of
the Table method is just terrible. In addition, it is not a true
box-layout because natural size is not taken into account when flex &gt;
0. Avoid using tables for layout. Really, don't use 'm.</p>
<p>The Box method works pretty well, but is not supported on IE and has
some issues on other browsers as well. On Firefox it seems slightly
faster than the Flex method, but this might be because it is cutting
some corners.</p>
<p>It’s good to see that the Flex method is so well supported. Even though
its specification is officially still in draft, it works on a wide range
of browsers. It's the clear winner according to this comparison.</p>Volume rendering in Vispy2015-01-28T00:00:00+01:002015-01-28T00:00:00+01:00Almartag:almarklein.org,2015-01-28:/volume_rendering.html<p>We recently added volume rendering to <a href="http://vispy.org">Vispy</a>. In
this post I'll describe the method that is used, what the advantages
of this method are, and possible future additions. I tried to be gentle
and explain the method without giving too much boring details. Plus
there is some fancy footage to demonstrate the new functionality.</p>
<p>We recently added volume rendering to <a href="http://vispy.org">Vispy</a>. In
this post I'll describe the method that is used, what the advantages
of this method are, and possible future additions. I tried to be gentle
and explain the method without giving too much boring details. Plus
there is some fancy footage to demonstrate the new functionality.</p>
<p>The method is heavily inspired on the one
used in the Visvis project. We made several improvements to it, such
as ray position refinements, and the ability to render from inside the
volume.</p>
<p>At the time of writing, the work is in a
<a href="https://github.com/vispy/vispy/pull/612">pull request</a>, pending for merge.</p>
<p><em>Edit: the PR got merged in March 2015</em></p>
<p><img alt="Escher's cubic space division" src="images/escher_cubic_space.jpg">
<br /><small><i>
Image by Escher (public domain)
</i></small></p>
<h2>A brief overview of shaders</h2>
<p>Before we start, let's briefly explain how executing code on the GPU works
in OpenGL. To make any visualization, you need to write
two programs: the vertex shader and the fragment shader. The vertex
shader gets executed for each <em>vertex</em>, i.e. each point in the
visualization. Here, you can transform the point, apply color, and
prepare information for the next stage: the fragment shader. The
fragment shader gets executed for each <em>fragment</em>, i.e. each pixel in
the output image that ends up on the screen. Here, you can look up
colors from a texture and do calculations to determine the final output
color and depth.</p>
<p>There are other shaders (e.g. the geometry shader), but in Vispy we
limit ourselves to the vertex and fragment shader to remain compatible
with WebGL.</p>
<p><img alt="glsl shaders" src="https://docs.google.com/drawings/d/1PVLVbCLYa6Q8K9SeFOIoFCHv5FZnMJ3dj5htKwleV_I/pub?w=600">
<br /><i> Image illustrating the vertex and fragment shaders. </i></p>
<h2>Setting things up</h2>
<p>In order for the shaders to work as we want, they need to be supplied
with the correct information. The most elemental data are the
vertices: the locations that form the base of the visualization. For
our volume rendering method, we supply the locations of
the 8 corners of the cube that we want to visualize. For each corner,
we also supply the texture coordinate, i.e. the corresponding location inside
the volume. In OpenGL terms, per-vertex data like the locations and
texture coordinates are called <em>attributes</em>.</p>
<p>Further, we also supply the shaders with a texture that contains the
volume data, the shape of the volume, and a parameter to determine the
step size. These are called <em>uniforms</em>. Other uniforms that are used,
but are not specific to volume rendering are the transformations to map
the vertex positions to screen coordinates depending on the camera
settings.</p>
<h2>Inside the vertex shader</h2>
<p>As is common in vertex shaders, we transform the vertex position so
that it ends up in the correct position on the screen. In other words,
the corners of the cube are projected onto the screen, depending on
the camera state.</p>
<p>In order to cast a ray through the volume, we need to know the ray
direction, which is influenced by the orientation of the volume, as well
as the position and orientation of the camera. We calculate the ray
direction in the vertex shader.</p>
<p>To calculate the ray direction, we map the vertex position to the view
coordinate frame. This differs in a subtle way from the screen
coordinate frame, because of the notion of viewboxes (i.e. subplots)
in Vispy. In this coordinate frame, we transform the point a little bit
forward, and then project it back to the coordinate frame local to the
volume. The ray direction is defined as the difference between this
new position and the original position.</p>
<p>This ray direction vector is thus calculated for each vertex, and then passed
to the fragment shader. In OpenGL a value that is send from the vertex
to the fragment shader is called a <em>varying</em>. The value of the ray
direction as received in the fragment shader is interpolated between
the vertices.</p>
<p><img alt="calculation of ray direction" src="https://docs.google.com/drawings/d/1_Y2fu3uwlPcz4gq9eW3XqPGvQDr3Hp727jkBNq0GfW8/pub?w=400">
<br /><i> Image illustrating the calculation of the ray direction. </i></p>
<h2>Taking care of perspective.</h2>
<p>If the camera uses orthographic (i.e. not-perspective) projection, then
the ray direction is the same for each vertex. This is not the case for
perspective projection: the ray direction is different for each corner of
the cube, and linear interpolation between these vectors will cause a
wobbly effect. To mitigate this problem, we simply use more vertices.
The number of vertices that is needed relates to the field of view. For
typical values, a subdivision of around 10 seems sufficient.</p>
<p><img alt="Vertex subdivision" src="https://docs.google.com/drawings/d/1hPVFFdugrRo9dCta1Obb548ovUJTclgdfIZlTig4qrc/pub?w=200">
<br /><i> Image illustrating the additional vertices on the front-facing planes of a cube. </i></p>
<h2>The size of our steps</h2>
<p>Although the ray direction is known, we should still scale the vector
to determine the size of the steps. We take steps of approximately the
voxel size, multiplied with the user settable <code>u_relative_step_size</code>.
Higher values will typically yield prettier results, at the cost of
performance. Later in this post we discuss a trick to get good
results with relatively large steps. The main point is that we should
not completely step over voxels, because we might miss important
structure in the volume.</p>
<h2>How far can you go?</h2>
<p>Now that the ray casting vector is fully determined, there is but one
thing to calculate before we can step through the volume inside the
fragment shader: the number of steps. </p>
<p>There are several approaches for calculating the number of steps. A
common method is to first render the backfaces to a texture (using an
FBO), thereby creating a depth map that you can use during raycasting.
Our method uses a more direct approach, which does not need a FBO, and
probably has a better performance (I haven't any hard data on that though).</p>
<p>In the fragment shader, we have a texture coordinate which corresponds
to the start location of the ray (on the edge of the cuboid). This is
the starting point of the ray casting. We also know the direction
through the volume. We calculate the distance from this starting point
to each of the six planes (i.e. faces) of the volume, using a simple
mathematical formula. This formula yields a negative distance if the
plane is behind the vector. The number of steps is the minimum distance,
discarding negative values.</p>
<p>We apply a trick by defining all planes a <em>tiny</em> bit to the
outside, so that the plane that the start position is on will be
behind the ray, leading to a distance of -1, and will thus be discarded.</p>
<p>Further, it's important to set the wrapping property of the texture to
clamp (and not repeat), so that moving half a step outside the volume
won't yield wrong results.</p>
<video width="400" controls>
<source src="http://www.almarklein.org/images/vispy_volume_nsteps.mp4" type="video/mp4">
Your browser does not support HTML5 video.
</video>
<p><i>Video showing the number of steps encoded in color (brighter is more steps)</i></p>
<h2>Inside the volume</h2>
<p>A nice feature of this method is that you can put the camera <em>inside</em> the
volume. Instead of taking the front-facing planes of the volume as a
starting point, we use the back-facing planes of the volume and the
front-facing planes are discarded. In effect, the volume can be rendered
also from inside the volume, and it should be easier to render other
objects inside the volume (e.g. segmentation results).</p>
<p>To allow this, however, we need to take the clipping plane of the camera
into account. So in addition to the six planes that we test to determine
the number of steps, we also test the plane that is at the camera
position. You can see in the video above how the number of steps
decreases (i.e. the color becomes less bright) as the camera moves
further inside the volume.</p>
<h2>The casting of the ray</h2>
<p>From here the casting of the ray is relatively simple. In a loop from zero
to <code>nsteps</code>, we increase our texture coordinates with the ray vector.
At each iteration we sample the value from the 3D texture and process
it. The kind of processing that we apply depends on the rendering style that
is used. In maximum intensity projection (MIP) we simply remember the highest
intensity. In isosurface rendering, we cast the ray until the intensity
above a certain threshold value, and then do the lighting calculations.</p>
<p>In a post-processing stage, we re-cast a fraction of the ray in very small
steps around the depth of interest. Thereby we refine the ray, such
that we can do with relatively large step sizes and still get consistent
and pretty results.</p>
<video width="600" controls>
<source src="http://www.almarklein.org/images/vispy_volume_grid.mp4" type="video/mp4">
Your browser does not support HTML5 video.
</video>
<p><i>Video showing the volume rendering in action </i></p>
<h2>Future improvements</h2>
<p>There are several ideas to bring the implementation further:</p>
<ul>
<li>More render styles. See e.g. <a href="https://code.google.com/p/visvis/wiki/example_volumeRenderStyles">this visvis example</a></li>
<li>Colormapping, to give these gray MIP's a more appealing and useful appearance.</li>
<li>Allowing for anisotropic data, useful for medical images.</li>
<li>Use a 2D texture to store the 3D data, so that our volume rendering can be used in WebGL, e.g. in the IPython notebook.</li>
<li>Adopt techniques like shadows (see the upcoming WebGL Insights for an example).</li>
<li>Use adaptive step sizes to realize increased performance.</li>
<li>The use of <a href="http://graphics.cs.kuleuven.be/publications/BLD14OCCSVO/">octrees</a> allows rendering massive datasets in realtime.</li>
</ul>New task: don't forget to organize your ideas and knowledge too!2014-11-19T00:00:00+01:002014-11-19T00:00:00+01:00Almartag:almarklein.org,2014-11-19:/getting_things_done.html<p>Like many people, I use todo lists to organize my tasks. I’ve tried
different todo-list solutions and even made a few apps myself. In this
post I try to explain what I learned from these tools, why organizing
knowledge and ideas may be more important than organizing tasks, and
why I like <a href="http://trello.com">Trello</a> so much.</p>
<p>Like many people, I use todo lists to organize my tasks. I’ve tried
different todo-list solutions and even made a few apps myself. In this
post I try to explain what I learned from these tools, why organizing
knowledge and ideas may be more important than organizing tasks, and
why I like <a href="http://trello.com">Trello</a> so much.</p>
<p><img alt="The Arctic Council planning a search for Sir John Franklin" src="images/search_for_Sir_John_Franklin.jpg">
<br /><small><i>
Image by Stephen Pearce (public domain)
</i></small></p>
<hr>
<h2>Organizing knowledge</h2>
<p>Some tasks assigned to you originate from someone else. But (I hope)
that there are also many tasks that originate from yourself. Where do
tasks come from in that case? Most tasks are part of a bigger plan,
which was in its turn formed from ideas, ideals, and a certain purpose.</p>
<p>Managing tasks is only one side of the medal. You may be very efficient
in executing your tasks, but if you neglect the origin - your goals
and ideals - you live an empty life.</p>
<p>I believe everyone has a purpose, and in essence this purpose is to make
yourself useful, by doing something that you love doing. And the main
goal in life, as I see it, is to find that purpose.</p>
<p>Starting with a purpose, you develop ideas, and by combining ideas you
create plans, which in turn lead to concrete tasks towards achieving
your goals. Organizing your knowledge, ideas and plans can help you
during this process of creation.</p>
<h2>Tools to manage stuff</h2>
<h3>The common todo list</h3>
<p>Many todo lists are a simple one-dimensional lists of tasks. In many cases
this suffices.
I used to kept track of tasks using a text file (in my dropbox folder
for easy access). I've also tried
<a href="https://www.gmail.com/mail/help/tasks/">Google tasks</a>,
<a href="https://www.rememberthemilk.com/">Remember the milk</a>,
<a href="https://evernote.com/">Evernote</a>, and others.</p>
<p>Although most implementations allow keeping track of multiple lists, this
approach provides little means to <em>structure</em> tasks and set priorities.
I do think common todo lists can be useful. But the most useful
variant is often just a piece of paper to write down the things planned
for this day or week. </p>
<h3>Priority vs importance</h3>
<p><img alt="my Eisenhower-inspired task app" src="images/screen_gtasks2d.png"></p>
<p>When I learned about it, I liked the distinction between priority and
importance that is promoted by the
<a href="http://en.wikipedia.org/wiki/Time_management#The_Eisenhower_Method">Eisenhower matrix</a>.
I liked how this method uses two dimensions to structure tasks.
Therefore I made a small Python+Qt application based on this idea. It
uses Google tasks as a back-end, so that tasks can be managed from
anywhere. All tasks sit on a 2D canvas. The more to the right, the more
urgent it is, the more to the top, the more important it is.
It did give a nice overview of tasks, but while using it, it quickly
became cluttered with tasks, and at some point I needed to move tasks
around to prevent them from overlapping. I thought about filtering tasks
based on tags, but this method just did not seem to scale up very well.</p>
<p>This was also when I started to realize that what I wanted was to
organize my <em>thoughts</em>, of which organizing tasks is just one aspect.
I wanted to keep notes and a
<a href="https://medium.com/the-writers-room/8d6e7df7ae58">spark file</a>).
These do not really fit in an Eisenhower matrix,</p>
<h3>Filter, don't sort</h3>
<p><img alt="my filter-based task app" src="images/screen_notes.png"></p>
<p>The next app I made was inspired by <a href="http://todotxt.com/">todo.txt</a>. I wanted
to keep it simple and always accessible. Therefore the app stored the data
in a text file stored in my Dropbox folder.
It uses a simple <a href="https://bitbucket.org/almarklein/notes">protocol</a>
that is human readable. By default, the app shows all
items in chronological order. By writing a tagname in the search field,
the app will show only tasks marked with this tag. The same search field
allows plain-text searching through my all notes/tasks/ideas, or
selecting only tasks or ideas.</p>
<p>This solution has worked quite well for me for a year or so, until I
learned about something better ...</p>
<p><em>edit (June 2017): After hardly using Trello for almost 2 years, I've revived this little app again, and use it almost daily</em></p>
<h3>Hello Trello!</h3>
<p><img alt="Trello web app" src="images/screen_trello.png"></p>
<p>I've been using <a href="http://trello.com">Trello</a>, for a while now and
absolutely love it. Trello is brilliant in its simplicity; it offers
simple tools to organize basically anything you want. This is at the
same time its biggest pitfall, because you have to think about how you
are going to organize things. In fact, I had tried Trello before, but
then thought that it did not meet my needs. Only when I read about it
again and tried it more seriously, I realized its true power.</p>
<p>With Trello, you can organize information in four levels:</p>
<ul>
<li>The user or <strong>organization</strong> (multiple people can be part of an
organization).</li>
<li>Each user (or organization) can have multiple <strong>boards</strong>. Only one board
is shown at a time. Many things like access rights are also managed
on the board level.</li>
<li>Each board has <strong>lists</strong>, which are organized horizontally. You can
move lists around by dragging their title.</li>
<li>Each list contains (vertically stacked) <strong>cards</strong>. Again you can
move cards around (also to other lists) by dragging.
The card is the unit element in Trello. By clicking on a card you see
the "back" of the card, where you can write details, create checklists,
set due dates, have discussions, attach files and assign people.</li>
</ul>
<p>For myself, Trello is in many ways what I was looking for:</p>
<ul>
<li>It uses the horizontal dimension very effectively for providing structure.</li>
<li>It's flexible enough to structure tasks as you like.</li>
<li>It can help you organize not only tasks, but also thoughts/ideas/notes/etc.</li>
<li>You can easily share boards with other people.</li>
<li>It has a restful API (which I thankfully used to inject all
my existing tasks, notes and ideas). Also it prevents lock-in.</li>
<li>It has a really good smartphone app.</li>
<li>It's free.</li>
</ul>
<h2>How to use Trello</h2>
<p>Trello can be used in many different ways. You'll have to experiment to
find out what works for you. Fortunately it is easy and intuitive
to move things around. Here are some ideas:</p>
<ul>
<li>By default, a board has three lists: "todo", "doing", "done". This provide
a good starting poing for many projects.</li>
<li>I have a board for general tasks where I have a list for stuff that
I should someday do, a list for things that need attention soon,
and a list for things that are urgent.</li>
<li>In the image above you see the Trello board that I use to "plan" my
blog. On the left is a list of ideas. Stuff that touches me or I think is
important. At some point I turn one or more of these ideas into a plan for
a blog post, which I start writing: I move the card to the second
list. When done and published, I move the card to the final column.</li>
<li>People use it for SCRUM development: each card is an issue. Developers
are assigned to the issue and the issues travels from one list to another
as it get different states (e.g. in progress, fixed, under review, verified).</li>
<li>Sometimes a board represents a project. Sometimes it may be a way to
organize knowledge. Or a means to communicate with customers.</li>
</ul>
<p>For more example see e.g.
<a href="http://blog.trello.com/trello-is-now-trello-inc/">this post</a>.
Above all, I encourage you to just try it. Create lists and cards, don't
be afraid to move things around to search for a better structure.</p>
<h2>Conclusion</h2>
<p>Organizing your tasks is great to be productive, but it's equally
important to organize your knowledge and ideas, so that you are being
productive in the right direction. For short-term stuff I still prefer
pencil and paper. But for everything else, I think Trello does a really
good job.</p>Scientists need a dynamic programming language2014-09-20T00:00:00+02:002014-09-20T00:00:00+02:00Almartag:almarklein.org,2014-09-20:/why_dynamic.html<p>Dynamic programming languages provide great advantages due to
their interactive workflow, especially in science where algorithms are
complex and take many iterations to get right. Developer time is more
important than CPU time; writing all your code in a static language is
(often) a bad case of premature optimization.</p>
<p>This post is a story about how I learned the importance of dynamic
languages the hard way. I am sharing it here so that others might learn
from it too. It also touches on some of the benefits of Python compared
to Matlab.</p>
<p>Dynamic programming languages provide great advantages due to
their interactive workflow, especially in science where algorithms are
complex and take many iterations to get right. Developer time is more
important than CPU time; writing all your code in a static language is
(often) a bad case of premature optimization.</p>
<p>This post is a story about how I learned the importance of dynamic
languages the hard way. I am sharing it here so that others might learn
from it too. It also touches on some of the benefits of Python compared
to Matlab.</p>
<p><img alt="A Starry night" src="images/starrynight.jpg">
<br /><small><i>
Image by Van Gogh (public domain)
</i></small></p>
<hr>
<h2>Searching for an alternative to Matlab</h2>
<p>Like most PhD students at my university, I started out with Matlab to do
my programming. I liked it a lot, and over time got quite
experienced. I also used Matlab at home for a few hobby projects,
but for that I needed to resort to an illegal version, which always
felt a bit awkward.</p>
<p>As I grew more adept using Matlab, I also started to realize some of
its fundamental
<a href="http://www.pyzo.org/python_vs_matlab.html#the-problem-with-matlab">quirks</a>
stemming from its origin as a matrix manipulation package.
What finally drove me into looking for alternatives was the fact that
I was working with dynamic CT data, and I wanted to visualize dynamic
volumes together with segmentation results. Good luck with that in
Matlab.</p>
<p>Near the end of 2007, when I was almost a year in my PhD project, a
colleague and myself started looking for alternatives, and were quite
impressed with <a href="http://vtk.org">VTK</a>. Since we could not use it from
Matlab, we considered using C++ or C#. We eventually settled on C#,
because it is a very pleasant language overall, and makes creating GUI's
very easy. We used a C# wrapper for VTK that someone had fortunately made
available.</p>
<h2>Getting serious with C-sharp</h2>
<p>In February 2008 two other colleagues joined us and we started to design
a framework where algorithms could be represented as building blocks
with predetermined input and output ports (similar to VTK). One big
motivator was that students and ourselves could build such building
blocks which could then easily be reused by others. We hoped that this
would solve the all-too-frequent problem that code developed by students
was usually lost and forgotten after the student left. The combination
of C# with VTK offered us a number of advantages:</p>
<ul>
<li>Code reuse</li>
<li>Speed</li>
<li>3D visualization</li>
<li>Easy building of GUI's </li>
<li>Create portable apps</li>
<li>Object oriented code</li>
<li>Free (as in beer)</li>
</ul>
<p>That seemed like a pretty good list. Full of
enthusiasm and high expectations we started implementing this
"framework", which we had called "POT" (I can't remember what the acronym
meant, but it was unrelated to cannabis). We put a lot of energy into
it and things were looking pretty good. We could interface with VTK and
Matlab, and we even had a UI in which you could drag blocks around and
connect them to each-other. We had high hopes and started to suggest
our ideas to other members in our group.</p>
<h2>Realizing that C-sharp was not it</h2>
<p>Then came the moment that I started using the framework myself, by
porting some Matlab code to C#. I remember that it was a
Friday. I quickly ran into problems with porting my algorithms. I was
used to a rapid-prototyping approach: make a small script with a minimal
example and then expand and develop it further while repeatedly
executing it in the running process. Not in C#. Create a new project,
add references, build a small GUI, etc. And then running … if it does
not work (which is usually the case the first X times you try) so one
needs to debug. Find the bug, exit debugger, fix bug, run again, repeat
...</p>
<p>That's when I realized that I had tremendously underestimated the
flexibility of Matlab. This realization came as a lightning strike.
That Friday was the last day that I used our framework (and C# for that
matter). The months of effort turned out to be a waste of time. I learned
an important lesson though, and I never forgot it.</p>
<h2>Defeat ... and victory!</h2>
<p>Reluctantly I went back to Matlab. I started experimenting with Matlab's
new functionality for classes, and I did some experiments to use OpenGL
via DirextX (I had already discarded VTK due to it's sheer
size and verbose API). About two weeks later I ran into the <a href="http://vnoel.wordpress.com/2008/05/03/bye-matlab-hello-python-thanks-sage/">blog
post</a>
that changed my life.</p>
<p>After learning about <a href="http://python.org">Python</a> I experienced the same
symptoms of being in love (don't tell my wife) and I barely slept for
3 days. I realized very soon that Python had been exactly what I was
looking for. It ticked all the boxes.</p>
<p>Python is a beautiful language. After using Python for a while, many
things in Matlab seemed very awkward. Like indexing, string
manipulation, building GUI's, creating classes. Even the fact that you
need one file for each function definition seemed so primitive in
retrospect! There has not been a single moment when I regretted the
move to Python.</p>
<p>On the other hand, the scientific ecosystem was a bit immature at that
moment, which meant that I had to get my hands dirty and become active
in developing tools. I am still doing this today, and I love doing it!</p>
<h2>What I learned</h2>
<p>The important lesson that I learned is that the list mentioned above
misses one very important element: interactivity. I realize now that
for scientific computing a highly interactive workflow is incredibly
beneficial for your productivity. You can work with your code
in a very direct and effective manner, by allowing you to execute code
in a running process, redefine functions without having to reload your
data, and introspect many aspects of your code during runtime. Dynamic
languages like Matlab provide this, compiled languages like C# do not.</p>
<p>I think this realization has also been an important factor in the
development of <a href="http://iep-project.org">IEP</a>. Many ideas were in fact
inspired by how Matlab allows code execution, like being able to execute
a cell (a piece of code between two double comments <code>##</code>).</p>
<h2>Languages side by side</h2>
<p>So let's have a look at the list again (the complete version this time),
and see how C#, Matlab and Python compare:</p>
<ul>
<li><strong>Code reuse:</strong> this could have worked in our framework, except that the learning
curve is too steep to let our students work with C#. Code reuse in
Matlab is always a bit messy, since you have to fiddle with paths.
In Python we have packages and modules.</li>
<li><strong>Speed:</strong> yes, C# is faster than Matlab and Python, but there are many
ways to deal with this. Matlab has Mex-files, Python has
<a href="http://www.cython.org/">Cython</a>, <a href="http://numba.pydata.org/">Numba</a>
and other solutions. Choosing a compiled language as a primary
development tool may just be a bad case of premature optimization, because
an interactive workflow increases development speed (and fun), which
is often more important than a high runtime speed.</li>
<li><strong>3D visualization:</strong> I started making friendly Python wrappers to OpenGL. This culminated
in <a href="http://code.google.com/p/visvis/">Visvis</a> and now <a href="http://vispy.org">Vispy</a>.</li>
<li><strong>Easy building of GUI's:</strong> good in C#, terrible in Matlab, in Python we
have Qt, WX, and others.</li>
<li><strong>Create portable apps:</strong> Matlab has the MCR but this is not very
reliable, pretty nor compact. On Python you can freeze your code into
a compact standalone application.</li>
<li><strong>Object oriented:</strong> available in Matlab. Very nicely done in Python. </li>
<li><strong>Free:</strong> Python is free as in speech and as in beer. Matlab definitely not.</li>
<li><strong>Dynamic:</strong> Python and Matlab are both dynamic languages that allow
an interactive workflow. </li>
</ul>
<h2>Conclusion</h2>
<p>For most scientists it would be most productive to use a dynamic
language, and only use a compiled language (or other tools) to make the
crucial bits of code faster.</p>
<p>Python is currently one of the best choices, although statisticians may
also like <a href="http://www.r-project.org/">R</a>. Further, there are very
interesting languages being developed right now (e.g.
<a href="http://pypy.org">Pypy</a> and <a href="http://julialang.org">Julia</a>), and I
suspect there will be more of that in the future.</p>The power of post-mortem debugging2014-09-03T00:00:00+02:002014-09-03T00:00:00+02:00Almartag:almarklein.org,2014-09-03:/pm-debugging.html<p>Post-mortem debugging refers to the concept of entering debug mode
<em>after</em> something has broken. There is no setting of breakpoints
involved, so it's very quick and you <em>can</em> inspect the full stack
trace, making it an effective way of tracing errors.</p>
<p>This post explains some of the benefits and how it can be used from
IEP and other environments.</p>
<p>Post-mortem debugging refers to the concept of entering debug mode
<em>after</em> something has broken. There is no setting of breakpoints
involved, so it's very quick and you <em>can</em> inspect the full stack
trace, making it an effective way of tracing errors.</p>
<p>This post explains some of the benefits and how it can be used from
IEP and other environments.</p>
<p><img alt="Autopsy" src="images/autopsy.jpg">
<br /><small><i>
Image by Rembrandt (public domain), via Wikimedia Commons
</i></small></p>
<hr>
<h2>Using post-mortem debugging in your workflow</h2>
<p>If you have an error in your code, there are several ways to debug:</p>
<ul>
<li>Trial and error: modifying a suspicious piece of code.</li>
<li>Add <code>print(x)</code> to your code to see the value of important objects.</li>
<li>Set breakpoints.</li>
<li>Post-mortem debugging.</li>
</ul>
<p>These approaches do not exclude one-another, and I often use a
combination of methods to find the source of a problem.</p>
<p>The greatest benefit of post-mortem debugging is that you can use it
directly after something has gone wrong. With all other methods, we
must first find out where (approximately) the error occurred, so that
we can place our breakpoints or print-statements at the proper location.</p>
<p>I have post-mortem debugging bound to <code>CTRL-P</code> in IEP; as soon as I see
a red traceback appearing in the shell, I press the shortcut and I
immediately see what went wrong and where. In some cases the error can
be fixed at once. In some cases not, and I need some of the other
methods too.</p>
<p>In most environments the experience is quite similar to debugging with breakpoints,
except you don't have to set breakpoints. Apart from not being able to
continue execution using step/step-in/continue, you can inspect the
namespace of all frames in the call stack. And you can run code in
these namespaces as usual (e.g. to test whether a modified version
of the offending line would work correctly).</p>
<p><img alt="post-mortem debugging in IEP" src="images/iep_screen_pmdebug.png"></p>
<h2>How you can use post-mortem debugging</h2>
<p>Post-mortem debugging is a method that requires an environment that
provides dynamic execution of code. When an error occurs in such an
environment, rather than stopping the process, an error message is
displayed and the process returns to a REPL (interactive prompt) of
some kind. As such, it is a method that works mostly with dynamic
languages.</p>
<p>Whether or not you can use this method depends on the environment and
the available tools. Here's a (very incomplete) list:</p>
<ul>
<li>In <a href="http://iep-project.org">IEP</a>, click <code>shell &gt; Postmortem: debug from last traceback</code>.
You can bind it to a key-combination for easier access.</li>
<li>The builtin Python debugger can enter post-mortem debugging using <code>pdb.pm()</code>.</li>
<li>Matlab can be configured to enter post-mortem debug mode when an
exception occurs.</li>
<li>(let me know if you how to use it in other environments)</li>
</ul>
<h2>How it works (in Python)</h2>
<p>Python has special placeholders for the last exception that occurred in
<code>sys.last_traceback</code>, <code>sys.last_value</code> and <code>sys.last_type</code>. A debugger
can grab these (especially the <code>last_traceback</code> to reconstruct the call
stack up until the point where the exception was raised.</p>
<p>The <code>sys.last_traceback</code> object is an object that can be used to
traverse through all frames on the call stack. These frames are
essentially a snapshot of the execution state of your program at the
moment that the exception occurred; the local variables in each function
up until the exception can be inspected. (A side effect of this is that
any (possibly large) objects that are present in the call-stack are not
cleaned up.)</p>
<p>So where do get these <code>sys.last_*</code> get their values from? When an
exception falls all the way through (i.e. an unhandled exception) the
interpreter catches it. It sets these values, and prints out the
familiar (red) error message to the shell.</p>
<p>When a debugger wants to enter post-mortem debugging, it can use
the <code>sys.last_*</code> variables to gain access to all there is to know about
that particular exception.</p>
<h2>How to handle an exception and still allow post-mortem debugging</h2>
<p>Sometimes you want to handle an exception by printing a warning/error,
but still allow entering post-mortem debugging for that error. This can
be particularly useful for event-driven applications (we use it in some
parts of <a href="http://vispy.org">vispy</a> too).</p>
<p>The function to use is <code>sys.exc_info()</code>. It should be called in the
scope where an exception is handled, and then returns information about that
exception. To handle an exception and still allow post-mortem debugging, you
can use the following code:</p>
<div class="highlight"><pre><span></span><span class="k">try</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">except</span> <span class="ne">Exception</span><span class="p">:</span>
<span class="c1"># get traceback and store (for post-mortem debugging)</span>
<span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">tb</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">exc_info</span><span class="p">()</span>
<span class="n">tb</span> <span class="o">=</span> <span class="n">tb</span><span class="o">.</span><span class="n">tb_next</span> <span class="c1"># Skip *this* frame</span>
<span class="n">sys</span><span class="o">.</span><span class="n">last_type</span><span class="p">,</span> <span class="n">sys</span><span class="o">.</span><span class="n">last_value</span><span class="p">,</span> <span class="n">sys</span><span class="o">.</span><span class="n">last_traceback</span> <span class="o">=</span> <span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">tb</span>
<span class="k">del</span> <span class="n">tb</span> <span class="c1"># Get rid of it in this namespace</span>
<span class="c1"># Now do however you wanted to handle the error</span>
<span class="o">...</span>
</pre></div>
<h2>Post-mortem debugging in IEP</h2>
<p>Until late 2013, IEP did not support breakpoints. But it has had
post-mortem debugging for a long time. The suggested means of debugging
was to place <code>1/0</code> in your code, so that you can quickly go to a certain
position and inspect the stack trace. IEP now does have breakpoints,
which are definitely an enrichment, but not a replacement for
post-mortem debugging.</p>
<p>Once in debug mode, the shell will move to the top frame of the stack
and the editor will open the associated file at the right line number.
In the shell widget, you can use the stack button to easily navigate
through the frames. Click the little cross to the left of it to exit
debug mode. (See the image near the top of this article.)</p>
<h2>Summary</h2>
<p>Post-mortem debugging is a simple and very quick way to start the
debugging process, and it's easy to use from IEP.</p>The importance of open source software in science2014-07-30T00:00:00+02:002014-07-30T00:00:00+02:00Almartag:almarklein.org,2014-07-30:/open.html<p>For the first post in this blog I wanted to write something that really
matters. At least to me. I tried to get at the core drivers behind what
I do: what are the fundamental reasons why I love open source software
so much?</p>
<p>Here's why I think that open source is <em>necessary</em> to improve/fix the
current scientific system and to guarantee our freedom to seek knowledge.</p>
<p>For the first post in this blog I wanted to write something that really
matters. At least to me. I tried to get at the core drivers behind what
I do: what are the fundamental reasons why I love open source software
so much?</p>
<p>Here's why I think that open source is <em>necessary</em> to improve/fix the
current scientific system and to guarantee our freedom to seek knowledge.</p>
<p><img alt="Cat anatomy" src="images/catanatomy.jpg">
<br /><small><i>
Image by Internet Archive Book Images (no restrictions), via Wikimedia Commons
</i></small></p>
<hr>
<h2>Open source software is rising</h2>
<p>When I started using Python in 2008, the scientific Python stack was
not very mature yet. So although I believed that Python was a great
tool for scientists, I did not feel that I could recommend it to my
colleagues or students yet.</p>
<p>Wow, a lot has changed since then! The development of the scientific
Python community is increasing fast, and with that the development of
the scientific packages. Many packages can be considered quite mature
now. We have also established the <em>Scipy stack</em>, a set of packages that is
considered the core of the Scipy ecosystem.</p>
<p>Right now, I would definitely recommend Python to anyone who wants to
do something related to scientific computing, imaging or data analysis ...</p>
<p>At the same time, other very interesting open source projects are being
developed that enrich the scientific ecosystem. For example, Julia is
a promising language that borrows ideas from Python and Matlab.</p>
<h2>Not only software</h2>
<p>The rise of open source software is not an isolated occurrence. It goes
hand in hand with the increasing demand for open access publications
and open standards.
People start to realize that publications funded with public money should
not be used for monetary profit for a small group of people; stuff paid
for by the public should benefit the general public.</p>
<h2>Reproducibility of scientific results</h2>
<p>A similar trend occurs for the reproducibility of scientific results.
Any scientific publication that talks of results that cannot be
reproduced by others is essentially worthless. The definition of science
is quite clear about that (from Wikipedia):</p>
<blockquote>
<p>Science is a systematic enterprise that builds and organizes knowledge
in the form of testable explanations and predictions about the
universe.</p>
</blockquote>
<p>In order to make scientific results reproducible, you need (at least)
three things:</p>
<ul>
<li>The data on which the analysis was done should be publicly available</li>
<li>The code to perform the analysis should be publicly available</li>
<li>The tools necessary to run the code should be publicly available</li>
</ul>
<p>Only if these three conditions are met, a scientific result can be
considered reproducible. Only then can others verify the results and
ensure there are no flaws in the analysis. And only then can we reliably
build further on top of these results.</p>
<p>Sadly, there are currently not many publications that meet these
criteria. But... now that powerful open source tools are available,
there are few technical limitations that prevent improvement in this
area. Further, the open source culture in which sharing of code and
knowledge is considered normal can also help solve this problem.</p>
<h2>Equality</h2>
<p>Anyone should be able to participate in the scientific process. No
matter where you live. No matter your financial situation, scientific
results should be publicly and freely available. And the same goes for
the tools to produce such results.
If you are in a university, you may have access to tools and publications
via licenses that your university pays for. But many universities don't have
the financial capacity for this. And even if they were, one should not have
to rely on a university for this.</p>
<h2>Broadening the audience</h2>
<p>Not so long ago, data analysis happened mostly in C and Fortran; one
needed to be an adept programmer in order to do data analysis. This has
changed with the coming of e.g. Python and (granted) Matlab.
Nevertheless, the art of scientific computing is still dominated by
hard-core coders.</p>
<p>With digital data becoming more ubiquitous, more (non computational)
scientists rely on the analysis of data to obtain results. Therefore,
there is a need to open up the art of scientific programming to a
(much) broader audience.</p>
<p>The bottom line is that we need to make it easier to get started:</p>
<ul>
<li>It must be easy to get Python &amp; packages (Anaconda, Pyzo and WinPython
help here)</li>
<li>It must be easy to do the programming itself (we need to further improve
IPtython, IEP, Spyder, etc. )</li>
<li>It must be easier for newcomers to find their way around
(<a href="http://scipy.org">scipy.org</a> helps)</li>
<li>Last, but not least, we must <em>educate</em> people to learn how to use these
tools</li>
</ul>
<p>Things are improving in this area as well, but there is still a lot of
work to do.</p>
<h2>The freedom to seek knowledge</h2>
<p>Let's take it one step further. I think it should be more common
(also in our everyday lives) to apply data science. Instead of just believing
what other people tell us, we should analyse the available data and
decide for ourselves what is true and what not. Whether this concerns
important scientific subjects or just practical everyday issues, it
will help us make <em>informed</em> decisions based on our own results.</p>
<h2>Summary</h2>
<p>To improve our current scientific system, we need good open source
tools, and we need to make these tools easy enough for all scientists
to use; we need to make science more accessible.</p>
<p>But we also need a shift in culture. One in which openness is the norm
and in which we can and will verify the results of others. So that
rather than relying on second hand information, anyone can seek
knowledge for himself. Because you can only truly believe something when
you've seen it with your own eyes.</p>
<h2>Further reading</h2>
<ul>
<li>
<p>Fernando Perez also speaks about the importance of open source in
science, e.g. in <a href="http://blog.fperez.org/2013/11/an-ambitious-experiment-in-data-science.html">this blog post</a>.</p>
</li>
<li>
<p>Lorena Barba's recent <a href="http://lorenabarba.com/blog/why-i-push-for-python/">blog post</a>
about teaching Python at the university.</p>
</li>
<li>
<p>Cyrille rossant's <a href="http://cyrille.rossant.net/why-using-python-for-scientific-computing/">blog post</a>
that lists additional reasons for using Python in science.</p>
</li>
</ul>