October 21, 2014

This might look bewildering at first. It’s a custom hg wip command, a variation of hg log. You can see the DAG on the left, what looks like commit summaries in white, and what looks like usernames in mauve. This is not what stock Mercurial looks like, but in my opinion, this looks great. It has so much information packed into such a small space, all colour-coded for your convenience. Let’s take a look.

Current commit

The current commit is simply where you’re standing. It’s the commit that hg diff or hg status would use as reference for telling you how the working copy has changed. All that I did for hg wip was to highlight more clearly what this commit is. Note in the DAG that this commit is marked with an “@”.

Bookmarks

I have chosen to display bookmarks in green. Recall that bookmarks are simply movable markers that follow along with the commits that they’re pointing to. They’re very similar to git branches. Here I am using them to refer to each of my feature branches. There is a special “@” bookmark that indicates where upstream development is.

Branches

In cyan we see the branch names. In this case there are only two commits on the stable branch. Branch names are permanent names tattooed on commits. This is useful for when you want to have a permanent record of where a commit was done.

Phases

This part is one of my favourites. Mercurial phases indicate whether commits have been published yet or not, and whether they should be shared or not. Commits that have been published and therefore immutable are in orange. The commits I’ve been working on but haven’t been accepted into upstream yet are yellow drafts. Finally, a couple of local patches that should never be shared are secret commits in blue. I am using local revision numbers to identify commits, but it would also be possible to use global hashes.

Selective about which commits to show

Another cool thing about my hg wip command is that it only shows what I consider interesting information about my work-in-progress commits, hence the name of the command. To be precise, it only shows my draft and secret commits, current open heads of development, and the minimum common commits. No other intermediate commits are shown. Actually, usually I don’t even need the extra newlines, and I prefer this more compact format:

So, how is this command built? It actually depends on a number of different Mercurial features, all useful on their own. All of the following snippets go into your ~/.hgrc config file.

Whoa! That’s even worse than the revset. Unfortunately, due to the nature of the templating engine, inserting whitespace to make this more readable would also make this extra whitespace show up in the final output.

If we are willing to define more intermediate templates and move them to an external template file, it can actually be made readable:

Okay, this is a bit better, but it requires using an extra config file. The syntax is a bit tricky, but most of it is self-explanatory. Special keywords in {braces} get expanded to their corresponding values. There are a few functions like if() and ifcontains() to selectively output extra information if that information exists. For example, I use ifeq() to hide the branch name if this name is “default”.

Once you’re in a function, keywords are already available. For example, rev gives you the revision number, and if you wanted global hashes instead, you could change that to node or shortest(node). You can even use a revset in the templating engine! In this case, for selecting the current commit and comparing it to the revision being printed.

. Most modern terminals can support it, but sometimes you have to specify the $TERM enivronment variable to be xterm-256color or something similar. Then you can define your own colours from the 256 available, referring to the numbers on this chart. If you dislike my colour choices, this is where you can configure them.

Define the command

The final part is to just turn this on:

[alias]wip= log --graph --rev=wip --template=wip

This defines the hg wip command to be an alias to hg log together with the parameters for displaying the graph, using the wip revset, and the wip template.

And voilà, fancy shmancy colourful hg command! Here is the complete addition to your ~/.hgrc all at once, for delicious copypasta:

[revsetalias]wip=(parents(not public()) or not public() or . or head()) and (not obsolete() or unstable()^) and not closed()

August 27, 2014

Dense Output

Specifying specific output times for the solution, should not affect the internal time steps that the solver uses. The basic idea of the Dense Output concept is to provide the solution at a given time \(s \in [t, t+dt]\) with the same order of accuracy as the solutions computed at the internal time points by using suitable interpolation methods.Up to now only linear interpolation was performed and this significantly lowered the accuracy if a higher order solver was used.I then implemented a series of interpolation function:

linear_interpolation:

x_out = linear_interpolation (t, x, t_out)

Given the time span \(t=[t_0, t_1]\) and the function values \(x=[x_0, x_1]\), it returns the linear interpolation value \(x_{out}\) at the point \(t_{out}\).

quadratic_interpolation:

x_out = quadratic_interpolation (t, x, der, t_out)

Given the time span \(t=[t_0, t_1]\), the function values \(x=[x_0, x_1]\) and the derivative of the function at the point \(x_0\), it returns the quadratic interpolation value \(x_{out}\) at the point \(t_{out}\).

hermite_cubic_interpolation:

x_out = hermite_cubic_interpolation (t, x, der, t_out)

Given the time span \(t=[t_0, t_1]\), the function values \(x=[x_0, x_1]\) and the derivatives of the function at both points \(x_0\) and \(x_1\), it returns the 3rd order approximation \(x_{out}\) at the point \(t_{out}\) by performing Hermite interpolation.

hermite_quartic_interpolation:

x_out = hermite_quartic_interpolation (t, x, der, t_out)

Given the time span \(t=[t_0, t_1]\), the function values \(x=[x_0, x_{1/2}, x_1]\) (where \(x_{1/2}\) is the value of the function at the time \(t_0+dt/2\)) and the derivatives of the function at the extremes \(x0\) and \(x1\), it returns the 4th order approximation \(x_{out}\) at the point \(t_{out}\) by performing Hermite interpolation.

dorpri_interpolation:

x_out = dorpri_interpolation (t, x, k, t_out)

This interpolation method is specific for the Dormand-Prince Runge-Kutta scheme. Given the time span \(t=[t_0, t_1]\), the function value \(x=x_0\) and the vector \(k\) with the function evaluations required in the Dormand-Prince method, it returns the 4th order approximation \(x_{out}\) at the point \(t_{out}\). For more information on the method have a look at Hairer, Noersett, Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems (pag. 191-193).

hermite_quintic_interpolation:

x_out = hermite_quintic_interpolation (t, x, der, t_out)

Given the time span \(t=[t_0, t_1]\), the function values \(x=[x_0, x_{1/2}, x_1]\) and the derivatives of the function at each point, it returns the 5th order approximation \(x_{out}\) at the point \(t_{out}\) by performing Hermite interpolation.

These methods are then used to perform the Dense Output according to the order of the solver chosen. This is the piece of code in integrate_adaptive.m that performs the interpolation:

% if there is an element in time vector at which the solution is required% the program must compute this solution before going on with next stepselseif( vdirection*z(end) > vdirection*tspan(counter) )% initializing counter for the following cycle i = 2; while ( i <= length (z) )

% if new time requested is not out of this interval if ((counter <= k) && (vdirection*z(end) > vdirection*tspan(counter))) % update the counter i++; else % else, stop the cycle and go on with the next iteration i = length (z) + 1; endif

endwhile

endif

It is important to notice that:

The 1st order approximation doesn't require any additional function evaluation.

The 2nd order approximation may require the evaluation of the function at the current time. This can be avoided if the stepper already returns that value.

The only 3rd order solver implemented is ode23. The 3rd order approximation exploits the Runge-Kutta \(k\) values to avoid further function evaluations.

There are no 4th order schemes as yet implemented. However if ones were to use ode45 without local extrapolation then the dorpri_interpolation function can be used to obtain a 4th order approximation without any additional function evaluation. For any other 4th order scheme the hermite_quartic_interpolation function can be used.

For the 5th order method ode45, Shampine proposes to obtain a 4th order approximation at the middle point and to use quartic interpolation. It is however possible to directly do quintic interpolation but this require an additional function evaluation without (according to Shampine) a significant improvement.

For the higher order solvers (ode78), a suitable interpolator has not yet been implemented.

Finally, since I wrote the interpolation functions in such a way that they are independent of the number of output points requested, a possible improvement would be to compute all the values of the dense output between \(t\) and \(t+dt\) all at once instead of one value at a time.

August 18, 2014

During this week I have been reorganizing all the code, docs and tests in a better way for integrating into Octave. As Rik kindly suggested, I decided to organize things this way:

Inside libinterp/dldfcn directory I have created two files, __ichol__.cc and __ilu__.cc

Within those files there are the dld functions that implements the each of the algorithms. They are ment to be built-in functions and follows the __foo__.cc naming convention.

* __ilu__.cc: contains __ilu0__() , __iluc__() and __ilutp__()

* __ichol__.cc: contains __ichol0__() and __icholt__().

I have moved all the tests from .cc files to .m scripts so no tests are performed for built-in functions.

The code is ready to be pulled from my repo to be reviewed :

https://edu159@bitbucket.org/edu159/octave-edu159

Practical examples: preconditioned conjugate gradient

It is interesting to show how preconditioning techniques can improve the convergency of some iterative solvers. In this case I am running a Matlab example using the Poisson matrix (that is positive definite) obtained with gallery() function. The scritp:

1. In this first case the convergency of pcg using ICHOL(0) algorithm, modified ICHOL(0) algorithm and without preconditioning are compared.

N = 100;

A = gallery ('Poisson', N);

b = ones (size (A, 1), 1);

tol = 1e-6; maxit = 100;

[x0, fl0, rr0, it0, rv0] = pcg (A, b, tol, maxit);

L1 = ichol (A);

[x1, fl1, rr1, it1, rv1] = pcg(A, b, tol, maxit, L1, L1');

opts.type = 'nofill'; opts.michol = 'on';

L2 = ichol (A, opts);

e = ones (size (A, 2), 1);

norm(A * e - L2 * (L2' * e))

[x2, fl2, rr2, it2, rv2] = pcg (A, b, tol, maxit, L2, L2');

semilogy (0:maxit, rv0 ./ norm (b), 'b.');

hold on;

semilogy (0:it1, rv1 ./ norm(b), 'r.');

semilogy (0:it2, rv2 ./ norm(b), 'k.');

xlabel ('iterations');

ylabel ('error');

legend ('No Preconditioner', 'IC(0)', 'MIC(0)');

Octave

Matlab

2. In this second part of the script what is compared is the convergency of ICHOLT algorithm with different values of droptol.

L3 = ichol(A, struct('type', 'ict', 'droptol', 1e-1));

[x3, fl3, rr3, it3, rv3] = pcg (A, b, tol, maxit, L3, L3');

L4 = ichol (A, struct ('type', 'ict', 'droptol', 1e-2));

[x4, fl4, rr4, it4, rv4] = pcg (A, b, tol, maxit, L4, L4');

L5 = ichol (A, struct ('type', 'ict', 'droptol', 1e-3));

[x5, fl5, rr5, it5, rv5] = pcg (A, b, tol, maxit, L5, L5');

figure; semilogy (0:maxit, rv0 ./ norm (b), 'b-', 'linewidth', 2);

hold on;

semilogy (0:it3, rv3 ./ norm(b), 'b-.', 'linewidth', 2);

semilogy (0:it4, rv4 ./ norm(b), 'b--', 'linewidth', 2);

semilogy (0:it5, rv5 ./ norm(b), 'b:', 'linewidth', 2);

ylabel ('error');

xlabel ('iterations');

legend ('No Preconditioner', 'ICT(1e-1)', 'ICT(1e-2)', ...

'ICT(1e-3)', 'Location', 'SouthEast');

Octave

Matlab

As it can be seen Octave plots are the same as Matlab's ones. Both lead to a decrease in the number of steps upt to convergence of the pcg method. ILU algorithms could also have been used here, but due to the simetry of the problem matrix ICHOL is faster.

FSAL - new stepper implementation

As stated in the previous post, the implementation of the steppers as it was did not allow the possibility to exploit the FSAL (First Same As Last) property of the Bogacki-Shampine algorithm (ode23) and of the Dormand-Prince algorithm (ode45).The input and output arguments of the steppers have then be modified. As an example here is the runge_kutta_23 stepper:

where k_vals has to be initialized for the first iteration as f(t, x).This implementation will reduce the number of function evaluation for each step.

Furthermore, after some tests in MATLAB, the return values for the solution and the estimate in the runge_kutta_23 and runge_kutta_45 steppers have been swapped to automatically perform local extrapolation. The MATLAB functions are in fact of order 3 and 5 respectively.

August 13, 2014

Status of the code: bugfixes and new issues

ODESET and ODEGET

odeset and odeget functions have been slightly modified to be compliant with MATLAB. Each MATLAB option is present and all the options are tested. The coding style has been adapted to the GNU-Octave standard.

ode_struct_value_check: this function has been introduced by Roberto in addition to odepkg_structue_check. The relation between the two functions has to be clarified: in particular it is necessary to understand if it is really necessary to have two different functions or one is sufficient.

CHANGES TO THE STEPPERS

The runge_kutta_78 stepper has been implemented.

Two 4th order steppers have been implemented: runge_kutta_45_dopri (Dormand-Prince coefficients) and runge_kutta_45_fehlberg (Fehlberg coefficients).

CHANGES TO THE SOLVERS

ode78 solver has been updated to the new structure. It now exploits the runge_kutta_78 stepper.

A series of tests has been added to each solver to check all the functionalities and the all options. This has made me possible to detect some bugs that have been corrected. In particular the adaptive timestep evaluation had some issues that lead to the use of too short timesteps. This has been corrected and now the algorithm proposed in [1] is used.

Furthermore the current implementation uses linear interpolation to evaluate the solution at the user specified times. This leads to a considerable loss in accuracy and is not consistent with MATLAB (which guarantees the same order of accuracy of the scheme used). In [1] different methods are proposed for the dense output: these will be used as a reference for the implementation of a better solution.

In the released version of odepkg some of the solvers perform local extrapolation, that is the higher-order estimate is chosen as the solution. With the new stepper structure, as it is now, this choice effects all the solvers. It have to be decided whether to perform it or not (MATLAB doesn't seem to do it, thus I suggest to avoid it).

MATLAB implementation of ode45 uses the Dormand-Prince (DP) coefficients. In the released version of odepkg there exits two solvers: ode45 that uses the Fehlberg coefficients and ode54 that uses the DP coefficients. To be consistent with MATLAB, ode45 now uses the DP method. This makes the runge_kutta_45_fehlberg stepper and the ode54 solver, as it is now, redundant. Either their elimination or a change of the solver might be considered. However one of the advantages of DP coefficients is the possibility to reuse the last function evaluation at a given step as the first evaluation of the subsequent one. This is not easily done with the stepper structure introduced by Roberto.

CHANGES TO THE OPTIONS

InitialStep option has been modified to be MATLAB compatible (it must be a positive scalar).

RelTol defalut value has been changed to 1e-3 instead of 1e-6 to be MATLAB compatible.

MaxStep option has been implemented.

NormControl option has been implemented.

TODO

In addition to the general plan, a couple of issues need to be addressed:

Clarify the relation between ode_struct_value_check and odepkg_structue_check.

Decide if local extrapolation has to be used or not. My opinion (and the current implementation) is to avoid it to be compliant to what MATLAB seems to be doing.

Solve the dense output problem in a way that guarantees the consistency with MATLAB.

Consider if it's possible to reduce the number of function evaluation for the Dormand-Prince stepper (ode45) and the Bogacki-Shampine stepper (ode23) exploiting the FSAL property (first same as last).

Decide if in the future releases of odepkgode54 has to be removed or maybe changed to become a Fehlberg solver.

August 12, 2014

As said in my previous post, a missing feature in fem-fenics was the marking of subdomains. Indeed, I proposed an example that needed a file generated with a run of the corresponding Python code, which is not, honestly, the best approach. In order to address this issue, these days I have implemented a new class, subdomain, which can be used to mark mesh entities. In the following I will describe how to use this new functionality. Here is the code:

As you can see, it is basically the same as in the previous post, except the line used to import the meshfunction. I wrote in the corresponding comment where the edits are to be found. Now the workflow comprises these steps: first of all, a meshfunction needs to be created, then a subdomain, in the end we should mark cells.

The call to MeshFunction is something new, since it is now possible to instantiate a meshfunction given a mesh, the required topological dimension and the value to initialise it with. Moreover, the optional label "dx" means that it can be used in calls to BilinearForm or LinearForm to supply markers for subsets of the integration domains. In the example, this instruction returns a meshfunction of dimension 2, which means it holds values associated with each triangle in the mesh, initialised to be 0 in every entry.

The subsequent instruction, instead, defines a subdomain, passing as arguments a function handle and a logical flag. The former will be the override of the dolfin::SubDomain::inside method, so it must return true for entities contained in the subset and false otherwise. In facts it checks whether the coordinates are inside the 2-interval defining the obstacle. The latter, instead, can be used to ask for a boundary subdomain, when set to true.

At last, mark is called to set the entries corresponding to cells inside the subdomain to 1, so that the returned meshfunction now represents the obstacle: after these lines, the variable named domains assumes value 1 on cells inside the obstacle region and 0 outside. Thus, it is now possible to solve a problem whose formulation entails subdomains entirely using fem-fenics.

August 11, 2014

It's been quite long since I posted here due to some personal situations. Anyway to sum up: I have finished ilu and ichol functions as I have planned in the beginning with great results.Things done after mid-term evaluation:

Implementing ICHOLT and ICHOL0 algorithms.

Fixing several bugs in ILU algorithms and introducing some enhancements for big sparse matrices with verly low densities.

The files involved in ichol, within the repository, are:

src/icholt.cc

src/ichol0.cc

ichol.m

You can clone the code from the repo:

https://edu159@bitbucket.org/edu159/gsoc2014-edu15

Before going into the details of the algorithms' implementation, I want to point out some details about how ichol behave in MATLAB.

In the real case the matrix must be symetric positive definite. In the complex case the input matrix must be hermitian. That means: diagonal elements of the input and output matrix have to be non-zero, positive and real values. So that, at each iteration those conditions have to be fullfilled.

If ichol is called just as L = ichol (A), Matlab ignores complex numbers and only work with their real part. Using L = ichol (A, setup) call, complex numbers are considered. Seriusly I do not understand why they do that and I have not followed that behaviour. Anyway if to be 100% compatible I must change that, it would be only a line of code extra.

Details of implementation

-->src/ichol0.cc

In this file is located the implementation of ICHOL(0) algorithm. The zero-pattern of the output matrix is the same as the input one so it is known from the beginning how much memory is needed to be allocated. The milu = ['on'|'off'] parameter indicates whether the dropped elements are added to the pivot or not (that keeps the colum sumation). I will show two examples, one that corresponds to a big matrix with a very low density and the one that used Kai last year in his blog.Example 1:

A = gallery ('poisson', 500); size (A) ans = 250000 250000 tic; L = ichol (A); toc; Elapsed time is 0.031718 seconds. density = nnz (A) / (size (A)(1))^2 density = 1.9968e-05 norm (A - L*L', 'fro') / norm (A, 'fro') ans = 0.0924207846384523 norm(A-(L*L').*spones(A),'fro')./norm(A,'fro') ans = 2.28617974245061e-17It can be seen that the product L*L' is quite different from A, but the product L*L' will match A on its pattern (that is expected for the ICHOL(0) algorithm. The execution time is just given to give an idea of how fast the code is. It is executed in a i7 2.4GHz.Example 2:

This example is taken from that post, written by Kai the past year. He faced problems with the michol option, obtaining different results from Matlab.input:

This file contains the implementation of ICHOLT algorithm. In this case the final structure of the output matrix is unknown. Therefore, a policy should be adopted for allocating memory. After trying different ways of doing that I end up using that one:

What is done here is just to increment 10% of the actual size of the ridx and data internal arrays of the output sparse matrix. But only if that amount is larger than the dimension of the input matrix (n). In other case the increment in size is just n. That policy seems to work very well in every case I tested and do not slow down the process at all due to reallocations.Example 3:icholt accepts a parameter for controling the sparsity of the ouput matrix called droptol. If droptol = 0 then the complete factorization takes place. If we increase that value the output matrix will become more sparse as more elements will be dropped. Taking the same matrix than in example 1:

As it can be seen, the higher the droptol parameter is, the sparser the matrix become. That lead to less execution times but on the other hand a higher error is obtained in the factorization. The complete factorization obviously have practically no error. Cool.

Location of source files inside Octave core

Now I've finished with the development of the algorithms, the final step is to integrate them into Octave core. For doing so I will create a subrepo of the default Octave repository and add the files. I have chosen the location for the functions looking at the last year repository Kai set.

That is just a sugestion and should be revised and accepted by the maintainers.

Future contributions

There is a week left that I want to use it to start (and hopefully finish) the development of sprandsym extra parameters that Matlab have but Octave does not. As I submitted in the past a changeset for a similar functionality in sprand and sprandn, it will be much easier to implement for me.

Also I am interested in developing some sparse linear solvers like minres and lsqr that Octave lacks. They are tightly related to the preconditioners I have been working on, and would be nice if they could be assigned to me for developing them.

August 09, 2014

As you may recall from my last post, for DirichletBC to work in parallel runs I had to implement a new class, meshfunction. However it was still quite unfinished, with no way for the user to create one, except extracting it from a mesh produced by the msh package, no description to display, no way to save it. These days I have been tackling this issue: while at it I wondered what one could do with meshfunction and found out that they can come in handy when you are dealing with obstacles.

At this link you can find a detailed explanation of the problem. It is a Poisson equation with variable diffusion coefficient on the unit square. Precisely, on [0.2, 1]x[0.5, 0.7] its value is 0.01, otherwise it is 1. The mentioned subset is the obstacle to diffusion, so we study its effect applying u = 0 on the y = 0 edge and u = 5 on y = 1. Here is the fem-fenics code:

In the beginning there is the now familiar ufl block. As you might have noticed, subscripted measures appear in the definition of the bilinear form a and of the linear functional L. This is UFL notation for the integration on specific subsets of the computational domain. For instance, dx(1) is an integral over the subdomain marked with label 1, while ds(3) is an integral over the exterior edges marked with label 3. A third possibility, even if not used in this example, is to use dS for integrals on interior facets, which could be of use for interior penalty methods. Going back to the example, you can see that markers are used to enforce non-homogeneous Neumann conditions on the side edges and to assign the proper coefficient on the two subdomains.

After defining the problem in UFL language, there are instructions to define the mesh, the function space, the essential boundary conditions and all the coefficients involved. All such lines come from fem-fenics before this summer or have been described in my previous posts, so I will not cover them in detail. The same applies for the assembly, solve and all the output in the end of the script. The only note is that the very last lines will error out in parallel runs: point-wise evaluations in DOLFIN can be performed only on local cells, but with meshgrid we are providing to every process the whole domain.

The computed solution

In between there are my latest efforts. At first, the brand new MeshFunction. With this, providing a mesh and a file name you can import a dolfin::MeshFunction. In this case it was saved in the XDMF format, here you can find the files needed to execute the script. DOLFIN uses this format for parallel input/output. It comprises a .h5 file storing data and a .xdmf with metadata useful to read the other one. The optional first argument is a string identifying the role of the returnedmeshfunction in the variational problem. In this case, with "dx" it will be searched for markers of the integrals on cells. All the previously mentioned measures are available, and "ds" is automatically attached to the meshfunction returned by Mesh. In the example this behaviour is exploited for the measure on edges.Afterwards, the mesh functions are passed as arguments toBilinearForm and LinearForm, so that the markers are available to assemble the system. In addition to the usual parameters, such as the name of the imported UFL problem, the function spaces and the coefficients, it is now possible to provide mesh functions properly labeled and they will be used.

Currently fem-fenics allows for easily marking subdomains and exterior edges copying markers from the PDEtool representation returned by the functions of the msh package, which makes it quite tricky to properly identify the obstacle in the example. The approach used in the python interface to DOLFIN entails subclassing dolfin::Subdomain with the proper implementation of the inside method, then use an object of the derived class to mark a dolfin::MeshFunction. This could be an interesting feature to implement in the future also in fem-fenics.

The basic structure has remained the same. DOLFIN boasts the capability to be run both in serial and in parallel execution without intervening on the code, so I did my best to have the same behaviour from fem-fenics. The Poisson.m m-file above can be run either as you usually would do with any other m-file, or from the command line with an invocation such as:

mpiexec -np 4 octave --eval Poisson

Now, how is this possible? In the beginning, with the ufl block, the variational problem is defined in UFL language, written to an .ufl file and compiled via FFC. Since IO is performed, ufl.m ensures that only process zero will open and write to the file. Moreover, a MPI barrier makes sure that no process will proceed before the .ufl file is imported.

As soon as the just-in-time compilation is over, there are two instructions to build the mesh, in this case on the unit square. For this, we rely on the msh package, which returns a PDE-tool-like representation of it. Mesh.oct must, then, convert it to DOLFIN internal representation and distribute it among processes. Here comes an issue: fem-fenics relies on markers present in the PDE-tool format to impose essential boundary conditions, and in serial runs dolfin::Mesh can store them, so that DirichletBC.oct needs just to know the boundary subset label. Unfortunately, this feature is not supported yet in parallel by the DOLFIN library, then Mesh.oct has been edited to return, if requested, also a meshfunction holding this information, in the example above facets. This way markers can be conveyed to DirichletBC.oct and boundary conditions can be applied on the correct edges.

Further intervention was needed for the assembly and solve phase. In assemble_system.oct both the matrix and the vector are assembled locally on each portion of the mesh and, afterwards, gathered on process zero and joined, so that the system can be solved with the backslash instruction of Octave. In order to allow output in VTK format, in Function.oct the solution is split up and properly distributed among processes, so that each one holds the portion of degrees of freedom related to its subdomain and to the neighbouring vertices. After save.oct has written the solution to poisson.pvd and its auxiliary files, it can be visualised with ParaView.

July 30, 2014

Lately I have not been very active on the blog since I am encountering some difficulties in the attempt to introduce MPI parallelism. Meanwhile, I have extended support to the latest version of the DOLFIN library.

Among other changes, one that strongly affects fem-fenics is the shift from the shared pointer implementation by the Boost libraries to the one included in the Standard Template Library with the new C++11 standard. This change alone calls for edits in almost all the codebase of the package, as basically all DOLFIN data structures are stored via smart pointers in the corresponding fem-fenics classes. However, currently version 1.3.0 is still present in the official repositories of the main Linux distributions, thus switching abruptly to the latest version would have prevented further releases of the package for a while.

In order to tackle the above mentioned issue, I resorted to the preprocessor capabilities, so as to infer from the DOLFIN version available on the compiling computer the right kind of pointers to use. Among other options, the preprocessor flags obtained using pkg-config define also a macro reporting the DOLFIN version. It is, then, possible to check it and choose the correct pointer implementation right before compilation. Currently in fem-fenics every occurrence of boost::shared_ptr has been replaced by a SHARED_PTR macro, which in turn is defined in a new header that takes care of setting it to the right value. There is just a catch: preprocessor conditionals cannot compare strings, but the DOLFIN_VERSION macro is indeed defined as a string. In order for this approach to work, the package Makefile, for the initial compilation, and the get_vars.m function, for the just-in-time ones, perform the actual check and define an auxiliary macro if the latest version is found on the system.

SoCiS 2014 - New timeline

On the occasion of SoCiS 2014 I will take over the work that has been done last year on odepkg and continue it. The final goal is to release a new stable version of odepkg and to insert the most common solver in core-Octavein such a way that everything is MATLAB-compatible.

The following list explains the main points of the timeline for my SOCIS project:

Check of the current status of the code, in particular with respect to the current release on SourceForge. The two repository will be merged so that every test present in the old version will be added to the new code. Verify if there are missing features and add them if necessary.

Comparison of the performance between the old and the new structure. In particular we expect that the introduction of the Levenshtein algorithm for the string comparisons will be a critical issue. If necessary implement levenshtein.m and fuzzy_compare.m in C++.

Verify that the functions odeset and odeget are MATLAB-compatible and compliant to Octave core. Add the two functions to the core.

Move ode45, ode23and ode23s to Octave core.

Implement ode15ssolver. This solver is still missing in odepkg but is highly suitable for stiff problems.

July 12, 2014

fem-fenics provides a couple of utilities for assembling matrices and vectors, which compose the algebraic system weak formulations are reduced to when applying FEMs. Left aside all the checks needed to verify inputs, their job boils down to creating proper DOLFIN objects to store these matrices or vectors, possibly applying boundary conditions, then building an Octave-friendly representation. This last task is quite critical for the implementation of the MPI parallelisation, as the underlying DOLFIN representation of matrices and vectors is transparently distributed among processes, thus making the serial code used to pass them on to Octave useless. Lately I implemented some new classes to manage this aspect, so I will highlight my design considerations.

The translation from DOLFIN's to Octave's data structures is logically a well defined task, whilst its implementation needs to vary according to its serial or parallel execution. Furthermore, it strictly depends on the linear algebra back-end used, for each of them stores a different representation and exposes a different interface to access it. To address these difficulties, I wrote a hierarchy of factories to provide the adequate methods, based on the run-time necessities. Moreover, this way the code is easily expandable to allow for more back-ends to be used in fem-fenics (currently only uBLAS is available). There is an abstract class, to declare the interface of its derived ones, and a concrete factory implementing the uBLAS-specific methods.

Since in a future fem-fenics there will be several algebraic back-ends available for use, the hierarchy will expand. This means that the checks of the run-time configuration will eventually become more complex. Another issue comes from the need to use different types depending on information available only at run-time. Both to encapsulate those checks, avoiding code duplication, and to solve the problem of choosing the right class, I added to the hierarchy a class implementing the Pimpl idiom. With this design, the "user code" in the C++ implementation of assemble.oct and assemble_system.oct needs just to create a femfenics_factory object and use it to extract the data structures of interest, while every other hassle is dealt with behind the scenes by this class.

UML diagram of the new hierarchy

In the diagram above you can see the already implemented classes and an example class to point out where others will collocate amongst them. femfenics_factory has private methods to check which is the right concrete class to use each time, and implements the public methods of the abstract class dispatching the call through a reference. uBLAS_factory, as other concrete classes are expected to do, holds the real code for creating Octave matrices and vectors and exposes a static method, instance, which allows for access to the singleton object of this type. femfenics_factory, in turn, obtains with it the reference needed for dispatching.

These days I have started my investigations on the actual implementation of the MPI parallelisation in fem-fenics. I found out some information that I will point out here, together with my goals for the next weeks.

First of all, apparently MPI can be used without user intervention on the serial code. This is a feature that DOLFIN boasts, but I would expect it not to pass on to fem-fenics, at least not without some effort on the implementation side. Furthermore, DOLFIN offers also a wrapper for MPI functionalities, thus probably it can be helpful in managing data transfers among threads in C++ code.

An issue that will need to be addressed is making ufl.m robust to parallel execution, since its serial implementation leads to all workers trying to open the same file, thus leading to an error that stops computation. Anyway, even if they could all open the file and write to it, this would entail that lines are copied in random order or more than once, so it must be fixed.

In the end, it seems that the partitioning procedure produces matrices that are not slices of the one assembled in serial execution. Due to this fact, I must go deep in the algorithm to find out the proper way to merge the pieces and obtain the complete matrix, which will be stored as octave_value to allow for further computation using Octave's features.

July 03, 2014

The purpose of this post is to explain the details behind the implementation of the ilu function, my work during this first period of GSOC program. The files involved are:

src/ilu0.cc

src/iluc.cc

src/ilutp.cc

ilu.m

You can pull the repo using mercurial from:

https://edu159@bitbucket.org/edu159/gsoc2014-edu159

--> src/ilu0.cc

This file contains the implementation of ILU0 algorithm, the easiest one. In this version the zero-pattern of the input matrix is not modified so it is known the final structure of the output matrix. That simplifies things. For the milu=['col'|'row'] option, it is needed to implement both the IKJ and JKI versions of the algorithm to efficiently compute the compensation of the diagonal element with dropped elements. I managed to do both in the same function, just changing a few lines of code. Lets use Matlab's documentation example:Example 1:

The following little function can be used, when milu = ['row'|'col'] to check that all the columns/rows preserve its sumation (not only with ilu0 but with iluc and ilutp). Just run it after calling ilu in any form.

NOTE: I have found in Matlab 2013a that the row and col sumation does not work well always, and the row and column compensation fails for ilutp and iluc. I will show an example later.

--> src/ilutp.cc

This algorithm is the trickiest one due to pivoting, and has caused me more than one headache during its coding because it is not well described in Saad's book, just a few indications. I have found here several bugs in Matlab's 2013a implementation that make me a bit reticent about trusting results correctness.

That 0 cannot be there. By construction L has to be a lower unit triangular matrix and that zero element spoils the L*U product. Again WRONG.

I have encountered more issues when testing Matlab using some testing matrices with 2000x2000 and 5000x5000 dimensions. With them my output is not the same as Matlab's (nnz of L and U are different from Matlab's), but taking into account the errors I found, I trust the most my version and not theirs. BTW in my case the rows and columns sums were preserved, theirs not. Obviously I have checked that those examples behave correctly in my code detecting 0 pivots

Pivoting: It worths to mention how pivoting is performed in that algorithm. When milu = 'row' the U matrix is column permuted (IKJ version used) but when milu=['off',|'col'] L is the permuted one and it is row permuted (JKI version used). Both algorithms share a lot of similarities and the code is designed to work in one version or another depending on milu option. That way code duplication is avoided. That was one of my primary fears when I realized that both versions were needed to attain Matlab compatibility.

--> src/iluc.cc

This is the file containing the crout version of ILU. This version is an enhancement of pure IKJ and JKI variants of gaussian eliminations. At iteration k the k:n section of k column and k row is computed. The enhancement is noticed in the execution time for the same input matrix. The following example is a comparison between my versions of ilutp and iluc:

For a 2000x2000 matrix ( I have not included this matrix in the repository due to it size):

With setup.droptol = 0.01 and setup.milu = 'off'.

Octave: ilutp --> 12.3458 secondsiluc --> 6.31089 seconds

Matlab:ilutp --> 12.868686 secondsiluc --> 7.498106 seconds

That is just to illustrate the performance of different versions.

NOTE: In iluc the dropping strategy for elements in U (stored as CRS) is to drop the element aij if (abs(aij) < droptol * norm(A(i, :))). For the L part (stored as CCS) aij is dropped if (abs(aij) < droptol * norm(A(:, j))).

That is all I wanted to show till now. I have written tests for the functions and adapted several ones from Kai last year work. However I want to add some more function-specific ones for validating results. The last thing pending is to place the source files inside the Octave source tree. I am not totally sure where they should go. On the other hand I have already started to work on ichol function and next week I'll report about my progress.

I know the post is a bit long but I think it is needed due to the poor verbosity I had through the blog during this period. I am aware of that (Jordi pointed me out a few days ago) and I will take into account for the following weeks.

June 27, 2014

This is a short post just to clarify my state at midterm. As I have arranged with Kai, at the beginning of the next week I will write a verbose post to explain all the details related to the development of "ilu" function. I need a couple of days to tidy up all the code and make it presentable.

The state of the code now is functional. It lacks of tests and documentation and need a bit of re-factorization I will do this weekend. I will list several relevant points about the implementation of the function.

It is programmed to output the same as Matlab.

My version is at least as fast as Matlab's, outperforming by small amounts of time in large output cases.

I found a bug. At least on Matlab2013a regarding "ilutp" option when using milu="col". The col sum is not preserved in at least one case so I found in my testing cases that my function does not output the same. I will explain with more detail that issue next week.

In the end I am happy with the performance of the function and that I'm meeting the time-line proposed at the beginning of GSOC (with a few days of delay maybe).

Future work: The second part of the GSOC I will implement "ichol" function. There are several points to discuss about its development with Kai because he implemented the code last year but there were some kind of issues with the licensing of it. This period is a bit longer and I will have no classes nor exams. Because of that, if I have time remaining at the end, I can start implementing "minres" or "lsqr" algorithms that Octave lacks of too. So there will be no time wasted.

June 25, 2014

I will try to give a comprehensive feel of what I achieved in this first part of the Google Summer of Code, since it is time for the mid term evaluation. Let's start with an example: as usual, it is the Poisson equation, but today, as a twist, we consider a fully Neumann problem. In order for such a problem to be well posed there is the need of an additional constraint, otherwise the solution would not be unique, so in the Octave code there is the Lagrange multiplier c. Here you can find more details and the C++ and Python code, I will just write down the differential problem for convenience:

- Δu = f in Ω

∇u ⋅ n = g on ∂Ω

Here is the Octave code that solves the above mentioned problem:

pkg load fem-fenics mshufl start NeumannPoissonufl CG = FiniteElement '("CG", triangle, 1)'ufl R = FiniteElement '("R", triangle, 0)'ufl W = CG * Ruflufl "(u, c)" = TrialFunctions (W)ufl "(v, d)" = TestFunctions (W)ufl f = Coefficient (CG)ufl g = Coefficient (CG)uflufl a = "(inner (grad (u), grad (v)) + c*v + u*d)*dx"ufl L = f*v*dx + g*v*dsufl end# Create mesh and function spacex = y = linspace (0, 1, 33);mesh = Mesh(msh2m_structured_mesh (x, y, 1, 1:4));W = FunctionSpace ("NeumannPoisson", mesh);# Define variational problemf = Expression ('f', @(x,y) 10*exp(-((x - 0.5)^2 + (y - 0.5)^2) / 0.02));g = Expression ('g', @(x,y) - sin (5.0 * x));a = BilinearForm ("NeumannPoisson", W, W);L = LinearForm ("NeumannPoisson", W, f, g);# Compute solution[A, b] = assemble_system (a, L);sol = A \ b;solution = Function ('solution', W, sol);u = Function ('u', solution, 1);# Plot solution[X, Y] = meshgrid (x, y);U = u (X, Y);surf (X, Y, U);At the very beginning you can see a block with every line starting with ufl. That is what you would have to put in a separate UFL file before this summer. In a sense it is not plain UFL, but there are extra quotes and apices. They are needed because, using the current version of Octave, those brackets with commas inside would otherwise be interpreted as function calls. After this blocks closes with the ufl end line, the resulting UFL file is compiled to obtain a FunctionSpace, a BilinearForm and a LinearForm. These are oct-files that fem-fenics will use later on to define the corresponding variables in Octave. A robust implementation of ufl.m, the function that provides this binding to the UFL language, is one of the results of the first term.In the end of the snippet you can see that the solution u is evaluated in its domain exactly as you expect to do with a regular function taking two arguments and returning one value. This is due to the new subsref method of the function class, which is used to represent the elements of a function space. Aside from surface plots, this feature can be of interest to generalise methods that rely on analytical solutions to differential problems, or to apply basically any algorithm to such functions. Here is the plot you will obtain with this script:

I wrote in an earlier post of the interpolate function: with this you can get the representation of a Function or Expression on a given FunctionSpace. It is useful, for instance, to compare your numerical solution with an exact one you happen to know. Or, in the example above, you might want to view what is the forcing term like:

f_cg = interpolate ("f_cg", f, u);F = f_cg (X, Y);surf (X, Y, F);

There is one last achievement to highlight for the mid term evaluation: currently both the initial compilation of the package and all the ones performed just-in-time when importing UFL instructions proceed smoothly without user intervention. To this end, now the build system relies on pkg-config to get at once all the flags needed for proper compilation and linking, since some dependencies of dolfin, the FEniCS interface, are not to be found in standard directories. In order to exploit the extracted information also for the subsequent run time builds, the autoconf substitution is performed also in the get_vars.m auxiliary function, which in turn provides it to generate_makefile.m. An implementation detail that proved quite tricky is how to pass all the preprocessor flags to mkoctfile: only a subset of the options of g++ are hard-coded in it, so I needed to resort to a workaround. Indeed, CPPFLAGS are always passed as environment variables and not as command line flags, so that mkoctfile will just copy and deliver them to the real compiler.

To further enhance the build system, I implemented other internal functions that hash the UFL file that was compiled and, later, check it to understand if it changed between the previous and the freshly requested build. In the example above, you will find in your working directory four new files after a run: the three already mentioned oct-files and a text file storing the md5 sum of the UFL that has been imported. Until one of these files gets somehow deleted or the problem in the ufl block changes, you will not need to take on a time consuming compilation any more.

June 23, 2014

As said in the previous post, my latest result is the possibility to evaluate a fem-fenics function on a point of its domain. This way it is possible to generalise methods that, otherwise, would rely on analytical solutions. In [1] we have an example of such a method.

The paper deals with the deposition of nanoparticles in tissues, for the treatment of cancer. The phenomenon is described with a Monte Carlo simulation of these particles' trajectories, assuming that the velocity field of the carrying fluid is known. In this study, some simplifying hypotheses about the geometry of the cells and the fluid layer nearby allow for an analytical solution of the Stokes equation. Unfortunately, these assumptions do not hold generally in human tissues: for instance, in the liver cells have cubic shape, contrasting to the spherical one used in this paper. Now, if this method is implemented in Octave, we can solve numerically the Stokes equation on a realistic domain and obtain right away a more general approach to this significant application.

The evaluation of a fem-fenics function was already possible via the feval method, but it had some glitches. One aspect is that the solution of a differential problem could not be used as if it was a regular Octave function, then a user should have adapted his/her algorithms to take advantage of it. One more critical issue is that the previous implementation did not handle the exception raised by the underlying FEniCS method when it gets as argument the coordinates of a point outside of the domain, thus leading to a crash of Octave.

In order to address these problems, I added the subsref method to the function class and implemented the proper exception handling in feval. To avoid code duplication, the former relies on the latter for the real computation, so it basically just forwards the parameters after checking that the right type of indexing was used. As a result, it is now possible to solve the equations:

- ν Δu + ∇p = 0

∇ ⋅ u = 0

with relevant border conditions, on a proper mesh and finite element space, and then evaluate the solution with the Octave expression values = u (points), where points is a matrix holding the coordinates of every point where to do so, one per column. Moreover, a careless evaluation will not result in your Octave session crashing any more.

Even if this feature of the package underwent some improvement, there is still room for more. Two issues I have not addressed yet are the somehow weird interface and the possibility to create a function handle to perform evaluations with. Regarding the former, we might observe that the above mentioned expression remains exactly the same no matter what the geometrical dimension of the domain is. I should modify the implementation so that a vectorial function on a 3D space is evaluated with [ux, uy, uz] = velocity (x, y, z). Moving to the latter, in my understanding the class design should be modified to allow the exploitation of the Octave internals managing functions, so this would require a careful reflection on all the possible collateral effects of such a change.

June 16, 2014

The mid-term review is approaching, so it is time to highlight what is done, what is underway and what are the future goals. In this post I will try to do so as clearly as possible.

Mid-term review

The main effort during the first part of the Google Summer of Code was the implementation of the bindings for the UFL language in Octave. Now UFL code can be written directly in m-files, without the need of a separate file to define the problem. To this end, ufl has been implemented for opening a file, writing to it and importing the variational problem when it is complete.

Further, I implemented interpolate, which allows the interpolation of a Function or an Expression on a given FunctionSpace. This can be of interest to test the validity of a discretisation method, for instance if an analytical solution is available in closed form, so that it is possible to compare it with the numerically obtained one.

Lately, I focused on the build system, both for the package compilation and for the just-in-time ones needed to import variational problems in Octave. The former is now backed by pkg-config, so that all the proper compiling and linking options required by the dependencies are obtained at once. Thinking about the latter, this information is used to accordingly configure the get_vars function, which provides it to the one that generates the Makefiles used to compile oct-files just-in-time. In the end, currently these oct-files are compiled again only when necessity arises, for example if one of them has been deleted or if the UFL file has been changed.

In the upcoming week I will add another feature: it will be possible to get a function handle for the evaluation of a Function. This way the solution of a variational problem can be used exactly as any other function in Octave, for instance allowing the generalisation of algorithms relying on exact solutions of differential problems, which are thus limited to simple cases. I will provide some details on an application in my post about this feature.

Final review

In the second part of the project I will be mainly committed to the parallelisation of the package execution via MPI. As noted in an earlier post, the parallelisation through the OpenMP paradigm has been quickly abandoned because it does not provide a significant performance gain, while opening the way to bugs and errors. Parallelism is, anyway, an interesting feature for the package's use cases, so it will be the main goal of the final hand in.

June 15, 2014

One of the known issues of the fem-fenics package was related to the errors during the just-in-time compilation, due to missing include directories. Among my preliminary contributions there is a changeset addressing the problem in the initial build of the package, when installing it into Octave. Now I went on and solved it also during the usage of fem-fenics.

At the moment of the first build, autoconf is in charge of finding out the relevant compiler and linker flags through pkg-config. They are, then, substituted in the Makefile, which compiles the package making use of them. This piece of information is needed also when an UFL file is imported and transformed into the oct-files used to transform the weak formulation at hand into an algebraic system, but until now the user had to supply it by the means of an environment variable.

Currently, I added a new utility function that provides those flags. In the configuration process they are substituted also in get_vars.m, which is called by generate_makefile.m when a differential problem is imported. The latter replaces two placeholders and writes the ad hoc Makefile with all the necessary compile and link options. This way users will not need to provide compilation flags anymore, instead the package will manage this aspect on its own.

As noted in a previous post, however, this just-in-time build is relatively time consuming, taking around half a minute each time. Nonetheless, a common usage pattern could entail the resolution of the same weak formulation on varying meshes or with different boundary conditions, forcing terms or physical parameters. Every mentioned situation does not need the recompilation of the problem's oct-files, since they carry information only about the function space and the formal expressions of the bilinear form and the linear operator. It is useful, then, to take on the build process only when needed.

To add this feature, I created three function to perform appropriate actions. After every successful just-in-time compilation, save_hash.m takes care of hashing the imported UFL file and writing the result to <ufl-filename>.md5sum. On the other hand, at the beginning of every import_ufl_*.m function, a check like this is performed:

if (check_hash (var_prob) ||

! check_oct_files (var_prob, "Problem"))

You can see in it the remaining two functions implemented lately. The first one, check_hash.m, receives as argument the name of the variational problem, reconstructs the UFL file name, looks for a saved hash sum and compares it with the current file's. It returns true if the proper .md5sum file is not found or if the new and old hashes are not the same. Clearly, the oct-files should be rebuilt if one of them is missing: check_oct_files.m looks for the relevant files, with its second option stating which import is underway (thus, which files are expected as output), and returns true if they are all available.

These days I have worked on the implementation of the interface to the OpenMP-powered assembly offered by FEniCS. Despite being potentially a one-line intervention, it proved quite tricky: indeed, with the needed addition, the fem-fenics function for system assembly broke with a huge number of run time errors, probably due to a change in the underlying data structure that is transparent to the library users, but does not go unnoticed if you need to access it directly, as fem-fenics does. This led me to leave this functionality behind.

My choice is backed by some computational experiments. They show that the approach enacted by the FEniCS library is quite effective, with times required for assembly reduced by half using four threads instead of just one. However, they are negligible compared to the linear system solve phase, even when the OpenMP parallelisation is disabled. I used a great number of mesh nodes in order to have meaningful timings: even if linear systems took some minutes for resolution, the assembly phase lasted as much as a couple of hundredth of a second in serial code. If we add to these findings that the fem-fenics package requires a just-in-time compilation lasting around half a minute, we understand that there is no point in devoting effort for the implementation of this feature.

It has been a bit more than two week since my last posting. I just wanted something solid enough to show before doing it again :). Because one image is better than a 1000 words. This is the state of my project till now:

In green color is what it is finished and working (obvious...) and in pink what it is partially finished. Red stuff is not working at all.

ILUTP implementation:

As I did with ilu0 function, I started the implementation of ilutp using the IKJ variant of the Gaussian elimination as prof. Saad does in his book. For working efficiently with CCS(column compressed storage) structure of sparse matrices it is only needed a transposition before and after the process. So I came up with a working version without pivoting using this strategy a week before this post (src/ilutp_crs.cc file in the repository). All OK till that point. Well ... it was not all OK. When pivoting comes into play, all get messy. It is not feasible to do row-pivoting efficiently after transposing the matrix and using the CCS structure with the IKJ algorithm. What I realized is that Matlab, by default, implements for milu="col" and milu="off" options a JKI variant of the algorithm. This way row- pivoting can be used and no transposition is needed using the CCS structure. So for the whole last week I had to almost rewrite entirely the function to implement it in the JKI way. That was a serious delay because I was not familiar with that variant. On the other hand I also got to the conclusion that milu="row" option demands a IKJ implementation with column pivoting. It can be infer from the documentation:

Column pivoting means that if CCS is used as storage structure (Octave does), the strategy must be to [transpose - perform IKJ algorithm with column pivoting - transpose again]. So it is needed another implementation. That is the reason milu="row" is not working with ilutp. I had no time to implement that variant with pivoting. However, I have half way traversed because of my early IKJ implementation. So I am working on it.

I am taking special care to output exactly the same as Matlab, that means figuring out some nuances of their implementation that can only be understood after trial and error experimentation with their ilu version. I tried to test intensively the function and for my test cases my version outputs the same as Matlab's.

I have integrated the ilu0 and ilutp function inside a m-file wrapper called ilu.m located in the root directory of the repository. The file was written last year by Kai and need to be changed a bit. But for now it is OK to provide a user-friendly interface to try my functions. Use it the same way as you were in Matlab.

Just execute make in the root directory and then open the Octave interpreter inside it too.

For the next week I am planning to finish the implementation for the milu option in both ilu0 and ilutp. (You can find the files as src/ilutp.cc and src/ilu0.cc in the project directory)

P.D: For who cares about performance ( I do), my version is a bit faster than Matlab's. You can try it for big matrices. I did, and for low values of droptol (means few terms of the matrix will be dropped), using pivoting and relatively big matrices (5000x5000) my version lasted around 200 secs and Matlab 220 secs. For a 2000x2000 one, the times were 19secs Matlab's, 13 secs mine. The numbers are just for you to get an idea. But they are good news.

June 09, 2014

Note: in order to satisfy the exquisite tastes of today’s discerning internet readers, the following blog post is written in Cracked.com style.

We have been using open source for so long that we have forgotten, culturally, where it came from. It seems so natural and ubiquitous that we can no longer remember how things were before it. Some of us are young enough to have never even lived through times were open source wasn’t everywhere.

I am here to set the record straight on a few things, because I have noticed that even people who have lived through ye olden times have forgotten where things came from. Open source wasn’t spawned single-handedly by the sheer might of Linus Torvalds’s virtual gonads. Open source doesn’t mean that money is forbidden. Open source doesn’t mean that Richard Stallman is a twit.

Wait, wait, wait, let me get this straight. Open source was created… by a girl?!?!

OSI is an organisation that got together for a single purpose: to keep saying “open source, open source, open source” so much until everyone else was saying it too. This was all in February 1998, remember. That means open source is barely a year older than The Matrix. Neo had probably not even heard about it, because…

2. Nobody called it “open source” before OSI

The greatest testament to how good OSI’s marketing campaign was is that we have come to believe that the term is so natural that we always just called it that. They have convinced us all that “open source” was our idea, without needing to get into our dreams to do so.

… and from now on, you will worship penguins and red t-rexes.

Needless to say, it was not our idea. By far, the most common way to refer to “open source” before 1998 was “free software”.

Now, I know what you’re thinking. “Oh god, not this stupid flamewar again. Jordi, we know you’re a FSF-spouting propaganda drivel machine, why do you keep pushing the stupid term for open source that Richard Stallman keeps talking about?”

3. Open source has a precise definition

Now, here’s the thing: OSI didn’t just say, “here is open source, go wild and free, call anything you want open source!” Nope, in what might appear at first blush to be a cruel ironic twist, OSI did not make the definition of “open source” itself open source. In fact, they even trademarked “open source”, and ask that you only use the phrase according to their trademark guidelines!

Those controlling bastards trampling on our freedom with their smug little ®

Alright, so what does “open source” mean?

Well, in the beginning, Bruce Perens wrote the Debian Free Software Guidelines (there’s that pesky “free” term again). Then, he decided he was just going to grab those very same guidelines, run sed -i s/Debian/Open Source/g, and make that the official definition of open source.

This means that “open source” means a lot more than just “show me the code”. In particular it means that,

If you don’t let people sell it, it’s not open source.

If you don’t let people give it to their friends, it’s not open source.

If you don’t treat all receipients of your software equally, it’s not open source.

If you’re not a pinko commie ideologue, it’s not open source.

So why did OSI insist so much on a precise definition of open source? Well, because…

4. “Open source” is a synonym for “free software”

Okay, this is one that really gets people riled and the one where the flamewars arise. I am here to tell everyone that if you’re flaming over whether stuff is open source or if it’s free software, you guys need to chill the fuck out: everything that is open source is also free software, and vice versa.

I bet that declaration alone is gonna rile everyone up even more, eh?

This guy has tuned to perfection the built-in flamewar radar under his beard through years of hard labour in the Usenet grand banks.

Okay, let’s look at this from a different angle with an analogy. The issue here is with something that philosophers like to call intensionality vs extensionality.

Now, you might recognise that lady above, and you probably also know that England also has a queen, and by now my astute readers and you have doubtlessly put together that the Queen of Canada also happens to be the Queen of England. Two names for the same person!

However, Canada’s Constitution Act doesn’t actually specify “The Queen of Canada will be whoever occupies the position of Queen of England”. It just says that Canada has a queen and goes on to list the duties of said queen. This is called the intensionality, the words by which we describe what something is. The extensionality refers to the actual objects in the world that are described by these words. In this case, “Queen of Canada” and “Queen of England” could, perhaps, under some weird political shenanigans end up being two different people, but in practice they end up referring to the same person. So the extensionalities of “Queen of Canada” and “Queen of England” are the same.

It is the same with free software and open source. The definitions look different, but in practice the software that they refer to ends up being the same. Oh, sure, there are some very minor disagreements over whether this or that license is OSI-approved but not FSF-approved or vice versa, but the whole point of coining “OSI” was to have another word to refer to “free software”.

In other words, it was always OSI’s intention for “open source” to be a synonym for “free software”. Hell, even Bruce Perens said so. Why did OSI want a synonym?

5. Open source came with certain promises

The whole point of coining the phrase “open source” was to push a certain point of view. The biggest proponent for the “open source” phrase was Eric Raymond. He and OSI have always described open source as marketing for free software.

So this marketing campaign came with certain promises, promises that we have forgotten were ever part of a marketing campaign by OSI, because they’re so ingrained into open source itself. Stop me if you’ve heard any of these before…

Open source is a cheaper model to develop software

Open source ensures that software has fewer bugs, because more eyes can look at the source code

Release early, release often.

The best software is created by scratching an itch.

And so on… the whole point was to make free software attractive to business by de-emphasising the whole “freedom” part of it. Instead, OSI promised that by making your software open source, you would have better software, that open source was a better development model, leading to cheaper, less buggy software.

Less buggy? Really?

The “cheaper model” thing is also still a fairly popular meme nowadays. When you look at free projects in Ohloh.com, one of the lines is how much money it would have cost to build this or that under some model called COCOMO.

I’m not trying to say that OSI is right or wrong about its promises. Some free software really is less buggy than non-free variants. It probably is way cheaper to develop Linux when all of the big companies chip in a few developers here and there to maintain it. All I’m saying is that we have forgotten that with the word “open source”, certain promises came attached to it. Some of these promises might even appear to be broken in some cases.

So next time you hear someone tell you that there will be fewer bugs and everyone will come sending you patches the moment you reveal your source code, remember that they’re repeating campaign slogans. And remember that even if those slogans might not always be true, there might be other reasons why you should give everyone else freedom to enjoy and distribute and hack your software.

June 06, 2014

So this post made the rounds a couple of days ago, and it got me thinking… can Mercurial (hg) do any better? I think it can, especially with Evolve. Here is me describing how Evolve works:

As to the movie, if you have not seen it yet, you might want to wait until after you do, but the basic gist is a time-travel plot where they go back and fix timelines.

In the beginning

History is terribly wrong, an awful, crippling bug has been discovered way back in history, and it’s so terrible that a big chunk of current history has to be thrown out. Someone created evil sentinels, so evil that they decided to exterminate all mutants and most humans.

Preparing Wolverine

Professor X and Magneto brief Wolverine on his impending task. The history has been made public, but the situation is so hopeless that hg admin Kitty Pryde decides to operate on Wolverine’s repo, the only one that could withstand the changes:

Now Wolverine’s repo can endure any change. It’s a desperate move, but these are desperate times. Kitty sends Logan back:

$ hg update -r mystiques-first-kill

Making the fixes

Wolverine dispatches some minor thugs and squashes a few bugs, but the first change needs to alter the timeline,

$ hg amend -m "Attempt some wisecracks with some thugs"

137 new unstable changesets

Now all of the history that was based on top of this commit is unstable. It’s still there, for now, but things are rocky. Sentinels are approaching in the bad future and might kill everyone. Shit will get real there.

That’s ok, though, Wolverine is badass, doesn’t give a fuck, and goes about his business,

Finalising

At this point, the unstable history with the bad timeline is no longer needed. If the X-Men had wanted to keep any part of it, they might have used the hg evolve command, but they just want to forget the whole mess

June 03, 2014

These days I have been studying the documentation of the FEniCS project, mainly the FEniCS book, in order to understand the features related to parallel execution that it boasts. This preliminary study is aimed at adding them to the fem-fenics package. First of all I will summarise my findings, then I will comment the problems I need to address to implement this functionality.

Parallelism in FEniCS

FEniCS implements parallelism in such a way to be transparent to the user of the library. Moreover, it scales on different architectures, ranging from multi-core personal computers to distributed clusters. To this end, FEniCS makes use of two paradigms, which can be exploited both separately and together.

The first approach is tailored for shared memory architectures, such as the vast majority of the PCs nowadays, but also in many cases each node of a computational cluster. The implementation is based on OpenMP, and adding a simple instruction one can enable parallelisation to speed up the matrix assembly phase. It should be noted that this paradigm has little support in the underlying linear algebra libraries, so the resolution phase can take advantage of multi-threading only with the PaStiX solver. Since in a shared memory model parallel programs might suffer race conditions, the mesh is coloured to identify subsets, so that no two neighbouring elements belong to the same set. Obviously, the notion of proximity depends on the particular function space, then this is considered in the colouring algorithm. The assembly proceeds iterating over colours and splitting their nodes among threads: with this technique race conditions are avoided and the user can enjoy the benefits of parallelisation without incurring in unpredictable behaviour.

Contrasting to the first approach, the second paradigm is based on MPI and addresses the needs of distributed memory architectures. Unfortunately, the latter is less immediate than the former, requiring a DOLFIN program to be launched with the MPI execution utility, but in this case the code need not be modified. In this implementation, the mesh is split so that each process gets its part of it, with an algorithm striving to minimise inter-process communication. With scalability in mind, no single process holds the full matrix and, moreover, everything happens behind the scenes: this way the user has no need of taking care of low level issues. The distributed memory paradigm is diffusely supported in the algebraic back-ends, so it allows the usage of several solvers, both indirect and direct. As already noted, this and the previous approach can be combined, for instance distributing the computation on a cluster and further speeding up the assembly process enabling multi-threading within each node, provided they are multi-core machines.

The implementation in fem-fenics

The shared memory paradigm should be quite straightforward to implement in fem-fenics. I expect to operate on a couple of functions: the private generate_makefile and the two assemble and assemble_system. The former should have the proper compilation flag (-fopenmp) added. The latter should have a new line reading like:

dolfin::parameters["num_threads"] = femfenicsthreads;

The number of threads could be passed to those functions as an argument, but this would ruin the interface compatibility with FEniCS, so this is a poor approach. Another way of addressing the issue is to define a global Octave variable in PKG_ADD and store in it the desired number of concurrent threads to use for the assembly.

The implementation of the distributed memory paradigm, instead, seems quite tricky. Basically, Octave does not use MPI, at least not Octave core. Nonetheless, there are two Forge packages with this goal, mpi and parallel. I will go through the documentation of these packages to understand if and, in case, how they address the problem of launching the oct-file with mpirun or mpiexec. Even leaving this aspect aside, I still do not know how easily the distributed objects storing matrices and vectors can be accessed to obtain the whole data.

In conclusion, I will initially work to add shared memory parallelism, at the same time looking deeper into the issues related to the distributed memory paradigm, which I suspect of being more than the ones highlighted.

May 26, 2014

This week has been more about researching than coding. I have finally been able to reproduce the output from the ilutp(ilu with threshold and pivoting) Matlab's algorithm with an m-script (named ILU_pc.m in my project's directory). The fact is that Matlab does not implement the algorithm as is described in Yousef Saad's book in a few ways. Because of that I had to do reverse engineering, testing many cases and matrices. That is the function, ugly as hell, but is just for testing purposes.

for i = 1:n for j = i+1:n if (abs(A(i,j)) < (tau*norm(B(:,j)))) A(i,j) = 0; end end end end

The next goal to achieve is obviously to implement the function as .oct file translating this algorithm into a sparse one using Octave's internal data types.

All the testing I did was at college using their Matlab license. That delayed me because I couldn't do almost nothing in the weekend. Now I have a function that reproduce the behavior of Matlab's version I can test against it my c++ code.

May 23, 2014

This week I started my work on the ufl function: it is now possible to write ufl code on-the-go, directly in your m-files. You can see below how the Poisson.ufl file of the homonymous example provided with fem-fenics (on the left) can be translated to a snippet of Octave code:

How to use ufl

Basically, you just need to prepend what you would have written in your .ufl file with ufl. As you can see, anyway, there are also two new instructions. fem-fenics still needs to store your code in a separate file, which is then compiled using ffc, the FEniCS form compiler, but now ufl takes care of the process.

Your code should begin with the start command, and optionally with the name you want to assign to the file: in this example, we choose to open a new Poisson.ufl file. Be aware that ufl will not overwrite an existing file so, if you plan to use your script for several runs, my suggestion is to keep your working directory clean and tidy with a delete ('Poisson.ufl') after the snippet above.

When you are fine with your ufl code, the end command will tell ufl that it can compile and provide you with your freshly built problem. You can also specify options like BilinearForm (it is not the only one available, find a comprehensive list in the help message, in Octave), in case you wrote just part of the problem in your last lines.

What now?

A lot of commitment was devoted to this function. This is not due to intrinsic difficulties: a sketch of the function's code has been around for a while and the current implementation has not consistently slid away from it. The goal was to obtain a robust piece of code, since it will be the cornerstone of a new paradigm in fem-fenics usage. At least each and every example provided with the package needs to be modified to take advantage of this change, and this will be my next task.

May 19, 2014

As said in my previous post, I have been working on extending the implementation of interpolate to allow for an Expression as input. Currently it can also be used as in the Python dolfin interface, see here. Let's see how to use this new function in fem-fenics.

The Poisson equation

This example can be found in the FEniCS Book, it is the very first. The problem at hand is the Poisson equation with Dirichlet boundary conditions:

- Δu = f in Ωu = u0on ∂Ω

We will solve this problem on the unit square, with f constant and equal to -6 and u0 = 1 + x2 + 2y2﻿﻿. It can be verified that the exact solution is uex = 1 + x2 + 2y2. With the following ufl file:

element = FiniteElement("Lagrange", triangle, 1)

u = TrialFunction(element)

v = TestFunction(element)

f = Coefficient(element)

a = inner(grad(u), grad(v))*dx

L = f*v*dx

and Octave script:

pkg load fem-fenics msh

import_ufl_Problem ('Poisson')

# Create mesh and define function space

x = y = linspace (0, 1, 20);

mesh = Mesh(msh2m_structured_mesh (x, y, 1, 1:4));

V = FunctionSpace('Poisson', mesh);

func = @(x,y) 1.0 + x^2 + 2*y^2;

# Define boundary condition

bc = DirichletBC(V, func, 1:4);

f = Constant ('f', -6.0);

# Define exact solution

u_e = Expression ('u_ex', func);

a = BilinearForm ('Poisson', V, V);

L = LinearForm ('Poisson', V, f);

# Compute solution

[A, b] = assemble_system (a, L, bc);

sol = A \ b;

u = Function ('u', V, sol);

# Save solution

save (u, 'poisson');

# Interpolate and save the exact solution

u_e_int = interpolate (u_e, V);

save (u_e_int, 'exact');

it is possible to compute the numerical solution, interpolate the analytical one on the same function space and then compare them. Using a visualisation tool like Paraview, one can verify that the computed solution and the interpolation of the exact one are practically the same. This is due to the fact that the Finite Elements Method with triangle elements on a rectangular domain can exactly represent a second order polynomial, as the solution of the problem at hand.

Here you can see a good solution poorly post-processed in Paraview to the Poisson problem solved in the example.

Taking the idea from Kai's last year blog, I will keep track of what is already done with the following figure.

Regarding repository setup, Kai helped me to configure a subrepository using bitbucket service. At present, it only contains an outdated Octave development version just to make sure things work. For cloning:

hg clone https://edu159@bitbucket.org/edu159/octave-subrepo

However, I would not need to use this subrepo until the final integration of my code into Octave. For development purposes I have set another repository for daily work, as I am working with .oct files that compile standalone. Here is the repo you should check for my updated work.

May 17, 2014

Lately I have coded my first function for fem-fenics: it is interpolate, which wraps the homonymous method of the dolfin::Function class. This allows to interpolate a Function, G, on the FunctionSpace of FunctionF, even if they are not defined on the same mesh, with a call like this:

res = interpolate (F, G)

I am working on extending it to allow for an Expression as input. With this function it is possible to make a quantitative comparison between the results of different discretisation approaches or to check the accuracy of a method, comparing the computed solution and an analytically obtained one.

The implementation

I provide here a comprehensive overview of the code for interpolate. First of all, the number of input arguments is obtained and an octave_value is declared, in order to hold the output. Then there is a check ensuring that exactly two arguments are provided and no more than one output value is asked for.

After verifying these preliminary conditions, there are some instructions checking that the function type is loaded and, if necessary, registering it. This way Octave is aware of it and can store it in an octave_value.

Eventually, the real computation is performed. After checking that the inputs are of the function type, with a static_cast the actual objects are extracted from the arguments:

Here comes the tricky part. The classes in fem-fenics are designed to hold constant dolfin objects, but dolfin::Function::interpolate is not a constant method. In order to be able to call it, a local dolfin::Function is constructed, used to perform the interpolation, then fed to the function constructor and assigned to the return value:

implement in fem-fenics further FEniCS functionalities, preferably according to the FEniCS developers' directions;

improve the build and distribution system, so that end users can enjoy full functionality right away after installing from Forge.

details

I will address point 3 implementing an .m function which accepts strings as arguments and writes them to a file. There should be two keywords, such as start and end, to identify where the UFL input begins and finishes. After writing this code, it will be compiled when needed. This way UFL instructions could be written directly in .m files in the following manner:

To address point 5, instead, I will add instructions to PKG_ADD to automatically find out, through pkg-config, the proper flags to allow for the just in time compilation of .oct files. I will also add instructions to PKG_DEL to restore the environment at its previous state when the package is unloaded. This would allow end users to use the package without taking care of the problems reported here.

tentative agenda

22 April - 18 May: Study of the documentation and interaction with my mentor and the Octave community to understand thoroughly the code base and the contribution expected from my project

19 May - 22 June: First phase of the project. I will implement bindings for the UFL language in Octave and adapt accordingly the provided examples. I will also work on the build and distribution system to allow for feedback on it from the community. In the end, I will commit a first set of new functions

23 June - 27 June: Period for the submission of the mid-term review, I will double check the functionalities already implemented and improve their documentation

28 June - 10 August: Second phase of the project. I will improve the package performance, both reducing copies of matrices between Octave and dolfin and implementing checks to avoid useless just in time compilations. Furthermore, I will add a second, probably larger, set of new functions, as suggested by the community and FEniCS developers. I expect to code some new examples which make use of the freshly introduced capabilities

11 August - 22 August: Week devoted to final minor fixes and to the improvement of the documentation

My name is Eugenio, a student in Mathematical Engineering at Politecnico di Milano. This summer I will be working with GNU Octave to continue the implementation of fem-fenics, an Octave Forge package started in last year Google Summer of Code. It is intended as a wrapper of FEniCS, a general purpose finite elements library, and its goal is to provide such numerical methods in the familiar interface offered by Octave.

In this blog you will find up-to-date information about the state of my contribution.

April 16, 2014

I just spent 5 days at PyCon 2014 here in Montréal (3 days for the actual conference, 2 days sprinting), and wow, what a great conference that was.

There are many things I want to praise about the whole experience. The venue was great, the organisation was superb, the talks were interesting, the infrastructure was amazing, the atmosphere was friendly… but most of all, I think I want to praise the entire culture of inclusiveness that the Python community is trying to promote.

It is interesting that the only true common thread at the conference was a programming language (and not even that, sometimes, some of the talks were hardly about the Python programming language at all). Python was originally conceived as a programming language that was meant to be as easy as possible to understand. Whether it has succeeded from a purely language-design point of view is hard to say, and not everything about Python is great. The language has its gotchas here and there, just like any other language. And yet, despite not being a perfect language programming language, it’s able to bring together such a diverse group of individuals together to accomplish common goals.

Now, to be clear, I don’t think that PyCon has eliminated sexism or that we have “won” this battle. As I overheard someone say, PyCon will not be inclusive enough for women unless the lines for the women’s bathroom are as long as the lines for the men’s. And there are still many issues, such as women still being “invisible” and ignored, or as I overheard someone else say, she actually had to say to some guy to look up from her breasts while she was talking to him. It’s not there all the way yet.

This just seems like a good start. I hope next year at PyCon 2015, we’ll be able to get 50% women attendees and speakers!

March 25, 2014

My intention is upgrading some functions related with sparse matrices so they become compliant with Matlab and implement others that are not present in Octave right now. This is the list plus some comments about how I expect to do things.

1: I have already submitted a patch but maybe some modifications are needed to improve performance. Anyway I have mailed Rik (the last one that modified the source code of that functions) and has told me that he will give it a look in a couple of weeks.

2: I have not yet investigated enough to give a road map for implementing that function.

In this website there are codes written in several programming languages that implement the algorithms (the authors are the ones that written the paper that Matlab gives as reference in their documentation for both functions). I have emailed professor Michael Saunders about adapting them into Octave versions and he answered me that I am welcome to do while I respect the license (CPL or BSD licenses). They are meant to be very unrestrictive but I need to get informed about the compatibility with GPL3.

5: Here comes the big one. That function has a big chunk of options and the last year was almost implemented by Kai Torben as his GSOC project. He interfaced Octave with ITSOL/ZITSOL libraries but in the end there were some issues with that approach:

ILUTP algorithm did not work

He had to patch the library to get things work!

modified versions of algorithms ("milu" option) were not implemented in the libraries

That "ugly" scenario lead to finally not being able to include ITSOL as a dependency with Octave. Bottom line, the integration with the development repository could not be achieved.

My approach: I will write from scratch all the functions needed as oct-files (ILUTP, ILU0, ILUC and ILUT) for real and complex numbers implementing the algorithms described by Yousef Saad in his book "Iterative methods for sparse linear systems Ed. 2". I can use some of the code that Kai wrote, mostly the tests and the m-file "ilu.m" that glue together all the functions.

6: That function need less work since it is almost all implemented. Just some complex implementations are missing and the modified version of the algorithms too. Since Kai implemented those functions from scratch there are no dependency problems. That is nice.

Update 1: Kai has commented me that there are some license issues on the FORTRAN code he used (see here). So ichol related functions need to be implemented from scratch.

Update 2: Following Kai's recommendations I will focus on ilu/ichol functions for the GSoC period and just in case there is some time left go ahead with the other functions.

I have implemented the most basic function ilu0 that do the incomplete LU decomposition with 0-fill. I have drawn a table with the execution times of that algorithm using Matlab, my version and using the code that Kai implemented last year using ITSOL. The function right now can be used with real and complex numbers and has the milu='row' option implemented.

The table shows for a NxN sparse matrix with a number of non-zero elements NNZ, the time of execution (tic - toc was used).

N-NNZ

ILU0-mine

ILU0-Matlab

ILU0-ITSOL

50 - 683

0.000055 s

0.000065 s

0.00085 s

400 - 72435

0.027 s

0.024 s

0.04 s

2000 - 1805571

3.35 s

3.25 s

4.88 s

5000 - 6482839

14.2 s

14.5 s

22.75 s

It can be seen that the implementation using ITSOL is the slowest. Maybe just because the overhead of copying and translating data back and forth between Octave and ITSOL. Between my version and Matlab's there is almost no difference.

Here you can download the code (ilu0.cc). It does not have any test nor documentation written yet.

February 26, 2014

Octave has been accepted into Google Summer of Code, for our first time as an independent organization! Student applications are due March 21, and decisions about accepting students will be made in early April.

To make GSoC a success, we need not only strong student programmers but also committed mentors to supervise the students and work with them to have contributions be useful for Octave. This can be tough to achieve because mentors are unpaid (though Google may send you a nifty T-shirt).

Each project should have a primary mentor and a backup mentor. In my experience, primary mentors should plan to devote 5-10 hours per week to the project, including speaking with the students and reviewing their code. Backup mentors have a smaller time commitment but still should keep up with project progress and be available to step in when the primary mentor is otherwise occupied.

If you’d like to be a project mentor:

1) Go to the wiki projects page and put your name next to projects you’d be interested in mentoring. Feel free to improve the project description at the same time.

February 01, 2014

I am cautiously hopeful for bitcoin. I just ate pizza and poutine with a group of friends, paying at the restaurant with bitcoins!

Acquiring the bitcoins

I am not a speculator. I am not an investor. I am not a miner. I am not trying to get rich nor make any money whatsoever by manipulating bitcoins. I just want to be able to earn and spend money on the internet without needing the permission of a bank, or the permission of Paypal, or making it anyone’s business but mine and the person I’m actually doing business with.

I acquired some bitcoins about a year ago, at the time it was about 1.8 bitcoins worth around 30 USD. I did not buy them. I did not invest on bitcoins. I did not mine them. I received them as a payment for tutoring mathematics in an IRC channel for an hour and a half. I earned these bitcoins through my labour, just like I would earn any other currency.

At the time there was not much I could do with bitcoins. Although I freely encouraged and accepted the payment in bitcoins, I did it more with amusement than conviction. I am not a criminal, but since at the time Silk Road still had a sort of forbidden underground allure, my client suggested that I could use the bitcoins to buy illegal drugs on the black market. I had no intention to do so, but I decided to keep the bitcoins around, as a cautious hope I could use them for something else.

First purchases

I kept the bitcoins for quite some time, since I could not find anything to spend them on. The first thing I could find that seemed interesting was a 4chan “account” (in reality, just an exemption from its captcha). Despite the website’s bad reputation as being the cesspool of the internet, I think that there is value in its pro-anonymity ethos, something that is slowly being eroded away in today’s online world. This was also my first test case for the possibility of anonymous online currency. In keeping with this ethos, I made the payment, and when there was a slight hiccup with the transaction, used a throw-away email address to resolve the issue. No chargebacks, but people are still people and are still mostly nice. I thus obtained my 4chan captcha exemption. I have not really used it since then, but I am satisfied knowing that I have made a small contribution towards promoting anonymity.

I kept my eye out for news stories of more opportunities to spend bitcoins on. The currency seemed to be slowly gaining more adoption, especially for non-material goods and services. My next purchase was one month of reddit gold as a gift for a particularly witty commentator. These two purchases together had already given a significant blow to my bitcoin balance, but I was not duly concerned. After all, this was just pocket change I acquired for 90 minutes of a hobby task.

Then, suddenly, over a couple of weeks the bitcoin price exploded from 20 USD per bitcoin to over 1000 USD per bitcoin. I didn’t exactly become a millionaire, but my paltry fraction of a bitcoin now had the power to buy more things.

I finally got around to reading Satoshi Nakamoto’s whitepaper. Wow. I don’t know if this person or group is a genius or just a child of the times, but Nakamoto seems to have solved a cryptography protocol problem that nobody had solved before. Nakamoto didn’t invent the idea of cryptocurrencies, but merely built upon the ideas of others in order to build a decentralised system. I was duly impressed.

Food for bitcoins

I eagerly announced yesterday on IRC that I was planning to go to this restaurant to try out my bitcoins. Another chatter local to Montréal decided to tag along for fun. We made plans to go that very evening for a night of socialising, three couples. I eagerly explained to everyone my excitement over finally being able to pay someone face-to-face with bitcoins.

My fiancée and I got to Montréal Poutine in the Old Port a few minutes earlier than everyone else. It was a small location, hardly more than a greasy spoon, but par for the course for a poutine joint. There were signs all over the establishment announcing the possibility of paying with bitcoins and a wifi password on their chalkboards.

We eyed the menu, and I nervously made a few preliminary checks to ensure the transaction would go smoothly. Due to one of my many quirks, I do not own a pocket computer (“smartphones”, neither smart and hardly phones), so I was prepared to pay with my laptop and a webcam for scanning a QR code. I ensured that the internet connection was stable and that I could scan a QR code. As instructed in bitcoinaccepted.ca (or in payezenbitcoin.ca, because this is .ca), I proudly announced to my server that I was going to pay with bitcoins. “Oh, wow, it’s my first time,” she said in French. “Mine too,” I replied with a smile.

Our company arrived, the two other couples. We ordered various kinds of poutine, pizza, avocado fries, beer and mineral water. We chatted about many things, and soon predictable discussions about bitcoins ensued. Their method of operation, their security, their volatility but recent relative stability. Musings if they would ever work or not. The more techy in the bunch were showing signs of optimism, while those who found the idea more foreign were predicting the eventual doom for bitcoins.

After a very pleasant evening, it was time to pay the bill.

The moment of truth

I announced that I had enough bitcoins to pay for all the food, but I asked everyone else to divvy up the drinks between them. I also told them that I had read that tips were not accepted in bitcoins yet, so that would have to be paid elsehow.

I readied my laptop and the webcam, to the growing amusement of my companions. The server came with one bill for food and another for drinks. I reminded her that I was going to pay with bitcoins, but I wasn’t quite sure what to expect. I had seen videos online of handheld machines that displayed a QR code with the bitcoin address to pay to, and that was my guess for what I would see. Instead, she pointed me to a static QR code pasted next to the cash register. Was this the bitcoin address to pay to? I walked with my baroque laptop-and-webcam rig over to the cash register, in good fun as I heard my giggling friends behind me.

I used Electrum to scan the QR code, looking for the address. Instead of a bitcoin address, this was a Bitpay URL. I wasn’t quite sure what to do with it, and I floundered for a few seconds. I opened a generic QR code scanner to get the URL. One of the guys with us helpfully walked over to help me scan the QR code, but I had already managed to get the URL loaded in Firefox by this time. The server was re-reading the instructions next to the QR code on how to pay.

At the Bitpay URL, there were two fields to fill out: an order number and the amount in CAD to pay. The server read the instructions again and said to leave the order number blank. I filled in the amount with taxes, showed it to her, and we agreed it was correct. I clicked enter. A bitcoin address showed up. I went back to Electrum to send money to that address. I stumbled when typing my wallet password. The server, getting the hang of the transaction, said bemusedly, “I cannot help you with that.” Finally I got it, and the familiar “this invoice has been paid” message showed up on the Bitpay website.

A few seconds passed, and the restaurant staff confirmed on another computer that they had received the payment. I showed everyone at the table the paid invoice on my laptop screen. A few other customers in the restaurant were showing some interest. I announced to everyone, “it’s ok, bitcoin has worked, I made the payment!”

Everyone relaxed and proceeded to tackle the more familiar problem of paying for the drinks with more conventional currency.

Future of bitcoins

I don’t know if bitcoin is going to crash or not. I don’t understand enough economics to comment on what the value of bitcoins is, or if it really is different from the value that we give to any other currency. I have a rough understanding of the cryptography behind it and what makes the whole thing work. I know people are working on making bitcoins more accessible to people less enthusiastic than me, and I very much wish they succeed.

All I know is that so far bitcoins have worked for me. I look forward to getting better acquainted with this new way to conduct business. I wish good fortune to everyone else who is also trying to build a new currency for an internet-powered age.

December 15, 2013

RATTLE

Constrained mechanical systems form an important class of differential equations on manifolds. For all the theory that I present here and I've used for the implementation I refer to "Geometric Numerical Integration" by Hairer, Lubich and Wanner.

Symplectic Euler method can be extended to constrained systems but we focus on SHAKE and RATTLE algorithms. SHAKE is a 2-steps algorithm so that, since I'm implementing only 1-step algorithms and the overall structure of solvers and integrators is made for 1-step solvers, I implemented just RATTLE algorithm.

December 09, 2013

In this first post I want to explain the organization of the code that I'm going to implement. The main idea is to have a structured organization and to subdivide the code according to the most important operations which must be executed. So that should be easier to optimize the bottlenecks and to extend the code with new functionalities.

We will have two main actors: the steppers and the integrate functions.

Steppers

A stepper will be a function and will represent the numerical method used for the integration. Its job is to execute just one integration step and its signature will be:

[x_next,err] = stepper(f,x,t,dt)

x_next is the solution at the next step, err is an estimation of the error (obtainable for example with Richardson Extrapolation), f is a function handle representing the equations to be integrated, x is the solution at the current time t and finally dt is the time step.

Inside this function I'll check if err is really requested by means of nargout keyword. The estimation of the error will be useful to determine the optimal dt in adaptive integrators.

To set the parameters for both the numerical method and the solver I'll use odeset and odeget which are not embedded into Octave but are functions of Odepkg. My intent is to edit these functions in order to make them suitable for my parameters setting.

Integrate Functions

An integrate function will be the function executing the algorithm of integration on more steps. We will have different integrate functions depending on how we want to obtain the solution. For example we can have:

integrate_const(stepper,f,x0,t0,t1,dt) integrating from t0 to t<=t1 with a fixed dt;

integrate_adaptive(stepper,p,f,x0,t0,t1,dt) integrating from t0 to t1 with an adaptive timestep, where p is the order of the stepper.

A simple example: Forward Euler

In this part I'll show you an example of two simple steppers (fe_richardson and fe_heun). Both the steppers make use of the Forward Euler method to find the new solution: $$x_{k+1} = x_{k} + h f(t_{k},x_{k})\ .$$ They differ in the way the error is estimated: fe_richardson makes use of Richardson Extrapolation while fe_heun makes use of Heun-Euler method.

The Heun-Euler method combines the Heun method, which is of order 2, with the Euler method, which is of order 1. This combination allows to get an estimation of the local error. We can write the Heun method as $$\begin{aligned} \tilde{x}_{k+1} &= x_{k} + h f(t,x_{k}) \\ x_{k+1}^{*} &= x_{k} + \frac{h}{2} (f(t,x_{k}) + f(t+h,\tilde{x}_{k+1})) \end{aligned}$$ So that the error estimation is given by: $$ e_{k+1} = \tilde{x}_{k+1} - x^*_{k+1}=\frac{h}{2}(f(t+h,\tilde{x}_{k+1}) - f(t,x_{k}))$$ This is the implementation of the fe_heun method:

The adaptive integrator

Finally I'll describe how integrate_adaptive is implemented. As i said this function does the integration on the total interval [t0,t1] adapting the timestep at each iteration, by means of the error estimation returned by the stepper. This function takes as input:

stepper: the stepper;

p: the order of the stepper;

f: the function to be integrated;

x0: the initial condition;

t0, t1: the extremes of the time interval;

dt: the first timestep.

While the actual time is less or equal to the final instant, this function recursively calls the stepper and then updates the solution, the time and the next timestep. Finally, since the very last instant may be greater than t1, the function substitutes the last element of the solution with a linear interpolation calculated in t1 between the last two elements of the solution.

The method used to calculate the next timestep is taken from the book of Shampine, Gladwell and Thomson. tau is a parameter which now is fixed but in the future will be setted by means of odeset and odeget functions.

How it will be at the end

At the end, functions like ode45 will be just aliases. For example, we define a function called odefwe which does the integration by means of Forward Euler method, adapting the timestep with Richardson Extrapolation. The call to this function will just hide a call to integrate_adaptive function in this way:

Odeset & odeget

Odeset and odeget functions allow to build a valid ODE options structure. They are already available in Octave odepkg but they are not perfectly compatible with MATLAB odeset and odeget functions. Furthermore, for geometric integrators like Spectral Variational Integrators, I will need new options into the ODE options structure, which now are not admitted in Octave odepkg.

So that I have written my own versions of odeget.m, odeset.m and the new function ode_struct_value_check.m in order to have full compatibility with MATLAB odeset and odeget (their same behaviour on the basis of their official documentation) and also the possibility to add new field names which i will need for future solvers of this project.

The new fields are the union of MATLAB ODE options and Octave ODE options, plus my new options (default values are in square brackets):

The last usage is needed for MATLAB compatibility and represents a fast access to the given structure with no error checking on options names.

Fuzzy string comparison

Users may do mistakes while typing ODE option names or they may want to write everything in low cases for a faster coding. So that my odeset and odeget functions behave in a user-friendly way, calling the fuzzy_compare function which determines the distance of each structure option name from the given word and returns the indices of those options whose distance is under a given tolerance.

The idea is that word cases are not relevant, if we want an exact matching then the tolerance will be 0, if no index is returned then it will definitely be a typing error, if one index is returned but words don't match exactly a warning will be displayed saying that the program is going on assuming that the user was intending the closest option name, otherwise all the fields whose distance is under the tolerance will be displayed and the user will be asked to insert the name again.

Signature of this function follows.

res = fuzzy_compare(string1,string_set,correctness)

string1 must be a string and is the option name we're looking for;

string_set must be a cell array of strings or a column vector of strings; it represents the set of option names;

correctness is the tolerance we want. It is an optional input argument;

We may set correctness equal to 'exact' or 0 and in this case fuzzy_compare will look only for exact matching; if we set correctness to any positive integer it will be the tolerance of the method (that is the maximum distance, from the given string, accepted). If we don't specify anything for correctness then the tolerance will be calculated as a function of the minimum distance and of the length of the given string: the less the distance, the less the tolerance; the greater the length, the greater the tolerance.

tolerance = minimus + floor(length(string1)/4)*floor(minimus/3)

There exist many definitions of distance between strings. I've chosen the Levenshtein distance.

Levenshtein distance

The Levenshtein distance is a string metric and is equal to the minimum number of single-characters edit (insertion, deletion and substitution) required to change one word into the other. This is the main algorithm of levenshtein function:

From these results it can be derived that the local truncation error (LTE) is $$ \mathbf e_{k+1} = \mathbf y_{k+1} - \mathbf u(t_{k+1}) = \mathbf o({\Delta t}^2)\ .$$

Inexact Newton

Now we have to solve the non-linear system built up with the backward Euler method. We want to use an inexact Newton solver. The classical Newton method is written as: given a non-linear functions system F(x) = 0 with a given initial condition, solve the linearized system: $$ \mathbf F'(\mathbf x_k)\mathbf s_k + \mathbf F(\mathbf x_k) = \mathbf 0 \\ \mathbf x_{k+1} = \mathbf x_k + \mathbf s_k\ .$$ Go on iterating this method until a given tolerance is reached.

In many cases, especially when we don't have an explicit expression for the Jacobian or it can't be inverted, we can use an inexact Newton method: $$ || \mathbf F'(\mathbf x_k)\mathbf s_k + \mathbf F(\mathbf x_k) || \leq \eta_k || \mathbf F(\mathbf x_k)||\ ,$$ where \(\eta_k\) is said to be the forcing term. The linearized system can be solved with an iterative method such as GMRES or preconditioned conjugate gradient.

Choosing the forcing term with one of the two previous possibilities, rather than imposing a fixed tolerance, it is possible to avoid the phenomenon of oversolving. Another advantage is that if we use the GMRES as the linear solver it's not necessary to know the Jacobian nor to datermine it, because we can approximate the term $$ \mathbf F'(\mathbf x_{k})\mathbf s_k $$ with a first order finite differences formulae: $$ \mathbf F'(\mathbf x_k)\mathbf s_k \approx \frac{\mathbf F(\mathbf x_k + \delta\mathbf s_k) - \mathbf F(\mathbf x_k)}{\delta} \ .$$

Variational integrators

The variational integrators are a class of numerical methods for mechanical systems which comes from the discrete formulation of Hamilton's principle of stationary action. They can be applied to ODEs, PDEs and to both conservative and forced systems. In absence of forcing terms these methods preserve momenta related to symmetries of the problem and don't dissipate energy. So that they exhibit long-term stability and good long-term behaviour.

Considering a configuration manifold V, the discrete Lagrangian is a function from V to the real numbers space, which represents an approximation of the action between two fixed configurations: $$ L_d(\mathbf q_k,\mathbf q_{k+1}) \approx \int_{t_k}^{t_{k+1}} L(\mathbf q,\dot{\mathbf q};t) dt \hspace{0.5cm}\text{with}\hspace{0.4cm}\mathbf q_{k},\mathbf q_{k+1}\hspace{0.3cm}\text{fixed.}$$ From here, applying the Hamilton's principle, we can get the Euler-Lagrange discrete equations: $$D_2L_d(\mathbf q_{k-1},\mathbf q_k) + D_1 L_d(\mathbf q_k,\mathbf q_{k+1}) = 0\ ,$$ and thanks to the discrete Legendre transforms we get the discrete Hamilton's equation of motion: $$\left\{\begin{array}{l} \mathbf p_k = -D_1 L_d(\mathbf q_k,\mathbf q_{k+1}) \\ \mathbf p_{k+1} = D_2 L_d(\mathbf q_k,\mathbf q_{k+1}) \end{array}\right.\ ,$$ so that we pass from a 2-step system of order N to a 1-step system of order 2N. This system gives the updating map: $$ (\mathbf q_k,\mathbf p_k)\rightarrow(\mathbf q_{k+1},\mathbf p_{k+1})\ .$$ For all the theory behind this I refer to "Discrete mechanics and variational integrators" by Marsden and West.

odeSPVI

Within odeSPVI I implemented a geometric integrator for Hamiltonian systems, like odeSE and odeSV, which uses spectral variational integrators with any order for polynomials of the basis and with any number of internal quadrature nodes, for both unidimensional and multidimensional problems.

[T,Y]=odeSPVI(@hamilton_equations,time,initial_conditions,options)

This solver just uses the stepper spectral_var_int.m which is the function where I implemented the resolution of the system displayed above; hamilton_equations must be a function_handle or an inline function where the user has to implement Hamilton's equations of motion: $$ \left\{\begin{array}{l} \dot{\mathbf q} = \frac{\partial H}{\partial \mathbf p}(\mathbf q,\mathbf p)\\ \dot{\mathbf p} = - \frac{\partial H}{\partial \mathbf q}(\mathbf q,\mathbf p) + \mathbf f(\mathbf q,\mathbf p)\end{array}\right.$$ where \(H(\mathbf q,\mathbf p)\) is the Hamiltonian of the system and \(\mathbf f(\mathbf q,\mathbf p)\) is an optional forcing term.

At lines 22 and 56 I put a comment line to show that it is possible to solve the implicit system also with inexact_newton (which is the one explained in the previous post on backward Euler), so it's not mandatory to use only fsolve. I will make it an option in order to let the user choose which solver to use.

Another important aspect to point out is that in case of an adaptive timestep the error is estimated using a new solution on the same timestep but with polynomials one order higher and one more internal node for the quadrature rule. Furthermore, for this last solution, the starting guess for fsolve is chosen in a non-trivial way: at line 53 we see that y0 has in the first q_dofs_err-1 rows the same modal values calculated before for the new solution x_next and a row of zeros just below. Then the starting nodal values for the momenta are set (lines 47-51) as a trivial average of new solution nodal values. This can seem wrong but I empirically stated that the number of iterations done by fsolve are the same of cases in which the reinitialization of nodal values is more sophisticated, so that is less computationally expensive to do a trivial average.

In the following code is shown the implementation of the system to be solved:

Here, at line 46, I don't show the implementation of the Jacobian of the implicit system because, with this visualization style, it may appear very chaotic. The important aspect is that the user can use the implemented Jacobian to speedup the computation. I stated that it really allows to speedup the computation when using fsolve as solver. Furthermore the code tries to see if the user has implemented, in the function defining the Hamilton's equations, also the Hessian of the Hamiltonian (which is needed int the computation of the Jacobian of the system). If the function passes as second parameter the Hessian, then the program uses that one and the computation is faster, otherwise it approximates the Hessian with the following function and the computation will be slower:

The Hessian of the Hamiltonian must be stored in a particular way (that I have to optimize yet, but the actual one works fine too) which is showed in the following example which is the definition of the Hamilton's equations for the armonic oscillator:

December 02, 2013

Symplectic Euler method

The symplectic Euler method is a modification of the Euler method and is useful to solve Hamilton's equation of motion, that is a system of ODE where the unknowns are the generalized coordinates q and the generalized momenta p. It is of first order but is better than the classical Euler method because it is a symplectic integrator, so that it yelds better results.

Given a Hamiltonian system with Hamiltonian H=H(t;q,p) then the system of ODE to solve writes: $$\left\{\begin{array}{l} \dot{q} = \frac{dH}{dp}(t;q,p) = f(t;q,p)\\ \dot{p}=-\frac{dH}{dq}(t;q,p) = g(t;q,p) \end{array}\right. $$

From E. Hairer, C. Lubich and G. Wanner, "Geometric Numerical Integration" we can state that the semi-implicit Euler scheme for previous ODEs writes: $$ \left\{\begin{array}{l} p_{k+1} = p_k + h g(t_k;q_k,p_{k+1}) \\ q_{k+1}=q_k + h f(t_k;q_k,p_{k+1}) \end{array}\right. $$ If g(t;q,p) does not depend on p, then this scheme will be totally explicit, otherwise the first equation will be implicit and the second will be explicit.

The function that I created to make use of this integrator is odeSE.m and it passes to the integrate functions the stepper defined in the function symplectic_euler.m. The signature of odeSE is similar to those of the other ode solvers:

[t,y]=odeSE(@ode_fcn,tspan,y0)[t,y]=odeSE(@ode_fcn,tspan,y0,options)

It's important to know that y0 must be a vector containing initial values for coordinates in its first half and initial values for momenta in its second half; the function handle (or inline function) ode_fcn must take exactly two input arguments (the first is the time and the second is the vector containing coordinates and momenta such as in y0) and returns a vector containing dq/dt in its first half and dp/dt in the second half.

options variable can be set with odeset and, if the system of ODE is explicit, the field options.Explicit can be set to 'yes' in order to speedup the computation. If tspan has only one element, then options.TimeStepNumber and options.TimeStepSize must not be empty, so that it will be used the integrate function integrate_n_steps.

Velocity-Verlet method

The velocity-Verlet method is a numerical method used to integrate Newton's equations of motion. The Verlet integrator offers greater stability, as well as other properties that are important in physical systems such as preservation of the symplectic form on phase space, at no significant additional cost over the simple Euler method.

If we call x the coordinate, v the velocity and a the acceleration then the equations of motion write: $$\left\{ \begin{array}{l} \frac{dx}{dt} = v(t,x)\\ \frac{dv}{dt} = a(t,x) \end{array} \right. $$

This method is one order better than the symplectic Euler method. The global error of this method is of order two. Furthermore, if the acceleration indeed results from the forces in a conservative mechanical or Hamiltonian system, the energy of the approximation essentially oscillates around the constant energy of the exactly solved system, with a global error bound again of order two.

The function that uses velocity-Verlet scheme is odeVV.m and the stepper called at each iteration is velocity_verlet.m. The signature of odeVV is the same of those of others ODE solvers:

[t,y]=odeVV(@ode_fcn,tspan,y0)[t,y]=odeVV(@ode_fcn,tspan,y0,options)

The documentation of input arguments is the same descripted in the previous symplectic Euler section, but there is the difference that now the function ode_fcn must return a vector containing the velocities in its first half and the expression of the acceleration in the second half.

Stormer-Verlet method

The Stormer-Verlet scheme is a symplectic integrator of order two, useful to integrate Hamiltonian systems in the form described in the previous symplectic Euler section. It is characterized by the approximation of the integral defining the discrete Lagrangian $$ L_d(q_k,q_{k+1})\approx\int_{t_k}^{t_{k+1}}L(t;q,\dot{q})dt $$ with the trapezoidal rule. So that $$ L_d(q_k,q_{k+1}) = \frac{h}{2}\bigg( L\Big(q_k,\frac{q_{k+1}-q_k}{h}\Big) + L\Big(q_{k+1},\frac{q_{k+1}-q_k}{h}\Big) \bigg)\ .$$

November 15, 2013

Ode solvers options

What I want to do at this moment of the project is to add to all the solvers I've written since now, the ability to manage all the possible options of the ODE structure (defined previously in odeset structure) that are already managed in corresponding MATLAB ode solvers; so that the solvers I've written will be totally MATLAB-compatible and this will be the same also for the new solvers I'm going to implement.