Posts Tagged ‘julia’

Introduction

Julia is a programming language that is used quite extensively by the scientific community. It is open source, it just reached its version 1.0 milestone after quite a few years of development and it is nearly as fast as C but with many features associated with interpretive languages like R or Python.

There don’t seem to be many articles on getting all up and running with Julia, so I thought I’d write about some things that I found useful. This is all based on playing with Julia on my laptop running Ubuntu Linux.

Run in the Cloud

One option is to avoid any installation hassles is to just run in the cloud. You can do this with JuliaBox. JuliaBox gives you a Jupyter Notebook interface where you can either play with the various tutorials or do your own programming. Just beware the resources you get for free are quite limited and JuliaBox makes its money by charging you for additional time and computing power.

Sadly at this point, there aren’t very many options for running Julia in the cloud since the big AI clouds seem to only offer Python and R. I’m hoping that Google’s Kaggle will add it as an option, since the better performance will open up some intriguing possibilities in their competitions.

JuliaBox gives you easy direct access to all the tutorials offered from Julia’s learning site. Running through the YouTube videos and playing with these notebooks is a great way to get up to speed with Julia.

Installing Julia

Julia’s website documents how to install Julia onto various operating systems. Generally the Julia installation is just copying files to the right places and adding the Julia executable to the PATH. On Ubuntu you can search for Julia in the Ubuntu Software App and install it from there. Either way this is usually pretty easy straight forward. This gives you the ability to run Julia programs by typing “julia sourefile.jl” at a command prompt. If you just type “julia” you get the REPL environment for entering commands.

You can do quite a lot in REPL, but I don’t find it very useful myself except for doing things like package management.

If you like programming by coding in your favorite text editor and then just running the program, then this is all you need. For many purposes this works out great.

The Juno IDE

If you enjoy working in full IDE’s then there is Juno which is based on the open source Atom IDE. There are commercial variants of this IDE with full support, but I find the free version works just fine.

To install Juno you follow these instructions. Basically this involves installing the Atom IDE by downloading and running a .deb installation package. Then from within Atom, adding Julia support to the IDE.

Atom has integration with Julia’s debugger Gallium as well as provides a plot plane and access to watch variables. The editor is fairly good with syntax highlighting. Generally not a bad way to develop in Julia.

Jupyter

JuliaBox mentioned above uses Jupyter and runs it in the cloud. However, you can install it locally as well. Jupyter is a very popular and powerful notebook metaphor for developing programs where you see the results of each incremental bit of code as you write it. It is really good at displaying all sorts of fancy formats like graphs. It is web based and will run a local web server that will use the local Julia installation to run. If you develop in Python or R, then you’ve probably already played with Jupyter.

To install it you first have to install Jupyter. The best way to do this is to use “sudo apt install jupyter”. This will install the full Jupyter environment with full Python support. To add Julia Jupyter support, you need to run Julia another way (like just entering julia to get the REPL) and type “Pkg.add(“IJulia”)”. Now next time you start Jupyter (usually by typing “jupyter notebook”), you can create a new notebook based on Julia rather than Python.

Julia Packages

Once you have the core Julia programming environment up and running, you will probably want to install a number of add-on packages. The package manager is call Pkg and you need to type “using Pkg” before using it. These are all installed by the Pkg.add(“”) command. You only need to add a package once. You will probably want to run “Pkg.update()” now and again to see if the packages you are using have been updated.

There are currently about 1900 registered Julia packages. Not all of them have been updated to Julia version 1.0 yet, so check the compatibility first. There are a lot of useful packages for things like machine learning, scientific calculations, data frames, plotting, etc. Certainly have a look at the package library before embarking on writing something from scratch.

Summary

These are currently the main ways to play with Julia. I’m sure since Julia is a very open community driven system, that these will proliferate. I don’t miss using the giant IDEs Visual Studio or Eclipse, these have become far too heavy and slow in my opinion. I find I evenly distribute my time between using Jupyter, Juno and just edit/run. Compared to Python it may appear their aren’t nearly as many tools for Julia, but with the current set, I don’t feel deprived.

Introduction

I was just watching an old episode of “Mayday: Air Crash Investigations“, on the crash of a Russian passenger jet with a DHL cargo plane over Switzerland. In this episode, both planes had onboard collision avoidance systems, but one plane listened to air traffic control rather than the collision avoidance system and went down rather than up, resulting in the collision. In reading about the programming language Julia recently, I had noticed several presentations on the development of the next generation of collision avoidance systems, in Julia. This piqued my interest, along with the fact that my wife is currently getting her pilot’s license, to have a slightly deeper look into this.

Modern airliners have employed an onboard Traffic Collision Avoidance Systems (TCAS) since the 1980s. TCAS is required on any passenger airplane that takes more than 19 passengers. These systems work by monitoring the transponders of nearby aircraft and determining when a collision is imminent. At this point it provides a warning to the plane’s pilot along with a course of action. The TCAS systems on the two aircraft communicate so one plane is ordered to go up and the other to descend.

Generally there are three layers to collision avoidance that operate on different timescales. At the coarsest level planes travelling in one direction are required to be at a different altitude than planes in the reversion direction. Usually one direction gets even altitudes like 30,000 feet and the reverse gets odd altitude like 31,000 feet. At a finer level, air traffic control is responsible for keeping the planes apart at medium distances. Then close up (minutes apart) it is TCAS’s job to avoid the collisions. This is partly due to the aftermath of the Russian/DHL crash and partly due to a realization that the latency in communications with air traffic control is too great when things get too close for comfort.

Interestingly it was the collision of two passenger plane’s over the Grand Canyon in 1956 that caused congress to create the FAA and started the development of the current TCAS system. It took thirty years to develop and deploy since it required computers to get much smaller and faster first.

Why Julia

The FAA has funded the development of the next generation of traffic avoidance which has been dubbed ACAS X. This started in 2008 and after quite a bit of study, it was decided to use Julia extensively in its development. Reading the reasons for why Julia was selected is rather scary when you consider what it highlights about the current TCAS system.

Problem 1 – Specifications

A big problem with TCAS was that the people that defined the system wrote the specification first as English like pseudo-code and then re-wrote that as a more programmy pseudo-code with variables and such. Then others would take this code and implement it in Mathlab to test the algorithms. Then the people who actually made the hardware would take this and re-implement it in C++ or Assembler. When people had a recent look at all this code, they found it to be a big mess, where the different specs and code bases had been maintained separately and didn’t match. There was no automation and very little validation. The first idea of fixing this code base was rejected as completely unreliable and impossible to add new features to.

They wanted to the new system to take advantage of modern technologies like satellite navigation systems, GPS, and on-board radar systems. This means the new system will work with other planes that don’t have collision avoidance or perhaps don’t even have a transponder. In fact they wanted the new system to be easily extensible as new sensor inputs are added. Below is a small example of the reams of pseudo code that makes up TCAS.

The hope with Julia is to unify these different code bases into one. The variable pseudo-code would actually be true Julia code and the English code would be incorporated into JavaDoc like comments in the code (actually using Latex). This would then eliminate the need to use Mathlab to test the pseudo-code. The consensus is that Julia code is easily as readable as the above pseudo-code but with the advantage of being runnable and testable.

The FAA doesn’t have the authority to mandate Avionics hardware companies run Julia on their ACAS X systems, but the hope is that the performance of Julia is good enough that they won’t bother reimplementing the system in C++ and that everything will be the same Julia code. Current estimates have the Julia code running 1.5 times the speed of C code and the thought is that with newer computer chips, this should be sufficient. The hope then is that the new system will not have the translation errors that dog TCAS.

Now that the specification is true computer code many other tools can be written or used to help check correctness, such as the tool below which generates a flowchart from the Julia code/specification.

Problem 2 – Testing/Validation

Certainly with TCAS implementing the system in Mathlab was hard. But then Mathlab is quite slow and that greatly restricts the number of test cases that can be effectively be automated. The TCAS system is based on a huge number of giant decision trees and billions of test cases. A number of test/validation frameworks have been developed to test the new ACAS X system including using theorem proving, probabilistic model checking, adaptive stress testing, simulations and weakest precondition code analysis.

Now if the Avionics hardware manufacturers run the actual Julia code, there will have only been one code base from specification to deployment and it will have all been very thoroughly developed, tested and validated.

Summary

The new ACAS X system is currently being flight tested and is projected to start being deployed in regular commercial aircraft starting in 2020. Looking at the work that has gone into this system, it looks like it will make flying much safer. Hopefully it also sets the stage for how future large safety-critical systems will be developed. Further it looks like the Julia programming language will play a central part in this.

Introduction

Flux is a Neural Network Machine Learning library for the Julia programming language. It is entirely written in Julia and relies on Julia’s built-in support for running on GPUs and providing distributed processing. It makes writing Neural Networks easy and leverages the power and expressiveness of the Julia language to make creating your Neural Network just the same as writing any other Julia expressions.

My last article pointed out some problems with using TensorFlow from Julia, due to many of the newer features being implemented in Python rather than being implemented in the core shared library. One recommendation from the TensorFlow folks is that if you want eager execution then use Flux rather than TensorFlow. The Flux folks claim a real benefit of Flux over TensorFlow is that you only need to know one language to do ML. Whereas for TensorFlow you need to know TensorFlow (its graph language) plus the host language like Python. Then it’s confusing because there is a lot of duplication and it isn’t always clear in which system to do things or whether to use a TensorFlow of Python data type. Flux then simplifies all this.

Although this all sounds wonderful remember that Julia just hit version 1.0 and Flux just hit version 0.67. The main problem I found was excessive memory usage, which I’ll benchmark and discuss later on.

Also note that Flux isn’t a giant compilation of algorithms like SciKit Learn. It is rather specific to Neural Networks. There are other libraries available in Julia for things like Random Forests, but you need to find the correct package and install it. Then each of these may or may not fully support Julia 1.0 yet.

MNIST in Flux

To give a flavour for using Julia and Flux here are a couple of examples from the FluxML model zoo. You can see it’s very simple to setup the Neural Network layers, perform the training and test the accuracy.

Performance

One of Julia’s promises is the ease of use of a scripting language like Python with the speed of a compiled language like C. As it stands Flux isn’t there yet. There seem to be some points where Flux goes away for a long time. These might be the garbage collector kicking in, or something else. I find the speed is about the same order of magnitude as other systems (modulo the pauses), but the big problem is memory usage.

To solve MNIST using a convolutional Neural Network from Python using the TensorFlow tutorial runs quite well and uses 400Meg of memory. Running the similar model using Julia and TensorFlow uses 600Meg of memory. Running the simple model above using Julia and Flux takes 2Gig or memory. Running the convolutional model above uses 2.6Gig. This laptop that I’m using has 4Gig of RAM and is running Ubuntu Linux. This is why I think the big stalls in performance is garbage collection.

The problem with this is that MNIST is a nice small dataset and the model used to solve it isn’t very large as Neural Networks go. If Flux is using six times as much memory as Python then it really diminishes its usefulness as an ML toolkit.

I spent a bit of time looking at the Julia Differential Equations tutorial. They were pointing out that using matrix operations in the Julia expression evaluator would lead to lots of unnecessary temporary storage for instance to evaluate:

D = A + B + C

Where these are all large matrices has to create a temporary matrix to hold the sum A + B which is then added to C. This temporary matrix has to be allocated from the heap and then later garbage collected. This process seems to be rather inefficient in Julia, at least by going by all the workarounds they have to avoid this situation. They have SVectors which are for small vectors that can be allocated on the stack rather than the heap. They recommend using the +. operator which does things element by element and is smart enough to not create lots of temporary values on the heap. I wonder if Flux needs some optimisations like they spent so much time putting into the Differential Equations library.

Summary

Julia and Flux make a nice system for Machine Learning in theory. I think until the technology matures a bit and some problems like memory management are better addressed, that using this for large projects is a bit problematic. A lot of the current ML systems being worked on with Flux are by PhD candidates who are developing Flux as part of their thesis work. Hopefully they improve the memory usage and allow Flux and Julia to live up to their full potential.

Introduction

Last time, I gave a quick introduction to the Julia programming language which has just reached the 1.0 release mark after ten years of development. Julia is touted as the next great thing for scientific computing, machine learning, data science and artificial intelligence. Its hope is to supplant Python which is currently the goto language in these fields. The goal is a more unified language, since it was developed well after Python and learned from a lot of its mistakes. It also claims to have the flexibility of Python but with the speed of a true compiled language like C.

I saw that in the list of packages there was support for using Google’s TensorFlow AI system natively from Julia so I thought I would give this a try. Although it worked, it did reveal some challenges that Julia is going to face in its battle to become a true equal with Python.

Using TensorFlow in Julia

The TensorFlow wrapper/interface for Julia is in a package created by a PhD candidate at MIT, Jon Malmaud. You can add it to Julia using Pkg.add(“TensorFlow”) as well as view the source code on GitHub. Since I wrote an article recently comparing TensorFlow running on a Raspberry Pi to running on my laptop, I thought I’d use the same example and compare Julia to those cases. I cut/pasted the code into the Julia IDE Juno and made some code syntax changes and gave it a go. It came back that the Keras object was undefined.

I then noticed that in the Tensorflow.jl github there were a couple of examples doing predictions on the MNIST dataset, so at least these were solving the same problem as my article, just using different models. I fired these up, but they failed with syntax errors in the code to load the MNIST dataset. Right now this is a bit of a problem in Julia that not all libraries have been updated to the Julia 1.0 syntax. I had a look at the library to load MNIST and noticed that no one had contributed to it in three years. It appeared to be abandoned with no plans to continue it. After a bit more research I found another Julia package called MLDatasets that was maintained and would load MNIST along with several other popular datasets.

I logged an issue with the Tensorflow.jl repository that they should fix this. They replied that they didn’t have time but if I wanted to fix it, to go ahead. So I fixed this and checked it in to the Tensorflow.jl Github. So now these MNIST examples work with Julia 1.0. I was then happy to have given my small contribution back to this community.

I then thought, why not be ambitious and add the Keras layer to Tensorflow.jl? Well this led to some interesting revelations to how Tensorflow is architected.

Problems with the Tensorflow Architecture

Looking at some of the issues in the Tensorflow.jl library there were requests for things like TensorFlow’s eager execution and the TensorFlow layers interface. The answer to these issues was that the Julia interface only talked to the DLL/SO interface to Tensorflow and that these modules didn’t exist there and were in fact written in Python rather than C++. I had a look inside the TensorFlow Github and found that their Keras layer is also written in Python.

Originally Tensorflow.jl talked to the Tensorflow Python interface. Julia is really good at interoperability and can easily talk to both Python libraries as well as C/C++ DLL/SOs. The problem with talking to Python libraries is that it involves running a Python process and then doing process to process communications to execute the code. This tends to be way slower than talking to DLLs or SOs. So early on the TensorFlow.jl library was changed to just talk to the DLL/SO interface for Tensorflow and eliminated all Python dependencies. This then lets Julia use the really performant part of TensorFlow and perform all the core operations very quickly.

Now the problem seems to be that Google is doing a lot of the new Tensorflow development in Python and not putting the code into the core shared library. Google is also spending a lot of time promoting these new interfaces as the way to go. This means if you aren’t programming in Python you are definitely a second class citizen.

OK, so is this just bad for the newbie language Julia? Should Julia programmers just use the Jula native Flux AI library? Well, the other thing Google is promoting is running TensorFlow on things like mobile devices, but then you are accessing TensorFlow from Swift on iOS or from Java on Android. Now you have the same problems as the Julia programmer. You only have efficient access to the core low level APIs for TensorFlow and all the new fancy high level access is denied to you. Google’s API block diagram below highlights this.

To me this is a big architectural problem with TensorFlow. Its great to use from Python, but is really limited in other environments. The videos and blogs starting to surface on TensorFlow 2.0 are promoting eager execution and the Keras layer will be the default and primary ways to program with TensorFlow. This then begs the question as to whether these will be moved into the core shared library or will remain as Python code? At this point I haven’t seen this explained, but as we get closer to the 2.0 preview later this year, I’ll be watching this keenly.

It would certainly be nice if they move this Python code into C++ in the shared library so everyone can use it. At that point I think TensorFlow would be much more usable from Julia, Swift, Java, C++, etc. Here’s hoping that is a major upgrade in the 2.0 release.

Julia TensorFlow Code

Just for interest here is the simplest Julia MNIST example just to give a flavour for the code. This is a simple linear model, so doesn’t give great results. There is a more complicated example that uses a convolutional neural network and gives far superior results.

Summary

You can certainly use TensorFlow from Julia. Just beware that you are limited to the lower level APIs, so anything TensorFlow has implemented in Python isn’t available to you. This means you set up the graph and then execute it, really like you always did in the earlier versions of TensorFlow. It would certainly be nice if Google fixes this problem for TensorFlow 2.0.

Introduction

A couple of weeks ago I saw the press release about the release of version 1.0 of the Julia programming language and thought I’d check it out. I saw it was available for the Raspberry Pi, so I booted up my Pi and installed it. Julia has been in development since 2012, it was created by four MIT professors as an open source project for mathematical computing.

Why Julia?

Most people doing data science and numerical computing use the Python or R languages. Both of these are open source languages with huge followings. All new machine learning projects need to integrate to these to get anywhere. Both are very productive environments, so why do we need a new one? The main complaint about Python and R is that these are interpreted languages and as a result are very slow when compared to compiled languages like C. They both get around this by supporting large libraries of optimized code written in C, C++, Assembler and Fortran to give highly optimized off the shelf algorithms. These work great, but if one of these doesn’t apply and you need to write Python loops to process a large data set then it can get really frustrating. Another frustration with Python is that it doesn’t have a built in array data type and relies on the numpy and pandas libraries. Between these you can do a lot, but there are holes and strange differences between the two systems.

Julia has a powerful builtin array type and most of the array manipulation features of numpy and pandas are built in to the core language. Further Julia was created from scratch around powerful new just in time (JIT) compiler technology to provide both the speed of development of an interpreted language combined with the speed of a compiled language. You don’t get the full speed of C, but it’s close and a lot better than Python.

The Julia language borrows a lot of features from Python and I find programming in it quite similar. There are tuples, sets, dictionaries and comprehensions. Functions can return multiple values. For loops work very similarly to Python with ranges (using the : built into the language rather than the range() function).

Julia can call C functions directly (meaning you can get pointers to objects), and this allows many wrapper objects to have been created for other systems such as TensorFlow. This is why Julia is very precise about the physical representation of data types and the ability to get a pointer to any data.

Julia uses the end keyword to terminate blocks of code, rather than Pythons forced indentation or C’s semicolons. You can use semicolons to have multiple statements on one line, but don’t need them at the end of a line unless you want it to return null.

Julia has native built in support of most numeric data types including complex numbers and rational numbers. It has types for all the common hardware supported ints and floats. Then it also has arbitrary precision types build around GNU’s bignum library.

There are currently 1906 registered Julia packages and you can see the emphasis on scientific computing, along with machine learning and data science.

The creators of Julia always keep performance at the top of mind. As a result the parallelization support is exceptional along with the ability to run Julia code on CUDA NVidia graphics cards and easily setup clusters.

Is Julia Ready for Prime Time?

As of the time of this writing, the core Julia 1.0 language has been released and looks quite good. Many companies have produced impressive working systems with the 0.x versions of Julia. However right now there are a few problems.

Although Julia 1.0 has been released, most of the add on packages haven’t been upgraded to this version yet. In the first release you need to add the Pkg package to add other packages to discourage people using them yet. For instance the library with GPIO support for the Pi is still at version 0.6 and if you add it to 1.0 you get a syntax error in the include file.

They have released the binaries for all the versions of Julia, but these haven’t made them into the various package management systems yet. So for instance if you do “sudo apt install julia” on a Raspberry Pi, you still get version 0.6.

Hopefully these problems will be sorted out fairly quickly and are just a result of being too close to the bleeding edge.

I was able to get Julia 1.0 going on my Raspberry Pi by downloading the ARM32 files from Julia’s website and then manually copying them over the 0.6 release. Certainly 1.0 works much better than 0.6 (which segmentation faults pretty much every time you have a syntax error). Hopefully they update Raspbian’s apt repository shortly.

Julia for Machine Learning

There is a TensorFlow.jl wrapper to use Google’s TensorFlow. However the Julia group put out a white paper dissing the TensorFlow approach. Essentially TensorFlow is a separate programming language that you use from another programming language like Python. This results in a lot of duplication and forces the programmer to operate in two different paradigms at once. To solve this problem, Julia has the Flux machine learning system built natively in Julia. This is a fairly powerful machine learning system that is really easy to use, reducing the learning curve to getting working models. Hopefully I’ll write a bit more about Flux in a future article.

Summary

Julia 1.0 looks really promising. I think in a month or so all the add-on packages should be updated to the 1.0 level and all the binaries should make it out to the various package distribution repositories. In the meantime, it’s a good time to learn Julia and you can accomplish a lot with the core language.

I was planning to publish a version of my LED flashing light program in Julia, but with the PiGPIO package not updated to 1.0 yet, this will have to wait for a future article.