slaweet.comhttp://slaweet.com
A blog of a front-end web developerTue, 01 Nov 2016 09:03:43 +0000en-UShourly1https://wordpress.org/?v=4.7.12Speeding up a database intensive Django commandhttp://slaweet.com/speeding-up-a-django-command/
http://slaweet.com/speeding-up-a-django-command/#respondTue, 01 Nov 2016 08:55:39 +0000http://slaweet.com/?p=43Continue reading "Speeding up a database intensive Django command"]]>In Python web framework Django there is an option to create custom commands to run some scripts for your website from console. In this post I’ll look at a particularly slow command from our adaptive learning system library proso-apps and what I did to make it faster.

The command is called load_flashcards and its sole purpose is to load some content from a JSON file to database. To give an example of what the content can be, let’s look at what the JSON file contains in project outlinemaps.org. There is a file geography-flashcards.json, which contains the list of all maps, the list of all terms (countries, cities, rivers, mountains, etc. ) and then a list of all flashcards. A flashcard is a pair of a terms and a its identifier on a map.

The file has grown quite large. It has about 50k lines (over 1MB) and is so large that Github won’t preview it. The command has to read all the data from the JSON file and for each item (map, term or flashcard):

create it, if it doesn’t exists

update its properties if they were changed

mark it as inactive (or delete – if specified by a command line option) if it’s no longer in the JSON file.

Running the load_flashcards command on this huge file takes several minutes, so it’s worth to have a look at what takes so long. Note that most of the time only a small part of the file is changed, yet it takes so long, because it has to check everything.

So what does take so long? I would guess reading from and writing to the hard drive. While the JSON file is read just once (1 I/O operation) there are many I/O operations when using the database. While there are some profiler tools for python (e.g. cProfile, or ), in this case I knew the bottleneck is database access, so I used unix time command to measure the overall time the script took to execute. Usage of the time command is pretty straightforward. Just prepend it to the actual command:

time ./manage.py load_flashcards geogrpahy-flashcards.json

Avoid Model.save()

The first thing I noticed was that the script didn’t have to call Django Model.save() method on each database object, but only on those that actually changed. So instead of blindly pouring all data from JSON objects to database objects, I started to check whether there are actually any differences between the two.

The basic idea is illustrated by the following piece of code. Note that the code assumes that variable db_flashcard contains the database ORM object and variable flashcard contains the dict parsed from the JSON file,

Avoid Model.objects.get() and Model.objects.filter()

Furthermore, I found out that the command was heavily using Model.objects.get() and Model.objects.filter() to look up related objects, one at a time. For example, to look up a Term for a Flashcard. While there were thousands of flashcards, it resulted in thousands of database queries. Even though each query took just a few miliseconds, all of them combined made a significant slow-down.

The solution is to load all objects (e.g. all Term objects) by one database query and save them to a Python dictionary.

Conclusion

]]>http://slaweet.com/speeding-up-a-django-command/feed/0Boosting grunt performancehttp://slaweet.com/boosting-grunt-performance/
http://slaweet.com/boosting-grunt-performance/#respondMon, 24 Oct 2016 18:22:28 +0000http://slaweet.com/?p=23Continue reading "Boosting grunt performance"]]>I guess you’ve heard about Grunt, the JavaScript task runner. It’s a great automation tool, but when you start using it and keep adding more stuff to it, at some point you’ll probably get to a point where the execution takes more time then you are willing to wait.

I’ve recently hit that point with a document portal I’m working on at Red Hat, so I went to look out for how to do some profiling of grunt and how to speed it up. While there are some articles and stackoverflows about the topic, neither of them contained all that I needed, so the purpose of this blog post is to give you a digest with all information in one place.

The grunt setup of the project is specific in the way that it uses no CSS preprocessor and no JavaScript minification (such as uglify), because the portal is not a single page app and grunt is used mainly for linting server-side JavaScript code. Another project-specific thing is my first optimization.

src path wildcards

The project is based on CMS Alfresco and uses its folder structure, which is rather deep. So I was heavily using wildcards in paths of the src files of jshint and jscs to save myself some typing (e.g. “**/group-manager.js” instead of “Data Dictionary/Scripts/com/redhat/pnt/group-manager.js”). The problem is that grunt had to traverse the folder structure of the whole project (including a huge “node_modules” folder) to find those files. Avoiding wildcards made the grunt to speed up from 36s to 5.6s. You’ll probably won’t hit this in many projects, but I just wanted to mention it.

time-grunt

The first step in any optimization should be to measure “what the hack takes so long”. In case of grunt the profiler tool is time-grunt. time-grunt creates a time report that shows how much time each task took and appends the report to the output of each grunt call. It can look like this:

grunt-newer

The first think we can do to speed up those tasks is not to run them on files that didn’t change since the last time. Luckily (as mentioned in Two tips to boost Grunt performance), there is a grunt plugin exactly for that, called grunt-newer. The installation is pretty easy and described in the aforementioned link:

npm install grunt-newer --save-dev

Once the plugin has been installed, it may be enabled inside your gruntfile.js with this line:

grunt.loadNpmTasks('grunt-newer');

To use grunt-newer, you just need to prefix respective tasks with ‘newer:’, e.g. “newer:jshint”. So let’s do this and look at what the time-grunt output will look like when we change only one file:

Wow. We got a great speed-up (from 5.6s to 1.9s). Now we can see that the slowest task is “loading tasks” and that is exactly what we’ll look into next.

Don’t load all tasks

The second tip is not to load all tasks every time. While the aforementioned blog post describes a quite complicated way to achieve that, there is a grunt plugin to do that for us automatically: jit-grunt. The installation is again trivial:

npm install jit-grunt --save-dev

Remove grunt.loadNpmTasks, then add the require('jit-grunt')(grunt) instead. Only it.

Now the speed-up is not that impressive (from 1.9s to 1.6s), but it’s important to mentioned that all tasks are run on JacaScript files in this project, there is for example no CSS preprocessor (such as sass). If we had a project with tasks that don’t run on JavaScript files, there would surely be a better speed boost.

grunt-parallel

Another boost I was considering was to run the tasks in parallel (using grunt-parallel), because I have the fortune of having three tasks that don’t depend on each other (jscs, jshint and jasmine tests). I tested it and the results were highly varying (between 1.1s and 1.6s), apparently the parallelism makes the performance less deterministic. grunt-parallel also causes the other tasks to be time-reported separately and thus to be missing form the overall report.

So grunt-parallel doesn’t predictably do much speed-up and adds some clutter to the output, so unless there are tasks that take several seconds I don’t recommend using it.

Another issues

Another issues I would like to address in the future include:

Tasks are loaded each time a watch is triggered, not just at launching grunt watch.

sass doesn’t work at all with grunt-newer, because sass imports cause that change in one file might require other files to be recompiled.

uglify doesn’t work well with grunt-newer. When one file is changed, it sill processes all files, because it doesn’t have them cached before it concatenates them with the rest. They could have

Conclusion

The best speed-up is obtained by running tasks only on changed files, with grunt-newer. Another good shot is to use jit-grunt to load grunt task just in time and avoid unnecessary loads. If grunt is sill slow after applying those, I recommend to use the profiler time-grunt to find out what is the next pain point.

]]>http://slaweet.com/boosting-grunt-performance/feed/0Hurray! One more blog on the internethttp://slaweet.com/one-more-blog/
http://slaweet.com/one-more-blog/#respondFri, 30 Sep 2016 07:49:06 +0000http://slaweet.com/?p=1Continue reading "Hurray! One more blog on the internet"]]>Here I am, going into the business of blog posting as advised by John Sonmez in his Blogging Course. What will it be about? I have no idea, so far. Or I have many different ideas but none of them seems very good and capable of providing enough topics for many blog posts. So, we’ll see. Either way it will have something to do with programming and software.

I’d like to share the experience of starting the blog. Mainly it was about two things:

Choose and register a domain

Choose and setup a WordPress hosting

Choose and register a domain

I wanted to have a nice domain, not some random *.wordpress.com. Since I didn’t know what the topic of the blog will be, I went for my nickname, slaweet, which is a fairly unique identifier of my person that I’ve been using for over 6 years. The nickname is kind of my personal brand as it evolved from my name – Stanislav Vít -> stanislavvít -> stanislaweet -> slaweet. Then it was just a question of whether I want slaweet.cz or slaweet.com. In the end I went with slaweet.com as I want to write in English and the topics should be for global audience.

While John Sonmez recommended in his course some hosting for $4 a month, I wanted to look what’s out there for free at first. I found https://www.endora.cz/ (Czech only) with some free plans that display ads in the bottom of your page. Then I found out that they offer a paid plan for as little as 14CZK/month (that’s less that $1/month), which got me sold.

Than I spent some time trying to figure out how to use the domain I purchased already. At first it looked like I have to register the domain through them, then I found an option not to do that. Then it was a bit of a haste to find the IP address that I can set to DNS of my domain. Finally, there was an error when I tried to deploy the WordPress, twice, with no explanation of what the error was. That was a crazy UX. I ended up emailing user support and got a reply at about the time my third attempt succeeded saying that my hosting was probably too fresh. So finally I got up and running so that I could write this post.