Posts tagged with Web

A week of innovation

First of all I want to thank Lifesum for having another "Innovation Week", it is a great opportunity and I hope that more companies will start following it. In a few words the idea is to allow everyone from the company to freely pick a project or idea that they want to develop and and work on it for one week.
The benefits range from just making people happy because of the break of the routines and the opportunity to work on something a bit different, to seeing some pretty amazing prototypes that can be easily implemented in the company product.

What is Service Discovery?

In summary service discovery is the possibility of the separate services in scalable infrastructure to communicate with each other and to the outside world. In other words - how to route the requests to the corresponding service while providing balanced load on the instances in the pool and monitoring their health.
Sounds simple, right?

Well, unfortunately service discovery in the real world is not that simple. In my presentation from the Stockholm Python MeetUp I talked a bit more about the complexity of service discovery, the suboptimal solutions and Smartstack - a solutions invented at AirBnB for simplifying the whole process.
You can see more about it in my presentation:

However, the whole idea sounded so awesome that me and my colegue Esma decided to team up on that and try to explore a bit more the opportunities that Smarstack provides for us.
In the matter of fact we decided to explore two different approaches: Smartstack and Consul. However we had some issue with the Consul setup and we found that it is not acting exactly the way we so at the end we focused all our attention to Smarstack.

How does it work?

Smartstack consists of two main components - Nerve and Synapse.
Never handles the service registration while Synapse reads the information about the available services and configures a local HAProxy that plays as a load balancer for the service pool.
For our tests we used Zookeeper as register for the services.

What have we built?

We created a small project consisting of a pool of Zookeeper instances, two node for service A and one node for service B. We tested multiple scenarios of crashes of one or more nodes, both zookepers instances and service instances, how the systems operates during the crashes and how it recovers after the nodes are brought back on.

Results

As a result we created a public repository service-discovery giving details about the whole setup process, our tests and their results. Actually now is the time to praise Esma and her awesome work on conducting the tests and writing the great documentation from the repo above.
So, if you have ever wondered about how service discovery works or you just want to test Smartstack, just clone the repo and follow the instruction. We would be happy to hear about your experience.

Or how to use the benefits of Django template system during the PSD to HTML phase

There are two main approaches to start designing a new project - Photoshop mock-up or an HTML prototype. The first one is more traditional and well established in the web industry. The second one is more alternative and (maybe)modern. I remember a video of Jason Fried from 37 Signals where he talks about design and creativity. You can see it at http://davegray.nextslide.com/jason-fried-on-design. There he explains how he stays away from the Photoshop in the initial phase to concetrate on the things that you can interact with instead of focusing on design details.

I am not planning to argue which is the better method, the important thing here is that sooner or later you get to the point where you have to start the HTML coding. Unfortunately frequently this happens in a pure HTML/CSS environment outside of the Django project and then we waste some extra amount of time to convert it to Django templates.

Wouldn't be awesome if you can give the front-end developers something that they can install/run with a simple command and still to allow them to work in the Django environment using all the benefits it provides - templates nesting and including, sekizai tags etc.

I have been planning to do this for a long time and finally it is ready and is available at Django for Prototyping. Currently the default template includes Modernizr, jQuery and jQuery UI but you can easily modify it according to your needs. I would be glad of any feedback and ideas of improvement so feel free to try it and comment.

Preface: Nine months ago(I can't believe it was that long) I created a script called Simple Site Checker to ease the check of sitemaps for broken links. The script code if publicly available at Github. Yesterday(now when I finally found time to finish this post it must be "A few weeks ago") I decided to run it again on this website and nothing happened - no errors, no warning, nothing. Setting the output level to DEBUG showed the following message "Loading sitemap ..." and exited.
Here the fault was mine, I have missed a corner case in the error catching mechanism i.e. when the sitemap URL returns something different from "200 OK" or "500 internal server error". Just a few second and the mistake was fix.

Problem and Solution: I ran the script again and what a surprise the sitemap URL was returning "403 Forbidden". At the same time the sitemap was perfectly accessible via my browser. After some thinking I remembered about that some security plugins block the access to the website if there is not User-Agent header supplied. The reason for this is to block the access of simple script. In my case even an empty User-Agent did the trick to delude the plugin.

Final words: As a result of the issue mention above one bug in simple site checker was found fixed. At the same time another issue about missing status and progress was raised, more details can be found at Github but in a few words an info message was added to each processed URL to indicate the progress.

If you have any ideas for improvement or anything else feel free to comment, create issues and/or fork the script.

Preface: Have you noticed how on some websites when you click on a link that opens a lightbox or any overlay for first time it takes some time to display the border/background/button images. Not quite fancy, right?
This is because the load of this images starts at the moment the overlay is rendered on the screen. If this is your first load and these images are not in your browser cache it will take some time for the browser to retrieve them from the server.

Solution: The solution for this is to preload the images i.e. to force the browser to request them from the server before they are actually used. With a simple javascript function and a list of the images URLs this is a piece of cake:

Please have in mind that the code above uses the jQuery library.
Specialty: Pretty easy, but you have to hardcode the URLs of all images. Also if you are using Django compressor then probably you are aware that it adds extra hash to the URLs of the images in the compressed CSS files. The hash depends from the COMPRESS_CSS_HASHING_METHOD settings and can not be avoided. It is pretty useful cause it forces the client browser to reload the images every time when something has been changed. unfortunately our hardcoded list of URLs does not have this hash. So wouldn't it be much simpler if instead of hardcoding URLs we just read them from the CSS files?
Solution 2:

Now with the help of regular expressions we can read the image URLs directly from the CSS file together with the hash part. Please note the zero index in the css file selector, if your main CSS is not the first declared style-sheet then you will have to change the index according to its position.

I hope you will find this solutions simple and useful. As always feel free to comment, share and propose code improvements.

... neither is any other language or framework

This post was inspired by the serial discussion on the topic "Python vs other language"(in the specific case the other one was PHP, and the question was asked in a Python group so you may guess whether there are any answers in favor of PHP). It is very simple, I believe that every Python developer will tell you that Python is the greatest language ever build, how easy is to learn it, how readable and flexible it is, how much fun it is to work with it and so on. They will tell you that you can do everything with it: web and desktop development, testing, automation, scientific simulations etc. But what most of them will forgot to tell you is that it is not a Panacea.

In the matter of fact you can also build "ugly" and unstable applications in Python too. Most problems come not from the language or framework used, but from bad coding practices and bad understanding of the environment. Python will force you to write readable code but it wont solve all your problems. It is hard to make a complete list of what exactly you must know before starting to build application, big part of the knowledge comes with the experience but here is a small list of some essential things.

If you are going do develop web application learn about the Client-Server relation

Use "layers" to seprate the different parts of your application - database methods, business logic, output etc. MVC is a nice example of such separation

Never store passwords in plain text. Even hashed password are not completely safe, check what Rainbow Tables are.

Comment/Document your code

Write unit test and learn TDD.

Learn how to use version control.

There is a client waiting on the other side - don't make him wait too long.

Learn functional programming.

I hope the above does not sounds as an anti Python talk. This is not its idea. Firstly because there are things that are more important than the language itself(the list above) and secondly because... Python is awesome )))
There are languages that will help you learn the things above faster, Python is one of them - built in documentation features, easy to learn and try and extremely useful. My advice is not to start with PHP as your first programming language it will make you think that mixing variables with different types is OK. It may be fast for some things but most of the times it is not safe so you should better start with more type strict language where you can learn casting, escaping user output etc.

Probably I have missed a few(or more) pointa but I hope I've covered the basics. If you think that anything important is missing, just add it in the comments and I will update the post.

During the development of Simple Site Checker I realised that it would be useful for test purposes if there is a website returning all possible HTTP status codes. Thanks to Google App Engine and webapp2 framework building such website was a piece of cake.

The home page provides a list of all HTTP status codes and their names and if you want to get an HTTP response with a specific status code just add the code after the slash, example:http://httpstatuscodes.appspot.com/200 - returns 200 OKhttp://httpstatuscodes.appspot.com/500 - returns 500 Internal Server Error
Also at the end of each page is located the URL of the HTTP protocol Status Codes Definitions with detailed explanation for each one of them.

... a command line tool to monitor your sitemap links

I was thinking to make such tool for a while and fortunately I found some time so here it is.

Simple Site Checker is a command line tool that allows you to run a check over the links in you XML sitemap.

How it works: The script requires a single attribute - a URL or relative/absolute path to xml-sitemap. It loads the XML, reads all loc-tags in it and start checking the links in them one by one.
By default you will see no output unless there is an error - the script is unable to load the sitemap or any link check fails.
Using the verbosity argument you can control the output, if you need more detailed information like elapsed time, checked links etc.
You can run this script through a cron-like tool and get an e-mail in case of error.

... or how to reuse your plugins inside sections with different design

Problem: Frequently on the websites I am developing I need to display same set of data in several different ways. For example if I have a news box that needs to appear in different sections of the website e.i. in sidebar, main content etc. Using Django CMS plugins make this quite easy.
For simplicity we will take the following case. An image/text tuple with two layout variations - image on left of text and image on right.

Same data but different layout. All you need to do is just to allow your users to change the plugin template according to their needs. If you don't have experience with Django CMS Plugins I advice you to check how to create custom Django CMS Plugins before you continue with solution.

Solution: First you will have to create a tuple holding your templates(and their human readable names) and add a field that will hold the chosen template to the plugin model.

Final words: Yep, this is all. Simple isn't it? It is amazing how sometimes such small things are so useful. If you are having bigger difference in the layout of your templates you will probably have to put a little more stuff in the context that some of your templates may not need but it is OK. Feel free to comment and if you are using this "trick" please add your use case - it will be interesting to see in how many different cases this works.

... or how to pull data about page visits instead of implementing custom counter

Preface: OK, so you have a website, right? And you are using Google Analytics to track your page views, visitors and so on?(If not you should reconsider to start using it. It is awesome, free and have lost of features as custom segments, map overlay, AdSense integration and many more.)
So you know how many people have visited your each page of your website, the bounce rate, the average time they spend on the page etc. And this data is only for you or for a certain amount whom you have granted access.

Problem: But what happens if one day you decided to show a public statistic about visitors on your website. For example: How many people have opened the "Product X" page?
Of course you can add a custom counter that increases the views each time when the page is open. Developed, tested and deployed in no time. Everyone is happy until one day someones cat took a nap on his keyboard and "accidentally" kept the F5 button pressed for an hour. The result is simple - one of you pages has 100 times more visits than the other. OK, you can fix this with adding cookies, IP tracking etc. But all this is reinventing the wheel. You already have all this data in your Google Analytics, the only thing you have to do is to stretch hand and take it.

Solution: In our case "the hand" will be an HTTP request via the Google Data API. First you will need to install the Python version of the API:

sudoeasy_installgdata

Once you have the API installed you have to build a client and authenticate:

SOURCE_APP_NAME is the name of the application that makes the request. You can set it to anything you like.
After you build the client(2) you must authenticate using your Google account(3-9). If you have both Google and Google APPs account with the same username be sure to provide the correct account type(8).
Now you have authenticated and it is time to build the request. Obviously you want to filter the data according some rules. The easiest way is to use the Data Feed Query Explorer to build your filter and test it and then to port it to the code. Here is an example how to get the data about the page views for specific URL for a single month(remember to update the PROFILE_ID according to your profile).

Final words: As you see it is relatively easy to get the data from Google but remember that this code makes two request to Google each time it is executed. So you will need to cache the result. The GA data is not real-time so you may automate the process to pull the data(if I remember correctly the data is updated once an hour) and store the results at your side which will really improve the speed. Also have in mind that this is just an example how to use the API instead of pulling the data page by page(as show above) you may pull the results for multiple URLs at once and compute the feed to get your data. It is all in your hands.
You have something to add? Cool I am always open to hear(read) you comments and ideas.

What is Memcached:Memcached is a tool that allows you to store key-value pairs in you memory. The keys are limited to 250 Bytes and for better performance the value size is limited to 1MB(more details) but this size is fair enough for web usage.

Memcached installation:

apt-getinstallmemcachedapt-getinstallpython-memcache

The first line installs Memcached and the second one install Python API for communication between your application and Memcached daemon. After this the Memcached daemon is up and running. With default configuration it runs on port 11211 on localhost(127.0.0.1). If you want to modify this the configuration file(in my case) is situated in /etc/memcached.conf
Django configuration: This one depends from the Django version that you use. For 1.2.5 and prior the next code should by added in your settings file(settings.py):

In both cases if you use different port and/or IP you have to replace them above.
More info about cache backend configuration you can find in Django documentation docs.
So now you have Memcached running and Django configured. If you have doubts about is this suitable/usable in you case take a look at the posts mentioned above or just add comment with your case and I will be happy to give you an advice. Now it is time to start using it.
Cache usage(part I) - how to cache on Python level: If you have some heavy calculations in your view you can cache the result from this and use the calculated one to lower the load. Example:

fromdjango.core.cacheimportcachedefheavy_view(request):cache_key='my_heavy_view_cache_key'cache_time=1800# time to live in secondsresult=cache.get(cache_key)ifnotresult:result=# some calculations herecache.set(cache_key,result,cache_time)returnresult

The process is simple, you ask the cache for a value corresponding to a given key(line 4). If the result is None you execute the code that generates it(line 8 ) and store it in the cache(line 9).
My advice is to declare the key and time as variables cause this will ease their future changes.
Cache usage(part II) - how to cache on template level: This is suitable for the cases when you have some heavy processing in the template(as regroup) or you want to cache only part of the template(as latest news section). Example:

The basic usage usage is {% cache time_in_seconds key %} ... {% endcache %}
You can also cache code fragments based on dynamic properties, for example - current user recent conversations, just pass a 3rd param the uniquely identifies the code to be cached.

Final words: as you see from the examples above using Django and Memcached is really easy. Using it correctly will speed up your website and respectively improve your user experience(UX) and SEO. Using it wrong will provide negative results. Just take a moment and think what can be cached, how long can it be cached and is there a reason to be cached. Try to avoid double caching - there is no need to use caching in templates and then cache the rendered template in the view too.