Piotr Gabryjeluk bloghttp://piotr.gabryjeluk.pl
Blog, photos and developer notes of Piotr Gabryjeluk, one of Wikidot.com developers.Tue, 26 Sep 2017 21:42:27 +0000http://piotr.gabryjeluk.pl/blog:july-newsJuly Newshttp://piotr.gabryjeluk.pl/blog:july-news
<p>Some of you may be more used to me posting more often, than in last time.<br />
Some of you may wonder why I stopped blogging.</p>
<p>by <span class="printuser avatarhover"><a href="http://www.wikidot.com/user:info/gabrys" ><img class="small" src="http://www.wikidot.com/avatar.php?userid=2462&amp;amp;size=small&amp;amp;timestamp=1506462147" alt="Gabrys" style="background-image:url(http://www.wikidot.com/userkarma.php?u=2462)" /></a><a href="http://www.wikidot.com/user:info/gabrys" >Gabrys</a></span></p>
Tue, 28 Jul 2009 15:41:03 +0000
Some of you may be more used to me posting more often, than in last time. Some of you may wonder why I stopped blogging.

Brussels

Last month was full of adventures. It started 1st of July with me going to Brussels meat our friend to talk about wikipedia-like site about art. We're going to help this man build the most complete site about art using Wikidot software!

BTW, this was my first flight in lifetime. Quite a strange feeling, but generally fine.

Forms

I was working on some nice technical and UI improvements to Wikidot, that is crucial for the art site (but really really nice for Wikidot as well, like forms for editing, entering and viewing structured data to wiki pages).

Search issues

That week was also spent on some massive Wikidot.com search engine tweaks. A stupid one-line bug, which was not exporting proper LC_ALL environmental variable in indexing script, caused many sites that used Asian or East-European languages to be not indexed (most notably the great ИСТОРИЈСКА БИБЛИОТЕКА). At first we though that we can re-index the broken sites, but our re-indexing mechanism was way too slow (would last for weeks for all broken sites).

Pieter then challenged me. He said he can index whole Wikidot in 6 hours. I thought it's not even possible, but then I started to work on that and I managed to index the whole Wikidot in less than 2 hours without indexing tags at first. Then with tags, it took 2 hours and 10 minutes or so. That was damn fast!

Inspired by this and an accident of disk full error on /var partition of our webserver (but this is why we keep user-uploaded files and other important things on separate disks), I also rewrote the incremental indexer, to work in similar way to the whole-Wikidot-re-indexer.

search-api reindex

If you care about some technical details:

all search operations are issued with use of search-api, a separate program that can:

before rewriting it to only-Python, search-api was written in BASH and was a wrapper to:

java -jar searchApiHelper.jar search "phrase-to-search"

php search-api-helper.php flush

search-api also takes care of file locking to assure that

only one process tries to modify the index

items are added to queue one-after-another

when doing some big index modification (read: full re-index) queue is not flushed (so that after the re-index all changes are applied to new index)

when flushing queue takes more time, and cron tries to run more flushing processes, they simply end (so only one process flushes the queue at a time)

Union of Rock Festival

Just after week spent in Brussels in nice hotel I went to Węgorzewo, Mazury (Poland biggest lakes distinct) to have fun on rock music festival. Unfortunately, the music level was not very impressive, so I mainly enjoyed the atmosphere on the camping area.

The weather was not great. It was wet everywhere, the ground was covered in 20 centimeters of mud and it was hard to walk around without getting dirty. But during the first day of being there, I learned to do that.

Improved workflow at Wikidot

Some of you noticed, that recently we started to work more efficiently, but this is not quite true. In fact we work as efficiently as before, but we are better organized, and have better priorities on tasks. Also we keep track of what we do, so we can then tell what we've done. So for us, this is a little more work of "documenting" our work (so maybe we work even less efficiently than before?), but for the outside world, we make more noise (in a positive meaning) around that. So basically, people know what we do, what we are going to do, when they can expect changes and most importantly, they understand why some feature request is being postponed. This is (and was) because we have more important things to do, but before they couldn't tell it.

Squark turned into a professional project manager, that manages our time. pieterh decided to talk to the Community and listen to their complaints (he reads or at least skims every post on Community forums). He tells Łukasz what needs to be done, Łukasz knows when we will have time to do this. This way communication inside Wikidot improved. Also we (michal-frackowiak and me) no longer look on Community forums (some of you may regret), but this allows us to concentrate on our work.

The work continues

As I mentioned before, we want to introduce a great feature to Wikidot, which is forms. But the implementation now concentrates on the open source version of Wikidot software (once it's ready, working and tested we'll copy the feature to the Wikidot.com service).

aptitude install wikidot

As forms is a huge change, I started to prepare a good ground for it and closed most important bugs in Wikidot open source and I'm about to start making Ubuntu packages for it to allow even-simpler installation on Debian-based systems. Now the installation involves only 6 child-easy steps and in fact can be done by copying&pasting a few commands.

Yesterday's party

Yesterday I went to met some old-school-times friends in the heart of the city. It was meant to be a meeting for "a beer or two" but evolved into beer and dancing till morning. That was first time I get a morning bus (not even the first) to my home just after partying.

It was such a great fun and great folks I met.

Summary

I hope with this long blog post (but divided into friendly sections ;) ) I recompensed long period of not-posting anything here.

]]>
http://piotr.gabryjeluk.pl/blog:working-on-mobile-webbrowserWorking On Mobile Webbrowserhttp://piotr.gabryjeluk.pl/blog:working-on-mobile-webbrowser
<p>As I noted <a href="http://piotr.gabryjeluk.pl/dev:easter-time">before</a>, I'm running Gentoo in chroot of my iPAQ H3870. As next step of fun with this PDA, I'm willing to create a mobile webbrowser for this (and other) Linux-powered mobile devices. Inspired by iPhone's Safari I want the browser to have the following features:</p>
<ul>
<li>fast</li>
<li>easy to use</li>
</ul>
<p>by <span class="printuser avatarhover"><a href="http://www.wikidot.com/user:info/gabrys" ><img class="small" src="http://www.wikidot.com/avatar.php?userid=2462&amp;amp;size=small&amp;amp;timestamp=1506462147" alt="Gabrys" style="background-image:url(http://www.wikidot.com/userkarma.php?u=2462)" /></a><a href="http://www.wikidot.com/user:info/gabrys" >Gabrys</a></span></p>
Wed, 15 Apr 2009 18:42:03 +0000
As I noted before, I'm running Gentoo in chroot of my iPAQ H3870. As next step of fun with this PDA, I'm willing to create a mobile webbrowser for this (and other) Linux-powered mobile devices. Inspired by iPhone's Safari I want the browser to have the following features:

fast

easy to use

I want to use Qt and Webkit for this purpose. I will use PyQt for prototyping. As the interface will be minimal this should not add big overhead. For final version probably I'll compile C++ code statically (inserting the latest Qt library into the result program).

What features the browser should have and how I will implement them?

fast — using Webkit engine — well integrated with Qt 4.4+

fast — using fast JavaScript engine — one of newest Qt/Webkit's features using JIT

easy to use — full page zooming — using Qt/Webkit zoomFactor property

easy to use — kinetic scrolling — feature popular in iPhone GUI (already implemented by some Qt hackers)

fast — some hacky-features should be implemented like weight(-and-number-of-connections)-reducing proxy (like in Opera browser) and some AdBlock-like features (probably non-configurable)

I'm planning VERY minimal interface. No long-history, no bookmark management. Only a button to "save" a page to the browser's dashboard. Also I think about some cool internal things to really make the browser usable and to make it as good as the iPhone's browser.

]]>
http://piotr.gabryjeluk.pl/blog:wikidot-apiWikidot APIhttp://piotr.gabryjeluk.pl/blog:wikidot-api
<p>A few days ago I started working on Wikidot <a href="http://en.wikipedia.org/wiki/API" >API</a>. The API will be a standardized way to access the Wikidot.com service in a programmable way (i.e. not using a browser) to retrieve, create and update information stored on Wikidot, including site browsing, page editing and commenting.</p>
<p>In simple words this will allow people to write applications that connect to Wikidot.com and perform some actions for the user that runs the application.</p>
<p>by <span class="printuser avatarhover"><a href="http://www.wikidot.com/user:info/gabrys" ><img class="small" src="http://www.wikidot.com/avatar.php?userid=2462&amp;amp;size=small&amp;amp;timestamp=1506462147" alt="Gabrys" style="background-image:url(http://www.wikidot.com/userkarma.php?u=2462)" /></a><a href="http://www.wikidot.com/user:info/gabrys" >Gabrys</a></span></p>
Thu, 22 Jan 2009 18:57:07 +0000
A few days ago I started working on Wikidot API. The API will be a standardized way to access the Wikidot.com service in a programmable way (i.e. not using a browser) to retrieve, create and update information stored on Wikidot, including site browsing, page editing and commenting.

In simple words this will allow people to write applications that connect to Wikidot.com and perform some actions for the user that runs the application.

Technically, the Wikidot.com API is an XML-RPC service exporting methods from a few especially designed classes.

To connect to an XML-RPC service, you must know its endpoint, which is a regular URL (http:// or https://) address. We decided to use HTTPS to secure the channel from the very start.

The operations we are going to support are:

Browse

site.categories

site.pages

page.get

Above ones are already implemented. Using the API calls you get retrieve almost all data you stored on the Wikidot.com sites!

Modify

page.save

This will be the basic method to update the content on your site. We plan several other methods, but this is the one that is the most important.

Comments

page.comments

page.comment

They will be used to get and post comments on a given page. Using reply_to parameter, there is a possibility to reply to a particular comment.

Forum

forum.groups

forum.categories

forum.threads

forum.post

This bunch of methods are going to give you full access to the forums you have started on Wikidot.

How to use the API

We haven't yet enabled the API access to the main Wikidot.com server, but testing the API with Python XML-RPC library is as easy as this:

then we construct the ServerProxy object s supplying the endpoint URL (SOME-URL in this case, as we don't have yet decided what the URL is going to be)

we can see a list of methods by calling system.listMethods on the ServerProxy object

we get a help message for a method by calling system.methodHelp

then we get categories of site gamemaker (yeah, it's a part of the wikicomplete.info)

then we call site.pages method (specifying site and category parameters), but instead of displaying the whole list of structures that describe pages, we only display their names

calling page.get returns an array with the information about a page, including:

wiki source, array key: source

generated HTML, array key: html

array with various meta-data, array key: meta

we call page.get passing as the argument array that specifies site and page name, get the page object, but displays only what's stored under the source array key

As you see playing with this is really easy as is browsing the available methods and using them.

Why XML-RPC

We've chosen this protocol because it is an easy way to develop both server and client in almost any programming language. Also it gives some flexibility in passed arguments and return values.

We use struct XML-RPC type as the argument and return value type, which is mapped to associative array or dictionary in client (and server) libraries. Each API method gets a bunch of required and optional parameters, that are basically values stored in the struct passed to API methods.

For example site.pages gets a struct with the following keys:

site (site name to get pages from) — required

category (category to get pages from) — optional

This means, you have to create an associative array (when using PHP) or a dictionary (using Python) and pass it as the method argument:

]]>
http://piotr.gabryjeluk.pl/blog:bridging-python-and-phpBridging Python And PHPhttp://piotr.gabryjeluk.pl/blog:bridging-python-and-php
<p>Imagine you have a PHP-based application (like Wikidot). Now, you want to extend it using Python. Through all ways to do it, I'll show you how to achieve this using XML-RPC protocol.</p>
<p>by <span class="printuser avatarhover"><a href="http://www.wikidot.com/user:info/gabrys" ><img class="small" src="http://www.wikidot.com/avatar.php?userid=2462&amp;amp;size=small&amp;amp;timestamp=1506462147" alt="Gabrys" style="background-image:url(http://www.wikidot.com/userkarma.php?u=2462)" /></a><a href="http://www.wikidot.com/user:info/gabrys" >Gabrys</a></span></p>
Sun, 11 Jan 2009 10:48:16 +0000
Imagine you have a PHP-based application (like Wikidot). Now, you want to extend it using Python. Through all ways to do it, I'll show you how to achieve this using XML-RPC protocol.

Background

On server this works like getting a bunch of functions from your application and exporting it with HTTP.

On client this works like connecting to a XML-RPC server, finding out what function it delivers and constructing a so called server proxy — an object having a method for every function exported by an XML-RPC server.

Calling the methods of the server proxy connects to the server using HTTP, passes arguments and transport the result back to the client. So basically this works AS you have a remote located object locally available.

The data encoding between client and server is defined in XML-RPC specification and is a language based on XML (but you actually never touch it, the XML is converted to objects by libraries).

Overview

We want to run an XML-RPC server exposing a class in PHP and an XML-RPC client in Python to communicate with the XML-RPC server.

Traditionally we would need to have an HTTP server for the PHP XML-RPC server, because HTTP is used as the XML-RPC transport. But digging a bit into the specification, you'll discover, that none HTTP-specific parts of the protocol are used. It's just used as a line to transport the XML data.

So you may wonder if it's possible to use XML-RPC with transport other than HTTP. In short, yes. But you may need to hack around the XML-RPC libraries (because they usually suppose you'll want to use HTTP).

Set_include_path line adds the /path/to/zf/library directory to PHP path, so you can import the Zend_XmlRpc_Server class (located in /path/to/zf/library/Zend/XmlRpc/Server.php file).

Then there is an instance of Zend_XmlRpc_Server created, then there is MyClass attached as the class for myclass XMLRPC namespace. This means the repeat method is to be called via the XML-RPC as myclass.repeat.

If you place the file on your server and have it under some URL, for example:

Omitting the HTTP protocol

Probably you have both Python and PHP scripts to be run on the same machine, so the HTTP part is quite useless and an additional point of failure.

As I already stated, the HTTP is only a transport and you can replace it (with some cost) with some other transport.

I came into an idea to use stdout/stdin as the transport, so Python would execute a PHP script (command line interface) and pass the XML-RPC request to the script's stdin. PHP would then have to get the XML-RPC request from stdin instead of from HTTP request.

The change is passing an instance of Zend_XmlRpc_Request_Stdin to $server->handle(). This is all needed. Guys from Zend Framework already predicted such a use.

Then, the client part.

Xmlrpclib allows passing a custom transport in case you want to implement some proxies or other thing. We'll make a transport, that instead of making a HTTP connection, runs a PHP script, passes the request to its stdin and gets the response from stdout:

Only public methods are exposed to the XML-RPC clients, so you can hide some logic inside of private or protected methods and only expose what you need from given classes.

This solution is a quick way to actually use some of your well-working PHP code in your fancy-new and elegant Python application. This can help if you want to make a filesystem with Python-FUSE, but want to data be taken from PHP application.

My girlfriend was very unhappy with me not having my played songs submitted to Last.fm social music revolution portal.

That was because used to use Music Player Daemon and its various clients. Most of the clients don't implement the AudioScrobbler protocol, but as a matter of fact this is not needed, because there can be a separate MPD client meant just to submit the info to last.fm, running in parallel to one actually playing music.

I used to use scmpc for this reason, but since I bought my new laptop and migrated to Ubuntu I quit it — because the application was not in their repo.

Today I decided to find some short python-based implementation of audioscrobbler and enrich one of MPD clients with the Last.fm integration. I found pympd really good — nice looking, having plugin architecture, clean and simple. So I decided it to be my new favorite MPD client. Then I quick-hacked some random plugin and created a new one including almost 100% source from python-scrobbler project. The plugin:

sends "now playing" info to Last.fm on each track-change and loading of plugin

sends "song played" info to Last.fm on track-change and plugin unload event if song was listened at least to the half of its total length and is longer than 30 seconds.

Now the Last.fm user and password are hardcoded, but I hope to create a quick-and-dirty configuration window for it.

I had some problems with time convertion. The python-scrobbler sources suggest using datetime.utcnow() method while actually using datetime.now() is giving the right results.

]]>
http://piotr.gabryjeluk.pl/blog:working-on-tagfsWorking On TagFshttp://piotr.gabryjeluk.pl/blog:working-on-tagfs
<p>Today I spent another hour or two on working on tagfs (described in <a href="http://piotr.gabryjeluk.pl/dev:tagfs-idea">TagFs Idea</a>). It's now pretty much usable. You can now:</p>
<p>by <span class="printuser avatarhover"><a href="http://www.wikidot.com/user:info/gabrys" ><img class="small" src="http://www.wikidot.com/avatar.php?userid=2462&amp;amp;size=small&amp;amp;timestamp=1506462147" alt="Gabrys" style="background-image:url(http://www.wikidot.com/userkarma.php?u=2462)" /></a><a href="http://www.wikidot.com/user:info/gabrys" >Gabrys</a></span></p>
Fri, 26 Sep 2008 19:23:52 +0000
Today I spent another hour or two on working on tagfs (described in TagFs Idea). It's now pretty much usable. You can now:

browse files with the tag-directory mapping

directory tag1/tag2/tag3 is the same as tag2/tag3/tag1 or tag3/tag1/tag2

read/write files

file tag1/tag2/tag3/file is the same as tag2/tag3/tag1/file AND even tag2/tag1,tag3,file

tags are not only allowed as directory names, but also as comma-separated file prefix

create new files

echo Some Content > tag1/tag2/some_file

echo Some Other Content > tag1/tag2,some_file

files have all their "real" properties (owner, group, modification time — this is stored on the back-end filesystem)

Things to do yet:

forbid creating a file with tags rendering some directory with name of existing file

example: you have tag1/great file

then you create a tag1/tag2,great,people.txt file

this way, you would have a great directory showing in the tag1 directory, because there is a file having both tag1 and great

you also have great file there, so you end up with a file and a directory (tag) of the same name

scan and list files by their properties

file type — PDF, JPEG, HTML, …

EXIF tags for JPEG — date taken, camera info

ID3 tags for MP3 — artist, title, album

improve directory listing

include SOME files in sub-directories if the sub-directories consist of small number of files

as a result we get a easily-browsable repository of files

do some marketing

I would like to serve files from such a file system with FTP or Apache to let people feel the system

]]>
http://piotr.gabryjeluk.pl/blog:tagfs-ideatag filesystem ideahttp://piotr.gabryjeluk.pl/blog:tagfs-idea
<p>I just thought of my ideal file system, where nothing hides and is <strong>easy</strong> to find (without going through the whole system &#8212; either manually, or automatically &#8212; <tt>find</tt> or even the indexed way &#8212; <tt>mlocate</tt>).</p>
<p>by <span class="printuser avatarhover"><a href="http://www.wikidot.com/user:info/gabrys" ><img class="small" src="http://www.wikidot.com/avatar.php?userid=2462&amp;amp;size=small&amp;amp;timestamp=1506462147" alt="Gabrys" style="background-image:url(http://www.wikidot.com/userkarma.php?u=2462)" /></a><a href="http://www.wikidot.com/user:info/gabrys" >Gabrys</a></span></p>
Sun, 31 Aug 2008 08:14:59 +0000
I just thought of my ideal file system, where nothing hides and is easy to find (without going through the whole system — either manually, or automatically — find or even the indexed way — mlocate).

The main idea is tags. When I save a file I want to give it some tags. Moreover, I want to be able to find a file based on its properties, like

I dealt with some problems I came across like proper object encapsulation. Now, my database allows any object to be stored in it, even of classes that are internally used by my database.

I implemented "." and "WHERE" operators, which seemed the worst, because they needed working environmental stack.

Now comes, what's the most individual — optimizing things with query cache. Actually it's not a query cache, but rather a evaluateNode cache, but this works very similar (the only difference is that the evaluateNode cache is more deep in operation and can be used inside of queries. This means it can accelerate the processing even for one query!

More work will come for sure on this, because the cache seems to be really tricky with such a flexible database model we've assumed.

as pointed in the previous day comment, there is another IDE for developing in Python, which is BoaConstructor. As a matter of fact, I haven't taken that into consideration, because it was told it is a IDE for GUI development. As I was not about creating a GUI, this was completely skipped.

My impressions of the IDEs:

pida, geany, drpython — interesting, but not worth a try for me

eric — good overall, but raises many dialog boxes with information of exceptions. These are fatal and are about the GUI, so you have to click OK and continue working

I tried the Eric first, because it was written as a really good program. I like it, but it is not polished and would never be bought (if commercial) by anyone:

the support for SVN is really tricky

the GUI is really not intuitive — no "delete a file" or "new file (here)" in the navigator context menu

today I start working on object database to be implemented in Python. This is my individual project for my classes.

During the study-year I was taught how to parse sophisticated grammar with PLY (Python Lex and Yacc library). I've then learned how to then process the parsed queries, to get the results.

The database query language is similar to the SQL language. The difference is, we don't have definitions of tables and fields. Any object (row in table) can have fields other that other objects with the same name (of the same table).

This seems quite hard, but it has its pros.

Moreover we've analyzed the following query using the simple parsing and processing:

SELECTemployee.salaryWHEREemployee.salary > AVG(employee.salary);

now, having as much as 10000 employees, using the standard processing, this would mean calculating the average (AVG) from ALL salaries for EACH of the 10000 employees.

So the teacher proposed to use tree decoration, to mark the branches of the query syntax tree, that can be run once (i.e. their value does not depend on the rest of the query).

I proposed to do it other way: to don't analyze the tree to much, but instead implement a sort of caching. Each time we try to get the AVG(employee.salary) we check whether this was already calculated, if not (only the first time we get the AVG), we calculate this and populates the cache with the value.

This has the following pros:

we can use the cache in successive queries

we don't have to decorate query trees — it's not Christmas here

Today work I start with selecting a Python IDE to work with. During the classes I used PyDev — the Python plugin for Eclipse IDE, but I believe there any many new choices out there now — after a few months: