Alternatives of PHP ?

When I loaded GigaBytes of XMLs into mysql database with PHP script, I was thinking about PHP alternatives. Why do I need that ? 1. PHP is slow – I’m speaking about area of data processing and implementation of algorithms 2. No good cli debugger – I’m just tired of debugging with ‘echo’ and ‘var_dump’ 3. Unpredictable memory consumption – it’s easy in processing of big files to eat all available memory 4. Need something new – I’ve been using PHP for almost 10 years, so I want to try something else to refresh my mind. List of my requirements: – Stable binding to MySQL, support of new protocol with prepared statement is necessary. – good XML handling – Fast in terms of performance and development – multiplatform, at least Linux, Windows, Solaris necessary – under active development and wide community – web binding – not Java (I don’t like it, as most PHP-guys, I believe) – has something that impressive me

So I went to The Computer Language Shootout Benchmarks and walked through wide list of proposed languages: D, Ocaml, Haskell, Erlang, Python, Ruby, LISP, Scheme, Lua, Eiffel, C#, SML, Perl, TCL and rest. Unfortunately my first requirement of support MySQL narrows the list to Perl, Ruby, Java, C, C++, even Python supports only old mysql protocol (am I wrong here?). Excluding Java (see my requirements) and C / C++ (I don’t consider them seriously from Web-development) I have: Perl and Ruby. One problem with these languages they work as fast as PHP if believe to Shootout Benchmarks, so I’m not sure I should replace PHP on Perl/Ruby. However, taking into account increasing popularity of Ruby, maybe I’ll take a closer look at.

Speaking about impressive features I’d look at Ocaml & Haskell as functional languages. Also I was enjoying of syntax of J and K languages, e.g. program to calculate count of words in file and quicksort algorithm:

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

Hi, before ruling out Perl, please have a look at this discussion about the benchmarks you are mentioning.http://www.perlmonks.org/?node_id=392696 Most of the benchmarks were made by people with little experience of Perl. While Perl is not really designed to wrestle with gcc on efficiency, a judicious usage can achieve surprisingly good results. If you write Perl code by mimicking PHP/Java/C/C++ code, then it’s likely that you will end up with highly inefficient programs. If you want to code for efficiency, you should learn the Perl way of doing things. This may not suit you, but you should at least know that most efficient results come from using the appropriate idioms.

1. PHP is slow – I’m speaking about area of data processing and implementation of algorithms

Really? What are you comparing it to? A compiled language like C or C++, perhaps yeah… that’s a fair comparison. I loathe the day you do car reviews, I’ll bet you’d be comparing Toyota Corolla to Williams/BMW Formula 1 car.

For complex algorithmic tasks like counting words, for example PHP offers native function written in C, called str_word_count(). And over 60 other string functions in addition to that (not to mention PCRE regex support).

2. No good cli debugger – I’m just tired of debugging with ‘echo’ and ‘var_dump’

Before publishing a research article please educate yourself with a search engine, I hear Google is very user friendly. There is a free, open source debugger for PHP called Xdebug that works quite well on CLI and offers all the same capabilities you’d expect from something like GDB, with a few PHP specific features.

Ok, well if you do malloc(large_random_value) it’d be hard to predict your memory usage as well. If your code is not designed to be careful in use of memory, PHP offers memory_limit that allows you to restrict PHP’s memory utilization to a given value. PHP memory usage is actually quite linear and very easy to predict in just about all cases, with exception of instances when memory is allocated outside of PHP by a 3rd party library such as libxml. This however, would be an issue in any scripting or programming language. In the interest of science why don’t you write some C code that uses libxml and see that crash, then you can publish an article on how horrible and slow (and memory inefficient) C is.

Also if you are planing to store gigabytes of XML use something equally elephantine like Oracle or IBM’s db2. If you need something smaller try Berkeley DB4. All of the previously mentioned solutions have internal XML data representation ability and are far more efficient for this use.

4. Need something new – I’ve been using PHP for almost 10 years, so I want to try something else to refresh my mind.

This perhaps is the only point that makes sense. If you really want speed and processing efficiency native ASM is the only true way, custom XML parser in ASM will surely be an entertaining hack that will only run one of CPU reeeealy fast, make sure to use CPU’s native vector instructions to really push all you can out of it!. Or if you are totally lame you can stoop down to using C. And then you can make a php module (which is how it is supposed to be done anyway). Thereby making your work useful to the community as opposed to this particular article.

P.S. Rasmus? Is this you? Dude….. You wrote php 10 years ago and now you’re bored? If by some remote chance this is not Rasmus, please indicate when you are using “dog years” to count time, as PHP has only been out for 10 years and for the 1st year has pretty much had a 3 digit user base.

Take it easy – I\’m sorry it hurt you so deeply. You are right – this is Sunday\’s piece of flame. I\’m still using PHP for next bunch of XMLs.

1) Speed. I\’m speaking not only about C/C++ but also about Python, Perl, Lua, Java, Haskell, Ocaml, e.g. http://www.timestretch.com/FractalBenchmark.html Btw, what is Toyota Corolla and what is Williams/BMW Formula 1 in list of PL from your point of view? 2) Thank you for pointing me on xdebug as cli debugger. As I see it is still in beta stage and isn\’t under active development. Did you use it for debugging cli php scripts? 3) memory: I\’ve posted bug report about memory consumption in libxml: http://bugs.php.net/bug.php?id=38604. Also I\’ve seen such problem several times in other areas. Well, that can be problem in third-party libraries, but this fact does not make my life easier.

regarding your sarcasm about Rasmus – (un)fortunately I\’m not Rasmus, but IIRC I started to use PHP in 1997 when I made my first website for my company. That was early version of PHP 3.

That isn’t sendmail. J programming language:http://en.wikipedia.org/wiki/J_programming_language which is declared to be “very terse and powerful, and is often found to be useful for mathematical and statistical programming, especially when performing operations on matrices”.

I think Perl is a good alternative. I agree with Guiseppe. I have been programming Perl for 8 years and PHP for 6, and know both quite well. If you learn the intricate details of Perl, it can be very efficient indeed. Especially if you use SAX to parse your XML. I’ve converted some scripts at work to parse large files with SAX and had good results.

By the way I have deep expertise with XML and related technologies; I think you will find Perl’s support for XML so good and so varied that you can achieve better results by choosing one of its many tools for XML, just for your specific need, rather than for example .NET’s XML support which only offers you two or three ways to do something. (.NET’s XML Parser takes laughable amounts of memory too, if you use it badly).

I only say that to point out that if you use the wrong thing in Perl, it will suck, just as if you use the wrong thing in .NET it will suck, but you can do the right thing and it will fly.

I too have never found a good CLI debugger for PHP. There are ones but it can be very frustrating to get them to work. I’ve never found it satisfactory. On the other hand, perl -d is fantastic

This little line is certainly only about reproduce the leak/bug, but somehow representative of how to do things wrong (and I seriously hope you don’t read 1G with this line ;-): simplexml_load_string(file_get_contents($filename));http://www.php.net/simplexml_load_file

Also as long as you use P*, Ruby or mono, you will hardly see huge performence improvements as they all use libxml (except if you use sablotron or other alternatives).

If you like to limit your memory usage (in a constant manner), I recommend you xmlreader (available for c# and perl too, no idea if python/ruby have it). It is as fast as the other api but with a very low and “constant” memory footprint.

But the point is clearly about knowing what you do and how you should do it (which extension, functions or API fits best), it does not matter if you develop with PHP (or any other language) since N years or 2 months.

There is always a way to speed up programs/algorithms, for instance rewriting them in assembly language, however, in this case the speed is bundled with a portability nighmare.

Perl syntax is a little too loose, mixing the programming styles most likely will make the code unreadable and unmaintainable, a few more advices on coding styles can be found at http://thc.segfault.net/root/phun/unmaintain.html). Personally, writing Perl code I use a subset of Perl syntax constructions that resemble C-language syntax.

First I should note you should not take it as attempt to start a flame. I think anyone developing long enough using any Language (and any piece of technology in general) can get upset with it, for specific project. This is great motivation to try out other things.

Speaking about benchmarks mentioned I agree this is not really the point – if you develop in PHP all heavy weight processing is normally done in modules written in C/C++ – XML processing, regexp matching, sorting and even MySQL Client. If you need something else which is very CPU hungry and can’t be mapped efficiently on existing routines you should consider implementing it as an extension.

With CLI Debugger the keyword is “GOOD” – meaning allowing you to effectively debug applications in CLI mode. Why IDE is not enough ? Because we’ve got to work with remote Servers which might not even have X Windows. Not to mentioning working with remote X Windows client can be pretty slow.

Now memory consumption – what Vadim is mentioning is memory leaks. I’ve run into this with number of extensions (and XML In particular) and unfortunately developers do not seem to care. For Web applications it is not critical as soon as memory is freed after each request is processed. It is however big problem for batch job applications and permanently running script. Number of customers I worked with had used workarounds something like fork, process 100 files and exit exit, giving up workaround memory leaks.

There is also does not seems to be whole a lot of memory allocation tracking tools for PHP. I mean something which could tell you where memory was allocated for which objects etc, which could help to point where exactly leak happens.

My other concert with PHP which Vadim does not mention is lack of error handling. Of course there are exceptions but there does not seems to be a way to intercept fatal error, so you could display nice error message instead of partially created page. Yes these are often caused by development errors, for example passing false instead of object and trying to call it method. This kind of error should also be catchable.

I don’t want to go off topic, but if you want to intercept fatal errors Peter, you can set a custom error handler with set_error_handler(). Actually, you can even convert almost all runtime errors into exceptions by using your own error handler.

I’d like to know why do don’t like Java? I suspect I’d know the answer but I’d like to know. Putting that aside, have you actually considered JSTL, Java Standard Tag Libraries.

While this is built on Java, it’s very PHP/ASP like in syntax. Indeed I use in for quite rapid prototyping. The clear benefits are it’s intepereted, so you need only an editor and a browser to code away. You can however at any time revert to straight Java code if there is some functionality not supported (I find this rare).

Java…. It is just whole another world. I’m not saying it is bad though.

Things I find inconvenient (possibly wrong is)

– Need to compile stuff. For scripts I prefer to be able to run them right after the changes. – Class names. They are smart and standardized which does not make them pretty for my taste. – Too much standardization. For example Connector/J has to be JDBC complaint which means it has to do many smart things which are required by specs. – Love for complication. Java applications are typically designed “right” which makes them complicated. You can take a look at any stack trace posted for Connector/J bugs… it is rarely less than 20 levels. – Too many third party extensions, many acre commercial. – Not overly convenient to work with strings. – Product of large company. With Perl and PHP it is relatively easy to reach developers. – Fully OO. I like objects but not for 2 line scripts.

I guess most of them are just lame excuses the right reason would be I just do not feel like it. I enjoy playing with something which allows me to do things quick and dirty. Most of my tasks are far in complexity and reliability requirements from plane auto pilot.

Java is probably good for enterprise world but I do not expect it to get too much traction in Web words which wants applications to be quickly developed by students implementing prototypes for their ideas.

I knew something would write about set_error_handler. Unfortunately it does not work.

Here is what documentation says:

The following error types cannot be handled with a user defined function: E_ERROR, E_PARSE, E_CORE_ERROR, E_CORE_WARNING, E_COMPILE_ERROR, E_COMPILE_WARNING, and most of E_STRICT raised in the file where set_error_handler() is called.

Possible memory leaks were exactly the reason why I never used PHP for anything else but short running scripts (like web apps). As soon as I got the task to write some long running daemons in a scripting language for the first time some three years ago, I turned to Perl or even Bash (for simpler stuff), even if I never hit a real problem with memory leaks in PHP before.

But as soon as I started to write the first few lines of the deamon in PHP, I started to get some bad feelings about it (even if it would’ve fit for the task at hand and I was actually almost only using PHP at that time). I just realized that nobody was using PHP for anything that runs longer than a few seconds – and I just didn’t want to be the first one to hit the possible bugs.

If you change your mind and still want to use your PHP expertise, then XMLReader is really nice for huge XML -> database transformations. If you don’t need to do complicated processing of the data then it’s quite nice and if you take care to avoid circular references (not that hard with the XMLReader processing model), doesn’t leak memory. I haven’t had the “pleasure” of working with gigabyte sized XML files, but hundreds of MB’s aren’t a problem. As I have 6 years of experience using PHP, the development goes really fast too. Not that Python, et al are bad for this kind of thing, but experience really counts towards development time.

One advantage of Perl is it’s attracted the type of developers that like to solve this kind of problem (unlike PHP). Browse CPAN – Perl’s XML tools a also good. And demand Unicode support (some like Perl have it… other languages don’t)

C# has the MySql Connector/NET and is reasonably fast and predictable. The Mono project has a page on the topic with a quick example – http://mono-project.com/MySQL there is also the native System.Xml classes for handling Xml internally.

Perl is great but with so many dead links to what were booming sites before is sad. Perhaps a bad perception on my part but it seems to me that a lot of the open source projects seem to lack the wealth of community support they once had years ago.

Have you tried Euphoria? I’ve been looking into it for sometime. I just can’t stand php anymore, it’s slow, messy, and hard to develop. From what I understand it’s easy to install, just put the interpreter in cgi-bin as it’s written in c. From a benchmark I found it’s apparently 35x faster than perl and 31 times faster the python. http://www.rapideuphoria.com/bench.txt It has an ODBC library allowing it to connect to MySQL http://www.usingeuphoria.com/?page=bestofeu

I see also it has an euphoria to c translator that makes it 3.7 times faster. I think I’m going to download it and play around with it, I’ll let you know what I think

You could try WebDNA: it is lighter than php and mysql and is also much faster (resilient in-memory database). It is 100% compatible with any browser (does not need anything client-side) and you can hack virtually anything the server send to the browser. I run it with heavy in-memory databases but you can use MySQL if you want to.