Mark Maunder

Main menu

Category Archives: perl

I’ve been asked this question twice in the last 2 weeks by people wanting to write their first Web application. So I’m going to answer it here for anyone else interested:

If you want to write Web applications you need to learn the following languages: Javascript, PHP, HTML, CSS and SQL. It sounds like a lot, but it really is not. You can learn enough of each of these languages to write a basic Web application within a week. Trust me. It’s easy!

PHP is the guts of your Web application. It is the language that runs on your web server. It is also the only language where you have a choice about learning it or learning another language. You must learn HTML, CSS and Javascript and 99% of web programmers learn SQL to talk to a database. But there are many other languages to choose from that can do the same thing that PHP does.

However, if you are starting out writing web applications, PHP is the first server language you should learn and here is why:

PHP is used by a huge number of websites, both big and small. Most of Facebook is written in PHP. Wikipedia powered by Mediawiki is written in PHP.

WordPress, the worlds most popular open source blog platform is written in PHP. If you know PHP you can change it any way you like or even contribute to the community. WordPress is used by eBay, Yahoo, Digg, The Wall Street Journal, Techcrunch, TMZ, Mashable and of course the whole of WordPress.com is powered by WordPress written in PHP.

Most of the worlds best content management systems are written in PHP.

99% of web programmers can understand PHP, even though some don’t realize it. (like Perl Developers)

PHP is a mature language which means the bugs have all been ironed out and it runs fast!

If you Google a question you have about PHP, you have a much higher likelihood of finding an answer than any other server programming language.

Don’t learn Perl because even though it’s a mature, fast and popular language, it’s harder to learn than PHP.

Don’t learn Java because Java is better suited to launching spacecraft and running systems that control oil rigs or banking software than Web applications. It is strongly typed which means that you need to write more lines of code to get the same thing done. It’s also harder to learn because it’s a purer object oriented language . It also is owned by Oracle which means it’s a commercial language and that means Oracle will continually be trying to sell you stuff by making things seem harder than they are and claiming they have the answer to the problem they created in your mind.

Don’t learn .NET because it’s also a commercial language and pretty much everything made by Microsoft either will cost you money or will break a lot.

Don’t learn Ruby because the guys who run the community are totala-holes who will insult you for asking beginner questions. Ruby is also way less popular than PHP or Perl even though it’s used to power Twitter. It’s also the reason Twitter is down so often.

Don’t learn Brainf*ck, Cobol, D, Erlang, Fortran, Go, Haskell, Lisp, OCaml, Python or Smalltalk because these are languages that people tell you they know to show off. Some of them have specific advantages like parallelism, being a pure object oriented language or being compact. But they are not for you if you are starting out. In fact, the combination of PHP and Javascript will give you 99.99999% of what all these languages offer.

You also need to learn two presentation languages: HTML and CSS. They are actually part of each other because HTML is not too useful without CSS and vice versa.

HTML tells the browser the structure and content of a page. e.g. Put a form after a paragraph and have one field for email and one for full name.

CSS tells the browser how to make that page look e.g. Which fonts to use, what size they should be, what colors, how wide or tall things on the page should be, how thick borders should be, how much padding to use and how thick to make margins.

Then you also need to learn a data storage language called SQL which lets you talke to a database to store things like visitor names, email addresses and so on. For example, using SQL you can tell a database to store an email address and full name by saying “INSERT INTO visitors (name, email) values (‘Mark Maunder’, ‘mark@example.com’);. There are other ways to store data and a popular terrorist movement calling itself NoSQL has formed in the last 4 years and they spend their time sowing fear and doubt about SQL and confusing beginners like you. The reality is that 99% of web applications use SQL and continue to use SQL. It works, it’s fast, it’s easy to learn and everyone understands it. It’s used by WordPress, Wikipedia, Facebook and everyone else who counts, whether they like it or not. Just learn SQL!! I also recommend you use MySQL to store your web application’s data (even though it’s owned by Oracle) because it’s the most popular open source database out there. PHP applications use MySQL more than any other database engine on the web.

HTML (a presentation language that tells the browser the structure of a page)

CSS (a presentation language that tells the browser how to make a page look once it’ knows the structure)

SQL (a data access language that lets you store and retrieve data from a database)

Each of these languages runs or executes in a certain place or environment:

Javascript runs inside the browser of someone who has visited your website. What’s cool about this is that it uses your visitor’s CPU and memory instead of the resources on your server.

PHP runs on your own web server. Most websites use a kind of “container” or application server to run PHP called Apache Web Server with something called mod_php installed. Apache handles all the web server stuff like receiving the request for the document and making sure it’s formatted correctly. It then passes the request to mod_php which is executing your PHP code. This actually runs your web application written in PHP, your program sends the response back to Apache which sends it back to your visitor.

HTML is interpreted by a visitor’s browser and tells the browser how to structure the page as it loads.

CSS is also interpreted by a visitor’s browser and tells the browser how to make the HTML look.

SQL is a language that you use inside your PHP application to talk to a database like MySQL. You will actually write SQL in your PHP code but it will be sent to the database engine which is where it is interpreted and executed. The database then sends your PHP code whatever it asked for, if anything. (sometimes you’re just inserting data and not asking a question)

As you progress you will get familiar with the platforms you run each of these languages on. They include:

Linux is the operating system you will run on your server. Everything else on your server runs on top of Linux. Linux lets your web application talk to the server’s hardware.

Apache Web Server running mod_php. This is the application server you will use to run your PHP code. It will receive the web requests, forward them to your PHP code, and receive the response which is forwarded to your visitor.

MySQL database engine. You will talk to mysql using SQL which is written inside your PHP code.

One last note to help you in your language decision making. It’s important that you understand there are a few phenomenon that may confuse you in your language research:

The first is that some software developers have little life beyond writing software and have large egos. One of the few things they have to impress you with is their own intelligence. They will try to make programming sound harder than it actually is. It’s not hard. It’s easy.

Secondly, an arrogant programmer may regale you with a list of programming languages to choose from and tell you that he or she knows them all. They may make the choice sound complicated. It’s not. They’re just showing off. Choose PHP and the set of tools listed above and you’ll be fine.

Third, remember that there is always something new and shiny coming out that will get a lot of attention and is advertised to “change the way we…” or will “make everything you know about programming irrelevant”. Ignore the noise and stay focused on the basics. Until a new language, operating system, application or piece of hardware has been around for a while (usually at least 5 years), it’s going to be full of bugs, run slow, break often and it will be hard to get help by Googling because few people are using it and have had the problem you’re having.

Lastly, many companies like Google and Facebook spend a lot of time and energy trying to attract the best software engineers in the world. Google associated themselves with NASA purely for this reason, even though they’re in completely different businesses. To draw attention to themselves as thought leaders in software they talk a lot about languages like Erlang, Haskell and so on. The reality is that their bread and butter languages are pretty ordinary – languages like C++ and PHP. So don’t get confused when you see Facebook talking about using Erlang for real-time chat. They’re just showing off. Their bread and butter is PHP, HTML, CSS, SQL and Javascript, like most of the rest of the Web.

Who am I and how dare I express an opinion on this? I’ve been programming web applications since 2 years after the Web was invented. I’m the CEO and CTO of a company who’s web apps are seen by over 200 million unique people every month. I also own the company. I’ve seen languages and platforms come and go including Netscape Commerce Server, Java Applets, Visual Basic, XML, NetWare, Windows NT, Microsoft IIS, thin clients, network computers, etc.

The video below is Perl’s development history since Larry Wall and a small team started out in January 1988. It’s visualized using gource. Notice how dev activity has continued to increase all the way to 2010.

Perl is a powerful language. It’s also fast and everything you need has already been implemented, debugged, refactored, reimplemented and made rock solid. If you ever have a problem, it’s already been solved by someone else. When I was in my 20’s I was a big fan of the new-new thing. Now, as a startup owner taking strategic risks and trying to reduce risks in other areas of the business, I love Perl because I know it will do right by me and I deploy code knowing I’m not betting on a language that might one day grow up to be what Perl already is.

Watch the video in 720 hidef on full screen (Youtube link) to see all the labels. It’s awesome realizing how much work and evolution has gone into some of the core libs that I use.

I have this module that a great group of guys in Malaysia have put together. But their language of choice is PHP and mine is Perl. I need to modify it slightly to integrate it. For example, I need to add my own session code so that their code knows if my user is logged in or not and who they are.

I started writing PHP but quickly started duplicating code I’d already written in Perl. Fetch the session from the database, de-serialize the session data, that sort of thing. I also ran into issues trying to recreate my Perl decryption routines in PHP. [I use non-mainstream ciphers]

Then I found ways to run Perl inside PHP and vice-versa. But I quickly realized that’s a very bad idea. Not only are you creating a new Perl or PHP interpreter for every request, but you’re still duplicating code, and you’re using a lot more memory to run interpreters in addition to what mod_php and mod_perl already run.

Eventually I settled on creating a very lightweight wrapper function in PHP called doPerl. It looks like this:

On the other side I have a very fast mod_perl handler that only allows connections from 127.0.0.1 (the local machine). I deserialise the incoming JSON data using Perl’s JSON::from_json(). I use eval() to execute the function name that is, as you can see above, part of the URL. I reserialize the result using Perl’s JSON::to_json($result) and send it back to the PHP app as the HTML body.

This is very very fast because all PHP and Perl code that executes is already in memory under mod_perl or mod_php. The only overhead is the connection creation, sending of packet data across the network connection and connection breakdown. Some of this is handled by your server’s hardware. [And of course the serialization/deserialization of the JSON data on both ends.]

The connection creation is a three way handshake, but because there’s no latency on the link it’s almost instantaneous. The transferring of data is faster than a network because the MTU on your lo interface (the 127.0.0.1 interface) is 16436 bytes instead of the normal 1500 bytes. That means the entire request or response fits inside a single packet. And connection termination is again just two packets from each side and because of the zero latency it’s super fast.

I use JSON because it’s less bulky than XML and on average it’s faster to parse across all languages. Both PHP and Perl’s JSON routines are ridiculously fast.

My final implementation on the PHP side is a set of wrapper classes that use the doPerl() function to do their work. Inside the classes I use caching liberally, either in instance variables, or if the data needs to persist across requests I use PHP’s excellent APC cache to store the data in shared memory.

Update: On request I’ve posted the perl web service handler for this here. The Perl code allows you to send parameters via POST using either a query parameter called ‘json’ and including escaped JSON that will get deserialized and passed to your function, or you can just use regular post style name value pairs that will be sent as a hashref to your function. I’ve included one test function called hello() in the code. Please note this web service module lets you execute arbitrary perl code in the web service module’s namespace and doesn’t filter out double colon’s, so really you can just do whatever the hell you want. So I’ve included two very simple security mechanisms that I strongly recommend you don’t remove. It only allows requests from localhost, and you must include an ‘auth’ post parameter containing a password (currently set to ‘password’). You’re going to have to implement the MM::Util::getIP() routine to make this work and it’s really just a one liner: