Background (or Why would I even think to do this?): I inherited some useful scripts that were written in perl. When I started working with them, it was far easier to refresh my decade-old knowledge of perl than to rewrite in some other language. The scripts update data in a database, and since my application is hosted in Azure, I decided to have these scripts write to SQLAzure. This is no problem in perl. It’s pretty much the same as connecting to any MSSQL db. Since the data that these scripts generate is time sensitive, I really need to schedule them to run. (Most of them need to be run daily, and a couple of them more often.) Since my main app, which uses this data, runs in Azure already, and my database is SQL Azure, I decided that I should look into running these scripts from Azure itself. So now, I have these scripts running under my Azure Web Role deployment, scheduled with Quartz.Net.

The Project:

It includes a web role that includes a startup task. The startup task uses a powershell script to download Strawberry Perl from blob storage and then unzips it with 7zip. Lastly, the startup task installs required CPAN modules.

In the OnStart() method of the web role, a quartz.net perl job is scheduled to run periodically (1 minute in the sample project, below).

When the quartz job is triggered to run, the perl job starts the perl process and executes the perl script. It captures standard output and standard error, and writes them to the trace logs.

That’s really all there is to it, though it’s trickier to set up than I had thought. I’ve posted the solution to github, to hopefully make this easier for the next person.

If you need to do this or something similar to this, I highly recommend the following articles/blog posts. They were incredibly valuable to me:

download 7za.exe, the 7-Zip Command Line Version, available here: http://www.7-zip.org/download.html; place this in the root of WebRole1 and set “Copy to Output Directory” to “Copy always”

set the Diagnostics Connection String in ServiceConfiguration.Cloud.cscfg

set the url to the strawberry perl zip file in downloadPerl.ps1

This should run in the development environment and in Azure. You can browse to the startup log files in \approot\bin\startuplogs. In the development environment, during debugging, this would be under [Your Cloud Project]\csx\Debug\roles\WebRoot1. In the cloud I found this under an E:\ or F:\ drive (while connecting via RDP).

Some gotchas (or maybe tips):

Multiple instances of this role will each execute the perl scripts on its own schedule. Therefore, the scripts need to be idempotent. In the long run I don’t want this behavior, so the next step for this project is to use blob leases in the perl job. I plan to deploy the script changes via blob, so at the start of a job, the job will try to acquire a blob lease on the script. If the job can acquire a lease it will download the script, execute it, and release the lease. If the job cannot acquire a lease it won’t do anything. The scripts already contain logic to make sure the data was not already generated by a different run of the script. In this case a script will start, but it will notice that there’s nothing to do and end.

Because I used a feature of powershell 2, I needed to specify osFamily=”2”, so my role would run on Server 2008 R2. (ServiceConfiguration.Cloud.cscfg)

I initially used powershell and shell.application to unzip the perl zip file. This worked quite well in the development environment, but failed in Azure. After being unsuccessful at tracking down the cause, I switched to 7zip which just worked. I would have preferrred not having this dependency.

I made sure to log/trace almost every step of the startup. It was invaluable to be able to browse to the startup logs in the development environment, and later in Azure (via RDP) to see what was going on.

I had to explicitly add a DiagnosticMonitorTraceListener to my WebRole OnStart, so that standard output and standard error from my perl job execution would also be logged. (See Neil MacKenzie’s article)

This is for my own future reference. I had nunit tests that I was able to debug without any problem in Visual Studio 2008, with a dll that was targeting .Net 2, but I needed a feature in .Net 4 for a POC. I converted the Visual Studio project to VS 2010, and changed the dll and unit test dll to compile for .Net 4. The tests still ran fine, but I could no longer break in the debugger (from F5). In order to do this, I needed to target .Net 4 framework with the nunit /framework switch (see image), and…

to configure nunit-console-x86.exe to run with .Net 4 by commenting out supportedRuntime version=”v2.0.50727” in C:\Program Files (x86)\NUnit 2.6\bin\nunit-console-x86.exe.config, like so:

<?xml version=”1.0″ encoding=”utf-8″?><configuration> <!– The .NET 2.0 build of the console runner only runs under .NET 2.0 or higher. The setting useLegacyV2RuntimeActivationPolicy only applies under .NET 4.0 and permits use of mixed mode assemblies, which would otherwise not load correctly. –> <startup useLegacyV2RuntimeActivationPolicy=”true”> <!– Comment out the next line to force use of .NET 4.0 –><!– <supportedRuntime version=”v2.0.50727″ />–> <supportedRuntime version=”v4.0.30319″ /> </startup>

<!– Look for addins in the addins directory for now –> <assemblyBinding xmlns=”urn:schemas-microsoft-com:asm.v1″> <probing privatePath=”lib;addins”/> </assemblyBinding>

</runtime>

</configuration>

The top screenshot also shows that I’m redirecting standard output to a file. This is because I’m doing some crude performance analysis in the tests, collecting timing information and then writing it to standard output. This file (“TestResults.txt”) gets written to the same directory that the dll is in. I’m also targeting a particular test category (“MyCategory”), because the test suite has many more tests, but I only care about one group (aka category).

The NCAA collects player statistics from many different sports, but if you want to get that data, it’s not always easy. It’s not like they have an open API. Recently, I wanted to get the season-long statistics for all NCAA baseball players, divisions 1-3, so that data could be used for other purposes. I wrote a screen scraper to do it. It’s a very simple script, which crawls the site, scraping the information and outputting it to csv (comma separated values) so it can then be imported into an Excel spreadsheet or a database. Highlights:

Can scrape 1 school’s worth of baseball player data, if you know the NCAA’s numeric identifier for the school.

Can scrape all men’s baseball player data for all NCAA schools for which data is available. (Not all schools have data listed)

Implementation details:

It’s written in php.

It’s written for men’s baseball, but it could be made more generic by changes to the parsing of the HTML pages and the generation of the csv.

It could be changed to pull another sport’s data in under an hour (probably closer to 1/2 hour).

It does not include the NCAA’s player ids in the output, but that would be easy to add.

My challenges:

I hadn’t written anything serious in php in a while, since my day job consists mostly of Java, JavaScript, C#, and VB.Net. I chose php only because I suspected that the person who was going to use this would have an easier time with this than another language.

Because the NCAA.org site requires an established Java session, each request to the site is actually made in 2 requests. The first retrieves a jsessionid, and the second requests the data, attaching the jsessionid. This makes the application take twice as many requests as should be needed. (This certainly could be improved, but considering the script is not intended to be run many times, this is probably sufficient.)

Considering I wrote it in 4-5 hours, in a language I haven’t written serious code with in several years, and considering I only needed it for a 1 time scraping, I think it came out ok. I don’t expect to need this script again. The code is available here:

Review:

I finally got a chance to finish reading “High Performance JavaScript”. This has been on my to do list for almost a year. I bought the book after seeing an online video of Nicholas Zakas speaking about JavaScript performance, and when I found out that he would be speaking at the Boston Web Performance Group, I finally decided to stop putting it off and read it. I’m glad I did, and I’m glad I got to see his presentation in person. This book is chock full of tips on writing performant JavaScript. Admittedly, browser vendors are improving JavaScript engines at a rapid pace, making some techniques less important than they were just a couple of years ago, but the insights here are still indispensible. Understanding how the JavaScript engines work under the covers should be mandatory for anyone that writes front-end JavaScript for a living, and this is one of the best resources I know to learn this.

If writing JavaScript is a significant part of your job, I highly recommend that you read this book . A lot of the focus is on front-end JavaScript, but much of the information would apply in server-side JavaScript too (Node.js). I couldn’t give it 5 stars, just because the rapid evolution of JavaScript engines & browser technologies makes it a little murky as to which information and/or techniques discussed are the most important…today. I also didn’t particularly like the abrupt changes of writing style by the guest authors, but that’s a minor annoyance, and shouldn’t stop anyone from jumping in.

Overview:

Preface – A brief overview of the history of JavaScript and JavaScript engines.

Ch. 1 – Loading and Execution – This chapter discusses the JavaScript source files themselves and what impact their loading and execution has on the actual and perceived performance of a web page. It gets into the nitty gritty of how to avoid having scripts that block the UI on the page, and how to minimize the number of HTTP requests the page makes.

Ch. 2 – Data Access – This covers where data values are stored within your JavaScript program, i.e. literals, variables, arrays, and objects. There is a cost to accessing your data, and this chapter is worth the price of the whole book, just because of the explanations of identifier resolution and scope chains.

Ch. 3 – DOM Scripting – This was written by Stoyan Stefanov (http://phpied.com). It covers the interactions between the browsers’ scripting engines and DOM engines. It has very good explanations of html collections and how to work with them in the most performant way. It covers browser repaints and reflows and how to minimize them. And it goes over event delegation and using parent event handlers to handle events on children to minimize the quantity of event handlers on a page.

Ch. 4 – Algorithms and Flow Control – This chapter discusses the performance implications of how you use loops and conditionals, and how you structure your algorithms. It includes comparisons of the different loop constructs, of recursion vs. iteration, etc.

Ch. 5 – Strings and Regular Expressions – This was written by Steven Levithan (http://blog.stevenlevithan.com/). This chapter explains the common pitfalls to nonperforming string operations and regexes and how to avoid them. In doing so, the author describes what the browser is doing under the covers… interesting, but probably needs to be used as a reference. (i.e. not something I’ll just remember after one read.)

Ch. 7 – Ajax – This was written by Ross Harmes (http://techfoolery.com/). – “This chapter examines the fastest techniques for sending data to and receiving it from the server, as well as the most efficient formats for encoding data.” There are some interesting things in this chapter that I rarely see talked about elsewhere, such as Multipart XHR, image beacons, and JSON-P.

Ch. 8 – Programming Practices – Covers some programming practices to avoid, such as avoiding double evaluation, and not repeating browser detection code unnecessarily. Also covers practices to use, such as object and array literals and using native methods when available.

Ch. 9 – Building and Deploying High-Performance JavaScript Applications – This was written by Julien Lecomte (http://www.julienlecomte.net/blog/category/web-development/). This chapter discusses the important server-side processes and tasks that go into better performing web applications, including combining and minifying resources, gzipping, applying proper cache headers, and using a CDN.

Ch. 10 – Tools – This was written by Matt Sweeney of Yahoo. This chapter lists tools for profiling (not debugging) your application. This breaks down into JavaScript profiling and Network analysis, and includes descriptions of many useful tools such as native browser profilers, Page Speed, Fiddler, Charles Proxy, YSlow, and dynaTrace Ajax Edition.

Additional References:

I’m taking a small break from my technology posts to blog about an event I was privileged (and lucky) to attend in December. It was the FanDuel Fantasy Football Championship which took place in Las Vegas. Anyone that knows me knows that besides my family, my biggest interests are sports, classic movies, and programming, not necessarily in that order. So it should come as no surprise that I play fantasy sports. I’ve been playing in traditional fantasy football leagues for years, but last year I discovered weekly/daily fantasy via FanDuel.com. These are salary cap games, where you ‘buy’ players for your team and need to stay under a salary cap while filling out an entire roster. Just like other fantasy games you accrue points when your player scores or accumulates other stats that are significant in the fantasy game. The unique aspect of these games is that they are daily (or weekly depending on the sport), so almost any day of the year you can compete in a fantasy sports game. And because of the current state of gambling laws, fantasy sports is not considered gambling, so you can enter pay games and win real money.

That being said, I don’t play every day, but I follow football and hockey closely enough that I always think I can enter a roster and win. Guess what? I did, and more than once. Week 1 of the NFL season, I came in first place in one of the FanDuel football tournaments. That earned me enough funds to be able to enter other tournaments for the rest of the NFL season, and I ended up coming in first place in the Week 11 Qualifier of the Fantasy Football (FFFC) Championship. The prize for that was a trip to Las Vegas and an entry into a Championship Fantasy tournament with 11 other finalists. How cool is that?! (Here’s my interview with FanDuel.)

Well, I have got to say, that trip to Las Vegas and the entire FFFC Championship was one of the coolest experiences I have ever had. I got to bring my husband, and we left a day early, just to have a little more time to enjoy ourselves. The representatives from FanDuel were incredible hosts, and the other qualifiers and their guests were super nice.

The championship contest itself was nail-biting. I was in first place through a lot of the early afternoon games, but could tell I would probably slip in the late games. (All of my players, except 1, were playing in the early games.) Lucky for me, I didn’t slip that far. I finished the day in 3rd place!! Not bad for the only girl in the final. My NFL thanks go to Rob Gronkowski who had a big fantasy day and to the Green Bay coaching staff that pulled Aaron Rodgers in the 2nd half of the Packers game. (I didn’t have him and others did, so I would have slipped further.) And my jeers go to Michael Turner who had one of the best potential matchups of his season but had an awful game (stats-wise). I should have gone with MJD .

Enough football… This was such a fun event that I aspire to reach the finals again next year. And now I’m even more hooked on fantasy sports. I’ve noticed that more and more of these daily fantasy sites are popping up, and it seems to be a growing industry. If you’re into sports or fantasy sports, you might like to give this stuff a try.

Note that the default username for mysql in WampServer is root, and the default password is empty. Choose these and use the defaults for the host, database name, and tables prefix. Then click next:

Click continue:

If you didn’t install php extensions, like I didn’t, then you get this:

These extensions are easy to install from the WampServer menu:

Go to PHP > PHP extensions. Check the name of each missing extension, 1 at a time. (It appears that WampServer will restart after each install). When you’re done, go back to your browser and click Reload at the very bottom of the page.

Hopefully, you see this at the bottom of your page:

Click Continue. The system will chug away for a while. Be patient. If all goes well, you’ll get another screen looking something like this:

Click Continue. On the next page you simply fill in information about the admin account.

After that you fill in other settings for your site. And TADA, you are done:

Considering that I have never used WampServer or Moodle before, the fact that I could install this in about an hour is pretty impressive to me.

Review:

“Using the HTML5 Filesystem API” is a good read. It’s small, but concise, and the perfect size (for me). The book covers the evolving specification that is the HTML5 FileSystem API. It isn’t implemented in every browser but does have an implementation in Google Chrome. (The author works for Google.) The HTML5 FileSystem allows browser-based apps to read & write files and directories in a sand-boxed way on users’ computers. It’s sand-boxed in that my app can’t access the file system of your app. The book covers the usual file system “stuff”: how to create a file, how to write to a file, how to read from a file, etc. Most importantly, there are appropriate code samples, because anyone that’s looking into using this API will want to see/copy code.

Overall, for an API book, I think this is the perfect amount of detail and examples. But because there is also great, free information at HTML5Rocks.com, and because the specification is still evolving, I hesitate to say you should go out and buy this. If you need to learn the API soon, and you like learning from books, then by all means, this book is worth reading.

Overview:

Ch. 1 – Introduction – Introduces the HTML5 Filesystem API that is an evolving specification. The API is a way for web apps/sites to write to the user’s file system in a sand-boxed way.

Ch. 3 – Getting Started – Shows you how to start programming the FileSystem API. You use window.requestFileSystem(type, size, successCallback, opt_errorCallback) to get a local, sand-boxed file system for your app; and in some cases you need to use window.webkitStorageInfo.requestQuota(TYPE, SIZE, callback) to get permission to request the storage.

Ch. 4 – Working with Files – Create, read, write, remove, and import are all covered; as are ways to get files into your browser, via drag-drop, file input dialog, and XMLHttpRequest. Chrome, Safari 5, and Firefox 4 support dragging files from the desktop and dropping them on the browser.

Ch. 5 – Working with Directories – Like the chapter on files, this shows how to create, list, and remove directories.

Ch. 6 – Copying, Renaming, and Moving Entries – Files and directories both inherit from the same super class, Entry, so this chapter is about both files and directories. Folders are always copied recursively.

Ch. 7 – Using Files – Discusses how you can actually use files in your HTML5 apps. It goes over using them as

Filesystem URLs – filesystem:<ORIGIN>/<STORAGE_TYPE>/<FILENAME>

Blob URLs – blob:<ORIGIN>/<UNIQUE_RANDOM_STR>

Data URLs – data:<mimetype>[;base64],<data>

Ch. 8 – The Synchronous API – Some HTML5 FileSystem APIs have synchronous counterparts that can be used in Web Workers. This chapter goes over that and discusses the differences with the asynchronous API. There’s a useful example on how to download files in a web worker using XHR2.