Document automation in the 21st century

MS Office and Ruby

Jan 21st, 2013

Why do it?

Why use Ruby and not VBA? That’s actually a good question. VBA is the obvious incumbent… built-in to Office and easily accessible without the necessity to install any other tools/programming languages. However, there are a number of reasons why VBA is just so much harder to use once your requirements get a little trickier. Let’s go through some of these:

Regular Expression Support

When was the last time that you tried using regular expressions in VBA? Sure it is possible but it requires that you first include references to the Microsoft VBScript Regular Expressions 5.5 library in your project. Once you do this you can instantiate a new RegExp object and test various text strings against this object looking for a match.

This is what this would look like in code:

123456789

DimrxAsRegExpDimtest_stringAsStringSetrx=NewRegExprx.Pattern="<<Pattern goes here>>"test_string="<<String that you want to filter on based on matches>>"Ifrx.Test(test_string)Thendebug.Printtest_stringEndIf

The equivalent Ruby code would be something like this:

1

putstest_stringif/Pattern/.matchtest_string

So clearly Ruby wins out in terms of simplicity, its Perl heritage shining through as a killer tool in doing text manipulation.

No String Interpolation

There just isn’t any. Period. Multiline text strings are a pain also. You end up building strings by doing silly things such as:

12345

DimsqlAsStringDimlist_of_idsAsVariantlist_of_ids=Array(2,12,44,51)sql="SELECT FROM "&table_name&" WHERE "&vbCrLf&_"ID IN ("&IDs_as_comma_separated_string(list_of_ids)&")"

Did you notice that we would also need a separate function to return the IDs in the Array as a comma separated string? But that’s a matter for another discussion! Did you also notice the underscore + space above ? It is a line continuation trick in VBA. Nifty, but not so nifty when you consider that you could have done this in ruby:

No separate functions needed to join the arrays with a comma+space, nice multi-line strings with the HEREDOC style string declaration and complex string interpolation to boot.

Ruby wins again!!!

Hard to Use Data Types

I don’t know about you but arrays are hard to use in VBA. Once they are initialised dynamically and then set to a specific size and you need to change the size of the array doing things like Redim Preserve or Redim (if you don’t care about losing the data stored in the array up to that point).

12345678910111213

Dimarr()AsString' <--- initialiseDimiAsLongReDimarr(20)AsString'<--- ReDim to a certain size' How many will I need? ConstSomeLargeValueLargerThanArrayBounds=2000Fori=1ToSomeLargeValueLargerThanArrayBoundsIfi>UBound(arr)Then'<---ReDimPreservearr(i)AsString'<--- Three LOC to manage this EndIf'<---arr(i)="String "&iNexti

Whilst in Ruby, the following would apply:

1234

arr=[]some_iterable.each_with_indexdo|iter,i|arr<<"String #{i}"end

Have you tried to sort an array recently within VBA? This is even messier! You need to bring your own BubbleSort function (or your favourite sorting algorithm) and don’t even try sorting on more than one criteria, unless you are a sucker for punishment!

There are Collections, but they are fairly complicated to use compared to what you get out of them. What about hashes or dictionaries? Again, it is possible to use these by including the Microsoft Scripting Runtime library in your project. However it still feels clumsy and overly verbose compared to Ruby.

then later, assuming that we just want to return a dictionary containing the matches as keys with the length of the strings as values

1234

IfNothash.Exists(thing)Thenhash.AddKey:=thing,_Item:=Len(thing)EndIf

Notice also that you can’t have hash.Add(Key:=thing, Item:=Len(thing) with the parenthesis!!! VBA is very confusing when it comes to the use of parenthesis/brackets around the argument list. For example, when calling a function which does not return a value, we do not use brackets around the argument list. However, if we were returning a value that is being assigned to another variable then we must use brackets around the argument list. When calling a subroutine we don’t use parenthesis, unless we are calling the subroutine by using an explicit Call. The .Add instance method of a Dictionary isn’t returning a value so it must not use brackets around the argument list. Confused yet?!?

Thankfully, looping through the Dictionary to return the keys and values is pretty straight forward:

hash={}# Loop through a whole bunch of code and update thinghash[thing]=thing.sizeunlesshash.include?(thing)# keep looping until donehash.each{|my_key,my_val|puts"#{my_key}\t#{my_val}"}

Dictionaries (called hashes by Rubyists) are great in Ruby!!! In Ruby 1.9 they even maintain the order of the elements in the hash (which will come in handy when we automate the generation of a Gant Chart from a YAML file in an up and coming post).

Arrays are also a pleasure to use in Ruby! You can dynamically add and remove elements from the array without having to write supporting boilerplate code (as you do in VBA with Redim Preserve). You can also mix data types! Then there are sets which allow for the quick implementation of algorithms returning unique values from an array. It is straight forward to sort elements in the array and it is not hard to do so for multiple array elements. It is possible to reverse, join and do all sorts of useful things to arrays in Ruby, either in-place or to a copy of the original array. This is an area where Ruby clearly wins hands down! Ruby code is much more succint and expressive than VBA and working with data structures (even relatively complex ones) is made easy!

Difficult Code Re-use

In VBA code within MS Office, code re-use is facilitated by the use of Templates, Add-Ins, etc. However, quite often, when one isn’t specifically working on building templates and Add-Ins, one-off macros tend to reside in the same spreadsheet where you needed to implement them in order to meet that important deadline. Trying to figure out where these were stored some time later in the future is a struggle, unless you are very methodical and suffer a little from obsessive compulsive disorder (OCD)!

In Ruby, it is easy enough to generate modules or classes in order to re-use code. Using code generated by the Ruby community is easy with the use of gems. Installing gems is dead easy wiht the “gem install” command. However, it would be a good idea to ensure that the scripts are kept in the same folder as the native Office document that is associated with them, unless the script was a throw-away script that isn’t going to be required some time later. Even then, there is an option to search through the scripts as these are plain text files. Naturally, it is a bit harder to do the same against, say, an Excel binary or even against a hierarchy of XML files in the newer versions of Office.

Typical Example

Let’s apply this to the following example. Suppose that I have a workbook with two worksheets, Fruits and Results; one containing a list of Fruits and another containing the results of running a regular expression filter on those fruits that match a specific criterion. Suppose that we are specifically looking for fruits that start with B and end in y.

This is a contrived example as one wouldn’t even need to use Excel to implement a solution to this, but please just go along with me for the time being…

Based on what we covered above, the very first step in writing the macro solution to this problem would be to include the specific references to the two libraries required:

OptionExplicit'------------------------------------PublicSubConTheFruiterer()DimxlsAsExcel.WorksheetDimEOFAsLongDimhashAsScripting.DictionaryDimrxAsRegExpDimtest_stringAsStringDimiAsLongSethash=NewScripting.Dictionaryhash.CompareMode=TextCompareSetrx=NewRegExprx.Pattern="^B.*y$"Setxls=ThisWorkbook.Worksheets("Fruits")EOF=xls.Range("B1048576").End(xlUp).RowFori=1ToEOFtest_string=xls.Range("B"&i).ValueIfrx.Test(test_string)ThenIfNothash.Exists(test_string)Thenhash.AddKey:=test_string,_Item:=Len(test_string)EndIfEndIfNextiDimmy_valAsVariantDimrow_idAsLongrow_id=0Setxls=ThisWorkbook.Worksheets("Results")' Clear the results of a previous runxls.Range("A1").CurrentRegion.ClearForEachmy_valInhash.Keysrow_id=row_id+1xls.Range("A"&row_id).Value=my_valxls.Range("B"&row_id).Value=hash(my_val)Nextmy_valEndSub

Everything above is pretty standard apart from the method that we employed to find the last used row in the ‘Fruits’ worksheet:

1

EOF=xls.Range("B1048576").End(xlUp).Row

This is the canonical MS way of finding the last used row and it is similar to the way that we do it directly from Excel. Think of it as navigating to the bottom of Column B and with the cursor in that position, doing a control up-arrow to get to the last used row. This is exactly what we will be doing in our Ruby solution as well.

When run, we end up with the following:

The Ruby Way

For all its expressiveness and built-in text wrangling tools, writing scripts in Ruby to manipulate Office documents does have some disadvantages. Let’s quickly go through these:

No VBE to develop the code in

Actually, I use Sublime Text 2 and prefer to use it any day compared to the Code Editor IDE built-in to Office applications (i.e. the VBE). However, you will be without code completion and will have to keep two or more separate files for the solution, which sort of brings me to the next point…

Not Integrated

You won’t be able to make use of the Form widgets available in Office to build custom applications. Hence, using Ruby lends itself to applications where some high powered analysis of the data in the worksheet is required or where you will run a script to auto-generate a document, powerpoint or project schedule from a number of source files. These are the sort of applications that we will cover at length in this forum.

Impedance Mismatch

You will need to think in Ruby but, at the same time, be constrained by the OLE ecosystem. Your code won’t look like pure Ruby code with pure snake_case variable declarations and method definitions. The OLE methods will be CamelCased. Rather than starting coding your solution straight away, there will be some extra steps associated with calling the win32ole library, instantiating the appropriate Office Application, etc. Fortunately, Ruby syntax is not so dissonant from VBA syntax as to make the whole premise unviable. For instance, instance and class methods use the dot notation and named parameters in VBA can be easily emulated with Ruby 1.9’s new hash notation.

Let’s look at this VBA snippet to add a new task to a project schedule in MS Project:

VBA Constants are harder to use

Actually, they are but only a little if you use this technique…

Start off by creating a class to hold the constants

Load the constants in the class

Use the constant

Note that you will need to reference the constant with reference to the class. The constant will need to be capitalised in order to work in with Ruby (where all constants are capitalised). Hence xlUp becomes ExcelConst::XlUp.

123456

# N.B.: app previously set to the Excel ApplicationclassExcelConstendWIN32OLE.const_load(app,ExcelConst)eof=sht.Range("C65536").End(ExcelConst::XlUp).row

Ok, so let’s translate the above VBA macro into Ruby. We’ll start with the standard boilerplate which is going to be part of each Ruby script accessing MS Excel via OLE.

Line 7: the use of a block to iterate through the rows in the (1 .. eof).each bit

Line 9: the baked-in use of regular expressions in /pattern/.match test_string

Line 10: the use of the keyword unless in place of if not

Line 10: the fact that unless can be used as a statement modifier

Lines 16 to 20: the ease of iteration over the contents of the hash

It should be said that with this very trivial example, the line count of both solutions is roughly the same. This is due, however, to the fact that we needed to deal with the WIN32OLE boilerplate code in the ruby code. The actual code associated with the business logic is much more succint in the Ruby version. With more complex reports, you will probably find that the Ruby version will be significantly shorter than the VBA version.

In summary, VBA provides some very good tools for customising and extending Ms Office applications. However, by chosing to implement certain solutions in a standalone text editor using scripting languages such as Ruby, one tends to be much more succint and expressive in problem domains such as text manipulation, albeit, at the loss of facilities such as code completion, syntax highlighting, etc. By using a good text editor such as Sublime Text 2 (or EmEditor, Notepad++, etc.) and the techniques demonstrated above, it is possible to minimise the impact of losing the native IDE, whilst leveraging off the strengths or Ruby such as a wide collection of re-usable code packaged as gems and baked in features such as regular expression, hashes, sets, etc.

That’s all for now. I hope that you have enjoyed my very first blog and please keep coming back for some more updates…