Matlab users are used to xlsread and xlswrite, functions that can only read data from, or write data to, one sheet in a spreadsheet file at a time. For each operation, xlsread and xlswrite first have to read the entire spreadsheet file, for write operations xlswrite also has to finally write it out completely to disk.
There are faster ways, but then you'll have to dive into ActiveX/COM/VisualBasic programming.

If you want to move multiple pieces of data to/from a spreadsheet file, the io package offers a much more versatile scheme:

First open the spreadsheet file using xlsopen (for Excel or gnumeric files) or odsopen (.ods or .gnumeric).

NOTE: the output of these functions is a file pointer handle that you should treat carefully!

Mixing read and write operations in any order is permitted (the only exception: not with the JXL -JExcelAPI- interface).
The same goes for odsopen-ods2oct-oct2ods-odsclose sequences.

Obviously this is much more flexible (and FASTER) than xlsread and xlswrite. In fact, Octave's io package xlsread is a mere wrapper for an xlsopen-xls2oct-parsecell-xlsclose sequence. Similarly for xlswrite, odsread, and odswrite.

The io package comes with different interfaces to read/write different file formats.

COM

This (interface) is only available on MS Windows and with an MS Office installation.

[POI, POI/OOXML, JXL, OXS, UNO, OTK, JOD]

These are java-based interfaces. They are generally slower than Octave's native OCT interface; OTOH they offer more flexibility. Generally the OCT interface offers sufficient flexibility and speed.

OCT

This is the new impressive and fast (mostly written in Octave itself! + two C files to bypass bottlenecks) interface which presently supports .xlsx, .ods and .gnumeric files.

(Note that .ods is a complicated file format with many gotchas that doesn't lend itself for fast file I/O. So unfortunately the fastest .ods interface is the Java-based jOpenDocument (JOD) (luckily it is GPL). However if speed is not an issue or if you hate Java, the OCT interface still performs fast enough.)

So, if you want to read/write .xlsx files, you'll only need the io-package >=2.2.0.

But if you have to read/write .xls files, you'll need either

MS Windows with MS Office backings - or

Octave built with --enable-java, + a Java JRE or -JDK, and one or more of the Java interfaces (i.e., the class libs)!

If you want to read/write .gnumeric files, the OCT interface is even the only option.

For some rarely used file formats you'll need LibreOffice + Octave built with Java enabled + a Java JRE or -JDK. But OK, once there you can enjoy formats then like Unified Office Format, Data Interchange Format, SYLK, OpenDocument Flat XML, the old OpenOffice.org .sxc format and some others you may have heard of ;-)

Generally, if you use a Java-based interface for spreadsheet I/O, it doesn't matter much whether you use Octave 32-bit or Octave 64-bit.
However, the UNO interface (for LibreOffice & OpenOffice.org) is more pertinent: 64-bit Octave can only use a 64-bit LibreOffice with UNO, same for 32 bit.

(On Windows, the io package tries to automatically find all required Java class libs and LibreOffice. To help it, put the Java class libs in you user profile (home directory) in a subdir "java", e.g., C:\Users\Eddy\java. chk_spreadsheet_support searches that location automagically.
On Linux this automatic searching has been disabled as the io package took ages (well, minutes) to load...)

chk_spreadsheet_support.m — internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.

The following are support files called by the scripts and not meant for direct invocation by users:

Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)

For Linux:

Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)

For ODS access, you'll need to choose at least one of the following java class files collections:

(currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:

jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.3 (final) is the most recent one and recommended for Octave).

OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org or www.libreoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.

These must be referenced with full pathnames in your javaclasspath.
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the Java classpath.

odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.

odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.

odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.

Same reasoning goes for odswrite.

Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.

Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.

If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.

(When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.

In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:

options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.

Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....

The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.

Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).

OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.

MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year). (Why mention MS-Excel here? See below:)

Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.

So: you should carefully check what happens to date cells.

As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.

While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.

The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.

For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.

The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:

After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.

Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc due to e.g., invalid formulas, may have a 0 (zero) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).

Smaller gotcha's :

while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).

NOT fixed in jOpenDocument version 1.2 & 1.3 final:

jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.

The OCT interface (present as of io-1.2.4) offers read support for ODS 1.2, complete with all the options of ODFtoolkit and UNO, but fairly slow.

The OTK interface (ODFtoolkit) is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.

While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.

The JOD (jOpenDocument) interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.

However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.

The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:

admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;

it has built-in formula validator and evaluator;

it has a much more reliable data parser;

it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.

it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM.

Some hints for troubleshooting ODS support are given here.
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments (, 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.

Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under #Required support software what classes should be mentioned.

If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().

Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.

Try opening an ods file:

ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.

ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.

For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:

unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;

juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;

The subdirectory program/ (where soffice[.exe] (or ooffice) resides).

The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.

As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.

Suggestions for future development:

Reliable and easy ODS write support (maybe when jOpenDocument is more mature)

Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO is MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.

Passing function handle a la Matlab's xlsread

Adding styles (borders, cell lay-out, font, etc.)

Some notes on the choice for Java:

It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.

A BIG advantage is that a Java-based solution is platform-independent (portable).

But Java is known to be not very conservative with resources, especially not when processing XML-based formats.

So Java is a compromise between portability and rapid development time versus capacity (and speed).
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.

I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((
In addition processing ODS files became significantly slower (up to 7 times!).

End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet. Version 0.8.9 introduced an undocumented dependency on some obscure Java class lib - I think due to a bit sloppy development procedures. Anyway I couldn't get it to work.
So at the moment (Summer 2013 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.

xlsopen.m — Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.

xls2oct.m — Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.

oct2xls.m — Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.

xlsclose.m — Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.

parsecell.m — Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.

chk_spreadsheet_support.m — Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.

spsh_chkrange.m, spsh_prstype.m, getusedrange.m, calccelladdress.m, parse_sp_range.m — Support files called by the scripts and not meant for direct invocation by users.

These class libs must be referenced with full pathnames in your javaclasspath.

They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.

UNO specific (invoking OpenOffice.org (or clones) behind the scenes):

NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath.

xlsfinfo can be used for finding out what worksheet names exist in the file. For OOXML files you can do with the OCT interface (specify "oct" for the REQINTF parameter). For other Excel file types you need MS-Excel for Windows (or later version) and the windows package (specify "com" for REQINTF), and/or Apache POI and Java support (then the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed).

Invoking xlsopen/..../xlsclose directly provides for much more flexibility, speed, and robustness than xlsread / xlswrite. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.

Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').

When using xlsopen/.../xlsclose be sure to keep track of the file handle struct.

A possible scenario:

xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])
# Set rw to 1 if you want to write to a workbook immediately.
# In that case the check for file existence is skipped and
# -if needed- a new workbook created.
# If you really want an other interface than auto-selected
# by xlsopen you can request that. But xlsopen still checks
# proper support for your choice.
# Read some data
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)
# Be sure to specify xlh as output argument as xls2oct keeps
# track of changes and the need to write the workbook to disk
# in the xlhstruct. And the origin range is conveyed through
# the xlh pointer struct.
# Separate data into numeric and text data
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)
# Get more data from another worksheet in the same workbook
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)
# <... Analysis and preparation of new data in cell array Newdata....>
# Add new data to spreadsheet
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)
# Close the workbook and write it to disk; then clear the handle
xlh = xlsclose (xlh)
clear xlh

When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.

When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI.

options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.

Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.

While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.

Octave Forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:

it relies on cached range values and thus may be out-of-date;

it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.

When using the Java interface, reading and writing xls-files by Octave Forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....

When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.

Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)

Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.

If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; Octave Forge's xlswrite doesn't.

xlsfinfo

When invoking Excel/COM interface, Octave Forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).

Using Excel itself (through COM / ActiveX on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.

JExcelAPI (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in Octave Forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.

Apache POI (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.

OpenXLS (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).

UNO (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.

All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)

OCT offers read support for OOXML files (.xlsx) only, but it is by far the fastest read option; faster than Excel itself.

Some notes on the choice for Java:

It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.

A BIG advantage is that a Java-based solution is platform-independent ("portable").

But Java is known to be not very conservative with resources, especially not when processing XML-based formats.

So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.

Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.

Check if COM / ActiveXworks (only under Windows OS). Do a pkg list and see:

In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.

Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.

If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().

Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.

Try opening an xls file:

xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.

xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.

xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.

Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?

Since io package version 1.2.4, an interface called "OCT" was added. Except for unzip, it has no dependencies. It's still experimental but fast! Feel free to test it and give us a feedback.
Currently it supports reading and writing .xlsx, .ods and .gnumeric files (the latter in yet-to-be-released io-2.2.2).
If

chk_spreadsheet_support == 0

it's used automatically (default interface). Otherwise you can force the usage like