Find Us

F# Excel Type Provider

There comes a time in the life of every developer where he has to manipulate getting data from an Excel file. Either because migrating the information to another system is needed, or just for a preprocessing to conform with a defined format, for example, a Power BI data source.

Suppose we have to migrate data from an Excel file, with an identified format, to a database of a system we are building. The main problem is to browse the file and get the relevant data for our new system, which can also be calculated. Let’s also assume that the classes that are responsible for saving the information in our database are already implemented, once it is obtained.

An implementation of our model in C # would be as follows:

On the other hand, Excel files come from another system, which contains a lot of information; much more than we need.

If we read these files using C#, we should use a library. The most basic would be Microsoft.Office.Interop or you can use a third party one such as Syncfusion. The disadvantage of the first option is that it is extremely difficult to use such libraries because they are not code-managed and if the developer is not extremely careful there might be memory problems. Not to mention the fact that including it as a reference in a project can become extremely awkward. On the other hand, higher-level libraries are much easier to use, but many of them are expensive. Finally, none of them will give the developer the capacity to get Intellisense against the file itself. This can make the developer consume a huge amount of time.

It is here where F# and its Type Providers gain access. F# is a .NET platform language with functional characteristics, although it also has the possibility of being object oriented. In addition, it has a unique feature: a developer can implement a very special type of library called Type Provider. These libraries allow, among other things, to connect to different data sources such as Excel, CSV, JSON, etc. and allow them to be manipulated using Intellisense on the content of these data sources.

In our case, a probable Excel file whose data we would like to migrate can have this form:

The code used by the provider is extremely short, understandable and efficient. What is done is to take a sample Excel at development time and assume that all files to be processed will have that structure.

In line 9, the type of Excel to be processed is defined. This is done from a sample file, which should be one with the same “outline” as those to which the process will be applied. In line 11 is defined a function in charge of making the transformation between an Excel row and an object of our domain, in this case Product. There are several things to note about this method. The first is that we know, for each column exactly what type it is. Which it is very powerful to avoid programming errors. The second is that in F # you can have property names (also variables, and any names in general) composed and separated by spaces. For this, double quotes ““” are used.

The second function seems complex but is actually very simple. On line 23 instantiates an object of type “ExcelType” using an Excel file. Here what will happen is that it will try to use the Excel structure that defines the type, in this case “typeOfExcel.xlsx” but with the data that is inside the file “simple.xlsx”. Now comes what may seem a bit strange, but that is actually very simple and powerful. The F# “|>” operator takes the value returned by the previous expression and passes it as a parameter to the next function. In other words, the value of “file.Data” is first taken and passed to the function of line 26. This is responsible for filtering the rows that are empty (this can happen if an extra Enter was specified in the file next to the last row with a value). Then, the result, i.e. the filtered rows, is passed as a parameter to the function of line 27, which uses the function defined in line 11 to convert each row of Excel in a Product instance. What the last line does is return these products. In F # the return keyword is not necessary since it assumes that a function returns the last expression of its body, which in this case is “value”.

To use the power of this language, it is not even necessary to create a project. Visual Studio has a window that allows you to test and execute F # code interactively, so you only need access to the ExcelProvider library to use this with Excels on a daily basis.

The code that shows the example described is found in the following Github Repository.