A data frame is like a table with rows and columns where columns don’t need need to be of same type.
A data series is like a vector (an array of values) where each value is itself a key value pair.
A very famous data series is a time series which is a vector of value with a key representing an instant in time and a value associated with it (it isn’t limited to this single property).

Although Deedle website contains very good tutorials, like the Deedle in 10 minutes tutorial, I still found it a bit hard to grasp.

So why would you need Deedle?

One of the reason why I think Deedle is interesting is that it makes the manipulation of data frame much more pleasant.
With Deedle,it is possible to goup by rows and more importantly on multiple levels.
It makes it possible to target a certain level of the frame (we will see more later).
It allows to change values from a row or values from a column easily.
It is also possible to add or drop columns which is very helpful to add label or category column and dump useless columns.
And lastely it is possible to take out the data and plot it into a graph.

I started to look at Deedle by making very basic operations.
Even after reading all the information on Deedle website, it took me few hours to put together the operations so I wanted to share that with you so that hopefully you would not take as long as I did.
In this post I played with my bank statements data.
I will explain some common operations which will help to understand how Deedle works:

Extract data from CSV and load into frame

Label data in a new column

Group by

Execute operation on grouped data frames

Print the data frames

When you will be done with this, you will be able to have a good understanding on how to use the library in your advantage to make more complex operations.
Let’s start by loading data from CSV.

I start first by getting all the csv files in /data folder.
Then I load each file in a Frame using Frame.ReadCsv.
Because my csv has no header, when the frame gets loaded it has generic column names Column 1, Column 2 and Column 3.
Therefore I use Frame.indexColsWith to specify my own column keys.
Then I collect all values and parse it to the correct type and then concatenate all values together in a single dataframe.

2. Label data in a new column

Adding a new column is very helpful. In my bank statement I have data like that:

All I really care is that it comes from Amazon. I don’t really care that it has a reference of X or Y.
So what I will do is to add a fourth column which will contain a label which represents the store.
For the two records it will be Amazon.
At the same time I want to have a fifth column which will represent the category.
Amazon will be under Online which stands for online purchases.

So how will we do that?
We will use a regex pattern to match all titles and derive a label and category out of it.

As an example, I have added two categories and two labels. When we pass a title with a default Other category, it will try to match any of the regex patterns and return the appropriate pair.
If nothing match, it will just return the value which was passed without changing anything.

So if we pass INT'L YYYYYYYYYY Amazon UK Retail AMAZON.CO.UK we will get (AMAZON, Online).
If we pass ALDI GR KENT we will get (ALDI, Supermarket).

Then using that pair we can create the two new columns for the data frame.

We use AddColumn to append a column to the current frame.
Frame.getCol takes a column key and returns the whole column as a data series.
We use the whole title column to build a label and category column by mapping over all values using Series.mapValues.

3. Group by

The most common operation when dealing with data frame is to group data.
For expenses it is always useful to get the frame grouped by month so that we can either display it nicely or calculate the sum of all the expenses for each month.

By using Frame.groupRowsUsing, we can group by the result of a function.
Here we get the date and extract the month of each date to group the frame on it.
Since the data frame was previously indexed by a default id, we end up with a data frame with two identifiers: the month and the default id.

4. Execute operation on grouped data frames

Usually, we group data to execute operation on the groups.
Calculate a sum, an average or just the total count of elements in each group.
With Deedle we can execute operations on grouped data frames and even better, we can target a particular level to apply an operation.
Taking the previous example, we will get the sum of all category per month.

After grouping the values, the row key becomes int * (string * int) which is month * (category * default_id).
We then map over all row keys and flatten it to int * string * int with Pair.flatten3.
Because a data frame is composed by columns of different types, if we want to apply an operation like a sum, we need to get only the columns which are numerics.
We do that using Frame.getNumericCols. We then apply the sum on a particular level by giving Stats.levelSum Pair.get1And2Of3. It means that we take the first key the month and the second key category. Stats.LevelSum applies the sum to the values of the same category.

Date and Category being non numeric, their values is missing. For the amount, we get the sum for each combination month * category.

5. Print the data frames

When grouping, we get a data frame with a second identifier.
An interesting function to work with grouped data frame is Frame.nest.
Taking a data frame indexed multiple times, Frame.nest will return a data series with the first identifier as key and the nested data frame as values.
We can use that to printfn a nicely formatted data frame.

Nice isn’t it?
I have created other functions that I find useful and it is accessible from my github here.

Conclusion

Today we saw how to get started with Deedle by doing some very simple data frame manipulation.
Extract the data from csv, add column to label the data, group the data, execute operation on the data frame and finally print the result nicely.
I hope you enjoyed reading this post as much I enjoyed writing it and as always, if you have any comments
feel free to leave those here or hit me on Twitter @Kimserey_Lam. See you next time!

Thanks for reading my post Rei! If you are asking yourself about why should you use Deedle, it is probably because you actually do not need it.In my case, I needed a way to fill missing values, transpose data, work with nested groups, labelling and perform statistics like moving sum, average, standard deviations.It is surely possible to do all this with maps but to me the trade off between doing it myself and learning Deedle was a easy win to go for Deedle.

Post a Comment

Popular posts from this blog

Absolute layout and relative layout Xamarin formsIn Xamarin Forms, layouts are used to position elements in the page.
There are multiple layouts available like stack layout, grid layout, table layout, absolute layout and relative layout.
Stack layout is straight to grasp the concept, we stack elements one by one horizontally or vertically.
Grid layout is also straight forward, we define rows and columns and we place the elements by indicating the row index and column index.
On the other hand, absolute layout and relative layout isn’t that straight forward because in Xamarin it is NOT the same as positions in CSS.So today, we will see how and when we can use absolute layout and/or relative layout in our advantage.
This post is composed by two parts: 1. Absolute layout
2. Relative layout
1. Absolute layoutThe first thing to understand is that absolute layout has nothing to do with CSS absolute position.
In Xamarin Forms. absolute layout allows us to position elements in the page by pre…

Build an accordion view in Xamarin.FormsFew weeks ago I posted about absolute and relative layouts.
Layouts are called Layouts because they contain children which are placed in a particular way.
Xamarin.Forms has a lot of layouts and views to structure pages like grid, table view or list view.Today I would like to show you how we can use some of these basic views to build an Accordion view.Here’s a preview of the Accordion view:Full source code available on GitHub - https://github.com/Kimserey/AccordionView/blob/master/Accordion/AccordionView.csThis post will be composed of four steps:Create a BindablePropertyDefine the accordion expandable sectionDefine the accordion viewUsage sample1. Create a BindablePropertyAs we saw in one of my previous post,
Xamarin.Forms works around data bindings.
View properties are bound to viewmodel properties.Default views like Label, Button, ListView or TableView come with the necessary bindable properties like BackgroundColor, TextColor, ItemsSource, It…

Use Font Awesome from your Xamarin.Forms projectIcons are important in mobile applications where space is limited.
We use icon everywhere to convey action intent like + would be to add an item or a bin would be to delete one.
There are two ways to add icons to our mobile app:with imageswith fontsToday we will see the second option - how we can add Font awesome to our Xamarin.Android project and how we can use it from Xamarin.Forms.
This post will be composed by three parts: 1. Why Font Awesome
2. Add Font Awesome to the Droid project
3. Use with Xamarin.Forms
1. Why Font AwesomeIf you are familiar with Web development, you must have encountered Font awesome http://fontawesome.io/.
It is an icon font containing lots of icon for every type of usage.
The advantaged of using a font is that we can leverage the text options, like text color or text size, to easily update the style of the icon to match our app.Let’s see how we can bring Font Awesome to the Droid project.2. Add Font awesome…