Ask Ben: Using POI Utility To Move Excel Data Into A Database

I am soaking up your work with Excel spreadsheets. I am stuck on this question. I have followed your instructions for creating the multi sheet spreadsheet on meals and I can read the spreadsheet using your POIutility and I can dump out the array. I want to be able to use the data from the spreadsheet and write it into a database. You wrote "all values from the Excel are stored in the resultant ColdFusion queries as CF_SQL_VARCHAR values" - how do I access these?

Getting the data from an Excel using my POI Utility ColdFusion component is just a matter of understanding how the data gets read in from the Excel and how to access it once it is read in. The POI Utility reads each sheet of an Excel workbook into a "Sheet" object. All of these sheet objects are returned in a single array that looks like this (using my previous Meals example):

As you can see each index of the above array contains a single sheet object that looks like this:

Within that sheet object, there is the Query object (struct key: Query). This is the query object that contains the actual grid data from the Excel sheet. Now, taking that query object and using it to insert into a database could be done as follows:

<!--- Create a new instance of the POI utility. --->

<cfset objPOIUtility = CreateObject(

"component",

"POIUtility"

).Init()

/>

<!---

Get the path to our Excel document. Our Excel document

workbook contains three sheets with information regarding

three meals (Breakfast, lunch, and dinner).

--->

<cfset strFilePath = ExpandPath( "./meals.xls" ) />

<!---

Read the Excel document into an array of Sheet objects.

Each sheet object will contain the data in the Excel

sheet as well as some other property-type information.

We are telling the POI Utility to expect the first row

of the Excel document to function as a header row.

--->

<cfset arrSheets = objPOIUtility.ReadExcel(

FilePath = strFilePath,

HasHeaderRow = true

) />

<!---

We want to get the data from the Excel document into our

local database. The first thing we are going to want to

do is loop over each sheet and attach each set of

data individually.

--->

<cfloop

index="intSheet"

from="1"

to="#ArrayLen( arrSheets )#"

step="1">

<!---

Let's get a pointer to the current sheet object. We

could continue to refer to the sheet as an index of

the sheets array, but this is more convenient and

breaks it up into easier to read code.

--->

<cfset objSheet = arrSheets[ intSheet ] />

<!---

The data from the excel object is stored in a query

within this "sheet" object and can be accessed at the

key "query." For ease of use and short hand, let's

get a pointer to that query.

--->

<cfset qSheetData = objSheet.Query />

<!---

We can treat this query just like any old ColdFusion

query because it is just a plain old ColdFusion query.

Let's loop over it to get at each row.

NOTE: Since we told the POI Utility to use the first

row as a header row, the first row has already been

stripped off and returned as part of the sheet object.

We will not encounter it in THIS query.

--->

<cfloop query="qSheetData">

<!---

For ease of demonstration, let's get the values

out of the query that we want to use for our

database insert. This is step that is not required,

but helps clarify how things are working.

NOTE: Remember that the POI Utility auto names the

columns as it encounters them as COLUMN1, COLUMN2,

COLUMN3, .... etc.

We are throwing the second column (quantity) into a

Val() method call as we need to get it as a number.

The POI Utility reads in everything as a string.

--->

<cfset strFood = qSheetData.column1 />

<cfset flQuantity = Val( qSheetData.column2 ) />

<cfset strTastiness = qSheetData.column3 />

<!---

Now that we have the values we are going after, we

can insert them into our database.

--->

<cfquery name="qInsert" datasource="#App.DSN.Source#">

INSERT INTO food_diary

(

meal,

food,

quantity,

tastiness,

date_created

) VALUES (

<cfqueryparam value="#objSheet.Name#" cfsqltype="CF_SQL_VARCHAR" />,

<cfqueryparam value="#strFood#" cfsqltype="CF_SQL_VARCHAR" />,

<cfqueryparam value="#flQuantity#" cfsqltype="CF_SQL_FLOAT" />,

<cfqueryparam value="#strTastiness#" cfsqltype="CF_SQL_VARCHAR" />,

<cfqueryparam value="#Now()#" cfsqltype="CF_SQL_TIMESTAMP" />

);

</cfquery>

</cfloop>

</cfloop>

At this point, each record of each query of each sheet has been inserted into the local database. I hope that that helps. Please let me know if you have any further questions.

Reader Comments

I have to write something very similar to that today. It has to read a bunch of excel files and import just certain columns/rows into a database. Normally, I would do that in C# but since I've seen this post I'm now wondering if doing it in CF would be faster. Basically, I have to read a directory with about 3500 excel files and import just some data from each one. Any thoughts/suggestions?

I don't have much experience with scalability and speed of the POI system. However, if you know the exactly cell columns/rows that you need to extract, the POI has a great API for reaching right into the sheets and reading that data.

However, if you have a C# solution that is working and is fast, I guess go with that. But of course, POI has some fun learning :)

quick suggestion on your posts that contain code: maybe you can have a a link to generate a file with the code inside the code viewer so people can directly download the code you have posted. If you end up writing something of that sort, send it over (or post it), it would be useful to have for many people.

At the top of every code snippet there is a link to open the code up in a new window. The new window uses code-coloring (as best it can) and at the bottom there is a text area where you can copy-n-paste the code.

I of course do not advertise that, so unless you have seen it before, there is no way you would know that there is a cut-n-paste box at the bottom :) My bad.

I have seen what you are talking about and I have used it. My suggestion was to put another link next to the "Launch code in a new windows >>" link. When clicking on the new link, instead of opening a new window, pass the code snippet to a cf code that sticks it into a new file and prompts you to download the snippet. You know what I'm saying? Or is it just a stupid idea?

He, he...you are going a bit over board with the "brilliance" and "badass" stuff :-) Funny enough, I named my PC "badass" since I just upgraded it to Core 2 Duo E6600 with 2GB ram and Radion X1950XT (and the guys at work were making fun of me). Anyway, the idea came to me as part of the CodeSnippets web application I want to develop, so you wouldn't mind sharing what you write, that would be awesome!

Take a look at the comments that Josh G made. He was getting the same problem. I cannot remember what the solution was, but I think he realized that he was using an outdated version of the POI package or something.

I wish I could find his email back to me. Perhaps if you send me the data file, I can give it a go on my end and see what I can do.

This is great, I am able to run queries now, however, for some reason it is reading from all tabs when I try to output some data, basically I am not able to specify which sheetindex is should be reading from. I may not want all sheets read together.

The reason I want to do this is because my data columns may be in varying order and am trying to query 'o query the data to put things in back with the correct column names to efficiently insert the data into the db (one query, as opposed to a loop line by line). Here is my complete code . . .

I have downloaded the POI Utility and I am using the ReadExcel utility for importing data from an excel sheet. There is a column in the excel sheet which contains dates. When I import the rows the date values are getting converted in to a number. Is there any code change required from my end?

Reading dates is a funny thing. Really, dates are stored in the Excel file as a number (they are simply formatted to look like dates). ColdFusion should be able to handle these properly, you just have to format them:

#DateFormat( qExcel.date_column )#

I am working on trying improve the reading in of Excel files. I have been concentrating lately on the creating of Excel files, so I have been distracted.

If anyone can figure out how to pass the Euro currency symbol to Excel, please let me know. Yen and Pound go through perfectly, while the Euro gets turned into a funky character (¬ when I save the Excel as MHT).

I'm outputting my page with the poi tags to the screen by changing all <poi's to -poi and putting an abort at the end. The Euro is correctly getting passed to the utility.

To all the folks who like me wished the header row column names were used instead of column1, column2, etc. you can do something like the following at the beginning of looping through rows, to put the values you'll use into a struct.

I have some queries for storing excel data in database using apache POI,

1.how to get the cell contents in list2.how to store in batch wise3.what is I am storing the data and DB goes down4.some sample code to throw the exception saying record numbers to the user5.what if two users try to upload the same file at the same time in database