Just use an ordinary Table input step. Everybody that has used PDI before knows that the Table input step retrieves data based on an SQL statement. However, the SQL statement is not required to be a SELECT statement. Any statement that returns a result set will do, including SHOW statements, and CALL.

In case of your reporting application - I can only guess...What product did you experience this with? I think this functionality was recently added to Pentaho Reporting (JFreeReport).

I do know that, at least in the MySQL case, the application must use the prepared statement protocol when executing the procedure, else you won't be able to capture the result set.

Note that in the case of pentaho data integration, it is in fact supported to call the stored procedure directly by explicitly typing the CALL statement in the "Table input" step. The only thing that is odd here is the name of the step, "Table input", because it does not literally read rows from a table, but an SQL statement in general.

This leaves the question: "Why doesn't the existing Call DB Procedure step support this", or "Why isn't there a step especially for getting result sets out of stored procedures (or table functions for that matter)".

I can only guess but I think the answer is that in case of the "Call DB Procedure", the design considerations were simply to allow a pdi input stream to drive stored procedure execution and to obtain the return value, and possibly OUT parameter values - no more no less.

I think that there is no technical reason not to support it - the JDBC API allows for resultsets to be returned from prepared statements, so if the Call DB Procedure uses that it should be able to capture a result set (or even multiple, although that would open a whole can of worms on its own).

That said, for a data integration tool like pdi, I do not think it is terribly important to be able to capture SP result sets, more of a nice to have thing. But maybe I'm too short-sighted here. Please feel free to continue this discussion, and maybe some of the PDI developers will chime on to provide their point of view.

Thanks for your elaborate response.I'm ashamed to say I don;t remember the name of the application. I had it on a customer's site.

I had a good reason to use a stored procedure instead of a basic query. There was a problematic, long running LEFT JOIN query, which we managed to significantly reduce using two INNER JOINs UNION, and two truncated temporary tables. This process required a stored procedure. All worked well till we realized the reporting tool did not support SPs. Eventually, we used another solution, which was not as fast as the SP one.

As a Java programmer, I know just how easy it is to get results from a SP: almost as easy as running a query. You just need to know in advance that it is indeed a SP you're calling. And if you're not even passing parameters, it's even easier.

thanks for the kind words. I am pretty sure you'll hear more about it. BTW - looking forward to meeting you again at the MySQL UC. I'm sure we can find some illegal roulette racket somewhere to spill some chips ;)

Maybe you can checkout Pentaho Reporting in that case (aka JFreeReport) - Like I said I think they added it. I believe BIRT also supports it.

That said, what you are describing sounds like a pretty complex thing to do, just to run a report. My gut feeling says that it would probably be better to set up an environment that is designed to cater to queries, and that uses some after hours to pre-process things so you know you can always efficiently get the answers whenever you need them.

I realize It might not always be possible and some customers are truly scared by the term "Data Warehouse" but really, I am feeling more and more inclined to advise against one-by-one copies of the source systems as "reporting environments" in favor of a proper, star schema - based data warehouse. This would in almost all cases eliminate the need for explicit temporary tables and outer joins at all

After some searching, I came across your recommendation. The problem I'm finding is that PDI does not detect the column names returned (as a resultset) from the call, so I am getting some rather dire errors. I'm sending the results of the stored procedure call to a stream lookup, which may be my problem. Not sure if you've encountered this, but it's a potential downside to this solution.

You must have a lot better luck than I do. I'm returning a dataset out of my stored procedure using SELECT, sending that to a Select Values, where I try to set the data types of the two columns being returned from the SP, and sending that to the Stream Lookup, but am getting some null pointer errors from the Stream Lookup step It's not a deal-breaker, as I have a workaround in place, it's just frustrating.

Wow, ignore that, I was making a very stupid mistake. I seem to have the same luck.

Using a Select Values did the trick. It still gives errors if I try to "get fields" or "get input/output fields" anywhere downstream from the Stream Lookup, but it appears that the Stream Lookup is working perfectly.

Hey Roland, just got your book...congrats and a good read! I have a transformation I'd like your take on from a "best practice" perspective. I'm thinking there's multiple ways of doing this, so I'd like to understand what the most efficient is.

Scenario: table input with a field that I need to make consistent. Lets say the values I can receive from this field are as follows:1. example - A1012. example - A2023. example - B402

I'd like to create a transformation that would catch 1 and 2 and output as "Jane Smith". 3 would output as "John Doe".

What is the "Pentaho way" of doing this? SQL case statement? Javascript step? built in transformation step?

thanks for your interest and support! I always enjoy hearing the book is useful to someone.

Regarding your scenario: Although you could solve it in the SQL statement in the input step using a CASE...WHEN expression, I would advise against that. Best practice is to make all transformations to the data as explicit as possible, and writing SQL expressions hides the logic.

In pentaho data integration, I would most probably use a "Value Mapper" step for this case. The value mapper operates on a field in the input stream and then looks if it matches one of the listed "source" values and then returns the mapped "target" value if there is a match. In this particular case, you would have the following source/target pairs: {A101, Jane Smith}, {A202, Jane Smith}, {B402, John Doe}. You can find the value mapper in the "Transform" category. It is first discussed in the book on page 273, "The Value Mapper Step".

I should point out that the value mapper is the way to go if you "just" want to map a few values. If the mapping is actually stored in a database table, you should use a database lookup step. Especially if you need to manage the mapping over time (and you want to keep track of the histoy of your mapping practice) you could start to explictly store the mappings in your staging arera in a lookup table, and use the lookup step to do it.

Now about your suggestions:

You could do it with a javascript step too, but then again: you can do everything with a script step. Like writing SQL expressions in the input stetp, a java script step obscures the logic, and you should only use that step as a last resort. Also, the javascript step is quite slow.

There is also a switch/case step (in the "Flow" category") but you would use that in case you actually wanted to apply a different treatment on the rows depending on a field value.

If you need to use stored procedures to result data sets, and include paramaters for ad hoc reporting, you can use a product called JEvolution from www.mcmsoftware.com see the following video: http://www.mcmsoftware.com/docs/jevolfromstp.htm

Visual Studio and SQL Server management Studio are two different entities. So it takes a lot of time when we try an interaction with the database from Visual Studio then it takes a lot of time.

When we are using desktop applications then we do not realize this time delay, but while using web applications the time delay becomes an important factor because the database is on the remote server so a huge time delay is observed.

Normally, the query that we write in Visual Studio is first compiled and then executed in Sql Server which then returns the results. So, this is why a time delay is observed.

The answer to this is Stored Procedure usage. The Stored Procedure is stored in compiled format in SQL server so when we call it from Visual Studio application then it is just executed. Thus saving a lot of time.

An illustration of proper usage of STORED PROCEDURE has been displayed in the following video. You may see it:

I see its been a while since this thread has been active. I'm also trying to execute a SP within Kettle (Spoon 4.1).

I used your tip on using the Table input and calling the SP which works. Although.. My table gets filled with data but the transformation returns an error: 'Couldn't get row from resultset'. Maybe you already explained it in this thread, but I don't know how to make my transformation not returning the error.

Perhaps you know a solution? P.s. Your book is awesome! I got it as well since last month.

Do you know of a way to select from a stored procedure in MySQL? For example, a hypothetical syntax might be:

SELECT s.* FROM db.sp_my_stored_procedure(@arg1,@arg2) s;

This would allow me to also join the stored procedure result set to other tables - handy. I've worked around this by having my stored procedures create temporary & permanent tables, but that's pretty clunky.

unfortunately, this is not possible. There must be a feature request somewhere in bugs.mysql.com, I've seen many people request this feature.

I have a suggestion that might improve the workaround you're using at the moment.

Instead of creating a (temporary) table, you can have your stored procedure create a view. The advantage of that it doesn't cost storage, and typically your generating procedure returns faster because it doesn't have wait until the data is stored to disk. If the query in which you then use the view allows the view to be queried using the MERGE algorithm, then that should be reasonably fast (provided there are good indexes on the underlying tables). If the view is evaluated using the TEMPTABLE algorithm then it could be costly because you won't have any indexes on the set from the view.

Search This Blog

About Me

I'm Roland Bouman (@rolandbouman on twitter). I'm a software (web) application developer and I work on both the front end as well as the back end. I do data modeling, database design, ETL, Analytics, and Business Intelligence.