Subscribe to this blog

Follow by Email

Table Functions, Part 5a: An introduction to pipelined table functions

[Gee, that was embarrassing. I start a series on table functions, and then it falters and stops. My apologies; I got so busy eating and drinking and breathing and sleeping and....anyway, back to the series!]

In this post I introduce PL/SQL developers to the pipelined table function.

Here's my first piece of advice regarding this interesting and in many ways unique (within PL/SQL) feature:

You are unlikely to ever need a pipelined table function, but when you do, it is going to be amazing.

I bring this up right at the start because I have found over the years that many developers talk in the most glowing terms about pipelined table functions. Yet when I look at what they are doing with those functions, it becomes clear that they are not pumped up about the pipelining. They are not, in fact, benefiting at all from the pipelining feature.Instead, they are excited simply about table functions.And that's cool, that's fine. Table functions are, indeed, quite awesome. Read my series! :-)

A pipelined table function is aspecialized variant of the more general species. So far as I know, the pipelined table function (ok, I give up: PTF from here on) has three primary use cases:

Make sure that a table function executed within a parallel query does not cause serialization. That is, parallelize (which is strikingly different from paralyze) execution of the table function.

Reduce user perception of the elapsed time needed to retrieve data.

Reduce the amount of PGA (process global area) consumed by a table function.

I discuss these briefly below, and then describe the architecture and flow of PTFs.

Parallel Query

I am not an expert in parallel query, but the basic idea if obvious: you have a long-running query (it takes days!). You need to make it run faster. So you give the Oracle Database SQL engine sufficient information so that it can break up that one big monster query into multiple queries that run in parallel. The results from each "partitioned" query are then merged into a single result set, and you have your answer - in much less time.

This one's a bit harder to explain, but should be clear enough once you get through the architecture/flow section below. The basic idea is a user visits your website, enters some criteria, and presses Submit. Surprise! They've just asked for 127,544 rows of data to be returned to the screen. And so they wait and wait and wait, as the page gets longer and longer.

Users hate to wait.

So you build that query around a PTF, and then the scripting language (or Java or whatever else you are using) can consume and display, say, the first 100 rows virtually instantly, and the user can start looking at and working with the data, as the rest of the data is being sent to the website.

The total time taken to retrieve all data is not faster, but the user's perception is: "Wow! I didn't have to wait at all!"

I think that sounds so cool, and I know (or believe) that people have used PTFs for this, because they told me. In my classes. In front of other developers. So they wouldn't lie, right?

But no one has ever pointed me to a demonstration. So if you have done something like this, please let us know via a comment on this post.

According to the documentation, you need to follow an "interface approach, the consumer and producers run on the same execution thread. Producer explicitly returns the control back to the consumer after producing a set of results. In addition, the producer caches the current state so that it can resume where it left off when the consumer invokes it again." You can do this in C or Java, and you will find examples here.

Reduced PGA Memory Consumption

When you populate data in collections (and table functions return collections), you consume Process Global Area memory. If you use too much of this memory (an OS setting), your session dies.

If your "normal" (non-pipelined) table function is consuming too much PGA, you can switch to a PTF and watch PGA consumption plummet.

Why? Well, to answer that question, you need to understand how PTFs work. And so....

Pipelined Table Function Architecture and Flow

Before I get into the details, it's important to remember the following about PL/SQL:

To drive this point home, I create a simple table function that includes a call to DBMS_LOCK.SLEEP (note: you may need to have someone with DBA authority grant you the EXECUTE privilege on this package to reproduce what I am doing):

As you can see, the time it took for the query to complete doubled when I doubled the time spent "sleeping" inside the function.

This, then, is the norm in the world of PL/SQL: Call me and you must wait for me to finish.

Clearly, that approach doesn't work very well if you want to execute a parallel query and one of the "tables" you are querying from is a table function. You can't parallelize an operation when one component of that operation insists on serializing, right?

Same goes for the user perception use case: if a user submits a request that executes a table function that must return 100,000 rows, everyone must wait till all those rows return.

And it returns those rows in the form of a PL/SQL collection, so that table function could and often will consume large amounts of PGA memory.

Sigh. What's a programmer to do?

Pipeline!

Well, that's just a word. So what does pipelining do in a table function?

It allows you to "pipe" a row of data from inside the function out to the calling query. The query can then immediately use that row of data, without waiting for the function to complete.

Rather than bloat up this post (and further delay its publication) with a full-bore explanation of the syntax for pipelining, I will simply offer a rewrite from my streaming table function post as a PTF (pipeline-specific syntax in purple). I will then in the next post in this series provide a step-by-step explanation.

You wondered oh so well, Jobo. I caught that typo when I tested the code (and produced the timings) but then forgot to apply the fix to the code in the post. I have since fixed it (thanks to Patrick Barel for first pointing it out - and oh so politely via email - an unnecessary gracious act. I am fine with being called out in public!).

For one client we once had to implement a specific case for Oracle Siebel audit trail. Audit trail in Siebel is "encoded" as string divided by * by some special logic.

The client's security department wanted to query decoded audit trail but they claimed that is has to be a select from a table but they can't execute a procedure before it. And of course the data had to be actual date without any delay from a specific date.

So we implemented it as a pipeline function encapsulated into the view.

In addition to the mentioned features that are characteristic for the pipelined function,there exists also a PL/SQL implementation aspect that makes them appealing:

When we need to define a table function that returns a collection of a composite data type, making the function pipelined allows for nicely organizing the code, by keeping both the function and the necessary record and collection type in a single package.

Otherwise, a similar non-pipelined table function would require to create an separate schema level object type and collection type.For pipelined functions, Oracle does this work for you behind the scenes :)

This can be especially attractive when the involved types are only necessary for that specific function or package, that is, without having a wider usage in the application.

Regarding the pipelined functions specific features, the reduction in the PGA consumption is indeed an essential one.

I had the opportunity to experience this effectively when I converted a table function used in Oracle8i, before the pipelined functions were introduced, into a pipelined function in later versions, and the improvement in memory usage was huge :)

Sometimes "the feel of an improvement" could be more rewarding than being "born directly" into a version with all these goodies already at hand :)

I fully agree with the implementation aspect: It makes code more readable (shorter sql), better testable (e.g unit testing) and less side effects.But I wonder how you made it work with local types. It worked for me with schema level object type and collection type, e.g.

CREATE OR REPLACE TYPE T_OBJ AS OBJECT( ...);CREATE OR REPLACE TYPE T_SET AS TABLE OF T_OBJ;

but not with the following local declaration (neither in header nor in body of my package):

type T_Obj is record( ...);

type T_Set is table of T_Obj;

I've got:

460/22 PLS-00642: local collection types not allowed in SQL statements

Of course. I defined a pl/sql collection, not an object type in my package. But Iudith wrote:

"When we need to define a table function that returns a collection of a composite data type, making the function pipelined allows for nicely organizing the code, by keeping both the function and the necessary record and collection type in a single package."

My apologies. I did not read your code closely enough. Yes, you declared a record. That will not work because when you invoke the function inside SQL, that record type will not be recognizable. So you create the object type at the schema level, but you can declare your collection type of that object type in the package specification. Make sense? [Hope I got it right this time!]

A question please. I have seen recently that pipeline functions are working very well with ROWNUM pseudocolumn to limit the returned results.

Is it known how it behaves regarding:1. Pagination (especially for the last pages - even if the probability to require them is small - I knew people who did this)2. ORDER BY in PTF is preserved at the output level? I have encountered an example in SQL Server where the order had a very strange behavior being modified from what was inside of a cursor to what revealed to be outside of the cursor (i.e. after the cursor prematurely exited because of this unforeseen behavior) and I am curios how Oracle behaves from this perspective regarding the PL/SQL features.

Post a Comment

Popular posts from this blog

Table functions - functions that can be called in the FROM clause of a query from inside the TABLE operator - are fascinating and incredibly helpful constructs.

So I've decided to write a series of blog posts on them: how to build them, how to use them, issues you might run into.

Of course, I am not the first to do so. I encourage to check out the documentation, as well as excellent posts from Adrian Billington (search for "table functions") and Tim Hall. Adrian and Tim mostly focus on pipelined table functions, a specialized variant of table functions designed to improve performance and reduce PGA consumption. I will take a look at pipelined table functions in the latter part of this series.

Here's my plan for the series:

1. I will start (in this post) with some very simple examples and exploration of a few use cases.

2. Explore table functions that return more than one value per row (via object types).

I have recently received a couple of requests for recommendations regarding unit testing of PL/SQL programs. I thought I would share with you what I told them.

First, some background: unit testing refers to the process of testing individual subprograms for correctness, as opposed to overall application testing (which, these days, almost always means visiting a website).

The basic idea behind unit testing is that if you verify that each individual subprogram works correctly, then you are much less likely to have bugs in higher-level programs that call those tested subprograms. And when you do, you know you can focus on the way the tested subprograms are used, and not the subprograms themselves.

The most important application of a unit test is to participate in a regression test, which can be run to verify one's code works today as well as it did yesterday. That will greatly reduce the chance of you upgrading the application and users complaining that a bunch of features that worked…

Please stop reading this post, and read that one. When you are done, come on back here for my thoughts on Sten's thoughts.
OK. You read it. Here we go.

First, thanks, Sten, for being such an interesting, wise, sometimes provocative voice in our community.

Next, Sten writes:

Now, on the one hand, I certainly agree that the vast majority of young developers are currently caught up in the modern version of a Gold Rush, which is: "Build an app using JavaScript, pay no attention to that database behind the curtain."
But I can assure you that I still do meet young PL/SQL programmers, regularly, when I am at conferences and doing onsite presentations at companies.
So, young person who writes PL/SQL: do not be afraid! You are not alone! And you are super-smart to have made the choice you did. :-)
Next, Sten offers this advice to managers: