The PL/SQL Challenge (www.plsqlchallenge.com) offers a daily quiz on the PL/SQL language, through which thousands of Oracle technologists demonstrate and deepen their knowledge of PL/SQL. This blog contains posts by the PL/SQL Challenge founder, Steven Feuerstein, as well as comments from players.

23 October 2010

Let's talk about varrays and the 22 October quiz (1525)

The 22 October quiz asked "Which of the following statements describe a reason to use a collection of type varray with BULK COLLECT?"
The choices and my explanations for them quickly attracted response. I was not terribly surprised. This question is a bit different from most of the previous ones. Rather than ask you to evaluate code, I ask you to pick reasons, which inevitably have to do with intentions and the solving of problems. Good stuff!
Or risky material.
My newest reviewer (and also, therefore, the most recently active player of my reviewers), Michael Brunstedt, actually pleaded with me to not go with this quiz, precisely because it was "conceptual" and therefore open to (mis)interpretation. "This quiz has bad karma," said Mike.
Well, I ignored him and published the quiz. I offer below the feedback of several players. I will hold off responding myself until I hear more from other players.
"I'm not sure if the question and answer two (will establish a upper limit...) are really clear. Yes, a varray will establish a upper limit because you will get an exception when more records are fetched than the varray has been defined for. But I'm not sure if that's a reason why I would use varrays instead of other array types. I have checked that question because it establishes that upper limit but in a program it doesn't make much sense because I will get an exception if I don't use the LIMIT clause (which really establishes an upper limit)."
"I choose none but I think that you will consider "Use of a varray will establish..." Because you can trap the exception and no nothing and the varray will keep the previous fetched values...even soo that is very bad practice (I think)...you could do that in the SQL (the limit)"
"First, I should witness again how much I love the quiz, more and more every day ! Nothing helps, the passion for constant self-improvement is there, it is burning and transforming you not only in a more eager professional, but also in a more passionate competitor !!! Though there are about 2 hours left for today's quiz, I just started to wonder about it having been somewhat different from the usual ones, leaving more place to interpretation. Specifically, I am not completely content with the fact that the second choice: "Use of a varray will establish an upper limit to the number of rows that can be fetched ..." does allow for possible opposite interpretations to this statement. First, strictly speaking, the use of a varray by itself WILL NOT establish an upper limit to the number of rows that can be fetched, except maybe with the following addition: "the number of rows that can be fetched without raising an exception". In fact, the programmer is the one that should take care NOT TO EXCEED the number of the elements predefined in the varray definition, either by safely limiting the number of rows in the result set or by catching an eventual exception raised if the result set is larger than the varray size. I did not check yet whether handling such an exception does indeed ensure that anyway, a number of rows not exceeding the varray size are still deposited into the target varray. If this does indeed happen, then we can be "that generous" and consider the varray as "kind of a tool" that can be used to limit the number of fetched rows, in addition to the other "traditional" ones, like limiting the rows selected or using an explicit cursor with FETCH ... BULK COLLECT INTO ... LIMIT "n". But, again, it is the programmer that uses varray as a limitation tool, and NOT the varray itself. I would say "it is the varray only" if this would happen "silently", without raising any exception, but just ignoring the rows that do not fit into the varray size. By the way, I remember something related to a similar behavior years back, when using pl/sql in SQL*FORMS V3. If a SELECT INTO ... ORDER BY ... happened to return more than one row, then a TOO_MANY_ROWS exception was of course raised, but, if that exception was handled, then the target INTO variables still contained the first row's values (as by the ORDER BY), which was exactly our purpose, so we deliberately used such coding, though it is not the best way to go. When upgrading to OracleForms6i, this behavior changed, in the sense that, after handling the TOO_MANY_ROWS exception, there were NO FETCHED VALUES deposited at all in the target variables, so we changed each such select into an explicit cursor and fetching only one row from it. Second, back to the quiz, considering that the question asked for a "REASON to use varray with BULK COLLECT", the above remark adds even more to the possible ambiguity of this choice: One programmer can consider the varray size limitation and the need to prepare for an eventual exception handling as a rationale AGAINST using varrays with BULK COLLECT, while another programmer, just the opposite, can see in this a possible and "simple enough tool" for obtaining a limited number of rows in a result set, again, as long as the behavior is the one that I tried to describe above and it is not changed by Oracle. Once again, I will test it immediately when I will be back at work and have the database at hand, but, till next week, I just wonder about what other competitors will think about this choice. Regarding the last choice, related to the efficiency of varray versus other collection types, I am not very sure that there is a 100% answer about whether this is right or not, it probably depends also on the number of rows in the result set, though, if I remember it correctly, it is generally considered that when storing varrays IN THE DATABASE (which is not the case here), especially small size ones, they can offer certain advantages regarding storage, ease of manipulation, subscript stability, a.s.o. upon nested tables. So, in summary, if I am allowed a personal opinion, this quiz was much more "phylosophical" than any previous one that I can remember, I dare presume that there will be more comments around it than on some of those previous ones that made a few compeitors so angry about the English text ... Here the controversial case is a "purely PL/SQL" one, so it will be interesting ...
" I considered today's quiz questions to be a bit more ambiguous and grammatical than typical, I thought you were trying to eliminate subjectivity and English language skills from these questions . . . I chose not to check the answer "Varrays are the most efficient type of collection, so the BULK COLLECT query will ..." question, because I could not find any Oracle documentation that denoted that 25% improvement is typical, and I know that each configured Oracle environment can behave quite differently than another when it comes to this kind of performance tuning, therefore even though they are the most efficient and faster, the 25% improvement clause causes the statement as a whole to fail the correctness test. I was iffy about checking the "Use of a varray will establish an upper limit..." question, but checked it in the end because someone could indeed decide that their program should raise an exception if more than records are returned, therefore this could be a reason for choosing them since they wouldn't have to wait for the statement to return all the records that will be ignored anyway since there were too many (and also sucking up SGA memory & processor cycles the whole time) until they could check their record count against ."
"On the Friday October 22 quiz, I'm not sure that I agree with marking correct the answer "Use of a varray will establish an upper limit on the number of rows that can be fetched with a "SELECT column_name BULK COLLECT INTO array_name ..." statement." I read that as saying that a SELECT ... BULK COLLECT INTO array_name would do the equivalent of an implicit LIMIT size_of_array, which is, of course, incorrect. I would further argue that bulk collecting into a varray does not limit the number of rows fetched-- Oracle is attempting to fetch all the rows. It just happens that the N+1th fetch into a varray of N elements happens to throw an error. "
So....what's your opinion on this matter?
And now I will prepare for the upgrade to 1.7.
Cheers, SF

29 comments:

The nice point about question was using VARRAYS for bulk collect statements.

As regards the option # 3, I am not a native speaker, but I understood the word "ESTABLISH" as the thing that helps to control the rows limit.We can use the optional LIMIT clause, allowed only in bulk FETCH statements, that limits the number of rows fetched from the database. LIMIT clause will "establish" it.The case when more than likely we will get an exception doesn't suit the word "establish".Choosing that option I was sure that it was like a gamling with 50% probablity of my understanding the main idea of that option.

Anyway, that was nice to pick the point up about using varrays, because I would never consider their using for this sakes.

What would you do with an exception then? Ask the developer to change the code to increase varray size? What if the new size is still not big enough some day? A loop with fetch limit would be a better solution.

Before creating a software solution, you should have unambiguous requirements, good validation rules and the goal to build something that consistently does what it is supposed to do.

Upon completion of the requirements analysis, we pick a feasible environment in which to design a solution. E.g. Oracle 10gR2 with the built-in PL/SQL. PL/SQL is a very mature programming language. It has evolved over many years and often provides several solutions to the same problem. How do you pick the best or most feasible solution? Well, this is where things like experience and best practices come to play. What may happen? Sometimes two or more solutions score equally well and you simply pick the one you, for whatever reason, like the best.

In reality, we are taking a concept and turning it into fact while dropping all ambiguity along the way. Sounds familiar? I bet most developers can say: Been there, done that.

The scenario usually takes place in the confinements of a team and a few discussions are likely to occur before a solution is presented. That is at least my experience :-)

Let's translate this to the varray quiz.We have a requirement to retrieve data and have agreed that BULK COLLECT INTO is the way to go. As we all know, we can achieve that in different ways and this is where the discussions take off. One will argue for this, the other for that and Steven, without recommending actually doing it, asks for reasons why a varray should be used. Sitting around the table, at the bar or wherever these meetings are held, everything is discussed and we come to a mutual agreement (or the boss wins) and we all move on. But, wait a minute, we did not have that discussion... The discussion amounted to three choices, each with a boolean answer. There is no possibility to say; Yes, but I would rather... or No, it will work, but...

Had the quiz concentrated entirely on varray as the only solution, then maybe ok. But, the choices forced upon us the need to compare with the other collection types. Directly by saying that varray is faster and indirectly by saying that varray may limit the number of rows retrieved. The last one we cannot dispute though, after all varray does enforce an "upper limit". If you would ever do it that way is another question (one that wasn't asked).

Are conceptual quizzes bad?Personally, I think anything conceptual is better off in a discussion forum than a venue where I can only answer TRUE or FALSE. Others may (rightfully) argue that, in the world of software engineering, all great solutions start out as a concept and that tools used to create them are no different. So why can’t we quiz about concepts in PL/SQL?

My personal opinion is that conceptual quizzes are not bad, they are just extremely difficult to create. As long as there is enough fact around to quiz about, I would hold back on the concepts.

Your reviewer, Mike, was right; this quiz was destined to have trouble. Conceptual questions have a much greater risk of inviting challenge than do those based on issues of syntax or code behavior. This quiz included two problematic areas: questions of performance, and "creative" use of undocumented behavior.

I chose to select the performance choice (despite the suspicious reference to 25% improvement) because VARRAY handling is more efficient than either nested table or associative array handling. The performance difference is heavily influenced by many variables (memory fragmentation, available memory, collection size, etc.). Determining typical relative performance would require extensive experimentation (assuming it could be determined at all). I suspect that query processing time will always be far more significant than the time associated with collection handling.

The use of a VARRAY to limit the number of rows returned by a query is "creative" but appears to be undocumented and unintended behavior (I’m assuming the demonstration code presented in the answer works with 10g). I don’t have a 10g instance available, but with 11gR2 an ORA-22165 exception is raised if the query attempts to return more rows than the VARRAY can hold. The VARRAY is populated even if this exception occurs (when performing a BULK COLLECT the collection is handled as if it were an OUT NOCOPY subprogram parameter; i.e., changes to the collection are not reverted on failure). I didn’t run any test code prior to answering the quiz but instead rejected the "limit" choice because it is not a reason I would choose to use a VARRAY over another collection type. Even if an exception were not raised when attempting to query more rows than will fit into a VARRAY, I would still not consider this a valid reason to prefer a VARRAY since the LIMIT clause effectively addresses the memory use issues and clearly documents the intent to restrict the number of rows retrieved regardless of the collection type.

For what it's worth, I loved this quiz -- even though I only got 1 out of 3 correct (I'm new to bulk collect and I thought it didn't work with varrays -- all the more reason to dig out a reference manual before I submit my answers).

I appreciate being able to read Feuerstein's reasoning for the choices, as well as everyone else's take on the reasons.

Hello,To me, it was a very strange question. I've chosen the option that was stressing the performance issues of using varray type. Although I had doubt about the gain of 25%, I just clicked on that option and submitted the answer. Then I scanned the Oracle documentation and found the source of the information in my mind about varray's performance advantage over nested tables: Database Application Developer's Guide - Object-Relational Features book's chapter 8 says "If the entire collection is manipulated as a single unit in the application, varrays perform much better than nested tables. The varray is stored packed and requires no joins to retrieve the data, unlike nested tables". Of course it was totally a different aspect of Oracle collections but it was the thing that led me to the wrong direction.The option telling the upper limit establishment succeeded by varray type was saying "Use of a varray will establish an upper limit on the number of rows that can be fetched with a "SELECT column_name BULK COLLECT INTO array_name ..." statement". I considered that statement was denoting a built-in limit applied to the SQL statement caused by using the varray collection, and simply eliminated it. And I am surprised a lot when I saw that it was the correct answer, because I was sure that this sort of coding throws an exception and there was no auto limiting feature coming with the varray types used in bulk collect statements and I still cannot understand why this option is correct. I think that, the statement in this option is very likely to be misinterpreted.Regards,

Regarding performance: I realized earlier today that the people who are talking about improved performance with varrays are talking about (as Bora related above) to the use of a collection as a column type (in SQL, that is, not in PL/SQL).

Yes, there are at least theoretical gains in performance by using varrays. But this question was not about using nested tables or varrays as column types. Or I should say: I did not explicitly exclude that usage, but did say explicitly "than if you use an associative array or nested table." There is no possibility of using an associative array as a column type.

In addition, this is a PL/SQL quiz, not a SQL quiz, so I don't think it was reasonable to conclude from this choice that it was correct because you could see performance gains by using a varray as a column type, over nested tables.

Hi again,I am aware of the difference between the performance advantages of using varrays intead of nested tables, mentioned in the Oracle documentation and the concern of the quiz. I just wanted to share the vague information pieces in my mind which affected my choice in a negative way. That's why I used the words "Of course it was totally a different aspect of Oracle collections but it was the thing that led me to the wrong direction." in my former comment.I actually am wondeing about the final verdict on the option which had been stating the varray's upper limit establishment to the rows that can be selected.And one last thing: I really am a fan of PL/SQL challenge so that I am trying to inform all the professionals in my network about it. I think you have created a great platform for PL/SQL specialists all over the world. You teach, attract our attentions to the right points, introduce new features and help us position the pieces in the right places. So you deserve positive feedbacks undoubtedly.Regards,

When testing on my notebook with 11gR2, nested tables perform ~12% worse than VARRAYs. I executed a similar test using a SQL query selecting 10,000,000 rows instead of performing an element copy and only saw ~2% difference. The minor performance difference when performing a query confirms the assumption I made in my earlier posting that SQL query processing time is much more significant than any performance differences between the various collection types. As the number of rows returned by the query is reduced the performance differences between collection types becomes too small to measure reliably. Removing the calls to SYS.DBMS_SESSION.FREE_UNUSED_USER_MEMORY introduced randomness in the timings due to the varying times of memory reclamation.

I object to “establish an upper limit”. I also interpreted it as “automatic LIMIT clause” and marked it as incorrect.Replacing one error with another could be a reason of choosing varrays. But in strangely thinking minds. Steven, did you ever yourself use your “choice”? - I’ll never choose within two errors. I’ll rather write code that always work.At last you will not say that associative arrays and nested tables has no some “upper limit”, will you? :)

Personally, I like the conceptual questions, so I'd hate to see you ask fewer concept questions just because they are more likely to generate challenges. I actually prefer questions that generate challenges because that often leads to some interesting discussions and tidbits.

On the other hand, my bias would be that a correct answer to a conceptual question ought to be both literally "correct" as well as practically correct. In a question based on code, I'm fine with implementations that are technically correct but would never be considered by a competent programmer in practice. The code itself is an unambiguous specification and the hack-ish nature of the code is designed as an illustration of some point, not as a guidepost for rational development practices.

For concept questions, though, I think the bar for correctness needs to be higher-- a correct answer needs to be both literally correct but also practically valid in at least some situations. Realistically, there is always going to be some level of ambiguity in a questions that don't include code. Knowing that we're not trying to lawyerly parse a statement for loopholes but to evaluate whether it makes sense in practice helps remove a great deal of potential ambiguity. When the question asks which are reasons to use varrays with BULK COLLECT, the litmus test ought to be whether the reason is one that would potentially influence the thinking of a good PL/SQL programmer. The third choice fails that test-- no competent programmer would argue that a varray's ability to throw an exception if too many rows were fetched is a reason to use a varray rather than to use the LIMIT clause of a BULK COLLECT.

a) My family cannot travel in busb) Bus is usually 25% faster than train or taxic) There is an upper limit of passengers in the bus.

Is (C) correct?Sure. My family was going to city for watching a movie. I am not sure what is the size of my family! All I know is that it should not be more than what a movie theater can contain. So I chose a bus so that I can ensure that the number of my family members are less than that of a movie theater capacity!

Only when the conditions, circumstances & rules are known, the creative 'reasoning' for option C might appear (arguably) correct.

IMO the rules/conditions were not clearly specified while taking the quiz, hence option is invalid.

That is, AAs faster. In fact, there is more of a difference here than I expected, but varrays do NOT seem to be at least 25% faster or anything like that.

Does anyone else consistent 25%+ better performance with a varray?

2. Regarding the use of 25%: I actually put that in there so that a person could not do some timings, find a +/-2% difference and make a decision. That is, I felt that the arbitrarily chosen 25% was so large as to make it clear that this choice could not be correct. Sure, you might ask: How could a player figure that out? Well, a player could figure it out from experience or writing a test as JHall62 did. That's what I did, in fact. So I believe that this choice remains incorrect - and it is a matter of learning, for people who were not aware of this. That is, when you took the test you didn't know. Now you do.

3. Objections to my use of varray to "establish an upper limit" - it seems like we have a number of oppositions to the semantics. I don't really get this. If you use a varray as the "target" collection in a BULK COLLECT query, you have established an upper limit on the number of rows that can be fetched. What is ambiguous about that? If you try to fetch more that than number of rows, Oracle raises an exception.

4. Use LIMIT with FETCH...BULK COLLECT instead: sure, you could do this as well, though the LIMIT is clause is designed more to help you manage the amount of memory your session uses. The idea with the varray limit is that you have decoratively (through the definition of the varray) established that limit and now Oracle will do the checking for you. As I note in my answer, the scenario in which this could make sense is when you KNOW that the number of rows in your table should never exceed a certain amount and if it goes beyond that, you have an error in your data. In this scenario, Oracle will enforce that limit for you. Have I run into a need for this technique? No. In fact, I hadn't thought of it until Bryn Llewellyn, PL/SQL Product Manager, brought it to my attention. Would I ever use it? I don't know, probably unlikely, but it is a tool in my disposal. Does that make it wrong? I don't see how. Now lots more people are aware of the possibility of applying varrays in this way. And if they read this blog (and my answer text, which I will probably amend to include some of this discussion!) they will be aware of the possible complications.

That is, AAs faster. In fact, there is more of a difference here than I expected, but varrays do NOT seem to be at least 25% faster or anything like that.

Does anyone else consistent 25%+ better performance with a varray?

2. Regarding the use of 25%: I actually put that in there so that a person could not do some timings, find a +/-2% difference and make a decision. That is, I felt that the arbitrarily chosen 25% was so large as to make it clear that this choice could not be correct. Sure, you might ask: How could a player figure that out? Well, a player could figure it out from experience or writing a test as JHall62 did. That's what I did, in fact. So I believe that this choice remains incorrect - and it is a matter of learning, for people who were not aware of this. That is, when you took the test you didn't know. Now you do.

3. Objections to my use of varray to "establish an upper limit" - it seems like we have a number of oppositions to the semantics. I don't really get this. If you use a varray as the "target" collection in a BULK COLLECT query, you have established an upper limit on the number of rows that can be fetched. What is ambiguous about that? If you try to fetch more that than number of rows, Oracle raises an exception.

4. Use LIMIT with FETCH...BULK COLLECT instead: sure, you could do this as well, though the LIMIT is clause is designed more to help you manage the amount of memory your session uses. The idea with the varray limit is that you have decoratively (through the definition of the varray) established that limit and now Oracle will do the checking for you. As I note in my answer, the scenario in which this could make sense is when you KNOW that the number of rows in your table should never exceed a certain amount and if it goes beyond that, you have an error in your data. In this scenario, Oracle will enforce that limit for you. Have I run into a need for this technique? No. In fact, I hadn't thought of it until Bryn Llewellyn, PL/SQL Product Manager, brought it to my attention. Would I ever use it? I don't know, probably unlikely, but it is a tool in my disposal. Does that make it wrong? I don't see how. Now lots more people are aware of the possibility of applying varrays in this way. And if they read this blog (and my answer text, which I will probably amend to include some of this discussion!) they will be aware of the possible complications.

Regarding the performace-related question, it is indeed a matter of testing.

The more problematic is the "varray fetch limitation" question.If we accept the rationale in the anwser explanation:"Use of a varray places a "hard" limit on the maximum number of rows that can be fetched by the SELECT statement.", which leads to less PGA used, a.s.o, then this has enough for reversing the correct answer to the otherchoice, the performance-related one.

I rather agree with Vitaliy, that is ,to accept that varrays establish (by themselves) an upper limit on the number of rows fetched would have been true if simply SELECT-ing into a varray behave like a FETCH ... LIMIT "varray size", that is, without raising any exception.

In summary, really a "bad karma" quiz ...

I agree that conceptual questions may sometimes be more interesting than others,but they generally cannot be scored "black" and "white" and counted for a contest ranking,therefore are better for forums.

I am not sure that the scores of this quiz do indeed reflect the pl/sql knowledge more than they reflect the interpretation that each player gave to the question itself.

Steven, your timing tests support my assertion that performance is significantly influenced by many factors. It also opens the possibility that someone could have performed a single comparison test that happened to show a >25% speed improvement using a VARRAY, possibly due to memory reclamation occurring when the non-VARRAY query was running. Prior to adding the forced user memory recovery to my script I would occasionally encounter significant and unpredictable timing variances. The script I posted does not NULL the collections after each test (I added this after I pasted the code into my post). This could explain why you observed much poorer performance with VARRAYs (as the second test in the script it may have had less memory available).

In light of this discussion I now understand the intent with the "limit" choice was to force an exception if more than a specific number of rows are returned by the query (this was not clear from the quiz); however, it seems that many, myself included, interpreted the intent to be similar to using a LIMIT clause in a FETCH. The use of the expression "upper limit" is ambiguous without additional context in the same way that "speed limit" can be ambiguous. When applied to traffic laws "speed limit" refers to a speed above which one will be in violation of the law (an exception). When applied to a governor (an engine control device) "speed limit" refers to the maximum speed at which the engine will operate (depressing the accelerator further results in no additional increase in speed but does not result in a failure).

Although concept focused quizzes are more problematic than those based on concrete syntax or code behavior, the resulting discussions may be more valuable than the lost points.

I think most people don't have problem with accepting the explanation and meaning in VARRAY-limit question once it was explained. At least for me this was more english language skills quiz rather than PL/SQL language skills quiz. The ambiguity in the text of choice was (again, for me as a non-native speaker) the same a descibed by Vitalij (“establish an upper limit” interpreted as “automatic LIMIT clause”).

Well, I started my comment with "most people" but obviously I can't speak instead of "crowd" so the above is just my opinion.

I agree with you that you can use VARRAY in the way you described, I would never use it that way but as often PL/SQL Challenge is about PL/SQL and different ways of its use rather than (always about) best practice.

The only reason I can think of to use a VARRAY limit rather than a BULK COLLECT...LIMIT would be to 'tidy' the magic numbers. That is, if you expect to process 25 rows (or less), you use a VARRAY with size 25. To do the same with BULK COLLECT LIMIT, you'd need to use 25 + 1 (plus manually check to see if there is a 26th row).But heavily on the other side is that the magic number for the varray limit is hardcoded into the declaration. BULK COLLECT...LIMIT can use a variable and/or expression, making it easier to put all the magic numbers in one place.

Not 25%, but an improvement. Will that be consistent depending on OS, hardware, data design? Unlikely.

I'd be happier saying a UNION ALL produces a consistent improvement (of value) as a set operator, but not this. You would probably choose array type depending on your problem at hand, not so much performance.

Shame, because I thought VARRAYS might inherently perform better, so I ticked the box. Now I know.

Regarding the same choice, Steven, you believe yourself to be RIGHT but I think myself definitely to be NOT WRONG. Does it establish a paradox? Or show choice's ambiguity instead?Responders to both interpretations prove their choice by knowledge that varray can not contain more then it's limit elements. One fact but two opposite conclusion…

Steven, I’m surprised that your tests showed VARRAYS performing more poorly than the other collection types. How many times did you run the script? I had thought that the ten iterations in the script would be sufficient to eliminate any significant affects from the occasional long run caused by unrelated database activity, but have discovered otherwise. Within each environment most runs have been consistent, but there have been a few exceptions where the execution times were at least an order of magnitude greater than the average. If the outliers are ignored then VARRAYs have performed better than nested tables (sometimes surprisingly so). The following lists the average percent performance penalty associated with using nested tables instead of VARRAYs for both the previously posted “copy” script and for a similar script that performed a SELECT INTO (labeled as SQL):

My guess is that the performance differences between these environments are due mainly to the amount of PGA memory available. If appears that when ample PGA is available the simpler structure of VARRAYs vs. nested tables has a significant impact on performance. As memory becomes more constrained the timing variances seem to increase.