Developing for the Oracle Database

1 Coding Standards
What this document does not claim to offer is a prescriptive guide on the minutiae of coding standards.

Coding standards and naming conventions, particularly in SQL and PL/SQL, are an emotive and divisive subject.

This is largely the fault of a historical absence of mature IDEs and a lack of any rigorous, generally accepted standards as seen with other languages.

Many developers still hand-craft scripts and routines and the various tools available often have differences, subtle or otherwise, in what built-in formatting they can offer.

Good developers can adapt, good standards can adapt.

The most important objectives for coding standards are to :
• Make development faster and debugging easier
• Make easier the understanding of other people’s code
• Limit bugs

The following expectations support these objectives:
• Where possible use SQL before using PLSQL
• Code will be reasonably formatted and legible, preferably with a consistent style within the module at least.
• It is preferred but not mandated that Oracle keywords be capitalized – e.g. CREATE OR REPLACE PACKAGE, SELECT, UPDATE, DELETE, FROM – and lowercase used for objects and columns, etc
• In SELECT statements, tables should be aliased – this provides a very small benefit to the optimizer when parsing but also prevents bugs particularly in subqueries.
• Procedures, cursors and variables, etc should be properly scoped – e.g. public vs private, global vs local, parameter scope, etc
• Meaningful names will be given for code items
• Reasonable naming might include some sort of prefixed or suffixed indicator of purpose, e.g. k_ for constants, l_ or v_ for local variables, g_ for global variables, p_ for procedure, f_ for function, _pkg for package, i_ for in parameters, o_for out parameters, io_ for in out parameters.
• Package and procedure level comments should indicate why/when a particular program was changed but SVN, or other code respositories, are the appropriate mechanism for code control.
• Code comments should be used when they add value.
• Excessive commenting and stating the obvious should be avoided – these are often more effective when refactoring the code concerned into a private routine (procedure/function) which is named appropriately (e.g. function f_format_swift_string).
• CamelCase is not considered appropriate for the database as all code is stored in the db as uppercase.
• Package headers and bodies should be checked into separate files for clarity and to prevent unnecessary recompilcation of unchanged code and dependencies (version dependent)
• Performance should be built in and evidence of such documented.

2 Writing Optimal SQL

2.1 Key points
Writing Optimal SQL should be relatively simple but many people struggle particularly when making the transition from an object/attribute language to a set-based language like SQL.

The key tenets of performant database code in Oracle are:
• Think in sets.
• Think about how the database might be able to process and apply logic to a set of data with great efficiency.
• Ask the right question in the best way.
• Know your data.

In support, when thinking about database code and SQL operations:

• If the query needs to be long/big, make it long/big.
• Bulk operations are critical, row-by-row operations are a cardinal performance sin.
• Eliminate data at the earliest opportunity.
• Sort on the smallest possible set – if possible avoid aggregations, sorting and distinct operations on the largest sets of data.
• Use bind variable when you require shareable SQL and when bind variables make sense
• Use literals when literals make sense.
• Use a mix of binds and literals if appropriate.
• Avoid PL/SQL in SQL.
• Be careful of applying functions (TRUNC, etc) to columns in the WHERE clause.
• User-defined functions which are called from SQL and which themselves contain SQL are, almost without exception, unacceptable.
• Never rely on implicit datatype conversion. Use the correct datatypes for parameters and where possible convert parameters NOT columns.

When writing SQL, focus on the question being asked by the SQL statement.

If you put the question into words as a comment before a complex SQL statement, then this can often add value to the next developer.

Often the most performant version of a SQL statement is the one which asks the question at hand in the most natural way.

To this end, proper consideration needs to be given to:
• Subqueries – EXISTS / IN / NOT EXISTS / NOT IN
• Set-operators – MINUS, UNION, UNION ALL, INTERSECT
• Use of DISTINCT is often an indication of a wrong sql statement or poor design
• Common Table Expressions:

Often it can help to use of Common Table Expressions (CTE), aka the WITH clause, for separating the main logic of the query from the subsequent fetching of additional data/attributes, e.g.

Again, this guide should not seek to be prescriptive on the preference of one over the other.

The bottom line should be that if a developer finds it easier to write a correct and optimal SQL statement using one rather than the other, then that is most important.

There are some SQL statement constructs which are more conveniently written in ANSI – the FULL OUTER JOIN for example.

It is also true that the optimizer always transforms ANSI SQL to the equivalent Oracle syntax and there are some limitations to the optimizer’s other complex query transformations when using ANSI SQL.

And unfortunately there are bugs in both.

2.5 Eliminate Early

Where there are predicates (WHERE clauses) which can significantly reduce the dataset early, check that they are being applied early enough in the execution plan (more information to follow), check whether the SQL statement might be rephrased or reconstructed (CTE/WITH) to make sure they are applied at an appropriate stage.

2.6 Sort / Aggregate on the smallest possible dataset

Similar to eliminate early. Sorting and aggregating requires memory and under certain conditions can spill to expensive (unscalable) disk operations.
Wherever possible, do the sort or aggregation on the smallest set of rows (not necessarily applicable to the order by clause of a query).

2.7 What’s the big deal with PL/SQL functions called from SQL?

The bottom line is that it’s about performance.

We could get in a whole argument about reusability vs performance but performance eventually wins in the end.

Often the correct mechanism for reusability in the Oracle database is not a function but a view joined to appropriately in the main SQL.

In the name of reusability, functions encourage row-by-row operations and discourage thinking in sets.

If the function itself contains SQL, then this SQL will not be part of the read consistency mechanism of the calling statements which can be potentially problematic.

If you absolutely have to, have to, have to use functions in SQL, then think again.

Then if you really, really do then please look at deterministic functions and consider wrapping the function call in a (select from dual) to expose the potential benefits of subquery caching for functions called with repeated parameters.

2.8 What about simple functions like TRUNC in the WHERE clause?

Using functions on columns in the WHERE clause can prevent the optimizer from using an index or from pruning a partition unless a function-based index is in place on the column.

For this reason, it is often best to avoid this sort of construct:
WHERE TRUNC(some_date_column) = TO_DATE(’01-NOV-2013’,’DD-MON-YYYY’)

In favour of this:
WHERE some_date_column) >= TO_DATE(’01-NOV-2013’,’DD-MON-YYYY’)
AND some_date_column) < TO_DATE(’02-NOV-2013’,’DD-MON-YYYY’)

2.9 Using the correct datatype, be explicit

Performance problems related to using the incorrect datatypes are common.

The optimizer will implicitly add functions to make sure the datatypes on both sides of the predicate match.

Always convert date-like parameters to DATEs where the column datatype is also DATE.

Never rely on implicit datatype conversion.

3 Execution Plans & Metrics – What we want, why we want it and how to get it
We have flown through some aspects of how to have a better chance of writing an optimal SQL statement.

3.1 How can we tell if it’s optimal?

Run it.

Run it twice to rule out the effect of uncontrollable factors like OS caching, SAN caching, etc.

Run it on representative data.

Run it on current volumes.

Run it on expected future volumes.

Then what?

In order to validate that our SQL statement is likely to have effective performance, what we want is the actual execution plan used and preferably the actual rowsource metrics.

Firstly, for getting actual execution metrics we can do one of two things prior to running the SQL statement concerned:
1. Add the /*+ gather_plan_statistics */ hint to the SQL or
2. In the same session, run alter session set statistics_level = all;

This is a convenient wrapper to get the execution plan and metrics from V$SQL_PLAN.

The first parameter is SQL_ID and by passing in NULL, we default to the previous SQL_ID run in this session.

The second parameter is CHILD_CURSOR_NO and this should be the previous child_id for the previous sql_id.
The third parameter is the FORMAT and ‘ALLSTATS LAST’ format says to get all statistics for the last execution.

If this works this should produce an output which is examined in more detail in section 6.

3.2.2 What if this doesn’t work?

If you find you don’t have privilege to run these commands – you need access to V$SESSION for example to use DBMS_XPLAN.DISPLAY_CURSOR – then you need privilege. There is no reason for privilege not to be given.

Otherwise the approach above is effective 90% of the time.

For parallel execution plans, see section 3.2.3 below.

However, in addition and specifically to SQL Developer, there are some recursive operations run by the tool which means that SQL Developer runs some internal commands such that when our DBMS_XPLAN statement runs, the previous SQL ID is no longer our target SQL statement.

There is one such example in SQL Developer 3 related to timestamp columns which affects the test script when running everything as a script (F5). In this case, there are two alternatives. Firstly, run the individual commands in SQL Developer as Run Statement (F9 / Ctrl + Enter). Alternatively, just comment out the timestamp columns in the SELECT part of the statement, for the purposes of this exercise.

Furthermore, in SQL Developer 4 there are further changes to recursive operations which seem to affect some statements.

In all such cases, if the output of the DBMS_XPLAN.DISPLAY_CURSOR is not the execution plan of the statement being profiled then the approach should be to identify the SQL statement in the shared pool (Look for matching SQL_TEXT in V$SQL) and plug the specific SQL_ID into the first argument of the DBMS_XPLAN call (no need to rerun the target SQL statement).

3.2.3 Parallel Execution Plans

For parallel execution plans, the approach of using DBMS_XPLAN.DISPLAY_CURSOR with the format of ‘ALLSTATS LAST’ is not appropriate because it fetches the execution metrics from the last execution of the statement – which is the Query Coordinator (QC) and does not include the metrics of the parallel slaves.

A better approach for parallel execution plans is to use real time sql monitoring and the easiest way to do this is to run the following and capture the output report:
select dbms_sqltune.report_sql_monitor(‘’) from dual;

This requires you to identify the SQL_ID of the target sql statement from V$SQL (seek matching text in SQL_TEXT/SQL_

FULLTEXT column).
It may also require you to add the /*+ monitor */ hint to your SQL statement as by default this only kicks in on executions which last longer than a default number of seconds (2?) and for statements which are less than a certain length

This follows the principle that:
“if an access plan is not optimal it is because the cardinality estimate for one or more of the row sources is grossly incorrect”
and
“the cbo (cost-based optimizer) does an excellent job of finding the best access plan for a given sql provided it is able to accurately estimate the cardinalities of the row sources in the plan”

By gathering the actual execution plan and the actual execution metrics, we can show whether the optimizer was accurate in its estimations and if it was accurate, then, from a developer perspective and for the purposes of most code reviews, there is a good likelihood that the SQL is good enough for the optimizer to do a good job with.

4 Interpretation of Execution Plans and Execution Metrics

If we’ve been lucky we should have the actual execution plan and the executions metrics.

4.1 What are we looking for? How do we interpret it?

Providing a thorough guide on how to interpret most variations of execution plans is beyond the scope of this guide, although we ill provide a basic guide in Appendix A.

Essentially, what we want to see in the execution metrics is that the optimizer’s estimates are broadly accurate.

How accurate?

In general, we shouldn’t necessarily be overly concerned until we get to a factor of 10x or even more.

And when we are looking at estimates vs actual, we need to consider the “Starts” so what we are looking for is that “Starts * E-rows” is in the right ballpark compared to “A-rows”. For more information, please see Appendix A.

In the above, the estimates are accurate so there is a very good chance that this is a good plan.

Here’s another, this time not so good because the estimate of rows in T1 was 1 whereas the actual was 10000.

This led the optimizer to choose an index access path over a full table scan and a NESTED LOOP rather than a HASH JOIN.
SQL_ID 9quvuvkf8tzwj, child number 0
————————————-
select /*+ cardinality(t1 1) */ * from t1 , t2 where t1.col1 =
t2.col1

There are four key elements:
• The SQL statement
• The SQL ID – a hash value of the sql statement, usually consistent between databases and even across versions
• The execution plan
• The predicate section – not to be overlooked. Can highlight issues with implicit functions and datatype conversions amongst other things

For the execution plan itself there are a number of elements to be concerned with:
• Optimizer – all modern version of Oracle use the Cost-based Optimizer (CBO). This uses statistics and cost calculations to choose a best-cost plan for execution.

• Cost – Cost is an estimated indicator of time which the optimizer uses to compare execution plan possibilities, usually choosing the lowest cost plan. However, to all intents and purposes, developers should ignore it.

• Cardinality – An estimate of the number of rows for a particular rowsource, for a particular join, etc. Exposed in the execution plan as E-Rows for estimate and A-Rows for actuals. When comparing E-Rows to A-Rows it is important to take Starts into account, i.e to compare “Starts * E-Rows” to A-Rows. The Nested loop operations for example will have multiple starts for the inner/probed rowsource.

• Parent:child operations – An execution plan is generally a succession of parent:child operations – follow and match the indentation. A join mechanism should have two children.

• Join mechanism – A join mechanism joins two rowsources. There are a variety of mechanisms but in general there are two main methods depending on the cardinalities:

o NESTED LOOP – Essentially a FOR LOOP – For each row in the outer/driving rowsource, probe the inner/probed rowsource. Generally used for low cardinality rowsources.

o HASH JOIN – Hash all the outer/driving rowsource based on the join key(s) then hash all the inner rowsource. Generally used for high cardinality rowsources. If the cardinality estimate is too low, work area sizes used for hashing maybe too small and spill to temp space on disk – slow/unscalable

• Join order – Depending on the cardinalities, the optimizer can choose to join T1 to T2 or T2 to T1. The number of permutations for join order is N! where N is the number of tables being joined. The optimizer will limit itself to a maximum number of permutations to evaluate.

• Access path – how the data is fetched from the table, i.e. by index via various different index access mechanisms or by tablescan, etc.

• Row estimates of 1:
o The minimum row estimate is 1 and in some cases this actually means 0.
o If this is not a primary key access and there really isn’t 0/1, then are there any statistics for this object?
o Row estimates of 0/1 where the actual number of rows is significantly more than 1 can cause significant performance problems

• MERGE JOIN CARTESIAN + BUFFER SORT – particularly where the estimate is 1. Can be particularly detrimental if the actual rows are greater than 1. Rarely a good operation but can be symptomatic of a missing join.

• Implicit datatype conversions

• Nested loop operations where the inner/probed table/rowsource is a FULL segment scan.

• VIEW operations – symptomatic of a non-mergeable view which may or may not be a problem

• FILTER operations where the row-by-row operation is significant

5.3 Is there anything else to look out for?

Yes, that buffers column is a measure of logical IO.

When comparing different ways of doing things, when tuning SQL, one of the key measures that should be targeted is a reduction in logical IO.

If one approach uses significantly less logical IO compared to another approach then that is significant. The statement with the lower IO is likely to be better, is more likely to benefit from having more of the data it’s interested in cached and is less likely to impact other queries and the caching of other data.

There should probably be a rule of thumb about the ratio of logical IO to rows fetched. The difficulty is picking the right indicators.

If a query selects 100 rows from 100 million buffer gets and those all-important estimates are reasonably accurate, this should be a strong signal that perhaps the indexes are not optimal for that particular query.

As a rule of thumb, a ratio of a couple of consistent gets or less per row is damn good. 100,000s or millions may well be an indicator of significant inefficiencies.

But, as always, it depends.

It also significantly depends on whether the query itself is fast enough for the business requirement and whether it has the potential to impact other users of the database.

Furthermore, one lone query is unlikely to justify a new index but that is beyond the scope of this guide.

5.4 Further Reading

A 10053 Trace provides a very detailed walkthrough of the optimizer’s process of coming up with the execution plan. Not for the faint-hearted.

6 Appendix B Interacting with the database
The Oracle database is a very powerful piece of software.

It’s also likely to be one of the most expensive pieces of software in an application’s tech stack.

The keys to a performant Oracle database also differ significantly from other vendors.

How you do best approach something in Sybase or SQL Server is not necessarily how you should do something in Oracle.

One classic example is the use of temporary tables.

Developers should know how to get the best out of a particular database.

To treat it like a bit bucket or a “slow, dumb backup datastore” is to waste money and resources.

6.1 Vendor-specific Database features

Application developers should not be overly wary of using a feature particular to the Oracle database. Some tools can make it difficult to use vendor-specific features or optimizations but an investment in time and effort to do so can reap significant performance benefits.

Whilst this attitude might be relevant for third-party product developers who have to write software that can be installed on different database vendors, this is largely not true of enterprises writing internal software systems.

It is unlikely that the Oracle database on a particular system will be replaced by another vendor database.

It is far more likely that a Java component interacting with the database will eventually be replaced by a C# component or that the usage of the Oracle database will be deprecated in favour of caching and NOSQL technologies, so if you’re going to use SQL, use Oracle-specific features where they offer benefit.

6.2 Round-tripping

The default fetchsize for JDBC and for SQL*Plus is 10. The default is almost never appropriate for general usage as many SQL statements can be expected to fetch significantly more than 10 rows and therefore significant gains can be made by increasing this setting beyond the default.

The issue is not only about roundtrips across the network, it’s also related to the logical IO that a query needs to do. If you ask for just 10 rows, the database will do all the IO it needs to do to fetch the first ten rows. When you ask for the next 10 rows, the server process on the database might well have to do a logical read of some of the same blocks as the previous fetch which can lead to significant inefficiencies compared to a larger fetchsize.

6.3 Abstraction & Interfacing with the database

Abstraction is a principle that is put on a pedestal in the middle tier and yet often abandoned when interacting with the database.

Put simply if SQL is embedded in Java code then this introduces unnecessary dependencies on the database model and limits the ability to make subtle changes to the SQL or to the model without making a change to the application server code and doing an app server release.

Views, procedures and packages can all provide an interface to the database and the data model.

6.4 It’s all about the data.

Interacting with data appropriately, regardless of database vendor, is crucial.

Think in Sets.

Also consider the success of Engineered Systems like Oracle Exadata.

One of the things that Exadata focuses on, for example, is being able to eliminate redundant data as early as possible.

This means that the logic in the storage cells can eliminate the data before it even gets to the traditional database memory, before it goes anywhere near the network, long before it goes up to the application.

And it can do this with significant degrees of parallelism, usually with far more efficiency than similar processing in application threads.

Why is this relevant?

Eliminate early.

Let the database do the work it was designed to do.

Applications should let the database give them the smallest set of data that they need and should not bring excessive amounts of data into the middle tier for elimination and aggregation there.

Volumes of data are exploding. The best chances of scaling efficiently to deal with these volumes of data are to interact with the data appropriately.

Share this:

Like this:

Related

One Response to Developing for the Oracle Database

Hi Dominic,
Great article that sums up the important areas of Oracle development standards and practices.
I am going to forward this onto my dev teams in the vain hope that they read it and take some notice of it.