2) You are allowed to create a user-defined function named "sin" which can shadow the builtin function.

3) The second invocation of sin() returns the result of running the builtin function. This is because the second invocation is character-for-character identical to the first, so Derby just uses the previously prepared statement.

4) But the third invocation of sin() returns the result of running the user-defined function. Note that the third invocation has an extra space in it, which causes Derby to compile it from scratch, picking up the user-defined function instead of the builtin one.

Rick Hillegas
added a comment - 13/Aug/12 14:47 Behavior is different if you try to override one of the SQL Standard operators like abs. Consider the following script:
connect 'jdbc:derby:memory:db;create=true';
create function abs( a double ) returns double language java parameter style java no sql external name 'FakeSin.sin';
values abs( 0.5 );
values app.abs( 0.5 );
The first invocation of abs() returns the result of the builtin operator. The second returns the result of the user-defined function.

Marking this issue as "deviation from standard". The SQL Standard addresses the ABS case by making ABS a reserved keyword. That is, if Derby followed the Standard in this matter, then you would not be allowed to declare a user-defined function with the name ABS. The ABS case arises in Derby because Derby treats ABS as a non-reserved keyword. Changing ABS to a reserved keyword would create backward-compatibility problems.

Rick Hillegas
added a comment - 13/Aug/12 15:27 Marking this issue as "deviation from standard". The SQL Standard addresses the ABS case by making ABS a reserved keyword. That is, if Derby followed the Standard in this matter, then you would not be allowed to declare a user-defined function with the name ABS. The ABS case arises in Derby because Derby treats ABS as a non-reserved keyword. Changing ABS to a reserved keyword would create backward-compatibility problems.

1) We could make it illegal to declare a function whose unqualifed name would conflict with the name of a builtin Derby function or operator. Existing functions which violate this rule could be marked as invalid.

2) We could make all of these user-defined functions behave like the ABS case: Without a schema qualifier, the name would resolve to the builtin function or operator. To specify the user-defined function, you would have to qualify it with its schema name.

Here is my analysis of how these solutions would affect existing applications:

(1)

+ Existing code which invokes the user-defined functions would flat-out break, forcing the application writer to fix the problem.

Fixing the problem would be a two part process: i) drop the offending function and recreate it with a non-conflicting name, then
ii) update existing invocations of the function across the application.

(2)

Existing code which invokes the user-defined function might silently start using the builtin function/operator rather than the user-defined function. This could produce wrong results.
+ Fixing the problem would just be a matter of adding a schema qualification to existing invocations.

Rick Hillegas
added a comment - 13/Aug/12 20:14 Here are some suggestions for how to address this issue:
1) We could make it illegal to declare a function whose unqualifed name would conflict with the name of a builtin Derby function or operator. Existing functions which violate this rule could be marked as invalid.
2) We could make all of these user-defined functions behave like the ABS case: Without a schema qualifier, the name would resolve to the builtin function or operator. To specify the user-defined function, you would have to qualify it with its schema name.
Here is my analysis of how these solutions would affect existing applications:
(1)
+ Existing code which invokes the user-defined functions would flat-out break, forcing the application writer to fix the problem.
Fixing the problem would be a two part process: i) drop the offending function and recreate it with a non-conflicting name, then
ii) update existing invocations of the function across the application.
(2)
Existing code which invokes the user-defined function might silently start using the builtin function/operator rather than the user-defined function. This could produce wrong results.
+ Fixing the problem would just be a matter of adding a schema qualification to existing invocations.
Preferences? Other ideas? Thanks.

I think I'd prefer option (1). The compatibility issues sound more manageable with that option (a failure is easier to detect than silently changing the results). Also, option (2) opens for the possibility that you can create a function which is shadowed by a built-in function, and if you don't know that there is a built-in with the same name, you may unknowingly call the wrong method in your SQL queries.

Knut Anders Hatlen
added a comment - 17/Aug/12 16:23 I think I'd prefer option (1). The compatibility issues sound more manageable with that option (a failure is easier to detect than silently changing the results). Also, option (2) opens for the possibility that you can create a function which is shadowed by a built-in function, and if you don't know that there is a built-in with the same name, you may unknowingly call the wrong method in your SQL queries.

I think the most likely trouble for existing applications with option 1 would be if someone implemented a function before it was added as a builtin function in Derby later. For example these added in 10.3:https://issues.apache.org/jira/browse/DERBY-1808

Similarly if an application written against 10.9 creates a function by the same name as a builtin function that is added sometime in the future, it would also break when upgraded to the new derby version which has the new builtin function.

I am not sure what the correct answer is but wonder if the standard speaks to this point wrt non-reserved words like SIN and what other database products do. I am concerned about introducing an incompatibility that is not specific to one release but has the potential to create an incompatibility for every function added at a time when the likely affected applications are quite old and possibly don't have developers ready to investigate and fix such an issue.

Kathey Marsden
added a comment - 09/Sep/12 20:49 I think the most likely trouble for existing applications with option 1 would be if someone implemented a function before it was added as a builtin function in Derby later. For example these added in 10.3:
https://issues.apache.org/jira/browse/DERBY-1808
Similarly if an application written against 10.9 creates a function by the same name as a builtin function that is added sometime in the future, it would also break when upgraded to the new derby version which has the new builtin function.
I am not sure what the correct answer is but wonder if the standard speaks to this point wrt non-reserved words like SIN and what other database products do. I am concerned about introducing an incompatibility that is not specific to one release but has the potential to create an incompatibility for every function added at a time when the likely affected applications are quite old and possibly don't have developers ready to investigate and fix such an issue.

The Standard only addresses the situation where the operator/function name is a reserved word. For instance, DATE is a reserved word under the Standard and so an application should not be allowed to create a function called DATE--Derby allows this and this creates a conflict with the builtin Derby DATE operator.

The Standard is silent on operator/function names which are not reserved words. So, for instance, the Standard has nothing to say about whether an application can create a function called ABS or SIN. Note that these are both the names of JDBC escape functions. We would not have a problem with these operator names if Derby only invoked them via JDBC escape syntax.

But like other RDBMSes, Derby can declare its own operators on top of those defined by the Standard. As we go forward, we should be careful to not enlarge the problem described by this JIRA. That is, if we add new operators, we should make sure that they cannot collide with the names of application-defined functions.

I don't see how to address this issue without introducing a backward incompatibility. Regardless of the approach we take, we could add a scrap of upgrade logic which looks for conflicting application-defined function names. If we find a conflict, we could fail the upgrade with a message advising the user that the conflict needs to be fixed first.

Rick Hillegas
added a comment - 10/Sep/12 17:34 The Standard only addresses the situation where the operator/function name is a reserved word. For instance, DATE is a reserved word under the Standard and so an application should not be allowed to create a function called DATE--Derby allows this and this creates a conflict with the builtin Derby DATE operator.
The Standard is silent on operator/function names which are not reserved words. So, for instance, the Standard has nothing to say about whether an application can create a function called ABS or SIN. Note that these are both the names of JDBC escape functions. We would not have a problem with these operator names if Derby only invoked them via JDBC escape syntax.
But like other RDBMSes, Derby can declare its own operators on top of those defined by the Standard. As we go forward, we should be careful to not enlarge the problem described by this JIRA. That is, if we add new operators, we should make sure that they cannot collide with the names of application-defined functions.
I don't see how to address this issue without introducing a backward incompatibility. Regardless of the approach we take, we could add a scrap of upgrade logic which looks for conflicting application-defined function names. If we find a conflict, we could fail the upgrade with a message advising the user that the conflict needs to be fixed first.
Thanks,
-Rick

The functions that can be shadowed live in the SYSFUN schema and can be invoked with a qualified name, like this:

values sysfun.sin(0.5);

See the discussion about SQL-path in DERBY-6362. I suppose Derby's current behaviour is mostly consistent with what you'd expect if CURRENT_PATH = (CURRENT_SCHEMA, SYSFUN). What does not seem to be as expected, I think, is that

values sin(0.5)

has different behaviour depending on whether it is actually compiled or just fetched from the internal statement cache. The presence of an internal statement cache should be invisible to the applications. The resolution of the unqualified function name should happen when the application prepares the statement and not be influenced by identical-looking, previously compiled statements.

Knut Anders Hatlen
added a comment - 26/Nov/13 12:57 The functions that can be shadowed live in the SYSFUN schema and can be invoked with a qualified name, like this:
values sysfun.sin(0.5);
See the discussion about SQL-path in DERBY-6362 . I suppose Derby's current behaviour is mostly consistent with what you'd expect if CURRENT_PATH = (CURRENT_SCHEMA, SYSFUN). What does not seem to be as expected, I think, is that
values sin(0.5)
has different behaviour depending on whether it is actually compiled or just fetched from the internal statement cache. The presence of an internal statement cache should be invisible to the applications. The resolution of the unqualified function name should happen when the application prepares the statement and not be influenced by identical-looking, previously compiled statements.

I'm not sure. Normalizing statements before storing them in the cache would probably mean that we'd have to run them through parsing and binding, which is what the cache is there to avoid in the first place. Perhaps we could find a way to have CREATE FUNCTION evict affected statements from the cache if a function that shadows SYSFUN is created. Then we'd limit the performance hit to CREATE FUNCTION time and avoid the extra cost on each cache lookup.

One rather broad fix, that I think is possible in the current code without adding any new dependency tracking or parsing, is to make CREATE FUNCTION invalidate all statements in the cache whose original compilation schema is the same as the schema of the shadowing function. It is broader than it needs to be, since we only need to invalidate those statements that have unqualified invocations of the affected function. But if it is limited to only those CREATE FUNCTION calls that actually create a shadowing function, it might be an acceptable price to pay, and better than building something complex in order to improve the performance of the statement cache in some rare edge cases. (Maybe these cases are so rare that it would be acceptable to make it even simpler: clear the entire statement cache each time a shadowing function is created.)

Knut Anders Hatlen
added a comment - 01/Dec/13 17:04 I'm not sure. Normalizing statements before storing them in the cache would probably mean that we'd have to run them through parsing and binding, which is what the cache is there to avoid in the first place. Perhaps we could find a way to have CREATE FUNCTION evict affected statements from the cache if a function that shadows SYSFUN is created. Then we'd limit the performance hit to CREATE FUNCTION time and avoid the extra cost on each cache lookup.
One rather broad fix, that I think is possible in the current code without adding any new dependency tracking or parsing, is to make CREATE FUNCTION invalidate all statements in the cache whose original compilation schema is the same as the schema of the shadowing function. It is broader than it needs to be, since we only need to invalidate those statements that have unqualified invocations of the affected function. But if it is limited to only those CREATE FUNCTION calls that actually create a shadowing function, it might be an acceptable price to pay, and better than building something complex in order to improve the performance of the statement cache in some rare edge cases. (Maybe these cases are so rare that it would be acceptable to make it even simpler: clear the entire statement cache each time a shadowing function is created.)

There's also the question what to do with the ambiguous function names in triggers, views, generation clauses and similar, where we store the original SQL text in system tables. Currently, they will resolve to whatever was the correct function at some ill-defined point in time (such as the last time the trigger SPS was recompiled). They should resolve to whatever was the correct function at definition time.

DERBY-6362 has some code attached to it, which replaces unqualified identifiers with schema-qualified identifiers in CHECK constraints before storing them. Although it's not the primary purpose of DERBY-6362, it has the nice side effect that CHECK constraints will preserve the schema that the function resolved to at the time the constraint was created. I think that code is general enough to be used in triggers, views and generation clauses as well.

Knut Anders Hatlen
added a comment - 02/Dec/13 14:45 There's also the question what to do with the ambiguous function names in triggers, views, generation clauses and similar, where we store the original SQL text in system tables. Currently, they will resolve to whatever was the correct function at some ill-defined point in time (such as the last time the trigger SPS was recompiled). They should resolve to whatever was the correct function at definition time.
DERBY-6362 has some code attached to it, which replaces unqualified identifiers with schema-qualified identifiers in CHECK constraints before storing them. Although it's not the primary purpose of DERBY-6362 , it has the nice side effect that CHECK constraints will preserve the schema that the function resolved to at the time the constraint was created. I think that code is general enough to be used in triggers, views and generation clauses as well.

In the approach used in DERBY-6362, it means that the SQL text of a CHECK constraint is expanded before it is put into a column in SYS.SYSCHECKS. That is, if you execute "create table(x int, check(x < sin(pi())))", it would previously store 'check(x < sin(pi()))' in SYS.SYSCHECKS.CHECKDEFINITION, and now it will store 'check(x < "SYSFUN"."SIN"("SYSFUN"."PI"()))' instead.

We could do the same with the SQL text that we put into SYS.SYSTRIGGERS.TRIGGERDEFINITION, SYS.SYSTRIGGERS.WHENCLAUSETEXT, SYS.SYSVIEWS.VIEWDEFINITION and SYS.SYSCOLUMNS.COLUMNDEFAULT. As an extra bonus, it will also solve DERBY-6370 for us.

Knut Anders Hatlen
added a comment - 04/Dec/13 08:15 In the approach used in DERBY-6362 , it means that the SQL text of a CHECK constraint is expanded before it is put into a column in SYS.SYSCHECKS. That is, if you execute "create table(x int, check(x < sin(pi())))", it would previously store 'check(x < sin(pi()))' in SYS.SYSCHECKS.CHECKDEFINITION, and now it will store 'check(x < "SYSFUN"."SIN"("SYSFUN"."PI"()))' instead.
We could do the same with the SQL text that we put into SYS.SYSTRIGGERS.TRIGGERDEFINITION, SYS.SYSTRIGGERS.WHENCLAUSETEXT, SYS.SYSVIEWS.VIEWDEFINITION and SYS.SYSCOLUMNS.COLUMNDEFAULT. As an extra bonus, it will also solve DERBY-6370 for us.