To parse a statement and then execute it and possibly fetch all rows, there are additional forms. For large queries, these may not be suited; for small queries, they come in handy. One form is to fetch, one to simply execute:

DESCRIPTION

The purpose of this module is to provide a means to make SQL injections totally impossible, and to provide an easy, native SQL interface in Perl. These two goals go hand in hand: by embedding a full SQL parser in the Perl compiler, forcing proper quotation is easy.

This package also provides basic compile-time syntax checking of SQL.

Currently, the major goals are security and ease of use, rather than completeness or efficiency. We'll add more SQL syntax over time to make this more and more complete. So for some things, you'll still need the raw DBI interface.

Initialisation

This package needs a way to quote identifiers and values for proper SQL output. One way of providing this is by a DBI handle:

my $dbh;
use SQL::Yapp dbh => sub { $dbh };

In libraries, where you only what to parse SQL, you do not need to specify the DBI handle, but you may simply use:

use SQL::Yapp;

This enables the compile-time SQL parser for the given file. You only need to specify the the DBI handle if you want to stringify the parsed SQL objects. The SQL parser works without it.

You can set and change the link later, too:

SQL::Yapp::dbh { $dbh };

The current DB handle can be queried using the get_dbh function:

my $dbh = SQL::Yapp::get_dbh()
$dbh->prepare(...);

By settings a DBI handle, the library auto-implements the two required quotation functions, one for values and one for identifiers. Instead of passing the DBI handle reference, you can alternatively implement your own quotation functions:

Additional to auto-setting the quote functions, setting the DB handle enables the special execution forms sqlDo{...} and sqlFetch{...}, which are only supported if the DB handle is set.

A fancy package option is 'marker', which defines to which string the package reacts in your Perl script. The default is 'sql', so sql{...} encloses SQL blocks. You might want to use something different:

use SQL::Yapp marker => 'qqq';

Now, your SQL commands need to be embedded in qqq{...} instead. The prefix is used for other kinds of embedding, too, e.g. qqqExpr{...}.

You cannot dynamically change the marker, but only set it in the package initialisation, because it is needed at compile time.

You may pass these names in the use clause to import them just like in the initialisation of any other module:

use SQL::Yapp qw(quote_identifier);

You may also mix this with initialisation described above:

use SQL::Yapp qw(quote_identifier), marker => 'qqq';

All the exported functions get/set parameters of the library and their values can be set in the use clause already just like in the above examples. The xlat_* function family is described in Section "Identifier Name Translation".

The dialect option is an abbreviation for using read_dialect and write_dialect with the same value.

The read_dialect defines from which SQL dialect to accept incompatible extensions (compatible extensions are always accepted and normalised).

The write_dialect defines for which dialect to produce output. Note different quotation methods are automatically handled by the DBI driver, so for that, no change to write_dialect are necessary. But this option is about more non-trivial, additional syntax changes. Currently the following dialects are known:

'mysql'
'postgresql'
'oracle'
'std'
'generic'

'generic' means to try to please everyone while 'std' means to try to please no-one, i.e., to stick to the standard. :-)

The read_dialect option must be given in initialisation, because they must be known at compile time. The write_dialect option may be set before SQL expressions are evaluated (and thus stringified into SQL syntax).

For information, what normalisation is done, please refer to Section "Normalisation".

Finally, there's the debug boolean option, which, when set to 1, will dump the compiled code to STDERR. This is for developers.

For programs that do not know in advance how to connect to SQL, it is also infeasible to set dbh in the use clause. The SQL parser/preprocessor of the library still works, so you can do without problems:

use SQL::Yapp;
sub get_select()
{
return sql{ SELECT * FROM mydb };
}

Without setting the DB handle, the expressions the preprocessor generates cannot be stringified and executed, because the library does not know how to quote properly.

Basic Syntax and Usage

The embedded SQL syntax is based on normal SQL syntax, with interpolations of Perl values made easy. In Perl, an SQL expression is enclosed in an sql{...} block, like so:

my $query= sql{SELECT foo FROM bar};

The result is a list of blessed references, enclosed in a do{...} block.

The above $query automagically stringifies to SQL syntax when embedded in a string, e.g.:

"$query"

will return a string suitable for DBI interface. So you can use this with DBI as follows:

my $dbq= $dbh->prepare(sql{SELECT foo FROM bar});

Note again: the result of sql{...} is a list. So if you have multiple statements in your sql{...} block, you get multiple results. This way, the structure embeds nicely into Perl using Perl native concepts:

my @query= sql{SELECT foo FROM bar ; SELECT foz FROM baz};

In this example, @query has 2 elements, each an SQL select statement object. It is effectively the same as:

The SQL::Yapp syntax is a bit different from standard SQL syntax. The most important thing is that table and column names can only be lower case, unless escaped with ``, and that keywords must be upper case, and that all literal values and comments use Perl syntax instead of SQL syntax.

As already mentioned, an sql{...} block expands to a do{...} block. This is important in some places and was mainly implemented this way because of the similar look of sql{...} and do{...}. It has some consequences:

Firstly, you cannot directly index the result but must put parentheses around the block do to that:

my $second= (sqlExpr{ 1, 2, 3})[1];

Secondly, you can use sql{...} in places where you might otherwise get a surprising effect if we had used parentheses for enclosing, e.g.:

my @q= map sql{ SELECT .$_ FROM tab }, @col;

With parens, map (...), @col would produce a syntax error, but map do{...}, @col is fine.

Tokens

UPPER_CASE

SQL keywords and function names:

SELECT, FROM, MAX, SUM, ...

lower_case

Names: tables, columns, variables, you name it:

client_address, surname, ...

CamelCase

Type names that may precede Perl interpolations to indicate the intended item they store. Usually, such a type is inferred from context, but sometimes they are needed:

SELECT Column $a FROM ...

Other Identifier

Any other identifier with mixed case (or no letters at all) will raise a syntax error.

backquoted string: `...`

Quoted name, no escape characters allowed, may not contain newlines. Needed for names (tables, columns, etc.) that are not all lowercase.

Numbers

Numbers basically use Perl syntax: 99, 0xff, 077, 0b11, 0.9e-9

Strings

Again, these use Perl syntax. Singly and doubly quoted strings are supported. Backquoted strings are not directly supported, because they are used for identifiers already and are needed by SQL directly. You can use {`...`} instead, however.

'blah'

Singly quoted string.

"blah"

Doubly quoted string.

Symbols

These SQL syntax elements act like keywords, but are symbolic:

(, ), {, }, ...

Comments

The syntax is the same as in Perl:

# this is a comment

Deliberate Restrictions

Identifiers containing $ or # characters must be quoted with `...`. This is because $ and # interfere with Perl syntax.

In an ExprAs object, 'AS' is mandatory, just like in good SQL programming practice:

Bad here (but works in plain SQL)

SELECT a b FROM c;

Good

SELECT a AS b FROM c;

Table and column names may not contain newline characters.

Differences

As mentioned above, strings, numbers and comments follow Perl syntax. This change was done for more elegent embedding of SQL into Perl. It also helps syntax highlighting... E.g., you can naturally use string interpolations, e.g.:

Extensions

LIMIT clauses are parsed in both MySQL and PostgreSQL format and always generated in PostgreSQL format, i.e.:

sql{SELECT ... LIMIT 5, 2}

will stringify as:

SELECT ... LIMIT 2 OFFSET 5

Missing Error Checking

In some places, this package does not fully check your SQL code at compile time, usually for two reasons: (a) to make the code of the preprocessor easiler, (b) to keep the number of possible syntax structures and object types low for the user.

Column/Row Functions In Expressions

Expressions allow more or less types depending on where they are used. The SQL grammar distinguishes them accordingly, but we do not. E.g. count(*) cannot use in SQL in a WHERE clause, but we don't check that but leave it to your data base server.

I am sure there's more that could be documented here.

Immediate Execution

SQL commands can be immediately prepared and executed, and possibly fetched. This is a similar simplification as with DBI's selectall_arrayref() etc. functionality. This module introduces two blocks for that: sqlDo{} and sqlFetch{}.

With sqlDo{}, any statements can be just executed, without any return value:

my %newentry = (...);
sqlDo{
INSERT INTO table SET %newentry
};

With sqlFetch, all rows can be immediately retrieved. There are two possible conversions for each row: hash or scalar. A hash conversion is selected by default, making this like fetching all rows using fetchrow_hashref() when using DBI. E.g., to read a whole table, you could use:

my @table = sqlFetch{ SELECT * FROM table };

No prepare() or execute() is necessary, so this makes for very concise code. If only one column is selected, then each row is returned as a scalar, instead of as a hashref:

my @id = sqlFetch{ SELECT id FROM table };

The distinction is made automatically. There is currently no way to force one or another row conversion. The scalar conversion is selected only if the query clearly returns only one column, which needs to be visible without looking deeply into any embedded Perl interpolations.

Scalar conversion is used if a single column is explicitly given in SQL:

Hashref conversion is used if a non-scalar Perl interpolation is found:

SELECT .@a FROM

Hashref conversion is used if a complex and/or intransparent (to this module) Perl interpolation is found, regardless of the actual number of columns selected:

SELECT {$a} FROM

In this case, a human sees only one column, but {} is intransparent to this SQL module, so it assumes a non-trivial case.

The distinction of whether hash or scalar conversion is used is purely syntactical and statically done at compile time, regardless of the actual columns returned by using complex embedded Perl code.

If the returned list is evaluated in scalar context, then, as usual, the module assumes that exactly one result is wanted. A multi-line result will cause an error. Allowing scalar context is especially handy when retrieving a single value from the data base:

my $count = sqlFetch{ SELECT COUNT(*) FROM table };

Due to COUNT(*) being the only column specification, scalar row conversion is selected. And since $count is scalar, sqlFetch is evaluated in scalar context, and returns the single row. Together, the behaviour is what is probably expected here.

Perl Interpolation

The basic construct for embedding Perl code in SQL is with braced code blocks:

sql{
SELECT foo FROM bar WHERE
{ get_where_clause() }
}

Interpolation of Perl is triggered by $, @, %, "..." and {...} in embedded SQL. The syntax of such expressions is just like in Perl. All but {...} behave just like Perl; {...} is not an anonymous hash, but equivalent to a do{...} block in Perl. Inside "..." strings, you can also use Perl interpolation.

When parsing SQL expressions (i.e., values), it is unclear whether a string or a column is used. In that case, a string is used. If you mean to interpolate a column name, use a single dot in front of your interpolation (this single dot is special syntax, and the final SQL string will not contain that dot, but be proper SQL syntax):

It is impossible to interpolate raw SQL in a string with this module, since everything is parsed and thus syntax-checked. That's the whole point: we want guarantees that SQL injections are impossible, so we won't jeopardise this by letting arbitrary raw strings to be injected.

However, this module allows fully recursive embedding, i.e., it allows the use of sql{ ... } within the embedded Perl code. Like so:

All sql{...} blocks inside the embedded Perl code will not parse a statement list, but an expression, because the {...} is inside the 'WHERE' clause. This means that sql{...} is context-dependent.

On top-level, sql{...} it is equivalent to sqlStmt{...}. In the example above, it is equivalent to sqlExpr{...}, because it is inside a WHERE clause, where expressions are expected. You can construct SQL expressions, too, by changing the default:

Note: Type checking of interpolations will be done at run-time. So the following only fails at run-time, not compile time:

In the above example, this module has no way of knowing: (a) that the ?: operator yields inconsistent kinds of SQL things, (b) that the embedded Perl expression may return sqlStmt. So the case that sqlStmt is returned only fails at run-time.

Actually, this means that you have the same dynamic type checking that Perl has. The above only fails when $is_large is true. And the following is, maybe surprisingly, correct (both for for true and false values of $is_large):

Depending on context, embedded Perl is evaluated in scalar or in list context. Inside SQL lists, the embedded block will be evaluated in list context:

my $q= sql{
SELECT { code1 }
};

code1 will be evaluated in list context, and each result will be one value of the SELECT statement.

Furthermore, arguments of some binary operators, namely +, *, AND, OR, XOR, ||, and arguments to any function are evaluated in list context. Each element of the list becoming one operand:

my @a= (1,2,3);
my $q= sql{
SELECT 0 + @a
}

This selects 0+1+2+3.

Finally, in the above positions, many unary operators, namely -, NOT, and any operator starting with IS ..., will 'pass-through' list context, and are evaluated point wise. E.g.:

my $q=sql{
SELECT 0 AND NOT(@a)
};

This will become 0 AND (NOT 1) AND (NOT 2) AND (NOT 3).

In all other situations, values are evaluated in scalar context. Here's an example of scalar context:

my $q= sql{
SELECT name AS { code2 }
};

Here, code2 will be evaluated in scalar context, because only one single identifier can be used in the AS clause.

Some list interpolations allow syntactic hashes and then do something special with them. This means that hashes usually behave differently in list context depending on whether you write them as %a or { %a }. The former may have special meaning to embedded SQL (see below), while the latter has Perl meaning, listing the hash as a list, interleaving keys and values.

Note that in contrast to Perl, syntactic arrays are not allowed in scalar context. E.g. the following code is wrong:

Be advised to use the typecast Join before Join interpolations, because there is no single keyword to start the block of JOIN clauses in an SQL statement, so you might run into ambiguities. Using Join makes the situation unambiguous to the parser.

If you want to multiply nothing else but the values in @a, simply use an empty list to construct the syntactic context needed for the operator:

{} * @a

This expands to:

1 * 2 * 3

This interpolation is especially handy for constructing WHERE clauses with the AND operator, e.g.:

WHERE {} AND %cond

With %cond=( a => 1, b => 2, c => 3 ), this expands to:

WHERE a = 1 AND b = 2 AND c = 3

All of these functions also work with zero parameters, i.e., {}*{} and {}AND{} will expand to 1, while {}+{} and {}OR{} will expand to 0.

For AND and OR, this module provides convenience prefix versions, because they are relatively frequent in WHERE clauses. These prefix versions expand to normal infix notation, so the following produce equivalent SQL code:

Such prefixing abbreviations are not supported for symbolic operators + and -, because they are predefined prefix operators and have had a different context in previous versions of this module. For *, we do not provide such a special form for symmetry with +.

As usual, unary operators that support interpolation can be combined with this:

WHERE AND NOT { 1, 2, 3 }

expands to

WHERE (NOT 1) AND (NOT 2) AND (NOT 3)

Also note that unary operators followed by ( parse as a function call. This means that in list context, a unary prefix operator can actually be invoked with several arguments. These will be applied point-wise, just like in Perl interpolation. So what may intuitively look very similar, is not so internally, but still does the same thing:

WHERE AND NOT (1,2,3) # NOT has multiple arguments, which are
# applied point-wise, because NOT
# is in list context (from the AND)

and

WHERE AND NOT {1,2,3} # NOT has a single argument, a Perl
# interpolation that returns multiple
# values.

Both do the same:

WHERE (NOT 1) AND (NOT 2) AND (NOT 3)

The following is an error, because NOT only takes on argument:

WHERE NOT (1,2,3) # NOT is in scalar context and thus cannot
# take multiple arguments.

Hash Interpolation in Expressions

As alreay indicated, if hash interpolation is used in expression list context, such hashes are turned into lists of equations. The hash keys will be quoted with quote_identifier(), the hash values may be one of the things described above. Each key-value pair will expand to an expression:

`key` = value

This kind of interpolation is especially handy together with operators like AND and + which allow list context expansion, as described in the previous section. For example:

Because + is also a list-context infix operator, and because its purpose as prefix operator is very limited, it was felt that it is too confusing to let it operate point-wise. So the following is an error:

It makes no sense to apply AS name to both elements of @col, so it is not allowed. Without AS, the clause does support array interpolation, of course:

my $q=sql{
SELECT .@col # <--- OK, will become: SELECT `x`, `y`
};

Type Interpolation

Types can be stored in Perl variables:

my $t1= sqlType{ VARCHAR(50) };

Types can be easily extended:

my $t2= sqlType{ $t1 CHARACTER SET utf8 };

This is equivalent to:

my $t2= sqlType{ VARCHAR(50) CHARACTER SET utf8 };

You can also remove specifiers, i.e., do the opposite of extending them, with a syntax special to this module starting with DROP:

my $t1b= sqlType{ $t2 DROP CHARACTER SET };

This makes $t1b the same type as $t1.

To allow modification of types already constructed and stored as a Perl object, type attributes or base types can be changed by simply listing them after the base type or interpolation. Any new value overrides the old value, e.g. to change the size:

my $t3= sqlType{ $t1 (100) };

This is equivalent to:

my $t3= sqlType{ VARCHAR(100) };

You can even change the base type, keeping all other attributes if they are sensible. Any attributes not appropriate for the new base type will be removed:

my $t4= sqlType{ $t2 DECIMAL };

This is equivalent to:

my $t4= sqlType{ DECIMAL(50) };

The character set attribute has silently been removed. If you change the base type again, it will not reappear magically.

my $t5= sqlType{ $t4 CHAR };

This is equivalent to:

my $t5= sqlType{ CHAR(50) };

Note how the character set was removed.

In list context, modifications made to an array Perl interpolation will affect all the elements:

The are two types of Column interpolations: one element vs. multi element.

For one element Columns, embedded Perl code may return sqlColumn{...} objects or strings:

my @col= ('name', sqlColumn{age});
my $q= sql{
SELECT .@col
};

The above prefixed . is syntactic sugar. More generally, you can esplicitly request expansiong of @col as Column objects:

my $q= sql{
SELECT Column @col
};

For multi element Columns, only strings or sqlExpr{*} (in Perl, there is the constant SQL::Yapp::ASTERISK for this) are allowed in an interpolation, be cause a column specification cannot be qualified further:

GROUP BY / ORDER BY Interpolation

In any place where a list of order clauses can be listed inside a GROUP BY or ORDER BY clause, the whole clause is dropped if the list is empty:

my @a= ();
my $q= sql{
SELECT foo FROM bar GROUP BY @a;
};

This will expand so something like:

SELECT foo FROM bar

Perl interpolation of strings, except "..." interpolation, generates column names instead of plain strings in order position. To force interpretation as a string, use "..." interpolation. This is different from Expr, which defaults to string interpretation and needs you to use a single dot to force column name interpretation. Compare the following examples:

Interpolation In ASC/DESC Clause

If an ASC/DESC keyword follows a list interpolation, it is used for each of the elements of the list. For example:

my @col= ('x', 'y');
my $q=sql{
SELECT ... ORDER BY @col DESC
};

This is valid (if you fill in valid code for ...) and similar to:

my $q=sql{
SELECT ... ORDER BY x DESC, y DESC
};

This even works if the elements are themselves Order objects that carried an ASC or DESC modifier: the direction will either be kept (in case of an additional ASC) or swapped (in case of an additional DESC):

LIMIT Interpolation

Perl code in LIMIT clauses may return a number or undef. Specifying an offset but no count limit is not directly supported, so we will generate a very large count limit in that case, hoping that the data base server can handle that. Example:

Would prefix all table names with foo_. I mean every table name, mind you. Because the library knows about the whole SQL structure and parses everything, the quotation works throughout: for literal as well as all Perl interpolations. For example:

my $q= sql{
SELECT name FROM customer
};

This would be expanded (depending on how quote_identifier() quotes) similar to:

SELECT `name` from `foo_customer`;

The same is achieved with the following:

my $table= 'customer';
my $q= sql{
SELECT name FROM $table
};

Note that the package cannot distinguish aliases and real table names, so the following is modified more than you might expect (which usually does not hurt, but you should know):

my $q= sql{
SELECT c.name FROM customer AS c
};

This results in:

SELECT `foo_c`.`name` FROM `foo_customer` as `foo_c`

You can specify such modifications in the use statement already:

use SQL::Yapp xlat_table => sub { 'foo_'.$_[0] };

Simple prefixing can be achieved by convenience options for columns, tables, schemas, and catalogs, so you don't need to use xlat_ options, but can write more readably:

It is important to set the check_identifier function as early as in the use statement, because it is invoked at compile-time. Setting it afterwards is possible, but only allows run-time checks (which must be explicitly enabled, see below). You can set the function later by invoking:

SQL::Yapp::check_identifier { ... };

For columns, the function will be invoked with five parameters, namely:

$check_identifier->('Column', $catalog, $schema, $table, $column)

The $catalog and $schema will be undef if unspecified. For unqualified columns (i.e., without explicit table name), the $table parameter will be either undef, if no possible table is known, or $table will be an array reference with all tables that might contain the column.

If the column is *, this function will not be invoked.

For identifiers other than columns, the functions will be invoked with only four parameters, the first being the kind of identifier (in the same syntax as the name after the sql...{...}, e.g., Table, Index, Constraint, CharSet, etc.) followed by the schema-qualified identifier, again using undef for unqualified parts:

$check_identifier->($kind, $catalog, $schema, $identifier);

For example, for a table:

$check_identifier->('Table', $catalog, $schema, $table_name);

By default, only compile-time checks are performed. You can request run-time checks, too, so that all identifiers are checked, including those interpolated from Perl code, which is not seen at compile-time, of course. Run-time checks are enabled by setting the runtime_check flag to one, either early:

use SQL::Yapp
runtime_check => 1,
...;

or later:

SQL::Yapp::runtime_check(1);

Both involved functions can also be imported:

use SQL::Yapp
...,
qw(check_identifier runtime_check ...);

Note that the package does not (yet) understand the SQL syntax good enough to infer possible tables for columns, so we never pass an array ref. But be prepared for it to avoid being surprised in a later version of the library.

Normalisation

We currently don't do very much to normalise the SQL syntax so that it works for multiple data bases no matter how you write your query. What we do is listed in this section.

MySQL allows you to specify DELETE statements with a different syntax, listing some tables before FROM and some after it. This syntax is rejected. You are forced to write this with a USING clause. This is normalisation by forcing good upon the user.

MySQL has an extension in the INSERT statement that allows the use of SET instead of VALUES. I personally find this much more natural and more easy to read and maintain than the normal syntax where column names are separated from their respective values.

For this reason, this syntax is allowed although there is a portable alternative. If you pass a single hash table to the SET clause, it will be normalised to the standard form, e.g.:

my %a= ( a => 5, b => 6 );
my $q= sql{
INSERT INTO t SET %a
};

will be normalised to (the column order may be different, depending on Perl's mood of enumerating the hash table):

INSERT INTO `t` (`a`,`b`) VALUES (5,6);

You can do all kinds of fancy things and the transformation will still work:

In short, you can freely use INSERT ... SET even for data base servers that only support INSERT ... VALUES.

Operator/Function Normalisation

POW() and ** will be normalised to POWER.

|| will be normalised to CONCAT if write_dialect == 'mysql'. And vice versa: CONCAT will be translated to || in any dialect but mysql.

Note: There is no way concatenation can be normalised so that it works automatically in all common SQL dialects. One way to make mysql more conformant is to switch the server to ANSI mode.

MySQL and has an extension to parse &, |, and ^ as bit operations like C. In Oracle, on the other hand, there is BITAND, but with a slightly different semantics. To ease porting, the C operators are converted to BITAND, BITOR, BITXOR for Oracle, and vice versa for MySQL. You need a bit of more work (function definitions) for this to work in Oracle, however, but it is a start.

Manual Parsing

Sometimes you may want to parse SQL structures at run time, not at compile time. There is a function parse to do this. Its invocation is straight-forward:

my $perl= SQL::Yapp::parse('ColumnSpec', 'VARCHAR(50) NOT NULL');

The result of this function is a string with Perl code (this is what the compiler needs, so this is what you get here). To create an object, you need to evaluate this:

my $obj= eval($perl);

This $obj behaves exactly like a structure created at compile time, e.g. the following creates the same object:

my $obj2= sqlColumnSpec{VARCHAR(50) NOT NULL};

Neither for parse() nor for eval() you will need the DBI link (the dbh module option). Only if you stringify the object, you will need it.

SYNTAX

In the following sections, the supported syntax is listed in detail. A BNF variant is used to represent the syntax, and most people will probably find it intuitiv without further explanation. Still, here are some explanations.

The ::= operator is left out. Instead, the previous headline defines what is currently defined.

CamelCase identifiers (and maybe a little more) in pointed brackets are non-terminals and refer to other rules:

<Join>
<SELECT Stmt>

Plain English text in pointed brackets is a terminal that is informally explained by that text:

<a number in Perl syntax>

Optional parts:

[ ... ]

Optional parts that, depending on other syntax elements or other constraints, may even be forbidden, and need further clarification to fully describe the syntax:

[ ... ]?

Alternatives:

A | B | C

Grouping, for example together with a list of alternatives:

A ( B | C )

Literal parenthesis also form groups:

'(' ... ')'

Sequences that contain one item A or more:

A ...

Sequences with comma separator that contain once item A or more times:

A , ...

Note that this grammar allows redundant commas or other separators in all lists except after the last element. Lists delimited with parentheses even allow redundant commas after the last element before the closing parenthesis. Such lists are rectified and printed in proper SQL syntax. This is such a common typo, especially in CREATE TABLE statements, that it was felt it should be tolerated.

<UPDATE Stmt>

This is a blend of MySQL and PostgreSQL syntaxes in order to support both. MySQL allows multiple tables to be updated in one statement and does not support the FROM clause, while PostgreSQL allows only one table but supports FROM. No normalisation is provided (any attempt would be messy), so you must use the appropriate syntax for your DB.

ONLY is for PostgreSQL only, while IGNORE, LIMIT, and ORDER BY are MySQL.

MODIFY and CHANGE are MySQL extensions, which does not know about some of the ALTER COLUMN stuff, which PostgreSQL uses. It is almost impossible to specify something useful that's understood by both DB systems, it seems.

Other MySQL extensions: <ColumnPos>, multi-column syntax.

Other PostgreSQL extensions: TYPE and a few other things this module does not yet support.

When producing SQL, this package always qualifies a JOIN with exactly one of CROSS, UNION, INNER, LEFT, RIGHT, and FULL, with the exception than instead of NATURAL INNER JOIN, the specification NATURAL JOIN is printed for further compatibility (e.g. with MySQL).

Note that MySQL, INNER JOIN and CROSS JOIN are not distinguished, but in standard SQL, they are. This module forces you to write more portable SQL: use INNER JOIN if there is an ON clause, and CROSS JOIN if not.

<Order>

( <Column>
| <Expr>
| [ 'Order' ] <Perl> )
[ ASC | DESC ]

Note that a string returned from a Perl interpolation is parsed as a column name in <Order> position, but a plain string is parsed as a plain string.

<TableName>

<Table>

Table specifications are constructed by maximally three components, the last of which is the table name, the last-but-first is the schema, the last-but-second is the catalog. All but the table name are optional.

Text::Balanced::extract_variable() is used for extraction of the sigil tokens, Text::Balanced::extract_delimited() is used for extracting the strings, and Text::Balanced::extract_codeblock() is used for extracting Perl code enclosed in braces.

The package distinguishes the different interpolation forms in context, e.g. might handle hashes differently from arrays, or produce error message for inappropriate literals but not for others. For this reason, Perl casts are handled as well:

${ ...PerlCode... }
@{ ...PerlCode... }
%{ ...PerlCode... }

Decimal numbers are not modified but parsed as strings so that arbitrarily large numbers are supported as literals.

As mentioned already, literal constant values have Perl syntax, and Perl interpolations are allowed at any place.

SQL has a few special literals that are always recognised although they may be semantically or even syntactically misplaced. SQL will tell you, this package does not check this.

In order to support all the functions and operators of any SQL dialect that might be used, expression syntax in general does not follow Perl, but SQL syntax. Otherwise, it would be necessary to translate Perl to the SQL dialect in use, but this package is not mainly meant to normalise SQL, but to embed whatever dialect you are using into Perl, making injections impossible, and thus making SQL usage safe. See "Normalisation".

So this package tries to parse SQL expressions with as little knowledge as possible. This means sacrificing early error detection, of course: many syntax errors in expressions will only be found by the SQL server. We only parse as much as to ensure easy and safe Perl code interpolation.

However, this package assigns no precedence to any of the operators, meaning you have to use parenthesis. This was done for two reasons: (1) to find bugs, (2) to handle = uniformly in UPDATE...SET and SELECT statements: in the former, = has very low precedence and is an assignment operator, while in the latter = has medium priority and is an equality operator. We would like to handle the two uniformly so that you can write:

This is especially interesting for handling hash interpolation uniformly in these two cases. It was felt that the exact precedence order of SQL is a mystery to many Perl programmers anyway (as is the precedence of the operators in Perl itself :-P), so using parens wasn't felt too high a price to pay. (There's a hack to enable some precedence parsing for the most common operators, but that's kept a secret until enough people complain.)

Known known associative and commutative operators may be used in sequence without parenthesis.

To make life easier for Perl programmers, the == and != operators are recognised as aliasses for = and <>, resp. There are no aliasses for && and ||, because || has a special meaning in standard SQL, namely string concatenation.

Any unrecognised keywords and symbols have a default behaviour when parsing embedded SQL: they are functors:

Missing: PostgreSQL ROW(<Expr> , ...) or simply (<Exp> , ...) values. The former works, because ROW is an unknown keyword and thus is treated like a function call, which actually produces the right result here, although it's a constructor term.

Missing: <Column>.<Field> This is tricky because the package doesn't really understand identifier chains, so it treats the last component as a column name, the second-to-last as a table name, etc. Specifying a field name will disturb and counting and the result will be wrong. (Often, you won't notice, but it will be wrong regardless.)

<Prefix>

+
| -
| NOT
| ANY | SOME | ALL
| AND | OR

ANY, SOME, and ALL must follow a comparison operator and must precede a subquery.

AND, OR as prefix operators are extensions for easy Perl interpolation.

So in contrast to SQL, attributes and even base types can be mixed and given in any order after an initial base type or Perl interpolation. The order will be normalised when printing SQL, of course. The reason why this change was made is that it allows the modification of types stored in Perl. See "Type Interpolation".

Not all combination of type attributes are accepted. What's accepted depends on the read-dialect used. Check your SQL manual for details.

<Transcoding>

IMPLEMENTATION

The lexer uses the Text::Balanced package for extracting embedded Perl.

The parser for SQL statements is a hand-written recursive descent parser. The lexer shifts pos() along as it scans the text, so the interface is well-suited for Text::Balanced. The author particularly likes pattern matching with m/\G.../gc.

SEE ALSO

There is a similar, smaller module that also uses source filtering: SQL::Interpolate::Filter. Similar to this module, it is activated by quotelike syntax, and it also uses the 'sql' prefix. (This is a coincidence, but then, what other prefix would you naturally use?) It is somewhat different from our approach, as it replaces variable names with bind places and does not really parse the complete SQL syntax, so it only handles values, not, say, column names, join clauses, etc.. It also only parses complete statements -- the parser cannot be requested to parse a single expression only.

Another module that uses source filtering is SQL::PreProc. It uses a different approach and allows using SQL statements directly in source code, without special markup around it. This leads to a different programming paradigm and looks very different from plain DBI usage.

Another source filter is Filter::SQL, which, like SQL::PreProc embeds statements directly into Perl without special markup.

Different approaches for making SQL usage safe are found in SQL::Abstract and SQL::DB, which provide SQL queries with a Perl-style interface, so the queries don't look like SQL anymore, but are also safe to use without possibility of SQL injections.

BUGS

Source filters are usually fragile, meaning that you can write Perl code that breaks the filter. To do it properly, it would be necessary to plug into the Perl parser itself. For example, currently, ${sql{a}} (for $sql{a}) will trigger filtering. (And even $sql{a} needed a hack to make it work.)

Moreover, unfortunately, it was unfeasible to use Filter::Simple in code_no_comments mode, because that filter is way to slow. This means that sql{...} is also considered inside comments and strings.

The supported syntax is currently mainly based on MySQL, and while I also looked at PostgreSQL and the SQL-2003 specs sometimes, I am pretty sure that a lot of useful stuff is missing for many DBs. Please don't hesitate to tell me what you're using so I can add it.

Finally, the resulting Perl code could be optimised more. This is on my TODO list.

There are some pretty bad problems when sql{...} is used inside comments. This may lead to syntax errors, because line breaks might be introduced. A more understanding parser I experimented with was too slow to be used in practice. This needs more work. Sorry!

Missing Syntax

UNION is missing.

WITH ... SELECT is not yet implemented.

The MySQL REPLACE command is currently not supported; it is an extension. You can use DELETE+INSERT instead, which is more portable anyway. The REPLACE command will nevertheless be added later.

Several other SQL commands are also missing.

AUTHOR

Henrik Theiling <cpan@theiling.de>

COPYRIGHT AND LICENSE

Copyright by Henrik Theiling <cpan@theiling.de>

This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.

Module Install Instructions

To install SQL::Yapp, simply copy and paste either of the commands in to your terminal

As a valued partner and proud supporter of MetaCPAN, StickerYou is
happy to offer a 10% discount on all Custom Stickers,
Business Labels, Roll Labels,
Vinyl Lettering or Custom Decals. StickerYou.com
is your one-stop shop to make your business stick.
Use code METACPAN10 at checkout to apply your discount.