ljb wrote:
> gwchamb(at)gmail(dot)com wrote:
>> Apache 2.2.3, PostgreSQL 8.2.1, PHP 5.1.6, Linux
>>
>> I have inserted (via pg_query_params) into a bytea field some binary
>> data (a JPEG image in this case) which I have escaped using
>> pg_escape_bytea. It appears, however, that the extracted data is
>> corrupt (NOT unescaped, more precisely), even after unescaping it with
>> pg_unescape_bytea. If I perform another (a subsequent)
>> pg_unescape_bytea, it appears to be partially unescaped, but there
>> still remain errors because the rest of the image is severely
>> distorted -- but minimally recognizeable as part of the original
>> image. What am I missing? I'm using the lo_* functions as an
>> alternative, but it's hard to dismiss the ease with which it appears
>> to deal with binary data with a bytea field.
>
> Interesting problem.
...
> The escaping done by pg_escape_bytea() is wrong for parameterized queries.
> With a binary-mode query parameter (which pg_query_params() can't do
> anyway), you want no escaping at all. With a text-mode parameter (as
> pg_query_params() does), you need to escape for only the bytea-input
> parsing, not the SQL parsing. So for example if your data has a byte with
> value 1, you need to pass that as the 4 characters: \001.
> pg_escape_bytea() returns that as the 5 characters: \\001 (unless the new
> 'standard conforming strings' is on), so it won't work. Nor can I think of
> another PHP escaping function that does work here.
Isn't it possible to partly undo the effect of pg_escape_bytea()?
Something like this:
<?php
/* some binary data */
$data = implode("", range("\000", "\377"));
echo "pg_escape_bytea():\n";
echo pg_escape_bytea($data), "\n\n";
echo "pg_escape_bytea(), corrected:\n";
echo str_replace("\\\\", "\\", pg_escape_bytea($data)), "\n\n";
?>
Here's the result:
$ php -f test.php | fold -w 70
pg_escape_bytea():
\\000\\001\\002\\003\\004\\005\\006\\007\\010\\011\\012\\013\\014\\015
\\016\\017\\020\\021\\022\\023\\024\\025\\026\\027\\030\\031\\032\\033
\\034\\035\\036\\037 !"#$%&''()*+,-./0123456789:;<=>?(at)ABCDEFGHIJKLMNOP
QRSTUVWXYZ[\\\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\\177\\200\\201\\202\
\203\\204\\205\\206\\207\\210\\211\\212\\213\\214\\215\\216\\217\\220\
\221\\222\\223\\224\\225\\226\\227\\230\\231\\232\\233\\234\\235\\236\
\237\\240\\241\\242\\243\\244\\245\\246\\247\\250\\251\\252\\253\\254\
\255\\256\\257\\260\\261\\262\\263\\264\\265\\266\\267\\270\\271\\272\
\273\\274\\275\\276\\277\\300\\301\\302\\303\\304\\305\\306\\307\\310\
\311\\312\\313\\314\\315\\316\\317\\320\\321\\322\\323\\324\\325\\326\
\327\\330\\331\\332\\333\\334\\335\\336\\337\\340\\341\\342\\343\\344\
\345\\346\\347\\350\\351\\352\\353\\354\\355\\356\\357\\360\\361\\362\
\363\\364\\365\\366\\367\\370\\371\\372\\373\\374\\375\\376\\377
pg_escape_bytea(), corrected:
\000\001\002\003\004\005\006\007\010\011\012\013\014\015\016\017\020\0
21\022\023\024\025\026\027\030\031\032\033\034\035\036\037 !"#$%&''()*
+,-./0123456789:;<=>?(at)ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmno
pqrstuvwxyz{|}~\177\200\201\202\203\204\205\206\207\210\211\212\213\21
4\215\216\217\220\221\222\223\224\225\226\227\230\231\232\233\234\235\
236\237\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\25
7\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277\300\
301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\32
2\323\324\325\326\327\330\331\332\333\334\335\336\337\340\341\342\343\
344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\36
5\366\367\370\371\372\373\374\375\376\377
Isn't the latter ready for pg_query_params()?
The only doubt I have is about the double '. Is it needed for bytea
parsing or sql parsing? You may have to replace it with a single ', if
it's only for SQL:
str_replace(array("\\\\", "''"), array("\\", "'"),
pg_escape_bytea($data)), "\n\n";
See http://www.postgresql.org/docs/8.2/interactive/datatype-binary.html.
The man page is not clear, the single quote is listed among the escaped
chars, but from the examples is seems that only SQL escaping is
performed. Compare with \:
SELECT E'\\\\'::bytea; -- after SQL parsing, you get \\
SELECT E'\''::bytea; -- after SQL parsing, you get '
it seems that the bytea parser is feeded with a double slash, but a
single quote. If so, the octal value 39 should be removed from the table
in the manual page, since it requires no escaping at bytea level.
pg_escape_bytea() seems to confirm this, it returns '' for ' but \\\\
for \. \ is escaped twice, ' only once.
Unfortunately I can't try it now with pg_query_params(), but I bet on
the second form... well no, maybe I can, yes, here we go:
<?php
$data = implode("", range("\000", "\377"));
$data_escaped = str_replace(
array("\\\\", "''"),
array("\\", "'"),
pg_escape_bytea($data));
/* fill in the conn string, if required */
$db = pg_connect("");
pg_query($db, "CREATE TEMP TABLE atest (data bytea);");
pg_query_params($db, "INSERT INTO atest VALUES ($1)", array($data_escaped));
$res = pg_query($db, "SELECT data FROM atest");
$data2 = pg_unescape_bytea(pg_fetch_result($res, 0));
echo ($data2 == $data) ? "OK!" : "PANIC!", "\n";
?>
$ php -f test.php
OK!
Seems to work...
.TM.