According to the Impala documentation, double backslashes should be used as a regex escape character. However, it doesn't work here (see Col2 in the above result). Instead, it does work when using a single backslash.

If we add an unrelated comparison to the query, this behaviour changes:

Now, a double backslash is required for the regex to function correctly. The result is identical, if one uses another table column in the comparison or puts the comparison into the where-clause.

This strange behavior is only present when running queries over JDBC/ODBC on a non-default database. Hue and Impala-Shell work as expected. And JDBC/ODBC-queries work as expected when executed on tables in the default database.

I've tested this on CDH5.15.0 and 5.13.1 with JDBC-2.6.3.1004 and ODBC v2.5.37.1014 (32bit) drivers.

Is this a bug or am I missing something? Anyone else experiencing the same issue?

1./ When running the query (select regexp_extract(text, '\w', 0), regexp_extract(text, '\\w', 0), text from test.test;) with the newest connector, through a client, the Impala parser got the following input:

We also ran multiple queries and noticed when there is a " character in the query, the driver passes through the statement as is. When there is no " in the query, the driver will use backticks and backslashes in the statement causing double escaping the characters.

The above leads us to believe that the problem is probably in the connector, and not in Impala. For now, this can be resolved by using a '' in the query or use "UseNativeQuery". This helps in ensuring that the driver does not transform the queries emitted by an application, and runs it as is, as explained on the Simba documentation page [1].