środa, 29 sierpnia 2012

DB2, PostgreSQL, FDW and Linux 64 bit

IntroductionFDW (foreign data wrapper) is a feature in PostgreSQL which allows access to external resources as if they were relational tables in PostreSQL database. There is also ODBC wrapper available so I was interested if it is possible to configure integration with DB2. Finally I was successful but it was a journey littered with troubles.DB2 data server client
Download and install DB2 data server client. Then configure connection to DB2 database by means of CLP application. In my environment remote DB2 database SAMPLE is created on server think and access port is 50005, user/password db2inst3/db2inst3

ODBC DB2 connection on Linux
We have to setup ODBC connection with DB2 database (local or remote) on the machine were PostgreSQL is installed. Be sure that ODBC Linux version is installed, open source unixODBC is available for all Linux distribution.
Next step is to configure ODBC access to DB2 database (DB2 Infocenter detailed information) . It can be done by GUI (ODBCConfig) or manually. You have to identify where libdb2.so ODBC driver is located and add the driver definition to /etc/odbcinst.ini file. In my environment this definition looks like:

The last command should yield the content of EMP table created in DB2 SAMPLE database.Additional remark.
In order to have it running we have to catalog DB2 connection using DB2 CLP and ODBC connection also. Also the database alias used in CLP (DB2SAMPL here) should match the DSN name in ODBC. Otherwise the ODBC connection is not working.Prerequisites in PostgreSQL
Together with PostgreSQL server additional packages should be installed. In my environment (Fedora 14) I installed these packages via commands:

select * from employee;
select * from employee where salary > 70000;
select count(empno),workdept from employee group by workdept order by count(empno);
select max(salary),min(salary),workdept from employee group by workdept;
select count(empno),workdept from employee group by workdept having count(empno) > 5;

One has to remember that foreign tables are "read-only" in PostgreSQL. Cannot be updated and do not participate in transactional processing (more details)
More complicated example
What about joins between foreign and native tables ?
Assume that together with official personal data in SAMPLE database the sleeky employer wants to keep some political incorrect information in a database.
So create additional table in PostgreSQL:

Let's check the number of people with specified and unspecified political views in the company

select workdept,sum(case when p=1 then 1 else 0 end) as unstable,sum(case when p=0 then 1 else 0 end) as stable from (select workdept, case when s.politicalviews is null then 1 else 0 end as p from employee e left outer join secret s on e.empno = s.empno) as w group by workdept;

create view companyandpolitics select workdept,sum(case when p=1 then 1 else 0 end) as unstable,sum(case when p=0 then 1 else 0 end) as stable from (select workdept, case when s.politicalviews is null then 1 else 0 end as p from employee e left outer join secret s on e.empno = s.empno) as w group by workdept;
select * from politics;

For some reason the last character is lost if the length of CHAR column is the same in DB2 and foreign table definition. I cannot tell where is the problem: in DB2, DB2 ODBC/CLI, fdw_odbc extension or FDW implementation in PostreSQL. The solution is simply to extend CHAR size in SELECT by DB2 CAST scalar function.

DECIMAL. DB2 ODBC/CLI maps DECIMAL column type to CHAR as a default.. In my locale as a decimal point , (colon) is used which is not acceptable by PostgreSQL. So the solution is to use DB2 REPLACE scalar function to replace , (colon) with . (dot).

I did not test date, timestamp, UTF-8 string etc. But - if any problem arises - I hope that it is possible to overcome any issue using method described above. It is also possible to modify the defaults of DB2 ODBC/CLI connection - look for more details.

Conclusion
Unfortunately, it was a painful job to setup this FDW for DB2. This solution is almost not documented, error handling is very poor (not existing at all). In case of any problem it can break the whole PostgreSQL or simply brings nothing without any error message. So I had to spend some time to dig through the source code and run this extension with debug mode enabled and additional logging. To my mind the extension requires more thorough test and improvements to be fully trusted.
But the result is amazing - we have a powerful tool for integrating PostgreSQL with DB2.