Oracle database import via sqoop fails when a table contains the column types such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE

Details

Type: Bug

Status:Resolved

Priority: Major

Resolution:
Fixed

Affects Version/s:0.22.0

Fix Version/s:
None

Component/s:
None

Labels:

None

Hadoop Flags:

Reviewed

Description

When Oracle table contains the columns "TIMESTAMP(6) WITH LOCAL TIME ZONE" and "TIMESTAMP(6) WITH TIME ZONE", Sqoop fails to map values for those columns to valid Java data types, resulting in the following exception:

ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.sqoop.orm.ClassWriter.generateFields(ClassWriter.java:253)
at org.apache.hadoop.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:701)
at org.apache.hadoop.sqoop.orm.ClassWriter.generate(ClassWriter.java:597)
at org.apache.hadoop.sqoop.Sqoop.generateORM(Sqoop.java:75)
at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:87)
at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:175)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.sqoop.Sqoop.main(Sqoop.java:201)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

I have modified the code for Hadoop and Sqoop so this bug is fixed on my machine. Please let me know if you would like me to generate the patch and upload it to this ticket.

The motivation for this fix is the following. Oracle releases 10.2+ introduced new data types "TIMESTAMP WITH TIME ZONE" and "TIMESTAMP WITH LOCAL TIME ZONE", and Sqoop did not handle these types properly. To be specific, it did not find the proper Java data types mapping for those SQL columns, which caused the exception described earlier. This error occurred in two situations:

2. Sqoop was attempting to map the table metadata to Java data types while loading data to hive.

Hence, I added two methods "toJavaType" and "toHiveType" method to SqlManager interface, and implemented these methods in OracleManager class. From now on, any new data types that are very specific to database vendor and not yet supported can be handled properly.

Leonid Furman
added a comment - 06/Jan/10 02:22 The motivation for this fix is the following. Oracle releases 10.2+ introduced new data types "TIMESTAMP WITH TIME ZONE" and "TIMESTAMP WITH LOCAL TIME ZONE", and Sqoop did not handle these types properly. To be specific, it did not find the proper Java data types mapping for those SQL columns, which caused the exception described earlier. This error occurred in two situations:
1. Sqoop was attempting to map the table metadata to Java data types while running table import.
2. Sqoop was attempting to map the table metadata to Java data types while loading data to hive.
Hence, I added two methods "toJavaType" and "toHiveType" method to SqlManager interface, and implemented these methods in OracleManager class. From now on, any new data types that are very specific to database vendor and not yet supported can be handled properly.
Please feel free to provide feedback.
Thanks,
Leonid.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

Hadoop QA
added a comment - 06/Jan/10 04:57 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12429497/MAPREDUCE-1327.patch.gz
against trunk revision 896265.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
-1 patch. The patch command could not apply the patch.
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/247/console
This message is automatically generated.

Aaron Kimball
added a comment - 06/Jan/10 22:04 Looking at this patch myself, a few things stand out:
This patch seems to revert many recent changes to Sqoop. Please re-merge your branch with trunk before resubmitting and make sure you're not submitting extraneous changes.
Please don't go over 80 columns / line
Uses of System.out and System.err should be replaced with appropriate LOG.info(), debug(), etc statements.
Don't trap the exception on failure to set tz in OracleManager; that should rethrow a SQLException
The base SqlManager.toJavaType() method should still return null in the undefined 'else' case. An extra 'dbToJavaType' method is unnecessary; subclasses should simply override toJavaType().
The base toHiveType() method should not be empty; it should call o.a.h.sqoop.hive.HiveTypes.toHiveType().
I don't understand SqlManager.getDatabaseType(). Nothing Oracle-specific should be in this class; that belongs in OracleManager.
getTypeClass() similarly does not belong in the base SqlManager class.

I changed the code according to your guidelines, and merged it to the trunk.

As far as the unit tests for supporting new Oracle data types, I modified the OracleManagerTest.java file to include data types TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE. I tested it in my local environment, so in order to verify that it works a separate QA environment with Oracle database will need to be configured.

Leonid Furman
added a comment - 07/Jan/10 00:18 Aaron,
I changed the code according to your guidelines, and merged it to the trunk.
As far as the unit tests for supporting new Oracle data types, I modified the OracleManagerTest.java file to include data types TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE. I tested it in my local environment, so in order to verify that it works a separate QA environment with Oracle database will need to be configured.
Let me know if the code changes are sufficient.
Thanks!

Leonid Furman
added a comment - 07/Jan/10 01:49 You are right, it shouldn't. I might have modified build.xml file in the following way:
from
sysproperty key="sqoop.throwOnError" value="" />
to
sysproperty key="sqoop.throwOnError" value="true" />
to include more error messages while testing the patch.
Let me know if I should resubmit the patch.

Aaron Kimball
added a comment - 07/Jan/10 01:53 Well, there's that one in /src/contrib/sqoop/build.xml. But also the top level /build.xml file has an extraneous one-liner change.
Also when running the test, I get this problem:
Testcase: testOracleImport took 1.679 sec
FAILED
SQLException when running test setUp(): java.sql.SQLException: java.lang.reflect.InvocationTargetException
junit.framework.AssertionFailedError: SQLException when running test setUp(): java.sql.SQLException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.sqoop.manager.OracleManagerTest.setUp(OracleManagerTest.java:118)
I'm using ojdbc6_g.jar and Oracle XE 10.2.0.

Leonid Furman
added a comment - 07/Jan/10 02:26 1. The one line change for the top level /build.xml was made to remove the capitalization from the project name. I was following the instructions provided on this page:
http://wiki.apache.org/hadoop/BuildingHadoopFromSVN
2. In regards to the unit test error:
Testcase: testOracleImport took 1.679 sec
FAILED
SQLException when running test setUp(): java.sql.SQLException: java.lang.reflect.InvocationTargetException
junit.framework.AssertionFailedError: SQLException when running test setUp(): java.sql.SQLException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.sqoop.manager.OracleManagerTest.setUp(OracleManagerTest.java:118)
I'm using ojdbc6.jar, not ojdbc6_g.jar, and Oracle XE 10.2.0.
Can you paste more detailed error message, preferrably from the stack trace? For testing purpose, I also changed the top level build.xml by setting the property from:
<property name="test.output" value="no"/>
to
<property name="test.output" value="yes"/>

1) I don't see how those instructions suggest you do that. In any case, don't propagate that change back into any patches you generate.

2) here's the stack trace:

10/01/06 18:33:13 ERROR manager.OracleManager: Could not set time zone for oracle connection
java.lang.reflect.InvocationTargetException
»- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
»- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
»- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
»- at java.lang.reflect.Method.invoke(Method.java:597)
»- at org.apache.hadoop.sqoop.manager.OracleManager.setSessionTimeZone(OracleManager.java:107)
»- at org.apache.hadoop.sqoop.manager.OracleManager.makeConnection(OracleManager.java:89)
»- at org.apache.hadoop.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:50)
»- at org.apache.hadoop.sqoop.manager.OracleManagerTest.setUp(OracleManagerTest.java:91)
»- at junit.framework.TestCase.runBare(TestCase.java:132)
»- at junit.framework.TestResult$1.protect(TestResult.java:110)
»- at junit.framework.TestResult.runProtected(TestResult.java:128)
»- at junit.framework.TestResult.run(TestResult.java:113)
»- at junit.framework.TestCase.run(TestCase.java:124)
»- at junit.framework.TestSuite.runTest(TestSuite.java:232)
»- at junit.framework.TestSuite.run(TestSuite.java:227)
»- at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:79)
»- at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
»- at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
»- at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
»- at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
Caused by: java.sql.SQLDataException: ORA-01882: timezone region not found
»- at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:159)
»- at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:137)
»- at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:257)
»- at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:468)
»- at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:418)
»- at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1089)
»- at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:188)
»- at oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:1222)
»- at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1516)
»- at oracle.jdbc.driver.OracleStatement.executeUpdateInternal(OracleStatement.java:2284)
»- at oracle.jdbc.driver.OracleStatement.executeUpdate(OracleStatement.java:2227)
»- at oracle.jdbc.driver.OracleStatementWrapper.executeUpdate(OracleStatementWrapper.java:809)
»- at oracle.jdbc.driver.PhysicalConnection.setSessionTimeZone(PhysicalConnection.java:13459)
»- ... 20 more

Handy tip: You don't need to see all the output on the console; it appears in text files in build/contrib/sqoop/test/TEST-(testname).txt

Aaron Kimball
added a comment - 07/Jan/10 02:34 1) I don't see how those instructions suggest you do that. In any case, don't propagate that change back into any patches you generate.
2) here's the stack trace:
10/01/06 18:33:13 ERROR manager.OracleManager: Could not set time zone for oracle connection
java.lang.reflect.InvocationTargetException
»- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
»- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
»- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
»- at java.lang.reflect.Method.invoke(Method.java:597)
»- at org.apache.hadoop.sqoop.manager.OracleManager.setSessionTimeZone(OracleManager.java:107)
»- at org.apache.hadoop.sqoop.manager.OracleManager.makeConnection(OracleManager.java:89)
»- at org.apache.hadoop.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:50)
»- at org.apache.hadoop.sqoop.manager.OracleManagerTest.setUp(OracleManagerTest.java:91)
»- at junit.framework.TestCase.runBare(TestCase.java:132)
»- at junit.framework.TestResult$1.protect(TestResult.java:110)
»- at junit.framework.TestResult.runProtected(TestResult.java:128)
»- at junit.framework.TestResult.run(TestResult.java:113)
»- at junit.framework.TestCase.run(TestCase.java:124)
»- at junit.framework.TestSuite.runTest(TestSuite.java:232)
»- at junit.framework.TestSuite.run(TestSuite.java:227)
»- at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:79)
»- at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
»- at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
»- at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
»- at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
Caused by: java.sql.SQLDataException: ORA-01882: timezone region not found
»- at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:159)
»- at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:137)
»- at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:257)
»- at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:468)
»- at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:418)
»- at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1089)
»- at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:188)
»- at oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:1222)
»- at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1516)
»- at oracle.jdbc.driver.OracleStatement.executeUpdateInternal(OracleStatement.java:2284)
»- at oracle.jdbc.driver.OracleStatement.executeUpdate(OracleStatement.java:2227)
»- at oracle.jdbc.driver.OracleStatementWrapper.executeUpdate(OracleStatementWrapper.java:809)
»- at oracle.jdbc.driver.PhysicalConnection.setSessionTimeZone(PhysicalConnection.java:13459)
»- ... 20 more
Handy tip: You don't need to see all the output on the console; it appears in text files in build/contrib/sqoop/test/TEST-(testname).txt

1. The instructions for Hadoop build and release suggested that project names should be lower case:

"patch the build.xml files of hadoop* to make the project names all lower case otherwise they get published with the wrong name in Ivy"

but I will resubmit the patch later without these changes.

2. By any change, are you using Oracle JDeveloper to run the unit test? From what I searched on google, the error "ORA-01882: timezone region not found" typically comes from not having the "user.timezone" system property in JDeveloper. Please correct me if I am wrong.

Leonid Furman
added a comment - 07/Jan/10 02:49 Thanks for the tips, they are very helpful!
1. The instructions for Hadoop build and release suggested that project names should be lower case:
"patch the build.xml files of hadoop* to make the project names all lower case otherwise they get published with the wrong name in Ivy"
but I will resubmit the patch later without these changes.
2. By any change, are you using Oracle JDeveloper to run the unit test? From what I searched on google, the error "ORA-01882: timezone region not found" typically comes from not having the "user.timezone" system property in JDeveloper. Please correct me if I am wrong.
I am running the unit test from the command line:
ant -Dtestcase=OracleManagerTest test
and don't have this error.

My understanding is that the timezone on your machine with JVM should be synchronized (i.e. to be the same) with timezone in Oracle DB. Perhaps this is why it is working for me. If this is not the case for your environment, may be the OracleManager.java should have the code:

Leonid Furman
added a comment - 07/Jan/10 03:03 In your Oracle database, can you run the query below:
select dbtimezone from dual
to find the timezone set in the database.
My understanding is that the timezone on your machine with JVM should be synchronized (i.e. to be the same) with timezone in Oracle DB. Perhaps this is why it is working for me. If this is not the case for your environment, may be the OracleManager.java should have the code:
System.getProperty"user.timezone")
instead of
Calendar.getInstance().getTimeZone().getID()
Anyway, can you try to run the unit test this way:
ant -Dtestcase=OracleMangerTest -Duser.timezone=YOUR_TIME_ZONE test

Leonid Furman
added a comment - 07/Jan/10 03:19 In your Oracle database, run the query:
select * from V$TIMEZONE_NAMES
which should give you the list of available time zones. if your time zone is not in this list, may be it causes the error. Can you check?

Leonid Furman
added a comment - 08/Jan/10 03:29 Aaron,
Did you have a chance to check if the query below generates the error message "timezone region not found"?
SELECT TZ_OFFSET(YOUR_TIME_ZONE) FROM dual
In the mean time, I will try to reproduce your scenario by writing a sample program where Oracle database and JVM reside in different time zone regions.

However this patch configures things re. Sqoop's connection to Oracle, it'll need to work without specifying particular JVM settings. Maybe it can pick a default timezone if the JVM doesn't have one (e.g., default to UTC).

Leonid Furman
added a comment - 11/Jan/10 22:35 Aaron,
I made the changes to the setSessionTimezone method such that it will set the default time zone as "UTC" in case JVM specific time zone is not valid.
Can you review a new patch? Thanks!

Thanks for the new patch. Note that Hudson won't be able to apply this – you generated it with paths like 'hadoop-mapreduce/src/contrib/sqoop/...'; you'll need to recreate the patch so that it applies with patch -p0 (i.e., paths like 'src/contrib/sqoop/...')

I noticed that you have flagged this issue as an incompatible change. What API are you breaking / other incompatibility are you introducing with this issue?

Also, please don't delete the old patch file when you upload a new one – sometimes it's helpful to refer to older versions of the code. Instead, just upload new versions, maybe with a numeric id (see MAPREDUCE-1126 for an example.)

Aaron Kimball
added a comment - 12/Jan/10 00:20 Leonid,
Thanks for the new patch. Note that Hudson won't be able to apply this – you generated it with paths like 'hadoop-mapreduce/src/contrib/sqoop/...'; you'll need to recreate the patch so that it applies with patch -p0 (i.e., paths like 'src/contrib/sqoop/...')
I noticed that you have flagged this issue as an incompatible change. What API are you breaking / other incompatibility are you introducing with this issue?
Also, please don't delete the old patch file when you upload a new one – sometimes it's helpful to refer to older versions of the code. Instead, just upload new versions, maybe with a numeric id (see MAPREDUCE-1126 for an example.)
After applying this patch, the oracle test now fails for me with:
Testcase: testOracleImport took 3.542 sec
FAILED
null expected:<...ineering,2009-12-29 [12.00.00.0,2009-12-29 12.00.]00.0> but was:
<...ineering,2009-12-29 [04:00:00.0,2009-12-29 04:00:]00.0>
junit.framework.ComparisonFailure: null expected:<...ineering,2009-12-29 [12.00.00.0,2009-12-29 12.00.]00.0> but
was:<...ineering,2009-12-29 [04:00:00.0,2009-12-29 04:00:]00.0>
at org.apache.hadoop.sqoop.manager.OracleManagerTest.runOracleTest(OracleManagerTest.java:193)
at org.apache.hadoop.sqoop.manager.OracleManagerTest.testOracleImport(OracleManagerTest.java:218)
I suspect this is due to timezone normalization (12:00 vs 04:00). You may need to determine the correct timezone and make the test expect a different result based on its offset.
The new changes in the patch look good though. One nitpick about the code:
SqlManager.java: "This method can no longer remain static" – no need for this comment; since the code doesn't say 'static' in it any more, that'll be out-of-context.

Leonid Furman
added a comment - 12/Jan/10 01:16 Thanks for the tips and feedback!
This patch doesn't introduce any incompatibility change. I think I marked it with 'incompatible change' flag by mistake.
I fixed OracleManagerTest unit test and it is working for me. The problem was the typo in one of the values of expected results array.
The new patch - MAPREDUCE-1327 .2.patch - does not include any parent directories like 'hadoop-mapreduce', but only contains the src/ directory tree.

Leonid Furman
added a comment - 13/Jan/10 00:06 Aaron,
The new patch MAPREDUCE-1327 .4.patch is available. I modified the code for unit test OracleManagerTest.java such that it takes timezone offset into consideration when comparing the test results.
Thanks!

Looking at OracleManagerTest.compareRecords() though, the date-checking loop should call fail() or throw a new IOException if it can't parse one of the dates. As-is, the mismatch flag is never reported back to JUnit as a test failure condition from that loop.

Aaron Kimball
added a comment - 13/Jan/10 01:52 This test now passes.
Looking at OracleManagerTest.compareRecords() though, the date-checking loop should call fail() or throw a new IOException if it can't parse one of the dates. As-is, the mismatch flag is never reported back to JUnit as a test failure condition from that loop.

Leonid Furman
added a comment - 13/Jan/10 02:13 Thank you for the comments! Attached please find a new patch - MAPREDUCE-1327 .5.patch. The methd 'compareRecords' in OracleManagerTest.java now throws an IOException if it can't parse a timestamp.

I also wanted to ask if this patch will be applied to Cloudera source repository any time soon. The reason I am asking is because my HDFS cluster is running on the current Hadoop release version - 0.20.0, which doesn't support Oracle. Therefore, if I build hadoop from trunk and run Sqoop, it will not work. But Cloudera's latest release supports Oracle, and when this patch MAPREDUCE-1327 is applied to Cloudera, installed on both the namenode and HDFS cluster, Sqoop should work as expected.

Leonid Furman
added a comment - 13/Jan/10 02:32 Thanks Aaron! It's been a great experience!
I also wanted to ask if this patch will be applied to Cloudera source repository any time soon. The reason I am asking is because my HDFS cluster is running on the current Hadoop release version - 0.20.0, which doesn't support Oracle. Therefore, if I build hadoop from trunk and run Sqoop, it will not work. But Cloudera's latest release supports Oracle, and when this patch MAPREDUCE-1327 is applied to Cloudera, installed on both the namenode and HDFS cluster, Sqoop should work as expected.
Thank you in advance.