Internally, the Connector uses the InterSystems JDBC driver to read and write values to and from servers. This constrains the data types that can be serialized in and out of database tables via Spark. The JDBC driver exposes the JDBC data types in the following tables as available projections for InterSystems IRIS™ data types, and converts them to and from the listed Spark Catalyst types (members of the org.apache.spark.sql.types package).

This mapping between Spark Catalyst and JDBC data types differs subtly from that used by the standard Spark jdbc data source, as noted in the following sections.

JDBC / Spark Value Type Conversions

The following JDBC value types are exposed, and are converted directly to and from the listed Spark Catalyst types:

The following JDBC object types are exposed, and are represented by the listed Spark Catalyst types. Bidirectional conversion is not supported for LONGVARCHAR, GUID, LONGVARBINARY, and TIME because these JDBC types do not correspond to unique Spark Catalyst types:

com.intersys.spark.core  provides the underlying implementation for the connector. Implemented in Scala. Resides in the isc-spark JAR file. These classes should be considered private: their APIs may undergo breaking changes in a future release, so should not be referenced directly.

com.intersys.sqf  consists of a number of supporting classes concerned with factorizing SQL queries. Implemented in Java source language and also used by the InterSystems JDBC driver, whose JAR file they reside in. These classes should be considered private: their APIs may undergo breaking changes in a future release, so should not be referenced directly.

Logging

The connector logs various events of interest using the same infrastructure as the Spark system itself uses, namely Log4J.

The content, format, and destination of the system as a whole is configured by the file ${SPARK_HOME}/conf/log4j-defaults. The connector is implemented in classes that reside in a package named com.intersys.spark and so can easily be configured by specifying keys of the form:

Notice that no alias is provided for the selection expression min(a). The server synthesizes names for such columns, and in this case might describe the schema for the resulting dataframe as having two columns, named 'a' and 'Aggregate_2' respectively.

No actual field named 'Aggregate_2' exists in the table however, so an attempt to reference it in an enclosing selection would fail:

For this reason, you should consider modifying the original query by attaching aliases to columns that would otherwise receive server synthesized names.

We hope to address this issue in a subsequent release.

Java 9 Compatibility

Java 9, and the JVM 1.9 on which it runs, became available for general release in September 2017. Neither Apache Spark nor the InterSystems Spark Connector currently run on this version of the JVM. We hope to address this issue in a subsequent release.

Handling of TINYINT

The mapping between Spark Catalyst and JDBC datatypes (see SQL/Spark Datatype Mapping earlier in this chapter) differs subtly from that used by the Spark jdbc data source. The Connector achieves this mapping by automatically installing its own subclass of class org.apache.spark.sql.jdbc.JdbcDialect but this also has the side effect of changing the mapping used by Spark JDBC itself.

By and large this is a good thing, but one problem that has been identified recently is that due to a bug in Spark 2.1.1, which neglects to implement a low level reader function for the ByteType, attempting to read an InterSystems IRIS table with a column of type TINYINT using the Spark jdbc data source will fail once the Connector has been loaded.

For now, it is probably best to avoid reading and writing DataFrames using the Spark jdbc data source directly once the Connector has been loaded. We hope to address this issue in a subsequent release.

JDBC Isolation Levels

The InterSystems IRIS server does not currently support the writing of a dataset to a SQL table using JDBC isolation levels other than NONE and READ_UNCOMITTED. We hope to address this issue in a subsequent release.