The Java programming language has been consistently popular for two decades, and is important in many development environments. Its longevity, and the compatibility of code between versions and operating systems, leaves the landscape of Java applications in many industries very much divided between new offerings and long-established legacy code.

Financial technology is no exception.
Competition in this risk-averse domain drives it to push against boundaries.
Production systems inevitably mix contemporary and legacy code.
Because of this, developers need tools for communication and integration.
Implementation risks must be kept to a strict minimum.
Kx technology is well-equipped for this issue.
By design kdb+’s communication with external processes is kept simple, and reinforced with interface libraries for other languages.

The Java API for kdb+ is a Java library.
It fits easily in any Java application as an interface to kdb+ processes.
As with any API, potential use cases are many.
To introduce kdb+ gradually into a wider system, such an interface is essential for any interaction with Java processes, upstream or downstream.
The straightforward implementation keeps changes to legacy code lightweight, reducing the risk of wider system issues arising as kdb+ processes are introduced.

This paper illustrates how the Java API for kdb+ can be used to enable a Java program to interact with a kdb+ process.
It first explores the API itself: how it is structured, and how it might
be included in a development project.
Examples are then provided for core use cases for the API in a standard setup.
Particular consideration is given to how the API facilitates subscription and publication to a kdb+ tickerplant process, a core component of any kdb+ tick-capture system.

The examples presented here form a set of practical templates complementary to the primary source of information on code.kx.com.
These templates can be combined and adapted to apply kdb+ across a
broad range of problem domains. They are available on GitHub.

The API is contained in a single source file on GitHub.
Inclusion in a development project is, therefore, a straightforward matter
of including the file with other source code under the package kx, and
ensuring it is properly imported and referenced by other classes. If
preferred, it can be compiled separately into a class or JAR file to be
included in the classpath for use as an external library or uploaded to
a local repository for build integration.

As the API is provided as source, it is perfectly possible to customize code to meet specific requirements.
However, without prior knowledge of how the interactions work, this is not advised unless the solution to these requirements or issues are known.
It is also possible, and in some contexts encouraged, to wrap the
functionality of this class within a model suitable for your framework.
An example might be the open-source qJava library.
Although it is not compatible with the most recent kdb+ version at the time of writing, it shows how to use c.java as a core over which an object-oriented framework of q types and functionality has been applied.

The source file is structured as a single outer class, c.
Within it, a number of constants and inner classes together model an
environment for sending and receiving data from a kdb+ process.
This section explores the fundamentals of the class to provide context and understanding of practical use-cases for the API.

These constructors are straightforward to use.
The host and port specify a socket-object connection, with the username/password string serialized and passed to the remote instance for authorization.
The core logic is the same for all; the host/port-only constructor attempts to retrieve the user string from the Java properties, and the constructor with the useTLS boolean will, when flagged true, attempt to use an SSL socket instead of an ordinary socket.

It is also possible to set up the object to accept incoming connections
from kdb+ processes rather than just making them. There are two
constructors which, when passed a server socket reference, will allow a
q session to establish a handle against the c object:

public c(ServerSocket s)
public c(ServerSocket s,IAuthenticate a)

IAuthenticate is an interface within the c class that can be
implemented to emulate kdb+ server-side authentication, allowing the
establishment of authentication rules similar to that which might be
done through the kdb+ function .z.pw.

Both of these constructor families represent two ‘modes’ in which
the c object can be instantiated. The first, and ultimately most
widely used, is for making connections to kdb+ processes, which
naturally would be used for queries, subscriptions and any task that
requires the reception of or sending of data to said processes. The
second, which sees Java act as the server, would see utility in
management and aggregation of kdb+ clients, perhaps as a data sink or
an intermediary interface for another technology.

Interactions between Java and kdb+ through these connections are
largely handled by what might be called the ‘k’ family of methods in
the c class. There are thirteen combined methods and overloads that
fall under this group. They can be divided roughly into four groups:

These methods are responsible for handling synchronous queries to a kdb+
process. The String parameter will represent either the entire q
expression or the function name; in the case of the latter, the Object
parameters may be used to pass values into that function. In all
instances, the String/Object combinations are merged into a single
object to be passed to the synchronized k(Object) method.

These methods are responsible for handling asynchronous queries to a
kdb+ process. They operate logically in a similar manner to the
synchronous query method, with the exception that they are, of course,
void methods in that they neither wait for nor return any response from
the process.

This method waits on the class input stream and will deserialize the
next incoming kdb+ message. It is used by the c synchronous methods in
order to capture and return response objects, and is also used in
server-oriented applications in order to capture incoming messages from
client processes.

These methods are typically used in server-oriented applications to
serialize and write response messages to the class output stream.
kr(Object) will act much like any synchronous response, while ke(String)
will format and output an error message.

The use of these constructors and methods will be treated in more
practical detail through the use-case examples below.

The majority of q data types are represented in the API through mapping
to standard Java objects. This is best seen in the method
c.r(),
which reads bytes from an incoming message and converts those bytes into
representative Java types.

The method c.r() deserializes a stream of bytes within a certain range to point
to further methods which return the appropriate typed object. These are
largely self-explanatory, such as booleans and integer primitives
mapping directly to one another, or q UUIDs mapping to java.util.UUID.
There are some types with caveats, however:

The kdb+ float type (9) corresponds to java.lang.Double and notjava.lang.Float, which corresponds to the kdb+ real type (8).

Java strings map to the kdb+ symbol type (11). In terms of reading
or passing in data, this means that passing "String" from Java to
kdb would result in `String. Conversely, passing "String" (type 10
list) from kdb to Java would result in a six-index character array.

Of particular interest is how the mapping handles temporal types, of
which there are eight:

q type

id

Java type

note

datetime

15

java.util.Date

This Java class stores times as milliseconds passed since the Unix epoch. Therefore, like the q datetime, it can represent time information accurate to the millisecond. (This despite the default output format of the class).

date

14

java.sql.Date

While this Java class extends the java.util date object it is used specifically for the date type as it restricts usage and output of time data.

time

19

java.sql.Time

This also extends java.util.Date, restricting usage and output of date data this time.

timestamp

12

java.sql.Timestamp

This comes yet again from the base date class, extended this time to include nanoseconds storage (which is done separately from the underlying date object, which only has millisecond accuracy). This makes it directly compatible with the q timestamp type.

When manipulating date, time and datetime data from kdb+ it is important
to note that while java.sql.Date and Time extend java.util.Date, and can
be assigned to a java.util reference, that many of the methods from the
original date class are overridden in these to throw exceptions if
invoked. For example, in order to create a single date object for two
separate SQL Date and Time objects, a java.util.Date object should be
instantiated by adding the getTime() values from both SQL objects:

The four time types represented by inner classes are somewhat less
prevalent than those modeled by Date and its subclasses. These classes
exist as comparable models due to a lack of a clear representative
counterpart in the standard Java library, although their modeling is for
the large part fairly simple and the values can be easily implemented or
extracted.

Kdb+ dictionaries (type 99) and tables (type 98) are represented by the
internal classes Dict and Flip respectively. The makeup of these models
is simple but effective, and useful in determining how best to
manipulate them.

The Dict class
consists of two public java.lang.Object fields (x for keys, y for
values) and a basic constructor, which allows any of the represented
data types to be used. However, while from a Java perspective any object
could be passed to the constructor, dictionaries in q are always
structured as two lists. This means that if the object is being created
to pass to a q session directly, the Object fields in a Dict object
should be assigned arrays of a given representative type, as passing in
an atomic object will result in an error.

For example, the first of the following dictionary instantiation is
legal with regards to the Java object, but because the pairs being
passed in are atomic, it would signal a type error in q. Instead, the
second example should be used, and can be seen as mirroring the practice
of enlisting single values in q:

Flip (table) objects
consist of a String array for columns, an Object array for values, a
constructor and a method for returning the Object array for a given
column. The constructor takes a dictionary as its parameter, which is
useful for the conversion of one to the other should the dictionary in
question consist of single symbol keys. Of course, with the fields of
the class being public, the columns and values can be assigned manually.

Keyed tables in q are dictionaries in terms of type, and therefore will
be represented as a Dict object in Java. The method
td(Object)
will create a Flip object from a keyed table Dict, but will remove its
keyed nature in the process.

The globally unique identifier (GUID) type was introduced into kdb+ with
version 3.0 for the purpose of storing arbitrary 16-byte values, such as
transaction IDs. Storing such values in this form allows for savings in
tasks such as memory and storage usage, as well as improved performance
in certain operations such as table lookups when compared with standard
types such as Strings.

Java has its own unique identifier type: java.util.UUID (universally
unique identifier). In the API the kdb+ GUID type maps directly to this
object through the extraction and provision of its most and least
significant long values. Otherwise, the only high-level difference in
how this type can be used when compared to other types handled by the
API is that a RuntimeException will be thrown if an attempt is made to
serialize and pass a UUID object to a kdb+ instance with a version lower
than 3.0.

Definitions for q null type representations in Java are held in the
static Object array NULL, with index positions representing the q type.

public static Object[] NULL={
null,
new Boolean(false),
new UUID(0,0),
null,
new Byte((byte)0),
new Short(Short.MIN_VALUE),
new Integer(ni),
new Long(nj),
new Float(nf),
new Double(nf),
new Character(' '),
"",
new Timestamp(nj),
new Month(ni)
,new Date(nj),
new java.util.Date(nj),
new Timespan(nj),
new Minute(ni),
new Second(ni),
new Time(nj)
};

Of note are the integer types, as the null values for these are
represented by the minimum possible value of each of the Java
primitives. Shorts, for example, have a minimum value of -372768 in
Java, but a minimum value of -372767 in q. The extra negative value in
Java can therefore be used to signal a null value to the q connection
logic in the c class.

Float and real nulls are both represented in Java by the
java.lang.Double.NaN constant. Time values, essentially being longs
under the bonnet, are represented by the same null value as longs in
Java. Month, minute, second and timespan, each with custom model
classes, use the same null value as ints.

The method
c.qn(Object)
can assist with checking and identifying null value representations, as
it will check both the Object type and value against the NULL list.

It is worth noting that infinity types are not explicitly mapped in
Java, although kdb+ float and real infinities will correspond with the
infinity constants in java.lang.Double and java.lang.Float
respectively.

KException
is the single custom exception defined and thrown by the API. It is
fairly safe to assume that a thrown KException denotes a q error signal,
which will be included in the exception message when thrown.

Other common exceptions thrown in the API logic include:

IOException

Denotes issues with connecting to the kdb+ process. It is also thrown by c.java itself for such issues as authentication.

RuntimeException

Thrown when certain type implementations are attempted on kdb+ versions prior to their introduction (such as the GUIDs prior to kdb+ 3.0)

UnsupportedEncodingException

It is possible, through the method setEncoding, to specify character encoding different to the default (ISO-859-1). This exception will be thrown commonly if the default is changed to a charset format not implemented on the target Java platform.

The examples that follow consist of common practical tasks that a Java
developer might be expected to carry out when interfacing with kdb+. The
inline examples take the form of extracted sections of key logic
and output, and are available as example classes from
the KxSystems/javakdb repository for use as starting points or templates.

These examples assume, at minimum, a standard installation of 32-bit
kdb+ on the local system, and a suitable Java development environment.

During development, it can be helpful to start a basic q server to which
a Java process can connect. This requires the opening of a port, for
which there are two basic methods:

Example: Starting q with –p parameter

$ q -p 10000

q)\p // command to show the port that q is listening on
10000i

Example: Using the \p system command

$ q

q)\p 10000 // set the listening port to 10000
q)\p
10000i

To close the port, it should be set to its default value of 0 i.e. \p 0.

Setting up a q session in this manner will allow other processes to open
handles to it on the specified port. The remainder of the examples in
this paper assume an opened q session listening on port 10000, with
no further configuration unless otherwise specified.

As discussed in the previous section, the c class establishes
connections via its constructors.

For connecting to a listening q process, one useful mechanism might be
to create a factory class with a method that returns a connected c
object based on what is passed to it. This way, any number of credential
combinations can be set whilst allowing the creation of multiple
connections, say for reconnection purposes:

These constructors will always return a c object connected to the target
session, and failure to do so will result in a thrown exception;
IOException will denote the port not being open or available, and a
KException will denote something wrong with the q process itself (such
as 'access for incorrect or incomplete credentials).

For the remaining examples, connections will be made using a custom
QConnectionFactory object returned from a static method getDefault(),
which will instantiate the object with the host localhost and the port
10000:

Queries can be made using the ‘k’ family of methods in the c class.
For synchronous queries, that might be used to retrieve data (or, more
generally, to halt execution of the java process until a response
is received), the k methods with parameter combinations of strings
and objects might be used.
For asynchronous queries, as might be used in a
feed-handler process to push data to a tickerplant, the ks methods would
be used.

The methods k(), kr() and ke() would not see explicit use in the
querying of a server q process, but are more significant when the Java
process acts as the server, as will be touched upon below.

The following examples demonstrate some of the means by which these
synchronous and asynchronous queries may be called:

The relationship between the kdb+ types and their Java counterparts has
been discussed in the previous section. From a practical perspective, it
is important to note that almost all objects and fields that might
return from a given synchronous query will be of type Object, and will
therefore more often than not require casting in order to be manipulated
properly. Care must be taken, therefore, to ensure that the types that
can be returned from a given query are known and handled appropriately
so as to avoid unwanted exceptions.

The exception to this might be the column names of a flip object (once
cast itself) held in the field flip.x. This field is already typed as
String[], as column names must always be symbols in q.

Kdb+ types that map to primitives (such as int) can be passed in Java to
a k method as a primitive thanks to
autoboxing,
but will always be returned as the corresponding wrapper object (such as
Integer).

Lists will always be returned as an array of the given list type, or as
Object[] if the list is generic. Extraction of atomic values from a
list, therefore, is as simple as casting the return object to the
appropriate array type and accessing the desired index:

Accessing a list from a nested list is similar to accessing a value from
any list. Here there are two casts required: a cast to Object[] for
the parent list and then again to the appropriate typed array for the
extracted list:

The Dict inner class is used for all returned objects of q type
dictionary (and therefore, by extension, keyed tables). Key values are
stored in the field Dict.x, and values in Dict.y, both of which will
generally be castable as an array.

Aside from matching the index positions of x and y, there is no
intrinsic key-value pairing between the two, meaning that alteration of
either of the array structures can compromise the key-value
relationship. The following example illustrates operations that might be
performed on a returned dictionary object:

The inner class c.Flip used to represent tables operates in a similar
manner to c.Dict. The primary difference, as previously mentioned, is
that Flip.x is already typed as String[], while Flip.y will still
require casting. The following example shows how the data from a
returned Flip object might be used to print the table to console:

When passing objects to q via the c class, there is less emphasis on how a given object is created. Rather, such an operation
is subject to the common pitfalls associated with passing values to a q
expression; those of type and rank.

The k family of methods, regardless of its return protocol, will take
either the String of a q expression or the String of a q operator or
function, complemented by Object parameters. Given the nature of q as an
interpreted language, all of these are serialized and sent to the q
session with little regard for logical correctness.

It is important, therefore, that any expressions passed to a query
method are syntactically accurate and refer to variables that actually
exist in the target session. It is also important that any passed
objects are mapped to a relevant q type, and function within the context
that they are sent. KException messages to look out for while
implementing these operations are 'type and 'rank, as these will
generally denote basic type and rank issues respectively.

The following method might be applied to all direct type mappings in
the API; for simple lists (lists in which all elements are of the same
type), it is enough to pass a Java array of the appropriate type.

The following example invokes the q set function, which allows for the
passing of a variable name as well as an object with which the variable
might be set:

c.Flip objects are created slightly differently; it is best to
instantiate these by passing a c.Dict object into the constructor. This
is because tables are essentially collections of dictionaries in kdb+,
and therefore using this constructor helps ensure that the Flip object
is set up correctly.

It is worth noting that for this method to work correctly, the passed
Dict object must use String keys, as these will map into the Flip
object’s typed String[] columns:

Globally universal identifier objects are represented in Java by
java.util.UUID objects, and are passed to kdb+ in an identical manner as
other basic types. The Java object has a useful static method for
generating random identifiers, which further streamlines this process
and can see utility in some use cases where only a certain number of
arbitrary identifiers are required:

Requirements will often dictate that while q processes will need to be
bounced (such as for End-of-Day processing), that a Java process will
need to be able to handle loss and reacquisition of said processes
without being restarted itself. A simple example might be a graphical
user interface, where the forced shutdown of the entire application due
to a dropped connection, or the lack of ability to reconnect, would be
very poor design indeed.

Use of patterns such as factories can help with the task of setting up a
reconnection mechanism, as it allows for the simple creation of a
preconfigured object. For c Objects, given that they connect on
instantiation, means that a connection can be re-established simply by
calling the relevant factory method.

In order to handle longer periods of potential downtime, either loops or
recursion should be used. The danger with recursive methodology here is
that, given an extended without a timeout limitation, there is a risk of
overflowing the method-call stack, as each failed attempt will invoke a
new method onto the stack.

For mechanisms that may need to wait indefinitely, it might be
considered safer to use an indefinite while-loop that makes use of catch
blocks, continue and break statements. This averts the danger of
StackOverflowError occurring and is easily modified to implement a
maximum number of tries:

A kdb+ tickerplant is a q process specifically designed to handle
incoming high-frequency data feeds from publishing process. Its primary
responsibility is the management of subscription requests and the fast
publication of data to subscribers. The following diagram illustrates a
simple dataflow of a potential kdb+ tick system:

Of interest in this whitepaper are the Java publisher and subscriber processes. As the kdb+ tick system is very widely used, both of these kinds of processes are highly likely to come up in development tasks involving kdb+ interfacing.

To facilitate the testing of Java subscriber processes we can implement
example q processes freely available in the Kx repository. Simulation of
a tickerplant can be achieved with
tick.q;
Trade data, using the trade schema defined in sym.q, can then be
published to this tickerplant using the definition for the file feed.q
given here:

The tickerplant and feed handlers can then be started by executing the
following commands consecutively:

$ q tick.q sym -t 2000
$ q feed.q

Once the feedhandler is publishing to the tickerplant, processes can
connect to it in order either to publish or subscribe to it.

It should be noted that in this example and below we are using a
Java process to subscribe to a tickerplant being fed directly
by a simulated feed. While we are doing this here in order to
facilitate a simple example setup, in production this is not
usually encouraged. Processes such as Java subscribers would generally
connect to derivative kdb+ processes such as chained tickerplants (as in the above diagram),
for which standard publishing and subscription logic should be
the same as that covered here.

Typical subscriber processes are required to make an initial subscription request to the tickerplant in order to receive data.
See the publish and subscribe cookbook article for details.
This request involves calling the .u.sub function with two
parameters. The first parameter is the table name and the second is a
list of symbols for subscription. (Specifying a backtick in any of the
parameters means all tables and/or all symbols).

If the .u.sub function is called synchronously, the tickerplant will
return the table schema. If subscribing to one table, the returned
object will be a generic Object array, with the table name in
object[0] and a c.Flip representation of the schema in object[1]:

If more than one table is being subscribed to, the returned object will
be an Object array consisting of the above object arrays; therefore, in
order to retrieve each individual Flip object, this should be iterated
against:

Upon calling .u.sub and retrieving the schema, the tickerplant process
will start to publish data to the Java process. The data it sends can be
retrieved through the parameter-free k() method, which will wait for a
response and return an Object (a c.Flip of the passed data) on
publication:

This mechanism might be then enveloped in an indefinite loop, such as a
while(true) loop. Each iteration waits on the k() method returning
published data, which will continue until one of the contributing
processes fails (at which point an exception is caught and handled
appropriately).

Publishing data to a tickerplant is almost always a necessity for a kdb+
feed-handler process. Java, as a common language of choice for
third-party API development (e.g. Reuters, Bloomberg, MarkIT), is a popular language for feedhandler development, within which c.java is
used to handle the asynchronous invocation of a publishing function.

Care has to be taken here to ensure that all typed arrays maintain
the same length, as failure to do so will likely result in a kdb+
type error. Such errors are especially troublesome when using
asynchronous methods, which will not return KExceptions in the same
manner as sync methods! It is also worth noting that the order of the
typed arrays within the object array should match that of the table
schema.

It is standard tickerplant functionality to append a timespan column to
each row received from a feed handler if not included with the data
passed, which is used to record when the data was received by the
tickerplant. It’s possible for the publisher to create the timespan
column to prevent the tickerplant from adding one:

The examples thus far have emphasized interfacing between Java and kdb+
very much from the perspective of a Java client connecting to a kdb+
server, using the constructors relevant to this purpose. It is very much
possible to reverse these roles using the c(Serversocket) constructor,
which enables a Java process to listen for incoming kdb+ messages on the
specified port.

While the use cases for this ‘server’ mode of operation are not as
common as they might be for ‘client’-mode connections, it is nevertheless
available to developers as a means of implementing communication between
Java and kdb+ processes. The following examples demonstrate the
basic mechanisms by which this can be done.

In a manner similar to tickerplant subscription, the method k() (without
parameters) can be used to wait on and listen to any connecting q
session. In this example, the object is retrieved in this fashion and
deciphered, either to return an error when passed the
symbol `returnError or to return a message describing what was sent:

In the above example, the server c object is instantiated with a
new ServerSocket being created in its constructor. This is acceptable in
this instance because we cared only about the handling of one
connection.

In general, ServerSocket objects should not be used in this manner, as
they are designed to handle more than a single incoming connection.
Instead, the ServerSocket should be passed as a reference. With the
addition of some simple threading, an application capable of handling
messages from multiple q sessions can be created:

This will allow any number of connections to be established, with factors
such as connection limitation and load balancing left up to how the
process is implemented. As in any case where threading is used, take
care that such a method does not enable race conditions or concurrency
issues; if necessary, steps can be taken to reduce the risk of such
operations, such as synchronized blocks and methods.

This document has covered a variety of topics concerning
the mechanics and application of the c.java interface for kdb+. Of the
workings and examples shown, the most common use case for this interface
will be connecting to a q process, executing queries and functions and
managing any result objects. However, this document has also displayed
the versatile nature of c.java as a tool, providing a handful of
solutions to a given problem and able to fulfill server as well as
client functions.

The practical examples should also help demonstrate that tasks required
as part of a standard kdb+ toolset can be handled easily from the
perspective of both Java developers interfacing with kdb+ for the first
time, or kdb+ developers who are required to venture into Java
development, for example, to help complete development of a feed
handler. The benefit of such interfaces is felt keenly through the
common role of these developers in helping to reconcile longstanding
applications with contemporary technologies, often to the benefit of
both.

Peter Lyness joined First Derivatives as a software engineer in 2015. During this time he has implemented a number of Java-based technical solutions for clients, including kdb+ interface logic for upstream static and real-time data feeds.