Associating Kafka message keys: The above statement does not make
any assumptions about the Kafka message key in the underlying Kafka
topic. However, if the value of the message key in Kafka is the same as
one of the columns defined in the stream in KSQL, you can provide such
information in the WITH clause. For instance, if the Kafka message key
has the same value as the pageid column, you can write the CREATE
STREAM statement as follows:

Associating Kafka message timestamps: If you want to use the value
of one of the columns as the Kafka message timestamp, you can provide
such information to KSQL in the WITH clause. The message timestamp is
used in window-based operations in KSQL (such as windowed aggregations)
and to support event-time based processing in KSQL. For instance, if you
want to use the value of the viewtime column as the message
timestamp, you can rewrite the above statement as follows:

Now that you have the pageviews stream and users table, take a
look at some example queries that you can write in KSQL. The focus is on
two types of KSQL statements: CREATE STREAM AS SELECT (a.k.a CSAS) and CREATE TABLE
AS SELECT (a.k.a CTAS). For these statements KSQL persists the results of the query
in a new stream or table, which is backed by a Kafka topic.

Use a [WHEREcondition] clause to select a subset of data. If you
want to route streams with different criteria to different streams
backed by different underlying Kafka topics, e.g. content-based routing,
write multiple KSQL statements as follows:

When joining objects the number of partitions in each must be the same. You can use KSQL itself to create re-partitioned streams/tables as required. In this example you will join users to the pageviews_transformed topic, which has 5 partitions. First, generate a users topic with a partition count to match that of pageviews_transformed:

CREATETABLEusers_5partWITH(PARTITIONS=5)ASSELECT*FROMUSERS;

Now you can use the following query to create a new stream by joining the
pageviews_transformed stream with the users_5part table.

The above query counts the pageviews from the time you start the query
until you terminate the query. Note that we used CREATE TABLE AS SELECT
statement here since the result of the query is a KSQL table. The
results of aggregate queries in KSQL are always a table because it
computes the aggregate for each key (and possibly for each window per
key) and updates these results as it processes new input data.

KSQL supports aggregation over WINDOW too. Let’s rewrite the above query
so that we compute the pageview count per region every 1 minute:

UCASE and LCASE functions in KSQL are used to convert the values of
gender and regionid columns to upper and lower case, so that you can
match them correctly. KSQL also supports LIKE operator for prefix,
suffix and substring matching.

KSQL supports HOPPING windows and SESSION windows too. The following
query is the same query as above that computes the count for hopping
window of 30 seconds that advances by 10 seconds:

The next statement counts the number of pageviews per region for session
windows with a session inactivity gap of 60 seconds. In other words, you
are sessionizing the input data and then perform the
counting/aggregation step per region.

Sometimes, you may want to include the bounds of the current window in the result so that it is
more easily accessible to consumers of the data. The following statement extracts the start and
end time of the current session window into fields within output rows.

The interests column in the users table is an array of
strings that represents the interest of each user. The contactinfo
column is a string-to-string map that represents the following
contact information for each user: phone, city, state, and zipcode.

Tip

If you are using ksql-datagen, you can use quickstart=users_ to generate data that include the interests and contactinfo columns.

The following query will create a new stream from pageviews_enriched
that includes the first interest of each user along with the city and
zipcode for each user: