Tutorial: Ingest data into a SQL Server data pool with Transact-SQL

In this article

This tutorial demonstrates how to use Transact-SQL to load data into the data pool of a SQL Server 2019 Big Data Clusters. With SQL Server Big Data Clusters, data from a variety of sources can be ingested and distributed across data pool instances.

In this tutorial, you learn how to:

Create an external table in the data pool.

Insert sample web clickstream data into the data pool table.

Join data in the data pool table with local tables.

Tip

If you prefer, you can download and run a script for the commands in this tutorial. For instructions, see the Data pools samples on GitHub.

In CTP 3.1, the creation of the data pool is asynchronous, but there is no way to determine when it completes yet. Wait for two minutes to make sure the data pool is created before continuing.

Load data

The following steps ingest sample web clickstream data into the data pool using the external table created in the previous steps.

Use an INSERT INTO statement to insert the results from the query into the data pool (the web_clickstream_clicks_data_pool external table).

INSERT INTO web_clickstream_clicks_data_pool
SELECT wcs_user_sk, i_category_id, COUNT_BIG(*) as clicks
FROM sales.dbo.web_clickstreams_hdfs
INNER JOIN sales.dbo.item it ON (wcs_item_sk = i_item_sk
AND wcs_user_sk IS NOT NULL)
GROUP BY wcs_user_sk, i_category_id
HAVING COUNT_BIG(*) > 100;

Inspect the inserted data with two SELECT queries.

SELECT count(*) FROM [dbo].[web_clickstream_clicks_data_pool]
SELECT TOP 10 * FROM [dbo].[web_clickstream_clicks_data_pool]

Query the data

Join the stored results from the query in the data pool with local data in the Sales table.