In this article

Move data from an SFTP server using Azure Data Factory

In this article

This article applies to version 1 of Data Factory. If you are using the current version of the Data Factory service, see SFTPconnector in V2.

This article outlines how to use the Copy Activity in Azure Data Factory to move data from an on-premises/cloud SFTP server to a supported sink data store. This article builds on the data movement activities article that presents a general overview of data movement with copy activity and the list of data stores supported as sources/sinks.

Data factory currently supports only moving data from an SFTP server to other data stores, but not for moving data from other data stores to an SFTP server. It supports both on-premises and cloud SFTP servers.

Note

Copy Activity does not delete the source file after it is successfully copied to the destination. If you need to delete the source file after a successful copy, create a custom activity to delete the file and use the activity in the pipeline.

Supported scenarios and authentication types

You can use this SFTP connector to copy data from both cloud SFTP servers and on-premises SFTP servers. Basic and SshPublicKey authentication types are supported when connecting to the SFTP server.

You can also use the following tools to create a pipeline: Azure portal, Visual Studio, Azure PowerShell, Azure Resource Manager template, .NET API, and REST API. See Copy activity tutorial for step-by-step instructions to create a pipeline with a copy activity. For JSON samples to copy data from SFTP server to Azure Blob Storage, see JSON Example: Copy data from SFTP server to Azure blob section of this article.

Linked service properties

The following table provides description for JSON elements specific to FTP linked service.

Using SSH public key authentication

To use SSH public key authentication, set authenticationType as SshPublicKey, and specify the following properties besides the SFTP connector generic ones introduced in the last section:

Property

Description

Required

username

User who has access to the SFTP server

Yes

privateKeyPath

Specify absolute path to the private key file that gateway can access.

Specify either the privateKeyPath or privateKeyContent.

Apply only when copying data from an on-premises SFTP server.

privateKeyContent

A serialized string of the private key content. The Copy Wizard can read the private key file and extract the private key content automatically. If you are using any other tool/SDK, use the privateKeyPath property instead.

Specify either the privateKeyPath or privateKeyContent.

passPhrase

Specify the pass phrase/password to decrypt the private key if the key file is protected by a pass phrase.

Dataset properties

For a full list of sections & properties available for defining datasets, see the Creating datasets article. Sections such as structure, availability, and policy of a dataset JSON are similar for all dataset types.

The typeProperties section is different for each type of dataset. It provides information that is specific to the dataset type. The typeProperties section for a dataset of type FileShare dataset has the following properties:

You can combine this property with partitionBy to have folder paths based on slice start/end date-times.

Yes

fileName

Specify the name of the file in the folderPath if you want the table to refer to a specific file in the folder. If you do not specify any value for this property, the table points to all files in the folder.

When fileName is not specified for an output dataset, the name of the generated file would be in the following this format:

Data..txt (Example: Data.0a405f8a-93ff-4c6f-b3be-f69616f1df7a.txt

No

fileFilter

Specify a filter to be used to select a subset of files in the folderPath rather than all files.

Specify whether use Binary transfer mode. True for binary mode and false ASCII. Default value: True. This property can only be used when associated linked service type is of type: FtpServer.

No

Note

filename and fileFilter cannot be used simultaneously.

Using partionedBy property

As mentioned in the previous section, you can specify a dynamic folderPath, filename for time series data with partitionedBy. You can do so with the Data Factory macros and the system variable SliceStart, SliceEnd that indicate the logical time period for a given data slice.

Sample 1:

In this example {Slice} is replaced with the value of Data Factory system variable SliceStart in the format (YYYYMMDDHH) specified. The SliceStart refers to start time of the slice. The folderPath is different for each slice. Example: wikidatagateway/wikisampledataout/2014100103 or wikidatagateway/wikisampledataout/2014100104.

In this example, year, month, day, and time of SliceStart are extracted into separate variables that are used by folderPath and fileName properties.

Copy activity properties

For a full list of sections & properties available for defining activities, see the Creating Pipelines article. Properties such as name, description, input and output tables, and policies are available for all types of activities.

Whereas, the properties available in the typeProperties section of the activity vary with each activity type. For Copy activity, the type properties vary depending on the types of sources and sinks.

In Copy Activity, when source is of type FileSystemSource, the following properties are available in typeProperties section:

Property

Description

Allowed values

Required

recursive

Indicates whether the data is read recursively from the sub folders or only from the specified folder.

True, False (default)

No

Supported file and compression formats

JSON Example: Copy data from SFTP server to Azure blob

The following example provides sample JSON definitions that you can use to create a pipeline by using Azure portal or Visual Studio or Azure PowerShell. They show how to copy data from SFTP source to Azure Blob Storage. However, data can be copied directly from any of sources to any of the sinks stated here using the Copy Activity in Azure Data Factory.

Data is written to a new blob every hour (frequency: hour, interval: 1). The folder path for the blob is dynamically evaluated based on the start time of the slice that is being processed. The folder path uses year, month, day, and hours parts of the start time.

The pipeline contains a Copy Activity that is configured to use the input and output datasets and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to FileSystemSource and sink type is set to BlobSink.