In this article

Move data from an FTP server by using Azure Data Factory

In this article

This article applies to version 1 of Data Factory. If you are using the current version of the Data Factory service, see FTP connector in V2.

This article explains how to use the copy activity in Azure Data Factory to move data from an FTP server. It builds on the Data movement activities article, which presents a general overview of data movement with the copy activity.

You can copy data from an FTP server to any supported sink data store. For a list of data stores supported as sinks by the copy activity, see the supported data stores table. Data Factory currently supports only moving data from an FTP server to other data stores, but not moving data from other data stores to an FTP server. It supports both on-premises and cloud FTP servers.

Note

The copy activity does not delete the source file after it is successfully copied to the destination. If you need to delete the source file after a successful copy, create a custom activity to delete the file, and use the activity in the pipeline.

Enable connectivity

If you are moving data from an on-premises FTP server to a cloud data store (for example, to Azure Blob storage), install and use Data Management Gateway. The Data Management Gateway is a client agent that is installed on your on-premises machine, and it allows cloud services to connect to an on-premises resource. For details, see Data Management Gateway. For step-by-step instructions on setting up the gateway and using it, see Moving data between on-premises locations and cloud. You use the gateway to connect to an FTP server, even if the server is on an Azure infrastructure as a service (IaaS) virtual machine (VM).

It is possible to install the gateway on the same on-premises machine or IaaS VM as the FTP server. However, we recommend that you install the gateway on a separate machine or IaaS VM to avoid resource contention, and for better performance. When you install the gateway on a separate machine, the machine should be able to access the FTP server.

Get started

You can create a pipeline with a copy activity that moves data from an FTP source by using different tools or APIs.

You can also use the following tools to create a pipeline: Azure portal, Visual Studio, PowerShell, Azure Resource Manager template, .NET API, and REST API. See Copy activity tutorial for step-by-step instructions to create a pipeline with a copy activity.

Whether you use the tools or APIs, perform the following steps to create a pipeline that moves data from a source data store to a sink data store:

Create datasets to represent input and output data for the copy operation.

Create a pipeline with a copy activity that takes a dataset as an input and a dataset as an output.

When you use the wizard, JSON definitions for these Data Factory entities (linked services, datasets, and the pipeline) are automatically created for you. When you use tools or APIs (except .NET API), you define these Data Factory entities by using the JSON format. For a sample with JSON definitions for Data Factory entities that are used to copy data from an FTP data store, see the JSON example: Copy data from FTP server to Azure blob section of this article.

Dataset properties

For a full list of sections and properties available for defining datasets, see Creating datasets. Sections such as structure, availability, and policy of a dataset JSON are similar for all dataset types.

The typeProperties section is different for each type of dataset. It provides information that is specific to the dataset type. The typeProperties section for a dataset of type FileShare has the following properties:

You can combine this property with partitionBy to have folder paths based on slice start and end date-times.

Yes

fileName

Specify the name of the file in the folderPath if you want the table to refer to a specific file in the folder. If you do not specify any value for this property, the table points to all files in the folder.

When fileName is not specified for an output dataset, the name of the generated file is in the following format:

Data..txt (Example: Data.0a405f8a-93ff-4c6f-b3be-f69616f1df7a.txt)

No

fileFilter

Specify a filter to be used to select a subset of files in the folderPath, rather than all files.

Specify whether to use the binary transfer mode. The values are true for binary mode (this is the default value), and false for ASCII. This property can only be used when the associated linked service type is of type: FtpServer.

No

Note

fileName and fileFilter cannot be used simultaneously.

Use the partionedBy property

As mentioned in the previous section, you can specify a dynamic folderPath and fileName for time series data with the partitionedBy property.

Sample 1

In this example, {Slice} is replaced with the value of Data Factory system variable SliceStart, in the format specified (YYYYMMDDHH). The SliceStart refers to start time of the slice. The folder path is different for each slice. (For example, wikidatagateway/wikisampledataout/2014100103 or wikidatagateway/wikisampledataout/2014100104.)

In this example, the year, month, day, and time of SliceStart are extracted into separate variables that are used by the folderPath and fileName properties.

Copy activity properties

For a full list of sections and properties available for defining activities, see Creating pipelines. Properties such as name, description, input and output tables, and policies are available for all types of activities.

Properties available in the typeProperties section of the activity, on the other hand, vary with each activity type. For the copy activity, the type properties vary depending on the types of sources and sinks.

In copy activity, when the source is of type FileSystemSource, the following property is available in typeProperties section:

Property

Description

Allowed values

Required

recursive

Indicates whether the data is read recursively from the subfolders, or only from the specified folder.

True, False (default)

No

JSON example: Copy data from FTP server to Azure Blob

This sample shows how to copy data from an FTP server to Azure Blob storage. However, data can be copied directly to any of the sinks stated in the supported data stores and formats, by using the copy activity in Data Factory.

Azure Blob output dataset

Data is written to a new blob every hour (frequency: hour, interval: 1). The folder path for the blob is dynamically evaluated, based on the start time of the slice that is being processed. The folder path uses the year, month, day, and hours parts of the start time.

A copy activity in a pipeline with file system source and blob sink

The pipeline contains a copy activity that is configured to use the input and output datasets, and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to FileSystemSource, and the sink type is set to BlobSink.