Using OAuth 2.0 with Talend to Access Google APIs

Using OAuth 2.0 with Talend to Access Google APIs

This tutorial was originally written in 2014 for my website. Although the Google screenshots may be slightly out of date and I wrote this using version 5.5.1 of Talend, you should still be able to make use of this. I have attached a copy of the job and a copy of the parameters file I used for this.

Google make use of the OAuth 2.0 protocol for authentication for (I think, but am happy to be corrected) all of their services. They do a pretty good job of describing the protocol. Once you have read through the documentation you should have a better idea of what you are doing, which should make this easier. The Google documentation ishere.

The first thing that needs to be done is to create a Google Project. This is described below....

Once you have logged in, you should see a page with content as below....

To create a project, click on the "Create Project" button circled in red. A popup will appear for you to fill out the "Project Name" and "Project Id". It might be a good idea to leave the "Project-Id" as the randomly generated value you are given as none of the values I have ever changed it to have worked.

This tutorial will be built as if it is going to be used to access Google Drive. Accessing other Google tools is done in the same way. You need to specify which APIs you require when you register a project.

Ensure that the "I have read and agree..." tick box is ticked.....oh and remember to read it.

Then press the "Create" button.

After a few seconds you will see the following screen appear....

Next we need to give the project access to certain APIs. In this case we will give the project access to Google Drive. To do this, click on the "APIs & auth" link (circled in red). When the tree expands, click on the "APIs link. The next screen will appear...

You may see some other APIs automatically selected. You can leave those or (as I have done) remove them. Then select the "Drive API" and "Drive SDK".....or whichever Google products you wish to use. As I said earlier, this tutorial is an example of how to access Google Drive. But it can be followed for giving access to any of the Google applications.

Once the APIs that are required have been selected, click on the "Credentials" link to see the screen below....

Here is where we create our OAuth 2.0 Client ID. To start the process click on the "Create new Client ID" button (circled in red). This will reveal the screen below....

In this example we are using the "Web application" application type. While this isn't necessarily the best type to choose for Talend, it doesn't have any limitations as to what can be used with it. It does mean that a user will need to log in the first time, but a "refresh token" can be used to ensure that future "access tokens" can be created from that. This is explained later when describing the Talend Job.

You can see that the "AUTHORIZED JAVASCRIPT ORIGINS" and "AUTHORIZED REDIRECT URI" both contain "http://localhost". We are not using any Javascript, so we don't need to worry too much about the "AUTHORIZED JAVASCRIPT ORIGINS". The "AUTHORIZED REDIRECT URI" is a URI that access tokens are sent back to. This is described later.

Once you click on the "Create Client ID" button, you will see the following section appear on the screen. This holds all of the details you will need for your Talend Job...

The "CLIENT ID" and "CLIENT SECRET" are required by the Talend Job in order to get refresh tokens and access tokens.

Now we can look at building the Talend Job.

The "Retrieve Google Access Code" Job

This Job is intended to be used as a child job by other Talend Jobs that require access to Google products. The purpose of this Job is simply to return an access token.

This Job isn't terribly complicated, but there are lots of IF Conditions to control the data flow. This is to accommodate several scenarios that might be hit when retrieving an access code. A screen shot of the Job can be seen below....

As explained earlier, this tutorial will not go into so much detail about configuring components. Each of the numbers in the screenshot above correspond to areas that need a bit of detail. If there is anything that you feel is not described adequately, please feel free to leave a comment or question below and I will get back to you.

1) Reading Context Variables

This subjob is used to read Context variables in from a flat file. It makes use of a tFileInputDelimited component and a tContextLoad component.

The tFileInputDelimited component makes use of a Context variable called "context_file" to point to the correct file location and requires a schema of two columns; "key" and "value".

This enables the population of the following Context variables which have been created for this Job....

Name

Type

Default Value

access_token

String

client_id

String

client_secret

String

context_file

String

"C:/GoogleDrive/Config/contextGoogle.csv"

redirect_uri

String

refresh_token

String

scope

String

The only Context variable that needs a default value is the "context_file" variable. The rest are handled within the file referenced by that variable.

2) "Access Token Empty And Refresh Token Not Empty" (Run If)

This "Run If" link tests to see if the "access_token" Context variable is empty and the "refresh_token" is not empty. If this test is "true", then the next phase is to generate an access token from the refresh token. The code used is below...

This "Run If" link tests to see if the "access_token" Context variable is not empty and the "refresh_token" is not empty. If this test is "true", then the next phase is to test the access token. The code used is below...

This "Run If" link tests to see if the "access_token" Context variable is empty and the "refresh_token" is empty. If this test is "true", then the next phase is to generate a new refresh token and access token. The code used is below...

This "Run If" link tests to see if any rows have come from the tLogRow_2 component. This is done to prevent the code from following this path if no rows were output. The code used is below...

((Integer)globalMap.get("tLogRow_2_NB_LINE"))>0

10) "tJava_3" (tJava)

This component is used to reset the "access_token" and "refresh_token" to an empty string if trying to acquire an access token from the refresh token fails. It also points the user to where to go to revoke access to the Talend Job so that the process can be started again from scratch. This should not be a common situation, but needs to be handled. The code for this can be seen below....

These subjobs are used to take the current values held by the Context variables and output them to the file that holds those values. The tContextDump component needs no configuration. The tFileOutputDelimited component needs basic configuration which can be seen below...

The "File Name" value is set to the "context_file" Context variable. This is the only Context variable with a default set in the Job.

The schema needs to be a copy of the tContextDump. This is achieved by clicking on the "Edit schema" button and copying the input schema to the output schema.

12) "tJava_1" (tJava)

This component is used to build a URI to be sent to the user to place in a web browser. It is made up of several Context variables which must be set in the Context variable file. This URI is described by Googlehere.

This component is used to retrieve the value of the redirect URL that is returned after a successful authorisation via a web browser. This is demonstrated later. The configuration of this component can be seen below...

Ensure that the "Buttons" drop down is set as "Question".

14) "tJava_2" (tJava)

This component is used to receive the result from the tMsgBox component and extract the authorization code from it. This is used by the next component to authorise the request for an access token. This process is described by Googlehere.

It should be noted that although we are receiving JSON back, this component will automatically convert it to a DOM document with the JSON wrapped with a "ROOT" element by default.

16) "tExtractXMLField_1" (tExtractXMLField)

This component is used to retrieve the access token and refresh token from the returned JSON string which has been converted to an XML document. The configuration of this component can be seen below.....

An output schema is required. To set this up click on the "Edit schema" button circled in red. Two columns called "access_token" and "refresh_token" are required.

Ensure that the areas circled in red are configured as seen above.

The "XPath query" required for the access_token column is "./access_token".

The "XPath query" required for the refresh_token column is "./refresh_token".

17) "tJavaRow_1" (tJavaRow)

This component is used to take the "access_token" and "refresh_token" column values from the previous component and set them as the current values of the Context variables "access_token" and "refresh_token". The code to do this is below....

The "input_row...." bits of code represent the values coming in. The "context.setProperty(..." sections assign the "access_token" and "refresh_token" values.

An "IF Condition" is used to cover situations where a "refresh_token" is not received. This should not happen, but this code prevents the Job from falling over if it does.

19) Read the newly set Context variables into the Job and output just the Access Token

This subjob is run at the end of the Job. It will always run, no matter which path the code has taken. It is used to return the access token that has been retrieved/generated. As it has no idea where the access token has come from, it reads the latest value from the Context variable file. As ALL Context variables will be returned from this file, a tMap component is used to filter the return values.

The tFileInputDelimited component points to the Context variable file using the context_file Context variable. It also has the schema that can be seen in the tMap "row14" table. This needs to be configured.

The tMap component can be seen below....

The filter that is used in the "access_token_return" table can be seen below...

row14.key.compareToIgnoreCase("access_token")==0

Remember that "row14" might be named differently in a version you write. If you have errors here, check the input row name.

20) "Test List Files Services" (tRESTClient)

This component is to simply test the access_token that is said to exist. If it tests successfully, the Job will end. If it fails, the error trigger will be used and the Job will attempt to generate a new one.

To configure this component copy the configuration shown below. Ensure that the sections circled in red are set correctly. To add the "Query parameters" use the green plus symbol circled in red.

The values required can be seen above, but you can find them below so that you can copy and paste them.....

This HTTP request is described by Googlehere. I have used a query to search for files with a modified date less than 2000/01/01. This has been done so that a successful response will return no data.

In order for the HTTP request to work, we need to provide the access token. This is done via the "Advanced Settings" tab as can be seen below....

The access token is provided by the HTTP header "Authorization". Its value must be a combination of the word "Bearer " (with a space) and the access token that has been supplied.

The Context Variable File

Below we can see an example of what the Context variable file will need to look like when it is first run. The variables that are assigned values here must be assigned values in your version. When the Job has been run for the first time, all of the values will be populated.

This Job can be run on its own to demonstrate that it works. It will print the access token to the System.out. It can also be used as a child Job that returns a key/value pair holding the access_token to be used by the parent. This section will demonstrate the Job being run as a standalone Job.

1) Running the Job

When running for the first time, we need to make sure that the Context variable file is fully configured minus values for the refresh_token and access_token (as seen above). Once that is sorted, load the Job and click on the "Run" button (circled in red).

This will produce a string in the System.out. Copy this string (circled in red) and paste it into a web browser.

2) Authorise the Talend Job with Google

When the Google authentication page loads, click on the "Accept" button. As below....

3) Copy the Authorisation Redirect URL

If the authorisation has worked, a redirect URL (like below) will be returned. Copy it.

4) Pass the Redirect URL back to the Talend Job

Paste the value copied from the web browser address bar into the message box and click "OK". The Job will then continue.

5) The Access Token is Generated

As can be seen below, the Access Token will be displayed at the botton of the System.out (circled in red). It will also be added to the Context variable file along with the Refresh Token.

Refreshing the Access Token from the Refresh Token

After the Refresh Token and Access Token have been generated for the first time, there should be no need for future human interaction unless the Refresh Token has been lost. To show this, open the Context variable file and add a few random characters to the Access_Token variable. Then run the Job as above. You will notice that there is no user interaction required and that a new Access_Token is generated.

Resetting the Refresh Token

You may find that for whatever reason the Refresh Token is not working or has been lost. If this is the case then the Talend Job will need it's authentication revoked before the Job can be run again from scratch. This is an unusual situation, but needs to be covered. To emulate this, open the Context variable file and alter some of the characters of the Access_Token and Refresh_Token. Then run the Job. You will see a screen like below informing you to revoke the access and giving you a URL to use....

Open the URL in a web browser and revoke access to your Talend Job (using the name you specified when you created the Google Project). Then start from scratch.

Running the Job as a child Job

This Job can be run as any child Job in Talend. Ensure that you remember to configure a schema for the child Job that returns exactly what is output by the tBufferOutput component.

Re: Using OAuth 2.0 with Talend to Access Google APIs

It has been a while since I wrote this job (I believe I did it in 2014), so the flow may have changed slightly. But the point of this job is that once the first token is received (access token), the refresh token is saved into a CSV file. The next time the job fails and requires a new access token, the refresh token will automatically be used to create one.

Using the refresh token to retrieve a new access token has a slightly different flow from the original flow to generate both a refresh token and access token. This may be what is tripping you up. However, if you have received a refresh token with your first access token, you should be able to use it to generate a new access token.

Re: Using OAuth 2.0 with Talend to Access Google APIs

Actually, we are trying to make a post (json) api call using Oauth2.0 authentication. Please find below step, what we have done with curl command. Exactly, same thing needs to be implemented in Talend.