Problem Description
After updating Java, you get the following error message when trying to run a Job in Talend Studio 6.x:
Cannot run program "<SOME_PATH>/java.exe": CreateProcess error=2 The system cannot find the file specified.
Root Cause
In the error message, <SOME_PATH> is the full path where the previous Java version was installed. During the Java update, the version was replaced with a new one. Therefore, this path is no longer valid and the Java executable is not found.
Solution
Update Studio preferences to set the new, valid Java executor location.
In Studio, navigate to Window > Preferences > Talend > Java interpreter.
Click Browse and navigate to the new java.exe file.
Click Apply, then click OK.
... View more

Problem Description
When importing a Talend DI Job in TMM, the import log file displays the following warning:
Context parameter type mismatch. XXXXX is not a valid numeric. Parameter 'aParam' is set to default 'null' value
Root Cause
The context variable has a value that is incompatible with its type, as shown below:
Solution
You can ignore the warning, but the context variable is still null and could cause some lineage issues in TMM.
Resolution
Provide an integer value to the context variable.
... View more

Problem Description
When viewing import log files, whatever bridge is used, the following error stack is found:
Error opening jar file C:\Talend\TMM\install\TMM-6.4\TalendMetadataManagement\tomcat\bin\..\..\java\SasBi.jar
java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:219)
at java.util.zip.ZipFile.<init>(ZipFile.java:149)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:130)
at MITI.bridges.javabridgeinterface.BridgeLoader.<init>(BridgeLoader.java:52)
at MITI.mimb.executable.MimbExecutable.generateRequestActions(MimbExecutable.java:631)
at MITI.mimb.executable.MimbExecutable.executeRequest(MimbExecutable.java:507)
at MITI.mimb.executable.MimbExecutable.executeRequest(MimbExecutable.java:489)
at MITI.providers.mimb.MimbExecutionThread.run(MimbExecutionThread.java:178)
Root Cause
This exception is caused by the file SasBi.jar that has been renamed to Sas.jar in recent cumulative patches. The original SasBi.jar is still in the file system from the previous install, but it’s not overwritten by patches because the name changed, so you must remove it manually.
Solution
Remove the ${TMM_HOME}/TalendMetadataManagement/java/SasBi.jar file.
... View more

Problem
When using a tHiveInput component in a standard Job, a query containing lowercase/uppercase column names returns all columns, even if the schema only contains lowercase column names. The same query in tHiveInput with the same schema does not return the column with the name given in uppercase in the query when run in a Big Data Batch.
Root Cause
This is a known issue and Hive is not at fault here. Hive is not case sensitive and always uses lowercase independently of the case used in the Studio. However, the difference between a DI Job and a Spark Job is the use of Avro for Spark.
With Hive, you can request fields with any case but Spark, uses Avro and it is case sensitive. Moreover, Avro field names are created with the Hive query case but fields are retrieved with Studio case in other components.
This means that if in your Hive query field names are not the same as the Studio schema, they are retrieved from Hive, but not from the Avro payload.
Example:
Studio schema: col1, COL2
In Hive, it gives: col1, col2
Back in Studio with a Hive query of "SELECT col1, col2 FROM ..." retrieves col1, col2 columns
Creation of an Avro payload with col1, col2 fields
Then Studio tries to find col1, COL2 fields
col1, null
Doing it the Avro way:
Studio schema: col1, COL2
In Hive, it gives: col1, col2
Back in Studio with a Hive query of "SELECT col1, COL2 FROM ..." retrieves col1, col2 columns
Creation of an Avro payload with col1, COL2 fields
Then Studio tries to find col1, COL2 fields
col1, COL2
Solution
In a Spark Job, the case for column names in the query must match case in the schema used.
Workaround
The other workaround is to use only lowercase letters.
... View more

This article describes how to stitch several files located on the same drive to a Talend Job in TMM 6.4.
You start with a very simple Job that reads from an input file, then writes to an output file.
The input file is a CSV file with full pathname: C:\tmp\tmm-inout\file-in.csv.
The output file is also a CSV file with full pathname: C:\tmp\tmm-inout\file-out.csv.
The Talend job
The tMap in the job
Use the Talend DI bridge to harvest the Job into TMM.
The Talend job harvested in TMM
Put it into a configuration and open the connection editor.
Connection editor
Notice that, although the Job has both an input and an output file, there is only one connection in the editor. However, this is not an issue. As ETL/DI tools do, TMM will always try to factorize the data connectors as much as possible to minimize the stitching work, thus only one connection is needed.
Note: Of course, in cases where the files are located under different drives, the connection editor would show several connections and you would need a separate model for each.
The files are located on different drives
The connections can't be factorized
Back to the single drive case, the next step is to harvest the files into TMM. In TMM 6.4, you can only use the initial file data catalog beta bridge known as FlatExcelFile. This FlatExcelFile bridge will be deprecated/removed from TMM 7.0, the next TMM main release, and replaced by the new official file system data catalog bridges.
Settings for the Flat Files model
The files harvested in TMM
Drop this model into the configuration created earlier, and go back to the connection editor. Select the store, then the store schema.
Selecting the "Files" store in the editor
Selecting the schema in the editor
The configuration shows no warning for the connection on the Talend Job, and the stitching reporter confirms it is all fine.
Stitching reporter
From the ExplorerUI, you get the expected data impact...
Data impact for "key" field
...and data lineage.
Data lineage for "valeur" field
You could have the same situation with a database server used by a DI/ETL Job with many schemas that are sometimes used as input or output or both. The second example below briefly depicts this situation with a PL/SQL script instead of a Talend Job.
The PL/SQL script is harvested into TMM:
insert into tmmuser.month_sales(bname, avgprice, quantity)
select bname, avg(sprice), sum(quantity) from tmmuser.movement group by bname;
PLSQL harvested in TMM
The database is harvested too.
Source and destination tables in the database model
Stitch all this in a configuration.
Selecting the DB store
Selecting the DB store schema
Architecture diagram
The data lineage is performed as expected.
Data lineage from destination table in the ExplorerUI
... View more

In Talend Jobs, it is possible to use the implicit context load feature that allows context variables to be dynamically loaded at run time, thus not providing any value for these variables in the Job itself. For more information, see Using the Implicit Context Load feature in the Talend Help Center.
A very simple Job using this feature would look like the one below.
Job design
Database settings with context variables
Context settings: no values given
Job settings with focus on implicit context load
Context file content
When configuring the model for this Job in Talend Data Catalog, you must provide the context file location in the Context File Directory parameter to allow for the bridge to resolve context variables.
Model settings
After importing the model, you can see that the DB connection is resolved with expected values.
Connection in the job model
And you end up with proper stitching.
Stitching completed
If the Context File Directory parameter is omitted or wrongly set, the Job model will look as follows, and no stitching will be possible.
Job model without proper context file location
... View more

Talend Version
6.4
Summary
Additional Versions
Product
Talend Metadata Manager
Component
Metadata Excel Format bridge
Problem Description
Using Metadata Excel Format bridge to import a FileDelimited model doesn't load anything in TMM, although no error is thrown. TMM doesn't show any content in the imported model.
Problem root cause
The schema name is wrongly set in the Excel file. This can be verified with the Validate command from the TMM Excel addon provided as shown below.
Solution or Workaround
Remove the schema and reimport the model. TMM now shows the expected content.
JIRA ticket number
... View more

Talend Version
6.4.1
Summary
Additional Versions
Product
Talend Big Data
Component
tHiveConnection
Problem Description
A Cloudera distribution is used, and the connection is configured to submit Hive queries to a specific Job queue using an additional JDBC property:
"jdbc:hive2://<host>:<port>/dbName;sess_var_list?mapred.job.queue.name=your_queue_name".
At runtime, the Job fails with the error message:
[FATAL]: <XXX> job returns 1. It doesn't terminate normally.
Exception in component tHiveConnection_1
java.lang.IllegalArgumentException: Illegal character in path at index nnnn:
Problem root cause
The JDBC property was set with spaces on either side of the = character:
"jdbc:hive2://<host>:<port>/dbName;sess_var_list?mapred.job.queue.name = your_queue_name"
Solution or Workaround
In the JDBC property, do not include spaces around the = character.
JIRA ticket number
... View more

Question
How can I submit Hive Queries to a specific Job queue in tHiveConnection when using a Cloudera distribution?
Answer
This can be achieved by specifying the queue name in the JDBC parameter list.
Syntax:
jdbc:hive2://<host>:<port>/dbName;sess_var_list?mapred.job.queue.name=your_queue_name
So in Talend Studio, it can be specified as a property in the Additional JDBC settings on the tHive component.
... View more

Question
Once a workflow glossary has been enabled, how do I assign security roles to it?
Answer
Workflow roles are assigned to categories in the Explorer UI, so the glossary must be in a configuration first. Then, select the category you want to set roles on, and choose Assign Workflow Roles in the top-right menu as shown in the screenshot below.
Talend Data Catalog on-line help reference for glossary workflow
The Business Glossary
... View more