Total Pageviews

Featured Posts

The purpose of a Columnar(NoPI) table is to spread the rows evenly across the AMPs. This is why a NoPI table is often used as a staging table.

Columnar Table Fundamentals

Columnar Tables must be a NoPI Table so No Primary Index (NoPI).

The NoPI brings even distribution to the table.

Columnar Tables allow Columns to be Partitioned.

An AMP still holds the entire row, but partitions vertically.

Columns are placed inside their own individual Container.

All Containers have the same amount of rows in the exact order.

Single Columns or Multi-Columns can be placed inside containers.

Each container looks like a small table for I/O purposes.

Add up all the containers and you rebuild the row.

Columnar Tables make sense when users query only certain columns.

When a row is deleted it is NOT Physically Deleted but marked deleted

Normal table vs columnar table
The two tables above contain the same Employee data, but one is a columnar table. Employee_Normal has placed 3 rows on each AMP with 5 columns. The other table Employee_Columnar has 5 Containers each with one column.

Selecting the wrong operator to process the CLOB or BLOB data type terminates the job.

TPT script Structure

Building TPT Scripts

TPT uses a SQL-like scripting language for extract, basic transformation, and load functions. This easy-to-use language is based on SQL, making it familiar to most database users. All operators use the same scripting language. This represents an improvement over the individual utilities, each of which has its own unique scripting language. A single script can be used to define multiple operators and schemas to create complex extracting and loading jobs.

There are only a few statements that are needed to build a TPT script. A quick look at the basic statements can be seen here:

DEFINE JOB

Defines the overall job and packages together all following DEFINE and APPLY statements.

DEFINE SCHEMA

Defines the structure of a data object in terms of columns of specific data types. A given schema definition can be used to describe multiple data objects. Also, multiple schemas can be defined in a given script.

DEFINE OPERATOR

Defines a specific TPT operator to be used in this job.

DEFINE DBMS

Defines an instance of a database server to be used in this job.

APPLY

A processing statement used to initiate a TPT load, update, or delete operation.

Note: Example, Job Variables can be maintained in separate variables file or we can pass directly

Job execution

tbuild -f <script file name> -z <checkpoint interval>

The -z option sets the checkpoint interval to the number of seconds specified.

SET CHECKPOINT INTERVAL 160 SEC

Or

SET CHECKPOINT INTERVAL 12 MINUTES

The checkpoint interval can be specified in a job script between the last DEFINE statement

and the APPLY statement(s).

tbuild

We have seen the tbuild command in many of the previous examples. This command is used to initiate a TPT job. The following key options may be used with tbuild:

-fSpecifies the filename to be used as input.

-uSpecifies job variable values which are to be applied.

-zSpecifies a checkpoint interval to be used for the client side.

-sSpecifies that job execution is to start at a specific job step.

-vSpecifies that job attributes are to be read from an external file.

-lSpecifies latency interval - how often to flush stale buffers.

-n Specifies that the job should continue, even if a step return code is greater than 4

Note: If the checkpoint interval is specified both in the job script and with the tbuild -z command option, the -z option takes precedence.

Troubleshooting a Failed Job

Common Job Failures and Remedies

There are two categories of job failures. The evaluation and correction of each type of failure must be handled differently:

• Some jobs fail at launch, during execution of the tbuild statement, but before the initial job step have run.

• Some jobs launch successfully, and one or more job steps may execute successfully, but thejob fails to run to completion.

The following sections describe common errors encountered by Teradata PT jobs.

When the Job Fails to Begin Running.

When a job is launched but fails to begin execution, the associated errors appear in the public log. Errors are detected according to the launch sequence:

1. Teradata PT first processes the options specified in the tbuild command. If it detects tbuild command errors, the job stops.

Error types encountered: tbuild command errors

2 If Teradata PT encounters no tbuild command errors, it then parses the job script and creates a parallel job execution plan that will perform the operations specified in the APPLY

statement(s) in the job script.

Errors types encountered:

Pre processor errors -- Incorrect use of job variables or the INCLUDE directive.

Job script compilation errors -- Syntactic and semantic errors.

3.Only when script compilation is successful and the execution plan has been generated does the Teradata PT allocate resources for and launch the various internal tasks required to execute the job plan

A parameterized query used to placeparameters, and the parameter values are provided in a separate statement at time of execution.

The main purpose is todistinguishes between code and data. Also avoids attackers from changing the query by inserting SQL commands.

Two newData Dictionary tables are

DBC.DBQLParamTbl

DBC.DBQLParamTbl logs

Light-Weight Redistribution:

The Light-Weight Redistribution(LWR) also referred to as the Single Sender Redistribution (SSR). With this feature, the Teradata Database optimizer can switch from an all-AMP row redistribution to a few AMP row redistribution.

While executing the query, Teradata optimizer determines a query step is eligible for SSR, then the retrieved rows are redistributed by hash code to receiver AMP. In this case the number of AMPs is few. Without SSR all rows are normally redistributed across all AMPs.

SELECT OADD_MONTHS (DATE '2014-04-15', 2), OADD_MONTHS (DATE '2008-02-29', 1);
---------------------------------------------------------------------------------------------------------------------
OADD_MONTHS(2014-04-15,2) OADD_MONTHS(2008-02-29,1)
2014-06-15 2008-03-31
Since 29 is the last day in February, March 31 is returned since 31 is the last day in March

SELECT TRUNC(CAST('2014/06/05' AS DATE), 'D') (FORMAT 'yyyy-mm-dd');
----------------------------------------------------------------------------------------------------
TRUNC('2014/06/05','D')
2014-06-01
The date was rounded to the first day of that week.

SELECT ROUND(CAST('2003/09/20' AS DATE), 'RM') (FORMAT 'yyyy-mm-dd');
----------------------------------------------------------------------------------------------------------
ROUND('2003/09/20','RM')
2003-10-01
Since the day is greater than or equal to 16, the date is rounded to the beginning of the next month.

Teradata 14 has released many Domain Specific Functions. Now we are discussing about new Numeric functions which are equivalent to Oracle.
The following numeric functions are available in this Teradata 14 release.

Name

Description

SIGN

It returns the sign of a value

TRUNC

It will truncate a numeric value

ROUND

It will round a numeric value

GREATEST

It return the highest value from a list of given values

LEAST

It return the lowest value from a list of given values

TO_NUMBER

It convert a string to a number via a format string

CEILING

It return the smallest integer not less than the input parameter

FLOOR

It return the largest integer equal to or less than the input parameter

In this Teradata 14 has released many domain specific function added NUMERIC data type, String functions and many of the functions supports regular expressions. These new functions are designed to be compatible to Oracle.
Here we are going to discuss about the following domain specific regular expression functions.

REGEXP_SUBSTR

REGEXP_REPLACE

REGEXP_INSTR

REGEXP_SIMILAR

REGEXP_SPLIT_TO_TABLE

Now will discuss in detail of each function below

REGEXP_SUBSTR
This function extracts a substring from a source string that matches a regular expression pattern.

In this we are replacing the 1st occurrence of 'God's' with 'Mother' by considering with case specific with 'c'.

SELECT REGEXP_REPLACE('I love the buzz-buzz buzzing of the bee', 'buzz', 'BUZZ', 1, 2, 'c');
In this we are replacing the 2nd occurrence of 'buzz' with 'BUZZ' by considering the case specific with 'c'.

SELECT REGEXP_REPLACE ('ABCD123-$567xy','[^0-9]*','',1,0,'i');
In this we are replacing the any character or symbols with NULL. So it results only Numeric values.

REGEXP_INSTR
This function Search the source string for a match to a regular expression pattern and return the beginning or ending position of that match.

REGEXP_SIMILAR
This function compares a source string to a regular expression and returns an integer value.
1 (true) if the entire string matches regexp_arg
0 (false) if the entire string does not match regexp_arg

It returns the names of employees that match
Smith Byrd
Smith Bird
John Bird
John Byrd

REGEXP_SPLIT_TO_TABLE
This table function splits a source string into a table of strings using a regular expression as the delimiter.
General Snytax:
REGEXP_SPLIT_TO_TABLE (source_string, regexp_string, match_arg)

REGEXP_SPLIT_TO_TABLE Function Examples:

CREATE TABLE split_table_latin(id integer, src varchar(100) character set latin, pattern varchar(100) character set latin, match varchar(100) character set latin);

A) Explain the EXPLAIN: Check for EXPLAIN plan to see how exactly Teradata will be executing the query. Try to understand basic keywords in Explain Plan like confidence level, join strategy used, re-distribution happening or not.

B) Collect STATS: The stats of the columns used join conditions should updated. Secondary Indexes without proper STATS can be of little or no help. Check for STATS status of the table.

C) Use Proper PI: If the Primary index is not properly defined in any one or all of the tables in the query. Check if the PI of target table loaded is unique.

D) Use PPI: If there is Partition Primary Index created on a table, try to use it. If you are not using it in filter condition, it will degrade the performance.

E) No FUNCTIONS in Conditions: Try to avoid using function in join conditions. Ex Applying COALESCE or TRIM etc causes high CPU consumption.

F) Use PPI: If Partition Primary Index is defined in tables try to use it. If you are not using it in filter condition, it will degrade the performance.
G) Same column DATA TYPES: Define same data type for the joining columns.

H) Avoid IN clause in filter conditions: When there can be huge number of values in where conditions, better option can be to insert such values in a volatile table and use volatile table with INNER JOIN in the main query.

I) Use Same PI in Source & Target: PI columns also can help in saving the data into disk .If the Source and Target have the same PI, data dump can happen very efficiently form source to target.

J) Collect STATS on VOLATILE table: Collect stats on volatile tables where required can save AMPCPU. Remove stats if already present where it is not getting used.
If the volatile table contains UNIQUE PI, then go for sample stats rather than full stats.

K) DROPPING volatile tables explicitly: Once volatile tables is no more required you can drop those. Donít wait for complete procedure to be over. This will free some spool space immediately and could prove to be very helpful in avoiding No More Spool Space error.

L) NO LOG for volatile tables: Create volatile tables with NO LOG option.

Q) Try MSR: If same target table is loaded multiple times, try MSR for several sections. This will speed the final MERGE step into target table and you may see good CPU gain.
R) Try OLAP Functions: Check if replacing co-related sub query with OLAP function may result in AMPCPU saving.

S) Avoid DUPLICATE data: If the join columns in the tables involved in the query have duplicates. Use Distinct or Group by, load into a volatile table, collect stats and use the volatile table.

T) Use Proper JOINS: If joins used, donít use right outer, left or full joins where inner joins is sufficient.

U) User proper ALIAS: Check the aliases in the joins. Small mistake could lead to a product join.

V) Avoid CAST: Avoid unnecessary casting for DATE columns. Once defined as DATE, you can compare date columns against each other even when they are in different format. Internally, DATE is stored as INTEGER. CAST is required mainly when you have to compare VARCHAR value as DATE.
W) Avoid UDF: Most of the functions are available in Teradata for data manipulations. So avoid User Defined Functions

X) Avoid FULL TABLE SCAN: Try to avoid FTS scenarios like SI should be defined on the columns which are used as part of joins or Alternate access path. Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is defined on that particular column

Y) Avoid using IN/NOT IN: For large list of values, avoid using IN /NOT IN in SQLs. Write large list values to a temporary table and use this table in the query
Z) Use CONSTANTS: Use constants to specify index column contents whenever possible, instead of specifying the constant once, and joining the tables. This may provide a small savings on performance.