SQL Basics – Working with ERP Data

24012010

January 24, 2010

This blog article is based on a portion of a presentation that I gave at a regional ERP user’s group meeting. While some of the information is specific to that particular ERP platform, the concepts should be general enough that the material may be applied to other environments.

Typically, a language called Structured Query Language (SQL) is used to directly communicate with the database. As with all languages, there are syntax rules that must be followed. In general, data is stored in a series of tables, which may be thought of as if they were worksheets in an Excel spreadsheet. The various tables may be joined together to provide greater detail, but great care must be taken to correctly join the tables together. The correct table joining conditions may be partially determined by examining the primary and foreign key relationships between the tables, and we will talk about that more later in the presentation.

Tips:

Relationships between tables containing related information may be determined by:

Primary key columns are often named ID, and the foreign key columns are often named table_ID, for example: ACCOUNT.ID = ACCOUNT_BALANCE.ACCOUNT_ID

Relationships may be discovered by searching for other tables in the database containing the same column names (see Data Dict Tables worksheet).

SQL Basics:

Indexes on table columns may allow a query to execute faster, but it is important that all of the beginning columns in the index are used (don’t forget the TYPE column when retrieving information from the WORK_ORDER table, or the WORKORDER_TYPE column when accessing the OPERATION table). While indexes usually help when a small amount of information is needed from a table, other methods (full table scan) are sometimes more appropriate.

Indexes usually cannot be used for those columns in the WHERE clause if the column appears inside a function name – index will not be used for TRUNC(LABOR_TICKET.TRANSACTION_DATE) = – unless a function based index is created for that function and column combination.

When multiple tables must be accessed, each column retrieved should be prefixed with the table name (or an aliased name for the table) containing the column. Prefixing the columns improves the readabilty of the SQL statement and prevents errors that happen when two tables contain columns with the same names.

In a WHERE clause, character type data should appear in single quotes ( ‘ ), and number type data should not appear in single quotes. Dates should not rely on implicit data type conversion – don’t use ’24-JAN-2010′ as there is a chance that the implicit conversion will fail in certain environments.

Information retrieved from the database using a SQL statement may be grouped to summarize the data.

Executing SQL:

Assume that we are new to SQL and just start typing a SQL statement, hoping that the database will be able to help us make a correct request – since that kind of works in Microsoft Access.

SELECT DISTINCT
*
FROM
WORK_ORDER,
OPERATION,
REQUIREMENT;

When we execute this SQL statement, the database server spins and spins (not the formal meaning of a spin), until the SQL statement finally falls over and dies (to the uninitiated, this is not supposed to happen when a query executes).

Depending on the database engine and the database administrator, it might be that the database is down for a long time, or that just the query tool that submitted the SQL statement crashes after forcing the CPUs on the server to spin excessively. Be careful about who has access to a query tool that access the database.

Simple SQL – Retrieve the part ID, description, product code, and quantity on hand for all parts:

The following is a simple SQL statement which will retrieve four columns from the PART table for all parts, essentially in random order. You may notice that my SQL statement is formatted in a very specific way – the reason for this formatting will become more clear later. Essentially, standardized formats help improve database performance (by reducing the number of hard parses) – for ad hoc SQL statements (those created for one time use), the performance difference probably will not be noticed, but when placed into various applications that execute the SQL statements repeatedly, the performance difference will be very clear.

SELECT
ID,
DESCRIPTION,
PRODUCT_CODE,
QTY_ON_HAND
FROM
PART;

Retrieve the part ID, description, product code, and quantity on hand for all parts with a commodity code of AAAA:

Retrieve the part ID, description, product code, and quantity on hand for all parts with a commodity code of AAAA with more than 10 on hand:

Retrieve the part ID, description, product code, and quantity on hand for all parts with a commodity code beginning with A with 10 to 100 on hand:

SELECT
ID,
DESCRIPTION,
PRODUCT_CODE,
QTY_ON_HAND
FROM
PART
WHERE
COMMODITY_CODE LIKE 'A%'
AND QTY_ON_HAND BETWEEN 10 AND 100;

Retrieve the part ID, description, product code, and quantity on hand sorted by product code, then part ID – Fixing the Random Order:

SELECT
ID,
DESCRIPTION,
PRODUCT_CODE,
QTY_ON_HAND
FROM
PART
WHERE
COMMODITY_CODE LIKE 'A%'
AND QTY_ON_HAND BETWEEN 10 AND 100
ORDER BY PRODUCT_CODE, ID;

Retrieve the product code, and total quantity on hand by product code, sorted by product code:

SELECT
PRODUCT_CODE,
SUM(QTY_ON_HAND) AS TOTAL_QTY
FROM
PART
WHERE
COMMODITY_CODE LIKE 'F%'
GROUP BY PRODUCT_CODE
ORDER BY
PRODUCT_CODE;

The above example changed the previous example quite a bit, so that only those parts with a commodity code beginning with F are returned – in the example, I want to determine the total number of parts on hand by product code (labeled TOTAL_QTY) for those parts with a commodity code beginning with F. In addition to the ORDER BY clause, a GROUP BY clause was also needed. The columns that must be listed in the group by clause are those columns in the SELECT clause which are not inside a SUM(), AVG(), MIN(), MAX(), or similar function.

Retrieve the product code, and total quantity on hand by product code, return only those with a total quantity on hand more than 100, sorted by product code:

SELECT
PRODUCT_CODE,
SUM(QTY_ON_HAND) AS TOTAL_QTY
FROM
PART
WHERE
COMMODITY_CODE LIKE 'F%'
GROUP BY
PRODUCT_CODE
HAVING SUM(QTY_ON_HAND) > 100
ORDER BY
PRODUCT_CODE;

Retrieve the top level part ID produced by all unreleased, firmed, and released work orders, include the work order, lot, part description, and quantity on hand:

Now that we know how to work with data stored in a single table, let’s take a look at an example with two tables. Each column returned from the tables should be prefixed with the table name – primarily in case where the same column name appears in both tables, but doing this also makes it easier to troubleshoot problems with the SQL statement at a later time. The following SQL statement retrieves a list of all parts produced by non-closed and non-canceled work orders that are in the system (status is unreleased, firmed, or released).

Retrieve the engineering master information for a part

Back to the original example which brought down the database server (or at the least filled the temp tablespace to its maximum size), adding in two references to the PART table, each with a different alias name. This SQL statement will retrieve the main header card, all operations, and all material requirements for a specific fabricated part. But, there is a catch. Operations without material requirements are excluded from the output. Fixing that problem requires the use of an outer join, which on Oracle is indicated by a (+) following the column name that is permitted to be NULL, and on SQL Server the outer join is indicated by an * to the side of the equality that is NOT permitted to be NULL. (Note that there are also ANSI style inner and outer joins, but these are not mentioned here).

Analyze the UNIT_MATERIAL_COST column in the PART table. For each part, find the relative cost (high to low) ranking, average cost, smallest cost, highest cost, and the total number in each group, when the parts are grouped individually by product code, commodity code, and also preferred vendor (all parts without a preferred vendor are grouped together):

SELECT
ID,
DESCRIPTION,
PRODUCT_CODE,
COMMODITY_CODE,
UNIT_MATERIAL_COST,
ROW_NUMBER() OVER (PARTITION BY PRODUCT_CODE ORDER BY COMMODITY_CODE,ID) PART_WITHIN_PC,
COUNT(1) OVER (PARTITION BY PRODUCT_CODE ORDER BY COMMODITY_CODE,ID) PART_WITHIN_PC2,
RANK() OVER (PARTITION BY PRODUCT_CODE ORDER BY UNIT_MATERIAL_COST DESC NULLS LAST) RANK_PC_COST,
AVG(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) AVG_PC_COST,
MIN(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) MIN_PC_COST,
MAX(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) MAX_PC_COST,
COUNT(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) COUNT_PC,
RANK() OVER (PARTITION BY COMMODITY_CODE ORDER BY UNIT_MATERIAL_COST DESC NULLS LAST) RANK_CC_COST,
AVG(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) AVG_CC_COST,
MIN(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) MIN_CC_COST,
MAX(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) MAX_CC_COST,
COUNT(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) COUNT_CC,
RANK() OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB') ORDER BY UNIT_MATERIAL_COST
DESC NULLS LAST) RANK_VENDOR_COST,
AVG(UNIT_MATERIAL_COST) OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB')) AVG_VENDOR_COST,
MIN(UNIT_MATERIAL_COST) OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB')) MIN_VENDOR_COST,
MAX(UNIT_MATERIAL_COST) OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB')) MAX_VENDOR_COST,
COUNT(UNIT_MATERIAL_COST) OVER (PARTITION BY PREF_VENDOR_ID) COUNT_VENDOR
FROM
PART
ORDER BY
ID;

On Oracle, there are also analytical functions which allow information to be grouped together without the need for a GROUP BY clause, and each column returned could potentially be grouped using different criteria. There are several interesting analytical functions that make otherwise difficult comparisons both easy to accomplish and efficient to execute. Many of the analytical functions allow data to be summarized by groups without losing the detail contained in each row of the data, for instance we are able to select the part_ID, description, and unit_material_cost without grouping on those columns. PARTITION BY may be thought of as behaving like GROUP BY. The inclusion of ORDER BY within the OVER clause means that only those rows encountered to that point, when sorted in the specified order, will be considered.

Show the sum of the hours worked for each employee by shift date, along with the previous five days and the next five days, and the next Monday after the shift date – looking at previous and next rows in the data, using inline view:

SELECT
EMPLOYEE_ID,
SHIFT_DATE,
NEXT_DAY(SHIFT_DATE,'MONDAY') PAYROLL_PREPARE_DATE,
LAG(HOURS_WORKED,5,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV5_HOURS,
LAG(HOURS_WORKED,4,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV4_HOURS,
LAG(HOURS_WORKED,3,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV3_HOURS,
LAG(HOURS_WORKED,2,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV2_HOURS,
LAG(HOURS_WORKED,1,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV_HOURS,
HOURS_WORKED,
LEAD(HOURS_WORKED,1,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT_HOURS,
LEAD(HOURS_WORKED,2,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT2_HOURS,
LEAD(HOURS_WORKED,3,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT3_HOURS,
LEAD(HOURS_WORKED,4,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT4_HOURS,
LEAD(HOURS_WORKED,5,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT5_HOURS
FROM
(SELECT
EMPLOYEE_ID,
SHIFT_DATE,
SUM(HOURS_WORKED) HOURS_WORKED
FROM
LABOR_TICKET
WHERE
SHIFT_DATE>=TRUNC(SYSDATE-14)
GROUP BY
EMPLOYEE_ID,
SHIFT_DATE
ORDER BY
EMPLOYEE_ID,
SHIFT_DATE);

LAG and LEAD are interesting functions which permit looking at previous and next rows, when sorted in the specified order.

SQL coding is not hard to understand – as long as you build out from a simple SQL statement to the SQL statement that returns the desired output.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: