A KISS approach to naming standards in Data Services

A strict naming schema for all DS objects (projects, jobs, workflows, dataflows, datastores, file formats, custom functions) is essential when working in a multi-user environment. The central repository has no folder concept or hierarchy or grouping functionality. The only way to distinguish between objects from one grouping and another one is by name. Most effective approach for naming objects is based on prefixing.

Note: In order to display the full names of DS objects in the Designer workspace, increase the icon name length. You do this by selecting Tools –> Options from the Designer menu, then expand Designer and select General. In this window, specify the value of 64 in the “Number of characters in workspace icon name” box.

General note: Versioning should not be handled by naming conventions. So, never include a version number in an object name. Use the central repository concept for maintaining successive versions of any object.

1.Reusable objects

Object

Naming Convention

Example

Project

<project_name>

BI4B

Job

<project_name>_<job_name>

BI4B_D

Workflow contained in one job only

<project_name>_<job_name>_[XT|TF|LD|AG…]

<project_name>_<job_name>_[XT|TF|LD|AG…]_<workflow_name>

<project_name>_<job_name>_<workflow_name>

BI4B_D_XT

BI4B_D_LD_Facts

BI4B_D_Init

Workflow that is reused

<project_name>_COMMN_[XT|TF|LD|AG…]_<workflow_name>

COMMN_[XT|TF|LD|AG…]_<workflow_name>

BI4B_COMMN_Init

COMMN_ErrorHandling

Dataflow contained in one job only

<project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>

BI4B_D_LD_Opportunities

Dataflow that is reused

<project_name>_COMMN_[XT|TF|LD|AG…]_<dataflow_name>

BI4B_COMMN_LD_JobCycles

COMMN_LD_Jobs

Embedded Dataflow

<project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>_EMB

BI4B_D_LD_Employees_EMB

ABAP Dataflow

<project_name>_<job_name>_XT_<dataflow_name>_ABAP

BI4B_D_XT_KNA1_ABAP

Custom Function contained in one job only

<project_name>_<function_name>

BI4B_getDate

Custom Function that is reused

COMMN_<function_name>

COMMN_dateKey

1.1. Projects: <project_name>

Give every DS project a 5-character short name. The name has to be short, because it will be used as a prefix for the name of all reusable objects defined within the project.

E.g.: P2345

1.2. Jobs: <project_name>_<job_name>

Give every job a 5-character short name. Use < project name>_ as a prefix for the job name. The name has to be short, because it will be used as a prefix for the name of all workflows and dataflows defined within that job.

Name every workflow with <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared workflows, used across projects, <project_name>_COMMN_ when used in multiple jobs within a given project.

Workflows are often used to group dataflows for serial or parallel execution. In a typical ETL job, dataflows are executed in “stages”: a first set of dataflows have be executed (in parallel) before a next set can be started; and so on. A data warehouse loading job may extract data from the sources, load them into staging, optionally transform from staging-in to staging-out before loading into the core EDW and aggregating into the semantic layer.

Distinguish between job stages by extending the prefix with a 2 character code:

XT: extract from source to staging

TF: transform from staging-in to staging-out

LD: load from staging into the core EDW layer

AG: load (physically aggregate) from core to semantic layer

…

The workflow name will be used as a prefix for the name of all embedded workflows and dataflows.

E.g.: P2345_J2345_XT

Within a workflow, objects (scripts, sub-workflows, dataflows) must either all be defined in parallel or all sequentially, and will be executed as such. There is no limit to the number of objects within a workflow. When the number of objects is higher than the number of processors available, DS will internally control the execution order of embedded parallel objects. Only when there are fewer objects than the number of processors available, they will really be executed in parallel.

Complex hierarchical structures can be defined by nesting workflows. There is no limit to the number of nesting levels, either. With nested workflows, use a name (_Facts for facts extraction or load, _Dims for dimension processing…) combined with an outline numbering scheme (1, 11, 111, 112, 12, 2…).

E.g.: P2345_J2345_LD_Dims21

Some workflows may not contain dataflows at all; they only contain not reusable objects. In that case, just name the workflow according to its function.

E.g. for a workflow embedding an initialization script: P2345_J2345_Initialise

1.4. Dataflows

DS supports three types of dataflows. The dataflow names must be unique across the different types. To distinguish the embedded and ABAP dataflows from the regular ones, use a suffix in their name.

According to design and development best practices there should only be a single target table in a dataflow. Name a dataflow according to that target table.

Use <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared dataflows, used across projects, <project_name>_COMMN_ when used in multiple jobs within a given project. Distinguish between dataflow locations (extract, transform, load, aggregate…) by extending the prefix with a 3 character code (XT_, TF_, LD_, AG_…) as from the embedding workflow.

Name every embedded dataflow with <project_name>_<job name>_ as a prefix; use _EMB as a suffix for the dataflow name. Distinguish between dataflow locations (extract, transform, load and aggregate) by extending the prefix with a 3 character code (XT_, TF_, LD_ and AG_).

E.g.: P2345_J2345_LD_TargetTable_EMB

ABAP dataflows: <project_name>_<job_name>_XT_<dataflow_name>_ABAP

An ABAP dataflow is always used as a source in a regular dataflow. Reuse that name for the ABAP dataflow and add _ABAP as a suffix to make it unique.

E.g.: P2345_J2345_XT_S_TABLE1_ABAP

1.5. Custom Functions: <project_name>_<function_name>

Give every Custom Function a descriptive name. Use <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared custom functions, used across projects.

E.g.: P2345_TrimBlanksExt

2.Datastores: [SAP|BWS|BWT|HANA…]_<datastore_name>

As datastores are often used in multiple projects, they do not follow the same naming conventions as for other reusable projects.

Name a datastore in line with its physical name, and make the prefix depend on the object’s type:

Datastore Type

Database Type

Naming Convention

Example

Database

SQL Server

SQL_

SQL_OC4A1

Database

Oracle

ORA_

ORA_ITSD

Database

Teradata

TD_

TD_Staging

Database

DB2

DB2_

DB2_MDM

Database

MySQL

MySQL_

MySQL_GEB

Database

Sybase ASE

ASE_

ASE_CRN

Database

Sybase IQ

IQ_

IQ_CMDWH

SAP Applications

SAP_

SAP_MDR

SAP BW as a source

BWS_

BWS_Achme

BW as a target

BWT_

BWT_Hana

SAP Hana

HANA_

HANA_DB

Adapter

AS_

AS_Nexus

Web Services

WS_

WS_Weather

Note 1: Pay attention when choosing datastore names. A datastore name cannot be changed anymore once the object has been created.

Note 2: Landscape-related information should not be handled with datastore names. So, never include a physical system indicator (DEV, T, QA…) in a datastore name. Landscape information should be configured using datastore configurations. Create one datastore, then create a datastore configuration for every tier (development, test, QA, production…) in the landscape.

3.File formats: <project_name>_<file_format_name>

Reuse the file name for the format name of a project-specific file. Use < project name>_ as a prefix.

E.g.: P2345_ISO_Cntr_Codes

Note: Pay attention when choosing a file format names. A file format name cannot be changed anymore once the object has been created.

4.Not reusable objects

Because not reusable objects are only defined within the context of a workflow or a dataflow, no strict naming standards are necessary. Names will only serve documentation purposes.

I don’t, because I find prefixes annoying and irritating 🙂 , especially when I am scrolling thru a long list of dataflow names. Let’s do a little test: how long does it take you to find out whether the two strings NBVCXWMLKJHGFDSQAZERTYUIOP and N8VCXWMLKJHGFDSQAZERTYUIOPare different? I bet no time at all! You see that right away. At least I do 🙂 . But what about NBVCXWMLKJHGFDSQAZERTYUIOPand NBVCXWMLKJHGFDSQAZERTYU1OP? It takes me a tiny bit longer. Because just as everybody using latin, cyrillic… script, I am reading left to right. And the further away to the right I find the difference, the longer it takes.

Also, I don’t need the prefix, because I can tell from an object’s icon, from its context, from its content… what type it is. Even if a dataflow and a workflow of mine would have exactly the same name, I am able to distinguish between them in the logs, because DS nicely prints the tracing level.

And I don’t, in analogy with what I and (most probably) you do at database level. How do you call your database table where you keep track of client information? Customer? Customers? KNA1? (KN is an abbreviation of Kunde, the German word for Customer.) Most probably something like that. I for sure have never called it Table_Customer. Not even T_Customer. No prefixes in my db? No prefixes in my DS! It’s as simple as that. By the way, that’s why I don’t feel the need for suffixes either (though I might have called one of my very first tables, long long time ago, EMP_TABLE or something like that – I definitely don’t do that anymore).

All that said, naming standards are a matter of personal preference. And circumstances. Take my datastores. I use a prefix according to its type. That’s because as an SAP consultant I hop from customer to customer, dealing with all sorts of data sources and targets. And the prefix helps me to identify quickly. But suppose I’d work in a 100% MS shop, all my databases would be SQL Server and then there would be no added value from the prefix. I am sure I would adopt a different naming scheme then, that would perfectly serve its purpose, too.

For staging-in, I do something very similar: I don’t use a prefix (you should know by now I am a bit allergic to them) but I define a schema per source system/schema combination and use exactly the same table names as in the source database.

For staging-out I typically adapt to the database naming standards in force, if any.

Hi, just wanted to hop in the discussion even if I get here 2,5 years later. Dirk approach is for me the latest trend as it follows the same principle as databases naming conventions nowadays of not using prefixes. Even if I dislike a lot repeating the project name, it’s useful because of the no folder concept. Good post 🙂 and remember, having a naming convention is always better than having none 😉

I am facing one issue with BODS while working on SQL Server 2014 as staging area. The job has multiple data flows in series which are based on table comparison. I am getting deadlocks frequently for the same job. Is there any fix available for the same in BODS or SQL Server?