Fresh Informatica Data Warehousing Interview Questions & Answers:

Factless Facts:Facts without any measures.
Additive Facts:Fact data that can be additive/aggregative.
Non-Additive facts: Facts that are result of non-additon
Semi-Additive Facts: Only few colums data can be added.
Periodic Facts: That stores only one row per transaction that happend over a period of time.
Accumulating Fact: stores row for entire lifetime of event.

There are three types of facts
1)Additive fact:a fact which can be summarized by any one of dimension or all dimensions EX:QTY,REVENUE

2)Semi additive fact:a fact which can be summarized for few dimensions not for all dimensions. ex:current balance

3)non additive fact: a fact which cannot be summarized by any of dimensions.ex:percentage of profit

Repository is a database in which all informatica componets are stored in the form of tables. The reposiitory server controls the repository and maintains the data integrity and Consistency across the repository when multiple users use Informatica. Powercenter Server/Infa Server is responsible for execution of the components (sessions) stored in the repository.

We have mainly tree ports Inport, Outport, Variable port. Inport represents data is flowing into transformation. Outport is used when data is mapped to next transformation. Variable port is used when we mathematical calculations are required. If any addition i will be more than happy if you can share.

We can use variable ports to store values of previous records which is not otherwise possible in Informatica.

By Definiation, Active transformation is the transformation that changes the number of rows that pass through it...in union transformation the number of rows resulting from union can be (are) different from the actual number of rows.

As we are combining results of two select queries using Union Tr Most probably no of rows increases.So it is An Active Tr.

When you add a relational or a flat file source definition to a maping,U need to connect it to a source qualifer transformation.The source qualifier transformation represnets the records that the informatica server reads when it runs a session.

SQ transformation is a transformation which is automatically generated to read data from source tables into informatica designer.

If the data in table1 is dependent on the data in table2 then table2 should be loaded first.In such cases to control the load order of the tables we need some conditional loading which is nothing but constraint based load

In Informatica this feature is implemented by just one check box at the session level.

A CBl specifies the order in which data loads into the targets based on key constraints

A target load plan defines the order in which data being extracted from the source qualifier

Partition points mark the thread boundaries in a source pipeline and divide

the pipeline into stages.

Partition points mark the thread boundaries in a pipeline and
divide the pipeline into stages. The Informatica Server sets partition points at several
transformations in a pipeline by default. If you use PowerCenter, you can define other partition
points. When you add partition points, you increase the number of transformation threads,
which can improve session performance. The Informatica Server can redistribute rows of data at partition points, which can also improve session performance.

Unconnected:
The unconnected Stored Procedure transformation is not connected directly to the flow of the mapping. It either runs before or after the session, or is called by an expression in another transformation in the mapping.

connected:
The flow of data through a mapping in connected mode also passes through the Stored Procedure transformation. All data entering the transformation through the input ports affects the stored procedure. You should use a connected Stored Procedure transformation when you need data from an input port sent as an input parameter to the stored procedure, or the results of a stored procedure sent as an output parameter to another transformation.

by using unconnected stored procedure reusability is possible in connected only one time is possible

Normal Load: Normal load will write information to the database log file so that if any recorvery is needed it is will be helpful. when the source file is a text file and loading data to a table,in such cases we should you normal load only, else the session will be failed.

Bulk Mode: Bulk load will not write information to the database log file so that if any recorvery is needed we can't do any thing in such cases.

One way is supplying the sorted input to aggregator transformation. In situations where sorted input cannot be supplied, we need to configure data cache and index cache at session/transformation level to allocate more space to support aggregation.

I am providing the answer which I have taken it from Informatica 7.1.1 manual,

Ans: While running a Workflow,the PowerCenter Server uses the Load Manager process and the Data Transformation Manager Process (DTM) to run the workflow and carry out workflow tasks.When the PowerCenter Server runs a workflow, the Load Manager performs the following tasks:

The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses the Rank Index port to store the ranking position for each record in a group. For example, if you create a Rank transformation that ranks the top 5 salespersons for each quarter, the rank index numbers the salespeople from 1 to 5.

What does this do?
Each session uses SHARED/LOCKED (semaphores) memory blocks. The ABORT function kills JUST THE CODE threads, leaving the memory LOCKED and SHARED and allocated. The good news: It appears as if AIX Operating system cleans up these lost memory blocks. The bad news? Most other operating systems DO NOT CLEAR THE MEMORY, leaving the memory "taken" from the system. The only way to clear this memory is to warm-boot/cold-boot (restart) the informatica SERVER machine, yes, the entire box must be re-started to get the memory back.

If you find your box running slower and slower over time, or not having enough memory to allocate new sessions, then I suggest that ABORT not be used.

So then the question is: When I ask for a STOP, it takes forever. How do I get the session to stop fast?

well, first things first. STOP is a REQUEST to stop. It fires a request (equivalent to a control-c in SQL*PLUS) to the source database, waits for the source database to clean up. The bigger the data in the source query, the more time it takes to "roll-back" the source query, to maintain transaction consistency in the source database. (ie: join of huge tables, big group by, big order by).

It then cleans up the buffers in memory by releasing the data (without writing to the target) but it WILL run the data all the way through to the target buffers, never sending it to the target DB. The bigger the session memory allocations, the longer it takes to clean up.

Then it fires a request to stop against the target DB, and waits for the target to roll-back. The higher the commit point, the more data the target DB has to "roll-back".

FINALLY, it shuts the session down.

WHAT IF I NEED THE SESSION STOPPED NOW?
Pick up the phone and call the source system DBA, have them KILL the source query IN THE DATABASE. This will send an EOF (end of file) downstream to Informatica, and Infa will take less time to stop the session.

If you use abort, be aware, you are choosing to "LOSE" memory on the server in which Informatica is running (except AIX).

If you use ABORT and you then re-start the session, chances are, not only have you lost memory - but now you have TWO competing queries on the source system after the same data, and you've locked out any hope of performance in the source database. You're competing for resources with a defunct query that's STILL rolling back.

Mainly there are two types of tranformation.1]Active TransformationAn active transformation can change the number of rows that pass through it from source to target i.e it eliminates rows that do not meet the condition in transformation.2]Passive TransformationA passive transformation does not change the number of rows that pass through it i.e it passes all rows through the transformation.Transformations can be Connected or UnConnected. Connected TransformationConnected transformation is connected to other transformations or directly to target table in the mapping.UnConnected TransformationAn unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.list of Transformations available in Informatica:1 source qualifier Tranformation2..Expression Transformation 3..Filter Transformation 4..Joiner Transformation 5..Lookup Transformation 6..Normalizer Transformation 7..Rank Transformation 8..Router Transformation 9..Sequence Generator Transformation 10..Stored Procedure Transformation 11..Sorter Transformation 12..Update Strategy Transformation .13...Aggregator Transformation 14..XML Source Qualifier Transformation 15..Advanced External Procedure Transformation 16..External Transformation 16.. custom tranformationMostly use of particular tranformation depend upon the requirement.In our project we are mostly using source qualifier ,aggregator,joiner,look up tranformation

When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the row ID of the last row committed to the target database.
The Informatica Server then reads all sources again and starts processing from the next row ID. For example, if the Informatica Server commits 10,000 rows before the
session fails, when you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001.
By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica Server setup before you run a session so the
Informatica Server can create and/or write entries in the OPB_SRVR_RECOVERY table.

If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the
session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
? Run the session again if the Informatica Server has not issued a commit.
? Truncate the target tables and run the session again if the session is not recoverable.
? Consider performing recovery if the Informatica Server has issued at least one commit.

Use Sorter Transformation. When you configure the Sorter Transformation to treat output rows as distinct, it configures all ports as part of the sort key. It therefore discards duplicate rows compared during the sort operation

2.Import the source and target from people soft using ODBC connections

3.Define connection under "Application Connection Browser" for the people soft source/target in workflow manager. select the proper connection (people soft with oracle,sybase,db2 and informix) and execute like a normal session.

Because in Data warehousing historical data should be maintained, to maintain historical data means suppose one employee details like where previously he worked, and now where he is working, all details should be maintain in one table, if u maintain primary key it won't allow the duplicate records with same employee id. so to maintain historical data we are all going for concept data warehousing by using surrogate keys we can achieve the historical data(using oracle sequence for critical column).

so all the dimensions are marinating historical data, they are de normalized, because of duplicate entry means not exactly duplicate record with same employee number another record is maintaining in the table.

In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup, and week lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called TIME DIMENSION for performance and slicing data.

This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend analysis by comparing this year sales with the previous year or this week sales with the previous week.

Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data.
Target definitions. Definitions of database objects or files that contain the target data.
Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions.
Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.
Reusable transformations. Transformations that you can use in multiple mappings.
Mapplets. A set of transformations that you can use in multiple mappings.
Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.

It is a web based application that enables you to run reports againist repository metadata.
With a meta data reporter,You can access information about U?r repository with out having knowledge of sql,transformation language or underlying tables in the repository.

Standalone repository. A repository that functions individually, unrelated and unconnected to other repositories.
Global repository. (PowerCenter only.) The centralized repository in a domain, a group of connected repositories. Each domain can contain one global repository. The global repository can contain common objects to be shared throughout the domain through global shortcuts.
Local repository. (PowerCenter only.) A repository within a domain that is not the global repository. Each local repository in the domain can connect to the global repository and use objects in its shared folders.

Power Center repository is used to store informatica's meta data .
Information such as mapping name,location,target definitions,source definitions,transformation and flow is stored as meta data in the repository.

Specifies the directory used to cache master records and the index to these records. By default, the cached files are created in a directory specified by the server variable $PMCacheDir. If you override the directory, make sure the directory exists and contains enough disk space for the cache files. The directory can be a mapped or mounted drive.

Its a session option. when the informatica server performs incremental aggr. it passes new source data through the mapping and uses historical chache data to perform new aggregation caluculations incrementaly. for performance we will use it.

When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes incrementally and you can capture changes, you can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session.

For example, you might have a session using a source that receives new data every day. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You then enable incremental aggregation.

When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This allows the Integration Service to read and store the necessary aggregate data. On March 2, when you run the session again, you filter out all the records except those time-stamped March 2. The Integration Service then processes the new data and updates the target accordingly.

Consider using incremental aggregation in the following circumstances:

You can capture new source data. Use incremental aggregation when you can capture new source data each time you run the session. Use a Stored Procedure or Filter transformation to process new data.

Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with complete source data.

Note: Do not use incremental aggregation if the mapping contains percentile or median functions. The Integration Service uses system memory to process these functions in addition to the cache memory you configure in the session properties. As a result, the Integration Service does not store incremental aggregation values for percentile and median functions in disk caches.

You should configure the mapping with the least number of transformations and expressions to do the most amount of work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache.

You can also perform the following tasks to optimize the mapping:

Configure single-pass reading.
Optimize datatype conversions.
Eliminate transformation errors.
Optimize transformations.
Optimize expressions. You should configure the mapping with the least number of transformations and expressions to do the most amount of work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache.

The reject rows say for example due to unique key constrain is all pushed by session into the $PMBadFileDir (default relative path is <INFA_HOME/PowerCenter/server/infa_shared/BadFiles) which is configured with path at Integration Service level. Every Target will have property saying Reject filename which gives the file in which rejects rows are stored.

while importing flat file definetion just specify the scale for a neumaric data type. in the mapping, the flat file source supports only number datatype(no decimal and integer). In the SQ associated with that source will have a data type as decimal for that number port of the source.

Once the session is succeeded the right click on session and go for statistics tab.

There you can see how many number of source rows are applied and how many number of rows loaded in to targets and how many number of rows rejected.This is called Quantitative testing.

If once rows are successfully loaded then we will go for qualitative testing.

Steps

1.Take the DATM(DATM means where all business rules are mentioned to the corresponding source columns) and check whether the data is loaded according to the DATM in to target table.If any data is not loaded according to the DATM then go and check in the code and rectify it.

Maping parameter represents a constant value that You can define before running a session.A mapping parameter retains the same value throughout the entire session.
When you use the maping parameter ,U declare and use the parameter in a maping or maplet.Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter,a maping variable represents a value that can change throughout the session.The informatica server saves the value of maping variable to the
repository at the end of session run and uses that value next time you run the session.

When you edit,schedule the sesion each time,informatica server directly communicates the repository to check whether or not the session and users are valid.All the metadata of sessions and mappings will be stored in repository.

repository always stores the meta data which stores all informations about trasformations, sessions, mappings,
scheduling and user validation of the process.

Reusable transformations can be used in multiple mappings.When you need to incorporate this transformation into maping, U add an instance of it to maping.Later if you change the definition of the transformation ,all instances of it inherit the changes.Since the instance of reusable transforamation is a pointer to that transforamtion, You can change the transforamation in the transformation developer,its instances automatically reflect these changes.This feature can save you great deal of work.

A reusable Transformation is a reusable metadata object , defined with business logic using single Transformation.

Session - A Session Is A set of instructions that tells the Informatica Server How And When To Move Data From Sources To Targets. After creating the session, we can use
either the server manager or the command line program pmcmd to start or stop the session.
Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel
Execution By The Informatica Server.
There Are Two Types Of Batches :
Sequential - Run Session One after the Other.
Concurrent - Run Session At The Same Time.

As per my knowledge there is no such restriction to use this number of sources or targets inside a mapping.

Question is " if you make N number of tables to participate at a time in processing what is the position of your database. I organization point of view it is never encouraged to use N number of tables at a time, It reduces database and informatica server performance

Manages the session and batch scheduling: Whe you start the informatica server the load maneger launches and queries the repository for a list of sessions configured to run
on the informatica server.When you configure the session the loadmanager maintains list of list of sessions and session start times.When you sart a session loadmanger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process.

Locking and reading the session: When the informatica server starts a session lodamaager locks the session from the repository.Locking prevents you starting the session again and again.

Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file and verifies that the session level parematers are declared in the file
Verifies permission and privelleges: When the sesson starts load manger checks whether or not the user have privelleges to run the session.

SHARE

Submit Your Feedback:

Your Location Is:

×

Thank You For Your Feedback!

Your message has been sent successfully.

Disclaimer

Interview Questions Answers .ORG is responsive and optimized web portal for individuals to get preparation for their job interviews, learning and training. Content at Interview Questions Answers .ORG might be simplified to improve our users experience.
We constantly review our content to avoid errors and copyright violations, but we cannot warrant full correctness of all the content.
While using this site, you agree to have read and accepted our terms of use,
cookie and privacy policy.
Copyright 2007-2017 by Interview Questions Answers .ORG All Rights Reserved.Powered Global Guideline.