Friday, 30 March 2018

Parameters and variables in Informatica PowerCenter

Parameters and Variables are used to store information during the execution. Each parameter and variable is defined with a specific data type and their main purpose is to provide increased development flexibility.

The value of a parameter is fixed during the run of the mapping

Variables can change in value during run-time

The format for declaring variables and parameters is $$VariableName or $$ParameterName .

Steps to create parameters and variables -->

Step 1 – To Create mapping parameter - In mapping designer,

Select mappings menu

Select parameters and variables menu

Step 2- In next screen,

Click on add new variable menu

From drop down, select type as parameter

Enter parameter name as $$Deptno

Enter an initial value of 10

Select OK button

Now, you have created a mapping parameter deptno, with initially assigned value of 10, and this parameter can be referenced inside the mapping.

How to Create Mapping Variable

Step 1 – In mapping designer

Select mappings menu

Select parameters and variables menu

Step 2- On the next screen

Click on add new variable menu

From drop down, select type as variable

Enter variable name as $$TotalSalary

Select DataType as decimal

Enter an initial value of 0

Select OK button

This will create a mapping variable.

Note – mapping parameter and variable names always begin with $$.

To change the value of a variable, one of the following functions can be used within an expression:

SETMAXVARIABLE($$Variable, value) , SETMINVARIABLE($$Variable, value), SETVARIABLE($$Variable, value) , SETCOUNTVARIABLE($$Variable), where

SETVARIABLE sets the variable to a value that you specify (executes only if a row is marked as insert or update). At the end of a successful session, the Integration Service saves either the MAX or MIN of (start value.final value) to the repository, depending on the aggregate type of the variable. Unless overridden, it uses the saved value as the start value of the variable for the next session run.

SETCOUNTVARIABLE - increments a counter variable. If the Row Type is Insert increment +1, if Row Type is Delete increment -1. A value = 0 is used for Update and Reject.

SETMAXVARIABLE - compare current value to value passed into the function. Returns the higher value and sets the current value to the higher value.

SETMINVARIABLE - compare current value to the value passed into the function. Returns the lower value and sets the current value to the lower value

Note --> At the end of a successful session, the values of variables are saved to the repository. The SetVariable function writes the final value of a variable to the repository based on the Aggregation Type selected when the variable was defined.

Initialization priority of Parameters: Parameter file, Declared initial value, Default value
Initialization priority of Variables: Parameter file, Repository value, Declared initial value, Default value

Thursday, 29 March 2018

Constraint Based Loading and Target Load Order

When you have a Parent – Child relationships (Primary Key, Foreign Key) defined in the target database, loading data in target requires special consideration.

Either you have to load data into the target in different sessions or you have to ensure that database constraints are honored before loading data into the target.

The example given here shows how to use Constraint Based Loading and Target Load Order while loading

We will look at target load order in this post and constraint based loading in the subsequent one.

Target Load Order -->

If you have multiple SQ connected to muliple targets and if you want to decide the order in which the targets are loaded , we go with target load order .

We can decide the target load order based on the SOurce qualifier transformations preset in the mapping

Target Load Order Group:

A target load order group is the collection of source qualifiers, transformations and targets linked in a mapping. The integration service reads the target load order group concurrently and it processes the target load order group sequentially. The following figure shows the two target load order groups in a single mapping:

Use of Target Load Order:

Target load order will be useful when the data of one target depends on the data of another target. For example, the employees table data depends on the departments data because of the primary-key and foreign-key relationship. So, the departments table should be loaded first and then the employees table. Target load order is useful when you want to maintain referential integrity when inserting, deleting or updating tables that have the primary key and foreign key constraints.

Target Load Order Setting:

You can set the target load order or plan in the mapping designer. Follow the below steps to configure the target load order:

1. Login to the powercenter designer and create a mapping that contains multiple target load order groups.
2. Click on the Mappings in the toolbar and then on Target Load Plan. The following dialog box will pop up listing all the source qualifier transformations in the mapping and the targets that receive data from each source qualifier.

3. Select a source qualifier from the list.
4. Click the Up and Down buttons to move the source qualifier within the load order.
5. Repeat steps 3 and 4 for other source qualifiers you want to reorder.
6. Click OK

Wednesday, 28 March 2018

SCD Type 2 implementation in Informatica

Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time.

SCD type 2 will store the entire history in the dimension table.

We will see how to implement the SCD Type 2 Flag in informatica. As an example consider the customer dimension. The source and target table structures are shown below:


Source table--> Customer

Customer_Id

Location




Target dimension table --> Customer_Dim

Cust_Key_Number

Customer_Id

Location

Flag

The basic steps involved in creating a SCD Type 2 Flagging mapping are

Identifying the new records and inserting into the dimension table with flag column value as one. Flag column value as 1 correspond to active records
Identifying the changed record and inserting into the dimension table with flag value as one.
Identify the changed record and update the existing record in dimension table with flag value as zero. Flag value 0 identifies the record as inactive or historical.

We will divide the steps to implement the SCD type 2 flagging mapping into four parts.

SCD Type 2 Flag implementation - Part 1

Here we will see the basic set up and mapping flow require for SCD type 2 Flagging. The steps involved are:

Create the source and dimension tables in the database.
Open the mapping designer tool, source analyzer and either create or import the source definition.
Go to the Warehouse designer or Target designer and import the target definition.
Go to the mapping designer tab and create new mapping.
Drag the source into the mapping.
Go to the toolbar, Transformation and then Create.
Select the lookup Transformation, enter a name and click on create. You will get a window as shown in the below image.

Select the customer dimension table and click on OK.
Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep only Cust_key, customer_id and location ports in the lookup transformation. Create a new port (IN_Customer_Id) in the lookup transformation. This new port needs to be connected to the customer_id port of the source qualifier transformation.

Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id = IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL Override. Alternatively you can generate the SQL query by connecting the database in the Lookup SQL Override expression editor and then add the WHERE clause.

SELECT  Customers_Dim.Cust_Key as Cust_Key,
        Customers_Dim.Location as Location,
        Customers_Dim.Customer_Id as Customer_Id
FROM    Customers_Dim
WHERE   Customers_Dim.Flag = 1

Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier transformation to the In_Customer_Id port of the LKP transformation.
Create an expression transformation with input/output ports as Cust_Key, LKP_Location, Src_Location and output ports as New_Flag, Changed_Flag. Enter the below expressions for output ports.

New_Flag = IIF(ISNULL(Cust_Key), 1,0)
Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND
               LKP_Location != SRC_Location, 1, 0)

The part of the mapping flow is shown below.

SCD Type 2 Flag implementation - Part 2

In this part, we will identify the new records and insert them into the target with flag value as 1. The steps involved are:

Now create a filter transformation to identify and insert new record in to the dimension table. Drag the ports of expression transformation (New_Flag) and source qualifier transformation (Customer_Id, Location) into the filter transformation.
Go the properties tab of filter transformation and enter the filter condition as New_Flag=1
Now create a update strategy transformation and connect the ports of filter transformation (Customer_Id, Location). Go to the properties tab and enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy transformation to the target definition.
Create a sequence generator and an expression transformation. Call this expression transformation as "Expr_Flag".
Drag and connect the NextVal port of sequence generator to the Expression transformation. In the expression transformation create a new output port (Flag) and assign value 1 to it.
Now connect the ports of expression transformation (Nextval, Flag) to the Target definition ports (Cust_Key, Flag). The part of the mapping flow is shown in the below image.

SCD Type 2 Flag implementation - Part 3

In this part, we will identify the changed records and insert them into the target with flag value as 1. The steps involved are:

Create a filter transformation. Call this filter transformation as FIL_Changed. This is used to find the changed records. Now drag the ports from expression transformation (changed_flag), source qualifier transformation (customer_id, location), LKP transformation (Cust_Key) into the filter transformation.
Go to the filter transformation properties and enter the filter condition as changed_flag =1.
Now create an update strategy transformation and drag the ports of Filter transformation (customer_id, location) into the update strategy transformation. Go to the properties tab and enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy transformation to the target definition.
Now connect the Next_Val, Flag ports of expression transformation (Expr_Flag created in part 2) to the cust_key, Flag ports of the target definition respectively. The part of the mapping diagram is shown below.

SCD Type 2 Flag implementation - Part 4

In this part, we will update the changed records in the dimension table with flag value as 0.

Create an expression transformation and drag the Cust_Key port of filter transformation (FIL_Changed created in part 3) into the expression transformation.
Go to the ports tab of expression transformation and create a new output port (Flag). Assign a value "0" to this Flag port.
Now create an update strategy transformation and drag the ports of the expression transformation into it. Go to the properties tab and enter the update strategy expression as DD_UPDATE.
Drag the target definition into the mapping and connect the appropriate ports of update strategy to it. The complete mapping image is shown below.

What is data warehouse?

A data warehouse is a electronic storage of an Organization's historical data for the purpose of Data Analytics, such as reporting, analysis and other knowledge discovery activities.

Other than Data Analytics, a data warehouse can also be used for the purpose of data integration, master data management etc.

According to Bill Inmon, a datawarehouse should be subject-oriented, non-volatile, integrated and time-variant.

What is the difference between OLTP and OLAP?

OLTP is the transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data.

OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately denormalized for fast data retrieval through SELECT operations.

What is data mart?

Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketing etc. stored in data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.

What is ER model?

ER model or entity-relationship model is a particular methodology of data modeling wherein the goal of modeling is to normalize the data by reducing redundancy. This is different than dimensional modeling where the main goal is to improve the data retrieval mechanism.

What is dimensional modeling?

Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualifies the data. The goal of Dimensional model is not to achieve high degree of normalization but to facilitate easy and faster data retrieval.

Ralph Kimball is one of the strongest proponents of this very popular data modeling technique which is often used in many enterprise level data warehouses.

If you want to read a quick and simple guide on dimensional modeling, please check our Guide to dimensional modeling.

What is dimension?

A dimension is something that qualifies a quantity (measure).

For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg.

Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions.

What is Fact?

A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.

What are additive, semi-additive and non-additive measures?

Non-additive Measures

Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.

Semi Additive Measures

Semi-additive measures are those where only a subset of aggregation function can be applied. Let’s say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average function might be useful.

Additive Measures

Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales Quantity etc.