Total Pageviews

Wednesday, 28 May 2014

Aggregator transformation in Informatica

Aggregator Transformation In Informatica

Type: Active & Connected.

#Aggregator transformation perform calculation on groups instead of processing row-by-row as in Expression transformation.
While performing this, you can also use conditional clauses like filter condition providing more flexibility.

Components Of Aggregator Transformation:

Aggregator cache:
     Integration service reads & store data group and row data in Aggregate Cache. It stores group values in Index Cache
and row data in Data Cache.
For ex: If we need to get department wise avg salary then integration service allocates memory for Index cache and stores group values
for department id say example 10,20,30.... and allocates memory for data cache & stores raw data values such as salary values.
If allocated is not enough to store data then integration service stores overflow values in cache file in specified cache directory.

Group By Ports:
     Allows you to create group/s. Port can be input, input/output, output even Variable port is also allowed.

Sorted Input:
     Selecting sorted input option will improve session performance. To use sorted input, you must pass
sorted data by group by ports to aggregator transformation.
     Also Integration service uses memory to perform aggregator transformation when sorted input option is used.
So you do not need to configure cache memory size when using sorted input.

Examples: AVG, COUNT, MIN, MAX, FIRST, LAST, SUM etc.

Conditional clauses example SUM(new_salary, new_salary > old_salary) will return sum of salaries only for the records
where new salary is greater than old salary.

You can also use non aggregate functions such as
IIF (SUM(quantity) > 0, SUM(quantity, 1)) will rerturn sum of quantities by group by port, if it is 0 or less than 0 then it will return 1.

When you configure the session to treate null values as 0 if specified otherwise it will treat null values as null.

To improve the performance, pass sorted data and filter the data if required before passing to aggregator transformation.

Monday, 26 May 2014

Data Transformation Manager in Informatica



There are various actions that DTM process performs, e.g., reading data from source, writing to target, processing transformations, etc. DTM process is started by/kicked off by Integration service process.



Integration Service process provides DTM process with session instance information. DTM process then reads metadata about Session and associated Mapping from repository and then validation process is done.

If Session is using a parameter file, Integration Service process passes parameter file information to the DTM process. DTM is capable of finding out Session Level, Mapping Level and Workflow level parameters and variables using this information.

DTM process also validates all the connection objects associated with Source/Target/Lookups and sees if appropriate execute permissions are available to the user who started the workflow.

Once Connection Object verification is done DTM executes and pre-session and then post-session commands provided.

One of the important functions of DTM process is to create multiple threads to perform various transformations specified in the mapping. DTM process creates separate threads for lookup, readers, writers, etc.

 DTM process is also responsible for creating session logs. Informatica has best logging mechanism where all the errors, transformation status and overall session processing status are logged. This makes Informatica powerful in terms of debugging any issue while executing the session.

DTM process also checks for any post session success or failure emails and processes them accordingly.

Thank you..

Informatica Integration Service Components

Integration Service acts as a controller for entire workflow execution. Integration Service gets into action whenever a workflow is kicked off (either manually or by schedule). It reads Workflow, Session/Task and Mapping information from Repository Database and performs the execution as per transformation defined.
   PowerCenter Integration Service uses three components during execution of a workflow:
  



Integration Service Process:

Whenever Integration Service receives a request to execute a workflow, it creates an Integration Service Process. Please note that Integration Service can start one or multiple Integration Processes to run and monitor Workflows.


When the workflow is scheduled / run manually/changed, Workflow Manager interacts with Integration Service Process. This interaction happens over TCP/IP.  Integration Service process locks the Workflow and then knocks the Repository Service to get metadata about Workflow. Repository Service intern connects to Repository Database – fetches desired information – and gives it back to Integration Service process.


Once Integration Service process receives metadata information back from Repository Service, it creates connections to Source and Target databases using ODBC or Native drivers. Integration Service process also creates connections to any lookup or stored procedure databases which are used by the Workflow. Once the data loading process is done, Integration Service process releases and closes all open database connections.


Integration Service process is also responsible to start DTM process that will actually execute Workflow Sessions. Integration Service process passes Parameter File and Session information to DTM process, that helps it to retrieve required metadata from the repository.


          To summarize, Integration Service process is responsible to carry out below tasks:
  • Workflow Schedule Management
  • Locking Workflow Before Execution
  • Reading Parameter File
  • Creating Workflow Logs
  • Running any Tasks in Workflow (e.g., Email)
  • Creating DTM processes to run Workflow Sessions
You can also read more about DTM process on this blog.

Thank you....

Informatica 9.x Architecture



  •  
  •   Administrator Console: This is a web based platform where an administrator can control how Informatica will be setup. This involves setting up and maintenance of domain, nodes, and various services such as Repository Service, Integration Service. This tool also provides Security Control like adding groups, users and their access to repository. 

  • Informatica Services Platform (ISP): The ISP is the administrative framework for Informatica services. It allows monitoring, start/stop, and failover of services. It provides a set of core services used internally, like Authentication Service, Name Services, and so on.
  •  Integration Service (IS): Integration service is created while installing Power Center and then we can manage IS using Administrator console. IS reads workflow information from Repository by requesting Repository service to run workflow. When we run the workflow, IS locks the workflow and then executes tasks, sessions related to that workflow. IS loads the extracted data into mapping target/s, also executes pre and post session commands, and email notification.

  • Repository & Repository Service:

The Informatica repository is at the center of the Informatica suite. The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools.
Metadata is data about data which include information such as source definitions, target definitions, mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets.
     The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version. 
     Use repository manager to create the repository. The Repository Manager connects to the repository database and runs the code needed to create the repository tables. These tables stores metadata in specific format the Informatica server, client tools use.

  • Metadata Manager Service: Metadata Manager interacts with Repository service via Metadata Manager Service.
  • Data Integration Service (DIS): DIS is the container for all data integration functionalities. DIS plug-ins provides different data integration functionalities. The different plug-ins are as follows:
     Profiling service plug-in translates profile into mappings. 
     SQL Service plug-in translates SQL into mappings
     Mapping Service executes data quality plans

  • Data Transformation Manager (DTM): DTM Interprets and executes mappings. It is re-engineered to be lightweight and embeddable. It allows repository-less execution: In previous versions of PowerCenter, DTM connects to the repository to fetch the plan. However, now the mapping can be run without reading from the repository, like mapping stored to disk. DTM enables concurrent execution semantics: multiple DTM instances can coexist in the same process. It supports logical transforms, which are translated to executable transforms before execution. It allows different execution modes: single threaded and pipelined execution modes. DTM comes up with optimal execution plan for mappings.
  • Informatica Analyst: According to Informatica documentation:

Informatica Analyst is a web-based application client that analysts can use to analyze, cleanse, standardize, profile, and score data in an enterprise. Business analysts and developers use Informatica Analyst for data-driven collaboration. You can perform column and rule profiling, score carding, and bad record and duplicate record management.

  • Informatica Developer: It is an application client that developers use to design and implement data quality and data services solutions.
For more about Integration Service

Thank you…