PERFORMANCE
TUNING IN INFORMATICA
Performance
Tuning is
done to improve session performance. To tune a session, performance
bottlenecks are identified and eliminated. The bottlenecks should be
looked in following places in the given order:
1.
Target
2.
Source
3.
Mapping
4.
Session
5.
System
General
Techniques to identify bottlenecks:
1.
Thread statistics should be analysed to identify source, target or
transformation bottlenecks. By default, the Integration Service uses 1 reader
thread, 1 tranformation thread and 1 writer thread to process a session. The
thread with the highest busy percentage identifies the bottleneck.
2.
Add a filter transformation after each source qualifier. Set the filter
condition to false so that no data is passed fromk filter. If the time to run
the session remains about the same, there is a source bottleneck.
3.
Add a filter transformation before each target definition. Set the filter
condition to false so that no data is loaded into the target. If the time taken
to run the session is almost same as before, there is a mapping bottleneck.
General
Techniques to improve session performance:
1.
If a target bottleneck is identified, consider dropping key constraints and
indexes on the target tables before the session is run. Rebuild those
constraints and indexes after the session run.
2.
Use bulk loading incase of insertion of large amount of data. When bulk
loading, the Integration Service bypasses the database log which speeds up
performance. But there is a trade-off here. Without writing to the database
log, the target database can not perform rollback.
3.
If multiple source tables are joined in one source qualifier, optimize the
query by using hints and/or adding indexes on the joined columns, GROUP BY or
ORDER BY columns. Configure the database to run parallel queries to improve
performance.
4.
Reduce unnecessary transformations in the mapping and delete unnecessary links
between transformations to optimize the mapping.
5.
If the session reads from a flat file source, the performance can be improved
by setting the number of bytes the Integration Service reads per line.By
default, the Integration service reads 1024 bytes per line. If each line in
source file is less than the default setting, the 'line sequential buffer
length' should be decreased in the session properties.
6.
Use filters as early as possible in the mapping.
7.
Eliminate unnecessary datatype conversions. Try to use integer values in place
of other datatypes when performing comparisons using lookup and filter
transformations.
8.
The integration service reads expressions written with operators faster than
the expressions with functions. Wherever possible, operators should be used.
e.g in place of CONCAT function use || operator.
9.
When Aggregator Transformation is used, following guidelines
should be used -
The group by should be done on simple columns. If possible, the
number type columns should be preffered for GROUP BY.
Use sorted input. When sorted input option is used, the
Integration Service assumes that all data is sorted by group. It does not wait
for the whole data to come into memory for aggrgation.
- Use
incemental aggrgation if the changes in source affect less than half the
target. In this case, the Integration Service updates the target
incrementally, rather than processing the entire source and recalculating
the same calculations everytime you run the session.
- Filter data
before aggregation. This will avoid unnecessary aggregation.
- Unncessary
port connections should be avoided to reduce the amount of data the
aggregator transformation stores in the data cache.
10.
When using Joiner Transformation, following things should be
considered:
- In case of
large data sets, configure joiner transformation to use sorted input. In
this case, the Integration service improves performance by minimizing disk
input/output. Also, designate the master source as the source with fewer
duplicate key values. When Integration Service processes a sorted joiner
transformation, it caches rows for 100 unique keys at a time. If the
master source contains many rows with the same key value, the Integration
Service needs to cache more rows and hence performance can be slowed.
- Designate
master source as the source with fewer rows. This is because during a
session run, the joiner transformation compares the value of each row of
the detail source against the master source. The fewer rows in the master,
the fewer iterations of the join comparison occur, which speds the join
process.
- Perform join
in the database or source qualifier transformation if possible
11. When using Lookup Transformation,
following things should be considered:
- If you have
privileges to modify the database containing a lookup table, you can
improve lookup initialization time by adding an index to the lookup table.
- You can
improve performance by indexing the columns in the lookup ORDER BY.
- By default,
the Integration Service generates an ORDER BY clause for a cached lookup.
The ORDER BY clause contains all lookup ports. To increase performance,
you can suppress the default ORDER BY clause and enter an override ORDER
BY with fewer columns. Place two dashes ‘--’ as a comment notation after
the ORDER BY clause to suppress the ORDER BY clause that the Integration
Service generates.
- If you
include more than one lookup condition, place the conditions in the
following order to optimize lookup performance:- Equal to (=),Less than
(<), greater than (>), less than or equal to (<=), greater than
or equal to (>=), Not equal to (!=)
- Improve
session performance by caching small lookup tables.
- If the
lookup table is on the same database as the source table in the mapping
and caching is not feasible, join the tables in the source database rather
than using a Lookup transformation.
12.
If possible, run sessions and workflows concurrently to improve performance.
13.
Increase the commit interval. By doing so, the number of times, Integration
Service commits data decreases and hence performance is gained. However, if the
commit interval is too high, the Integration Service will fill the database log
file and cause the session to fail.
14.
If the session runs if 'high precision' setting enabled, then disabling this
property may reult in performance gain.
15.
Set the proper error tracing level in session. e.g Verbose Data setting should
only be used for debugging mappings and should be avoided in the normal run.
16.
Use pipeline partitioning to improve session performance. Increasing the number
of partitions or partition points increases the number of threads. If the
Integration Service nodes contain enough CPU bandwidth, this approach can
significantly improve performance.