Why is Informatica PowerCenter's lookup cache faster than a direct lookup to the source?

https://stackoverflow.com/questions/18867265

29-06-2022
|

Question

I understand it is faster, but why? Both direct lookup and the cached lookup queries an on disk table. I'd expect it to be cached in memory for it to be faster.

More information here: http://www.clearpeaks.com/blog/etl/boost-performance-of-informatica-lookups

When a lookup is cached: Informatica queries the database, brings the whole set of rows to the Informatica server and stores in a cache file. When this lookup is called next time, Informatica uses the file cached. As a result, Informatica saves the time and the resources to hit the database again. - See more at: http://www.clearpeaks.com/blog/etl/boost-performance-of-informatica-lookups#sthash.fVWQ440D.dpuf

Why is it faster to use a cache file than the DB?

Solution

In direct/uncached lookup, Informatica fires a select query on the database for each and every record. So if a million records are coming from source, it will hit the database a million times. This takes time on database end as well as to move the data over the network. Moving small chunks of data repeatedly over the network is costlier than sending it as a whole.

But in a cached lookup, Informatica fetches the whole set of records once and caches it in its native server. Also, the cache is sorted and indexed based on your condition columns. Now, when a lookup needs to performed, it just searches (there are very efficient searching algorithms available like binary search) the input fields in the cache, which is a lot faster.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow