JOIN in Vertica fails with “inner partition did not fit in memory”

https://stackoverflow.com/questions/12959747

08-07-2021
|

Question

I have a problem with a big query from ten joined tables. I'm migrating data from a wide fact table (f1) into a star schema. I begin by populating the dimension tables from f1 and then I populate the new fact table (f2) with a join to the dimension tables to get the corresponding ID's.

Unfortunately I'm getting an error, "inner partition did not fit in memory". From the log I see:

2012-10-18 16:20:31.607 Init Session:0x2aac6c02b250 [EE] <INFO>   ENABLE_JOIN_SPILL may allow this query to run, with reduced performance 
2012-10-18 16:20:31.607 Init Session:0x2aac6c02b250 [EE] <INFO> Query Retry action: Setting add_vertica_options('EE','ENABLE_JOIN_SPILL');

but that doesn't work either since later I get:

2012-10-18 16:23:31.138 Init Session:0x2aac6c02b250 [EE] <INFO>   Join ((public.owa_search_term_dim x public.page_impressions_with_session) using owa_search_term_dim_projection_node0001 and previous join (PATH ID: 7)) inner partition did not fit in memory; value 
2012-10-18 16:23:31.138 Init Session:0x2aac6c02b250 [EE] <INFO> Query Retry action: Swapping join order with override: 1|7|0

This goes on for some time, while Vertica apparently tries to find a way to perform the join, but eventually bails with an error saying the join didn't fit in memory.

Are there any tips on how to minimize the memory needed to perform joins or why spilling to disk isn't working? I can handle a performance hit, I just need to be able to execute the query.

Solution

The things I've done to work around this error...

Rewrite the query
Sometimes the initial query isn't as optimized as it can be. One of the ways I approach this is to use subqueries.
Use Temporary Tables
Some of the reports I've had to generate work very well by using temporary tables. This is a more 'extreme' version of using subqueries.
Additional Filters
Sometimes little things like adding additional filters and making sure they get pushed down to the joined tables will make the difference between a 5min OOM query and a 30ses working query.
Limit Data Do multiple subsets of data in multiple steps. Much like additional filters, doing subsets of data reduces the amount of resources Vertica will use, allowing for successful execution. I frequently do this for date based aggregations; I do by day->month->year. This subset has never failed and I end up with an accurate yearly aggregation when simply aggregating the year would never work.
Projections
Using query specific projections tailored to this may allow Vertica to use less resources.
Explain Plan
The are 2 main benefit I pull from looking through the explain plan.
A) Ensure that Vertica is using the expected projections. For example, query specific projections to optimize for performance. If I find they aren't, I can review my expectations and assumptions relating to the query.
B) Check that all tables are having the maximum filters applied to them. In some of my more complicated subqueries I found that the Date column wasn't correctly being pushed down to all the tables. Once I corrected this the performance was an order of magnitude faster (see above 5min to 30sec).

Using these steps, I haven't encountered any situations where I haven't been able to get a result. Sometimes it takes a while. I have a set of queries pumping into a series of 14 temp tables which ends in a very small result set; but takes over 15 minutes to run because of the raw amount of crunching that has to be done.

OTHER TIPS

Nija's answer is the better answer, but here's a suggestion to consider: get more memory. Sometimes you outgrow your system.

His suggestion to use temp tables is something I've used in the past but I haven't encountered the problem in quite some time. But that's because our system doesn't do many joins.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow