Question

I have found out that in Progress 10.1 when multiple indexes are used in a query the database will use the first index in the list of indexes and not the most optimal and not a subset of the two indexes.

Did anyone else experience this?

=================================================================

Several indexes are defined but the two we are looking at are: XIE1cac_role_person owning_entity_mnemonic owning_entity_key role_key

XIE2cac_role_person contract_obj person_role_code effective_from_date

Initially my code was as follow, and it was using the first index which returns a much larger dataset.:

FOR EACH cac_role_person NO-LOCK 
        WHERE cac_role_person.contract_obj = cbm_contract.contract_obj 
          AND cac_role_person.owning_entity_mnemonic = "BROKER"
          AND (
              (cac_role_person.effective_to_date > TODAY 
          AND  cac_role_person.effective_to_date >=  
               cbm_contract_component.contract_component_start_date)
           OR (cac_role_person.effective_to_date = ? 
          AND cac_role_person.effective_from_date <=                  
              cbm_contract_component.contract_component_start_date)
              ):

So I now force it to use the 2nd Index:

FOR EACH cac_role_person NO-LOCK USE-INDEX XIE2cac_role_person
        WHERE cac_role_person.contract_obj = cbm_contract.contract_obj 
          AND cac_role_person.owning_entity_mnemonic = "BROKER"
          AND (
              (cac_role_person.effective_to_date > TODAY 
          AND  cac_role_person.effective_to_date >=  
               cbm_contract_component.contract_component_start_date)
           OR (cac_role_person.effective_to_date = ? 
          AND cac_role_person.effective_from_date <=                  
              cbm_contract_component.contract_component_start_date)
              ):

The first code resulted in about 4 000 fixes in 30 hours and the improvement resulted in 70 000 fixes in 12 hours. (the loop is part of much bigger piece, but this was only change i needed to speed up processing 17 times

Was it helpful?

Solution

There are certainly cases where a programmer can make a better index choice than the compiler. But it is usually pretty rare.

Without knowing all of your actual index definitions (which you have not provided) it is not possible to completely evaluate what indexes could be chosen by the compiler. Given what you have shared the choice follows the rules (see below) but the rules are not as you describe above.

And without data about the distribution of data it isn't really possible to say if the chosen indexes are "best" or optimal. Although having said that it seems intuitive that a field with values like "BROKER" is going to be less refined than one called "contract_obj". But that's just a guess.

The Progress 4GL engine can use multiple indexes to resolve a query but that doesn't mean that it will do so nor does it mean that it will necessarily be a better result if it does. To know if it did do so you need to compile with XREF and review the results.

The 4GL engine uses static, compile time index selection. You can find some very detailed information about the rules here: http://pugchallenge.org/downloads/352_Pick_an_Index_2013.pdf

The most important rule is: maximize the depth of equality matches on leading components. You have two possible equality matches:

cac_role_person.contract_obj = cbm_contract.contract_obj 
cac_role_person.owning_entity_mnemonic = "BROKER"

So your "best" indexes (without knowing anything about the data distribution) will almost certainly be ones that have those two fields as their leading components. Ideally your third component would be the cac_role_person.effective_to_date field. If you do not have any indexes that meet that criteria you might want to consider adding one.

The two indexes that you have shown each have a single equality match with the leading component. So they are of equal strength. Tie-breaker criteria then come in to play -- if one of them is designated as the "primary" index it wins. Otherwise, since no BY criteria are indicated, alphabetical order wins.

If you lack appropriate indexes or you are doing a table scan on purpose then it is often fastest to specify the smallest index. You can determine that by looking at the output of:

proutil dbName -C dbanalys > dbName.dba

The index with the fewest number of blocks is the one that you want. If they are all roughly the same size go for the highest "utilization".

FWIW -- the SQL engine uses a cost based optimizer. You must, however, regularly update the statistics if you want that to work well. (And it won't help your 4GL queries.) (The SQL syntax available inside the 4GL is embedded SQL-89 and it does not know about the cost-based optimizer -- that will not help either. Trying to use SQL inside a 4GL session is the road to endless frustration -- don't go there.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top