When is it okay to use Parallel Arrays?

https://softwareengineering.stackexchange.com/questions/350006

13-01-2021
|

Pergunta

I've been running into code (new code) that uses what I call 'Parallel Arrays' or Lists. Meaning there are 2 arrays that contain related data and are linked by their position (index) in the array.

I consider this confusing and prone to all sorts of errors. The solution I normally propose is to create an object called Company with the fields CompanyId and CompanyName.

An very real example:

List<string> companyNames;
List<int> companyIds;

//...They get populated somewhere and we then process

for(var i=0; i<companyNames.Count; i++)
{
    UpdateCompanyName(companyIds[i],companyNames[i]);
}

Are these parallel arrays considered bad practice?

Solução

Here are some reasons why someone might use parrel arrays:

In a language that does not support classes or structs
To avoid thread locking when individual threads are only modifying one of the columns
When the persistence method forces these things to be stored separately and you are reconstituting them.
They can consume less memory if the structures are padded. (not applicable for these data types in C#)
When parts of the data need to be kept close together to make efficient use of the CPU cache (would not be of help in the above code).
Use of Single Instruction Multiple Data (SIMD) op codes. (not applicable for this code, or strings at all)

I do not see any compelling reason to do this in this case... and there are likely better options in all of the above or are not so useful in a high level language.

Outras dicas

I've been guilty of using parallel arrays. Sometimes you're head's into the structure so much you don't want to think about how to abstract it. Abstraction can be a little harder to refactor so you're reluctant to launch right into it until you've proven what you really need.

At that point though it's worth considering refactoring to abstract away the details. Often the biggest reason I'm reluctant to do it turns out to be that it's hard to think of a good name.

If you can see a good way to abstract parallel arrays away do it every time. But don't paralyze yourself by refusing to touch them. Sometimes a little dirty code is the best stepping stone to great code.

This pattern is sometimes also called Structure of Arrays (as opposed to Array of Structures), and is extremely useful when vectorizing code. Rather than writing a calculation that runs on a single structure and vectorizing bits of it, you write the calculation as you normally would, except with SSE intrinsics so that it runs on 4 structures instead of one. This is usually easier, and almost always faster. The SoA format makes this very natural. It also improves alignment, which makes the SSE memory operations faster.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange