Time Complexity of Parallel.ForEach

https://softwareengineering.stackexchange.com/questions/400932

04-03-2021
|

Pergunta

Pseudo code and comments to explain:

// first select companies to process from db

foreach (company) {
    // select the employees
    foreach (employee of company) {
        // select items they can access
        foreach (item) {
           // do some calculation and save

I believe this will be O(n^3) time complexity, please correct me if I am wrong - Big O has always given me a headache. My question is if you introduce Parallel processing at the first level, what does it become? What if we introduce a second parallel.foreach() for the second iteration as well?

Edit: It was suggested that What is O(...) and how do I calculate it? would address my question, however I am more interested in whether Parallel.Foreach impacts time complexity so I believe the question is sufficiently different to stand on its own.

Solução

Time complexity is not about how long an algorithm takes to solve, it is about how much longer it takes as the input grows. That is, even the most complex (in terms of time complexity) algorithms can be solved very fast for adequately small inputs!
Your proposed code will not be O(n^3) complexity, rather O(n*m*k), because, in all likelihood, you don't have equal numbers of companies, employees per company and items per employee (i.e. n companies, m employees per company and k items per employee).
Yes, you can think of each nested loop like a multiplication in terms of Big-O notation. For each outer item i out of k, you must perform a multiple of, say, m actions. So for each additional outer item, it's another m actions. Note however, that if you have other things going on inside your loops (e.g. early stopping, etc.), time complexity may change.
Because of 1, Parallel.ForEach will not change the time complexity. However, it will certainly lower the overall time. A simplified way to think of this is a factor, out of whatever is inside the parentheses, e.g. think of O(n*m*k) as an alias for t(n, m, k) = a*(n*m*k) (t meaning actual time). Increasing or decreasing the constant a will obviously have an impact on the total running time of the algorithm. So, using Parallel.ForEach will have more of an effect on that constant, but the time complexity will not change (i.e. the time growth in input will increase approximately as it would without Parallel.ForEach). Let alone Parallel.ForEach depends on the actual hardware that is available, therefore, it is case-specific. As a result, in certain cases, you may end up delaying your run-time instead of speeding it up.

In simple words, if all companies have the same number of employees and each employee the same number of items, then triple the number of companies will triple the total running time (approximately, of course, and always given that you don't do any "tricks" inside the loops, which may depend on the company), with or without Parallel.ForEach, and this is what Big-O notation expresses.

*Once again, note that point 4 above is a simplification, just for the purpose of illustration. Things are much more complicated and, in reality, run-time analysis involves multiple tests against various input sizes and distributions.

Update

To provide a more direct answer to the question, let's reference a parallel vector model of computation. The step complexity is a theoretical measure that assumes using the ideal case of a processing device with infinite processors. This is done to "abstract" the complication of restricting the length of an input vector, in order to focus on the "parallelizability" of a problem. As a result, if a process can be broken down in steps, which can be executed in parallel, the step complexity is the time complexity of each such step. Of course, a problem may have more than one ways to be split in parallel steps. Minimizing this step complexity is important on its own. The point is that, in the real world (see here for some further analysis):

where N > p and p is the count of processors you are considering. Considering N = ∞, to relate to the aforementioned cases, this reads something along the lines: the total running time for p processors is bounded by the step complexity, plus the total work (sometimes called work complexity, i.e. the total work if the algorithm ran sequentially in 1 processor) divided by the number of processors. What this means (well, in very simple words) is that you get, at most, a p-fold improvement, minus something that depends on how well you designed your algorithm (of course, this is an upper bound, in the real-world, mixed results are observed, which some times appear to conflict with theory). The important thing here is that the p-fold improvement dominates the complexity as p gets smaller (i.e. as reality gets more relevant).

The take-home point, however, is that this "p-fold improvement" scales based on the number of processors and not on the size of the input! If you are a general-purpose algorithm designer, you usually have little control on p. What you have control over is typically T_1, i.e. you can employ algorithms that are as efficient as possible sequentially, to begin with.

Optimizing an algorithm for parallel execution is a very complicated task and, unless the requirements and use case are explicitly such, and adequately backed up, so that you absolutely must sit down and thoroughly analyze your algorithm, the most you can expect to get from Parallel.ForEach is something slightly worse than this p-fold improvement (and often, quite a bit of trouble debugging). Note that when p = 1, it is not unexpected to get increased run-time.

Additional Comments

Some additional points, in order to address some of the comments, as the original example is probably overly simplified.

Big-O notation is a mathematical "artifact". The use of the definition in the context of computational complexity requires a certain degree of "abuse", in order to be useful. For example, if f(x) = x^3 + 5x, then f(x) = O(x^3), but f(x) = O(x^15) is just as correct from a mathematical perspective. Even the use of the equals sign is not strictly OK.
Context is important. That means that the function f and the variable x can mean anything. Big-O simply expresses an asymptotic upper bound. In most contexts, the tightest known bound is typically used, because it conveys the most useful information. But Big-O stops right there. Further interpretation is based on what function f means in the desired context.
Analyzing the same algorithm can end up producing different expressions of complexity. It all depends on the definition of the problem size. For example, consider square Matrix addition. It is equally valid to say that the time complexity is O(N) or O(n^2). The key is that these statements consider a different size measure for the problem. If one cares about total elements N, then yes, square matrix addition scales no worse than linearly (this is precisely what Big O states) with respect to total elements N. In an obscurely equivalent manner, square matrix addition scales no worse than quadratically with respect to row (or column) count n. 3x the rows (and columns, as the matrices are square) ends up taking <= 9x the time, since total elements are now (3n)x(3n) = 9(n^2).
How is this relevant to the question? Well, one of the comments indicates that to arrive at an O(n*m*k) complexity for the OP's code, certain important assumptions have to be made. In order to clarify some intricate points, consider the following scenarios:
- Non-parallel computation with n indicating total number of companies, m indicating total number of employees per each and every company, and k items per each and every employee, with no employee belonging to more than 1 company and no item belonging to more than 1 employee. What matters in this scenario, is that the computation involves n iterations, of m iterations, of k iterations (as in a 3-dimensional matrix) and the time complexity is O(n*m*k). Alternatively, if q is the overall total number of items, the time complexity is O(q). Facts of life, such as employees working in more than one company or items belonging to multiple employees do change the complexity. Also, in real-world input, there is no m variable, or k variable because these represent per-company/employee counts, not overall counts.
- Considering more realistic conditions: Employees can only work in 1 company but items are definitely accessible from more than 1 employee (i.e. considering domain implications). In that case, things are slightly more complicated. Consider a total of n companies, a total of m employees (of all companies) and a total of k items (from all companies). In the worst case, all employees can access all items at all companies. In that case, m_i * k_i iterations would have to be performed for each company i (considering m_i representing the total number of employees and k_i representing total items for company i. Now, regardless of how many the companies are, m * k = (m_1 + m_2 + ... m_n) * (k_1 + k_2 + ... k_n) >= m_1 * k_1 + m_2 * k_2 + ... + m_n * k_n. In short, as the number of companies, n, grows, the worst case is that each employee accesses all items of their company. As a result, under the given assumptions, the time complexity is O(m*k), i.e. the triple loop would not scale worse than an equivalent double loop running over all employees and all items. In "domain words", your algorithm will always run faster than the equivalent "flattened down" case (i.e. if all your employees and items belonged to 1 company).

An important note for the last point, above, is that O(m*k) is just one asymptotic upper bound. It may not be the tightest known, i.e. an actual programmed solution may scale far better than that. A tighter bound may exist, but, of course, determining it would require further analysis of all conditions involved in the algorithmic procedure, alternative calculation paths etc.

Outras dicas

As a small addition to the answer above, O-Notation is still sometimes used in parallel algorithm analysis.

For example, see: https://en.wikipedia.org/wiki/Analysis_of_parallel_algorithms

Just so you know what is meant when you stumble upon such a thing in literature.

In that sense, you could say that your time complexity might be O(nmk / p) where p is the number of processors. This assumes however an unknown, growing number of processors, so never applies for a real machine. (But, as we all know, every algorithm is always O(1) on a real machine. ;-) )

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange