Pergunta

Let's say I have a function which sorts a database in O(n^2) time. I want to go about refactoring it so it runs in O(n log(n)) time, and in doing so I will change the fundamental way the operation runs, while keeping the return values and inputs equivalent.

What do I call this refactoring activity?

"Speeding-up-ifying" doesn't seem quite right, since you can make an algorithm go faster without changing the big O speed at which it executed.

"Simplifying" also doesn't seem right.

What do I call this activity?

Update

The best answer I could find is reducing the asympotic time complexity.

Foi útil?

Solução

It is typically called "performance optimization", but I would not call it "refactoring", since this term typically refers to changes in code which don't change its visible behaviour. And a change in Big-O is definitely something which I would call a visible change.

in doing so I will change the fundamental way the operation runs

In this case, your optimization is a rewrite of that function. Not every optimization, even if it changes "Big-O", is necessarily a rewrite, sometimes only small changes are necessary to achieve such an improvement, but even then, I am reluctant to use the term "refactoring" for this, because it tends to give a wrong impression about the nature of the change.

EDIT: I checked Fowler's refactoring list, and among this ~100 named refactorings, the very last one is called "Substitute Algorithm". So if we take this as canonical reference, there is a small, grey area where an optimization of the described form might be called a special kind of refactoring (but IMHO not a typical one). Note also, Fowler's goal with all refactorings was always to improve the design with focus on maintainability and evolvability of existing code without rewriting it, and clearly not performance optimization.

Outras dicas

I don't think there is a standard term, a commonly used is optimizing though improving, or speeding up would also be correct making the clarification that you are talking about code changes and not hardware changes.

As others have said, "optimizing" is the general term for improving the performance of an algorithm. However, optimizing often means improving constant factors. If I wanted to concisely but clearly state that I've reduced the asymptotic (time) complexity, I'd say that I've "improved the asymptotic performance". Sometimes people will describe this as "improving the scaling" but this is especially ambiguous nowadays.

Warning: Reducing the asymptotic time complexity is not necessarily the same as optimizing in a practical context. Oftentimes asymptotically non-optimal algorithms are used because on the range of inputs the program actually is exposed to the less optimal algorithms perform better.

It will be context-dependent.

"Fixing a quadratic runtime performance bug" is typically what I see. However, whether that deserves fixing (a code change) is context-dependent.

Keep in mind that databases provide lots of tools to improve time complexity. For example, to get the top N results from a database, just say so. When converting inefficient kludge into built-in optimized call, explanation seems superfluous.

The reason I consider an algorithm with quadratic runtime to deserve a code review (discussion) is not so much because it is slow (slow is relative; quadratic is fast when compared to exponential), but because human intuition (such as your customers, or fellow programmers) are innately uncomfortable with a software functionality that deviates too far from linear runtime, due to mixing of expectations from everyday life.

A lot of customer complaints about software performance falls into these two categories:

  • The whole system (software and hardware) were specified based on estimated usage. Last week, everything runs fine, a certain functionality took less than 5 seconds. This week, after installing an update, the same functionality takes more than 1 minute.

    • This is a comparison with a previously benchmarked performance. The customer holds future performance to an absolute yardstick of a human time-scale (from seconds to minute).
  • I submitted 100 jobs to the system. Why is it taking 400x the time to process, compared to the time it takes for a single job?

    • The customer expects processing time to be linear. In fact, the customer cannot understand or accept that there exists tasks that are slower than linear.

For this reason, a customer will consider the execution time to be a bug if both are true:

  • Slower than linear
  • Noticeable (i.e. falls within the human time range (longer than seconds or minutes) given typical task sizes)

Legitimate arguments that explain that a quadratic runtime algorithm doesn't pose a problem (i.e. doesn't deserve a code change):

  • The size of task typically handled by this quadratic runtime function is somewhat bounded
  • Given the typical size range, the actual (absolute) execution time is still small enough to be dismissed
  • If a user actually tries to submit a task that is large enough to be noticeable, the user will receive a message warning about the long running time
  • The users of the system are all experts, therefore they know what they are doing. For example, users of an API should have read the fine print on the API documentation.

A lot of algorithms useful for typical application development are in fact slower than linear (mostly O(N log N), as in sorting), therefore large-scale software will in fact try to workaround that, by only sorting the relevant part of the data, or use histogram (statistical) filtering techniques which achieves similar effect.

This applies to software customers, but if you consider the users of a software library or API function to be "customers" as well, then the answer would still apply.

If the original algorithm was correctly implemented (or any bugs where irrelevant) then "changing the algorithm" or "algorithm substitution", since such changes mean you are doing precisely that; substituting an algorithm with different time complexity for the one previously used.

If the previous time complexity was due to a bug in the implementation (e.g. say you were accidentally resetting the starting point of an inner loop on a sequence so that what should have been O(n) became O(n2)) then "fixing a quadratic cost bug" or similar.

There's an overlap, in that in such a case the effect on the code is the same if you know from the outset that the work could be done in O(n) time and the O(n2) time was an error, or if you first deliberately implemented it in O(n2) time and then realised it would still work correctly without resetting the starting point, and so substituted an O(n) algorithm for an O(n2) one.

Speed optimization by an order/orders of magnitude. Though this is mathematically incorrect language, it best conveys the idea of the Order being changed.

Improved the scalability. heard also.

Licenciado em: CC-BY-SA com atribuição
scroll top