Pergunta

I work in a DevOps role and mainly write glue code and OS-level automation. I've rarely needed to deviate from existing language implementations or "reinvent the wheel" in the small programs I've written.

What is the advantage of creating a "manual" implementation of an algorithm, i.e. merge sort, than using a language's builtin implementation if it exists? For example in Python, sort() is a very fast way to sort a list in place. At what point does it make sense to abandon sort() and write my own implementation of a sort algorithm?

Foi útil?

Solução

You should stick to builtins and the standard library, or possibly other libraries, and avoid writing your own algos as far as possible. The built-in stuff is likely to be faster, and likely to have fewer bugs than whatever you could write. This works fine until the assumptions of the standard library break down in your use case.

For example, you might have some data type where radix sort would be efficient, but comparison-based sorts would not. Or your data might be so large that it doesn't fit into memory. In such scenarios, rolling your own sorting algorithm could make sense. But you will not encounter such scenarios when doing DevOps stuff in Python.

A scenario that I encountered where a standard library was insufficient is the hash table in C++. It provides very strong guarantees, but those guarantees require overhead. Too much overhead for my performance envelope – I collected experimental data on this. So, I wrote my own hash table that only does what I need. My implementation can also make strong assumptions about the distribution and layout of the data. The result was much, much faster. But beware: while DIY data structures are feasible in C++, the Python language makes it essentially impossible to outperform built-in functions and types.

Licenciado em: CC-BY-SA com atribuição
scroll top