Should I be using any algorithm to sort/find items, or use a language's built-in ways?

https://softwareengineering.stackexchange.com/questions/306919

10-12-2020
|

Question

I'm currently a first semester IT student, and I'm wondering if it's better to write my method to find something e.g. in a C# List, or to use any built in method to do these, such as LINQ or .Find or .OrderBy method with lambdas? I heard from a senior that the .NET optimizes my code according to the contents and amount of data in my list, and finds the best way of searching in it, or sorting it. Is that true?

The question stands basically for any built-in-method vs pseudocode learned. The C# is only an example of what I've meant with the question. I know that learning the pseudocodes will solve problems if interpreted to any language, still I'm curious about it's efficiency.

Solution

Generally speaking, if the language or standard library provides a function to do what you want done, use it until or unless you have a specific reason you need to use something else. That latter can happen, but it's fairly unusual as a general rule.

Of course, in at least some cases it can make sense to consider some middle ground, such as writing a wrapper that usually uses the standard routine, but handles some corner cases it doesn't handle well, but happen to be important for your particular purpose (for one example, I once wrote a wrapper for the qsort in a C standard library because it did really poorly when asked to sort a collection that contained lots of duplicate items, and we expected a lot of duplicates much of the time).

OTHER TIPS

If your language's standard library provides a feature you need, and you have no reason to believe there is something wrong with its implementation or API, then use the standard.

Re-implementing the feature yourself will at best be a waste of time, since you'll be duplicating work other people have already done for you. At worst, your version will be slow and buggy because you won't have nearly as much time on the finer points of sorting algorithm implementations as the language implementers have, and the inevitable subtle differences between your version and the standard one will cause endless confusion for other developers.

It is important to have some clue how these algorithms are implemented, at least conceptually, because that's crucial to understanding why some methods (such as sorting methods) have the particular behaviors, performance characteristics and APIs that they do. For instance, if you never take an algorithms and data structures course, then you'll have no idea why a hash table can't be sorted (without copying all the entries to some other structure, or using a terrible hash function).

Of course, if there really is something wrong with the standard library feature, then you don't have much of a choice. But that's not very common, and if you read enough good books/blogs/etc about your language of choice you'll soon learn where the genuine flaws are and how to avoid them.

While the advice being given here about not implementing your own sorting algorithm in the real world is, I think, sound, you still have to choose among initial data structures, and a List is not always the best starting point for sorting/searching.

Searching the List once is one thing, but searching it many times is another.

In the latter case, the List might be better transformed into a more search-oriented data structure (an array, a sorted array, a tree, a hash table, a specialized tree, a B+ tree, etc...), with a lot depending on factors around mutability.

In some cases it may be reasonable to load the data into a database and offload the search/sort to it.

For me when it comes to the performance of standard data structures and algorithms and memory allocators (and the only reason to use one or the other typically is for performance), they're pretty easy to beat if your solution is less generalized than the standard library, and it's almost impossible to beat when you try to fulfill the exact same requirements.

As an example, it's very difficult to beat std::sort in C++ if you want a generalized comparison-sort (at least unless you use a parallelized version). However, you can easily beat std::sort if you use even a naive radix sort which only operates on integers or pointers.

Likewise you'll generally be hard-pressed beating the efficiency of C's malloc if you want a general-purpose thread-safe allocator that can handle variable-sized allocations and free individual chunks of memory. But you can easily beat it if you use a free list which can only pool chunks of a specific size and is only designed to be used from one thread at a time, or a sequential allocator which just pools in a straight sequential fashion and doesn't offer the ability to free chunks individually.

It'll be extremely difficult, if not impossible, to beat std::vector in C++ in all cases while fulfilling identical requirements. But it's easy enough to beat it for a sequence that's specifically optimized to store only, say, 0 to 8 elements.

Likewise your HandRolledArrayList<Float> is unlikely to outperform ArrayList<Float> in Java. But you can easily outperform it if you just use an array of float instead.

And if you're working in a truly performance-critical area where you're looping over millions of things and/or discovered hotspots through measuring, then it can sometimes pay to implement a competing data structure or allocator or algorithm to what's provided in the standard library, but typically only if your implementation is far less generally-applicable than the requirements the standard library implementers had to fulfill.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange