Sorting an array of numbers with decimal places

https://softwareengineering.stackexchange.com/questions/220241

01-10-2020
|

Frage

Suppose I have an array of float values in the following format:

{ 1.34, 1.15, 1.1, 1.2, 1.26, 1.10, 1.20, 1.17 }

Suppose they have been provided by user input (or some other mechanism), where the user takes "1.1" to actually mean "1.01" - meaning that it is separate from "1.10".

Using a standard sorting algorithm (bubble sort, quick sort, or some framework specific sort) is going to result in an array which should look similar to the following:

{ 1.1, 1.10, 1.15, 1.17, 1.2, 1.20, 1.26, 1.34 }

However, the required output array would be the following:

{ 1.1, 1.2, 1.10, 1.15, 1.17, 1.20, 1.26, 1.34 }

I'm thinking that the way to do this would be to iterate through the array before sorting and:

check whether the value of has N decimal places
if so, elevate to M (N + 1) decimal places - add a leading 0?

This would lead to two arrays, one containing values with N or M number decimal places (raw user input) and one containing all values with M number decimal places ("sanitised" user input). Which would mean that sorting the array containing values with M decimal places would provide the required result.

Are there any other ways to do this? I'm looking for faster algorithms, or those with a lower overhead.

I see this as being a common problem domain, and I'm going to guess that there are a multitude of ways to fix this. What are some of the other algorithms that can be used to attain the required result.

Lösung

Your problem is that your "numbers" don't have decimal places. You have a string which consists of two integers which are separated by .. Parse them as two integers and write a custom comparer. The sorting algorithm can remain the same, you just need to pass in the comparer.

In C# such a comparer could look similar to this:

void Test()
{
    var data=new string[]{ "1.34", "1.15", "1.1", "1.2", "1.26", "1.10", "1.20", "1.17" };
    Array.Sort(data, VersionComparer);
    data.Dump();
}

static int VersionComparer(string s1, string s2)
{
    List<int> parts1 = s1.Split('.').Select(int.Parse).ToList();
    List<int> parts2 = s2.Split('.').Select(int.Parse).ToList();

    while(parts1.Count < parts2.Count)
        parts1.Add(0);
    while(parts2.Count < parts1.Count)
        parts2.Add(0);

    for(int i = 0; i < parts1.Count; i++)
    {
         if(parts1[i] < parts2[i])
             return -1;
         if(parts1[i] > parts2[i])
             return +1;
    }
    return 0;
}

Andere Tipps

A bit of nitpicking: If some values have more significant digits than others, than what you have is not floats. It's strings intended to be interpreted as floats, but you do have to interpret them every time you access them, and that's precisely the issue here.

Basically you have two options. Either you use a bog-standard sort algorithm and just program your comparator routine so that it performs Float.parseFloat() before calculating anything with its arguments - or you do the parsing once and for all in a separate step and store the numeric values somewhere, so you don't have to do it again.

Obviously, which solution is better depends on how often the comparator will be called. If your list is in random order, then most values will probably be touched more than once during the sorting, so pre-computing the values that you actually want to compare makes sense. But if the list tends to be near-sorted or if float parsing is too cheap to worry about (compared to the programming effort of adding a transformation step), then doing it on the fly during each comparison is better.

As always, which one is better in general can't be answered, and which one is better in your case can't be answered without profiling both solutions. Never assume - always measure.

Depending on how large of an array you are working with you may wish to apply the functional programming "decorate sort undecorate" pattern (which is a form of memoization).

Constantly splitting the string into parts getting its length and then discarding the data can be inefficient, after all, the string isn't changing while you are sorting it, you only need to do it once.

In perl, this is known as a Schwartzian transform - the code for this looks like:

#!/usr/bin/perl
use strict;
my @data = qw ( 1.34 1.15 1.1 1.2 1.26 1.10 1.20 1.17 );
my @sorted =
    map { $_->[0] }
    sort { $a->[1] <=> $b->[1] or $a->[2] <=> $b->[2]  }
    map {
      my @l = split(/\./);
      [$_, length($l[1]), $l[1]] }
    @data;
print join(',', @sorted),"\n";

The trick to this is that the change of "1.15" to ["1.15", 2, 15] is only done once, and then the sort is working off of those values. For small sized arrays this is a minor boost in performance. For large arrays, it can be very significant.

In languages that lack the "just toss things in an array and sort it however", this becomes a bit more complex. You would need to build an object that has the original data and the component parts and the sortable. This is slightly easier to work with because you could put it in the constructor.

In Java, one could use the following approach:

public class DecimalSortable implements Comparable<DecimalSortable> {
    private int len;
    private int dec;
    private String data;
    public DecimalSortable(String data) {
        this.data = data;
        String[] array = data.split("\\.");
        len = array[1].length();
        dec = Integer.parseInt(array[1]);
    }

    @Override
    public int compareTo(DecimalSortable o) {
        int rv = Integer.compare(len, o.len);
        if(rv == 0) {
            rv = Integer.compare(dec, o.dec);
        }
        return rv;
    }

    public String getData() {
        return data;
    }
}

Note the single call to split and extraction of the values that are then later sorted on. Yes, its a bit more overhead than the more dynamic languages, but you get the idea of how it works and how to avoid repeated calls to split (and the related creation of new String objects for each comparison).

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit softwareengineering.stackexchange