Efficient streaming sort

https://cs.stackexchange.com/questions/91940

05-11-2019
|

Pergunta

Consider following task: we have an input of N+1 lines, where first line contains N - number of items, and then we have N lines, each one contains one item, which is a tuple (number id, number m), id < 10^9, m < 100. We have to perform a stable sort on this sequence and print the result.

Sample input:

Sample output:

My current naive implementation was just allocate vector of size N, fill it, and then run any stable sort alhorithm on it. In following example (all examples are written in Rust) I'm just adding row number as part of the sorting key and then perform quick sort:

use std::io::{self, BufRead};
use std::cmp::Ordering;

fn main() {
    let stdin = io::stdin();
    let mut lines = stdin.lock().lines();
    let n: usize = lines.next().unwrap().unwrap().parse().unwrap();
    let mut tuples = Vec::with_capacity(n);

    for i in 0..n {
        let line = lines.next().unwrap().unwrap();
        let mut line = line.split_whitespace();
        let id: u32 = line.next().unwrap().parse().unwrap();
        let m: u8 = line.next().unwrap().parse().unwrap();
        tuples.push((id, m, i));
    }

    tuples.sort_unstable_by(|a, b| {
        let cmp = b.1.cmp(&a.1);
        match cmp {
            Ordering::Equal => a.2.cmp(&b.2),
            other => other
        }
    });

    for (id, m, ..) in tuples {
       println!("{} {}", id, m);
    }
}

My second implementation used BTreeSet, but it wasn't a success, and it have taken much more time than first naive one.

My third idea was using idea from insertion sort to keep array sorted from the scratch. When we get yet another element from the stream we just seek the position where it should be placed and then insert in there. When all elements are processed we already have a final array. Here is a sample implementation:

use std::io::{self, BufRead};
use std::cmp::Ordering;

struct TeamResult {
    id: u32,
    m: u8,
}

impl TeamResult {
    pub fn new(id: u32, m: u8) -> Self {
        Self {
            id,
            m
        }
    }
}

impl Ord for TeamResult {
    fn cmp(&self, other: &Self) -> Ordering {
        other.m.cmp(&self.m)
    }
}

impl PartialOrd for TeamResult {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(&other))
    }
}

impl PartialEq for TeamResult {
    fn eq(&self, other: &Self) -> bool {
        self.m.eq(&other.m)
    }
}

impl Eq for TeamResult { }

fn main() {
    let stdin = io::stdin();
    let mut lines = stdin.lock().lines();
    let n: usize = lines.next().unwrap().unwrap().parse().unwrap();
    let mut tuples = Vec::with_capacity(n);

    for i in 0..n {
        let line = lines.next().unwrap().unwrap();
        let mut line = line.split_whitespace();
        let id: u32 = line.next().unwrap().parse().unwrap();
        let m: u8 = line.next().unwrap().parse().unwrap();

        let item_to_insert = TeamResult::new(id, m);
        let index_to_insert = search_pos(&tuples[0..i], &item_to_insert);
        tuples.insert(index_to_insert, item_to_insert);
    }

    for x in tuples {
        println!("{} {}", x.id, x.m);
    }
}

fn search_pos<T: Ord>(a: &[T], value: &T) -> usize {
    let len = a.len();

    if len == 0 {
        return 0;
    }

    if value < &a[0] {
        return 0;
    }

    if value > &a[len - 1] {
        return len;
    }

    let mut lo = 0;
    let mut hi = len - 1;

    while lo < hi {
        let mid = (hi + lo) / 2;

        match value.cmp(&a[mid]) {
            Ordering::Less => hi = mid - 1,
            Ordering::Greater => lo = mid + 1,
            Ordering::Equal => {
                lo = mid;
                break;
            }
        };
    }

    while lo <= hi && value >= &a[lo] {
        lo += 1
    }

    lo
}

However, it was even slower than a BTree one and it timeouted.

I wonder what are theorerical and practical optimal implementations? I mean I thing that in theory BTree should be faster on this task, but in practice simple array sort beats it.

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange