Berechnung Perzentile (Rubin)

https://stackoverflow.com/questions/2609597

25-09-2019
|

Frage

Mein Code basiert auf den Methoden beschrieben hier und hier .

def fraction?(number)
  number - number.truncate
end

def percentile(param_array, percentage)
  another_array = param_array.to_a.sort
  r = percentage.to_f * (param_array.size.to_f - 1) + 1
  if r <= 1 then return another_array[0]
  elsif r >= another_array.size then return another_array[another_array.size - 1]
  end
  ir = r.truncate
  another_array[ir] + fraction?((another_array[ir].to_f - another_array[ir - 1].to_f).abs)
end

Beispiel Nutzung:

test_array = [95.1772, 95.1567, 95.1937, 95.1959, 95.1442, 95.061, 95.1591, 95.1195,
95.1065, 95.0925, 95.199, 95.1682]
test_values = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

test_values.each do |value|
  puts value.to_s + ": " + percentile(test_array, value).to_s
end

Ausgabe:

0.0: 95.061
0.1: 95.1205
0.2: 95.1325
0.3: 95.1689
0.4: 95.1692
0.5: 95.1615
0.6: 95.1773
0.7: 95.1862
0.8: 95.2102
0.9: 95.1981
1.0: 95.199

Das Problem hierbei ist, dass das 80. Perzentil ist höher als der 90. und der 100.. Aber soweit ich meine Implementierung sagen kann, ist, wie beschrieben, und es gibt die richtige Antwort für das Beispiel gegeben (0,9).

Gibt es einen Fehler in meinem Code ich bin nicht zu sehen? Oder gibt es einen besseren Weg, dies zu tun?

Lösung

script

Das klingt wie ein Hausaufgaben Problem. Wie auch immer, es war ein bisschen Spaß zu tun.

# Score class
class Score
  attr_accessor :value, :percentile
  def initialize(score)
    self.value = score.to_f
  end
  def <=>(foo)
    self.value <=> foo.value
  end
end

# load scores
scores = []
DATA.each do |line|
  scores << Score.new(line)
end
scores.sort!
scores_count = scores.size

# iterate through scores and calculate percentile
scores.each_with_index do |s, i|

  # L/N(100) = P
  # L = number of scores beneath this score (score array index)
  # N = total number of scores
  # P = percentile
  s.percentile = (i.to_f/scores_count.to_f*100).ceil
end

# output
puts "What is the precise percentile of each score"
scores.each_with_index do |s,i|
  puts "#{s.value} is in the #{s.percentile} percentile"
end

# bonus: what score is in the Xth percentile?
puts "\nWhat score is in the Xth percentile?"
percentiles = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
percentiles.each do |p|

  # P/100(N) = L
  # P = percentile
  # N = total number of scores
  # L = score array index
  l = (p.to_f/100*scores_count).ceil
  puts "#{p} percentile? #{scores[l].value}"
end


__END__
95.1772
95.1567
95.1937
95.1959
95.1442
95.061
95.1591
95.1195
95.1065
95.0925
95.199
95.1682

Ausgang

What is the precise percentile of each score
95.061 is in the 0 percentile
95.0925 is in the 9 percentile
95.1065 is in the 17 percentile
95.1195 is in the 25 percentile
95.1442 is in the 34 percentile
95.1567 is in the 42 percentile
95.1591 is in the 50 percentile
95.1682 is in the 59 percentile
95.1772 is in the 67 percentile
95.1937 is in the 75 percentile
95.1959 is in the 84 percentile
95.199 is in the 92 percentile

What score is in the Xth percentile?
0 percentile? 95.061
10 percentile? 95.1065
20 percentile? 95.1195
30 percentile? 95.1442
40 percentile? 95.1567
50 percentile? 95.1591
60 percentile? 95.1772
70 percentile? 95.1937
80 percentile? 95.1959
90 percentile? 95.199

Andere Tipps

habe es funktioniert. Hinzugefügt -Infinity den Array so, dass ich die Indizes im Bereich 1 - N nutzen könnte. Ich war auch den Wert in der letzten Zeile für die falsche Variable multipliziert wird.

def percentile(param_array, percentage)
  another_array = param_array.to_a.dup
  another_array.push(-1.0/0.0)                   # add -Infinity to be 0th index
  another_array.sort!
  another_array_size = another_array.size - 1    # disregard -Infinity
  r = percentage.to_f * (another_array_size - 1) + 1
  if r <= 1 then return another_array[1]
  elsif r >= another_array_size then return another_array[another_array_size]
  end
  ir = r.truncate
  fr = fraction? r
  another_array[ir] + fr*(another_array[ir+1] - another_array[ir])
end

Die r = ... Linie kann für r = percentage.to_f * (another_array_size + 1) ersetzt werden, um die Formel in der ersten Verbindung zu verwenden anstelle von Excel.

Ausgabe:

0.0: 95.061
0.1: 95.0939
0.2: 95.1091
0.3: 95.12691
0.4: 95.1492
0.5: 95.1579
0.6: 95.16456
0.7: 95.1745
0.8: 95.1904
0.9: 95.19568
1.0: 95.199

Sie können auch Enumerable monkeypatch:

module Enumerable

  def rank value, n_tiles
    count = self.length

    raise "You cannot split an array of #{count} elements into #{n_tiles} tiles!" if n_tiles > count 

    ordered_array = self.sort
    split_size = count / n_tiles

    boundaries = []
    (n_tiles - 1).times do |i|
      boundaries << ordered_array[(i + 1) * split_size - 1]
    end

    boundaries.each_with_index do |boundary, i|
      if value > boundaries.last
        return n_tiles
      elsif value <= boundary
        return (i + 1)
      end
    end
  end

end

Danach würden Sie in der Lage sein, so etwas zu tun:

a = [1,4,2,5,3,6]

# Test in which range (rank) the number '1' would be places, if the array is ordered and spit into 3 pieces:  
a.rank(1,3)
#=> 1

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow