Howto 통 시리즈는 부동의 값으로 히스토그램에서는 파이썬?

https://stackoverflow.com/questions/1721273

19-09-2019
|

문제

내가 설정한 값의 부동 소수점에서(항상보다 적 0)입니다.는 원통으로그램 i,e.각각의 바에서 히스토그램을 포함한 범위의 값이[0,0.150)

는 데이터가 다음과 같습니다:

파 내 코드는 아래 기대한 결과를 얻을 수처럼 보이는

[0, 0.005) 5
[0.005, 0.011) 0
...etc..

려고 했지 않는 이러한 범주화에 대한 이 코드입니다.하지만 그것은 작동하지 않는 것.무엇이 올바른 방법으로 할 수 있나요?

#! /usr/bin/env python


import fileinput, math

log2 = math.log(2)

def getBin(x):
    return int(math.log(x+1)/log2)

diffCounts = [0] * 5

for line in fileinput.input():
    words = line.split()
    diff = float(words[0]) * 1000;

    diffCounts[ str(getBin(diff)) ] += 1

maxdiff = [i for i, c in enumerate(diffCounts) if c > 0][-1]
print maxdiff
maxBin = max(maxdiff)


for i in range(maxBin+1):
     lo = 2**i - 1
     hi = 2**(i+1) - 1
     binStr = '[' + str(lo) + ',' + str(hi) + ')'
     print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))

해결책

가능하면 바퀴를 재발 명하지 마십시오. Numpy에는 필요한 모든 것이 있습니다.

#!/usr/bin/env python
import numpy as np

a = np.fromfile(open('file', 'r'), sep='\n')
# [ 0.     0.005  0.124  0.     0.004  0.     0.111  0.112]

# You can set arbitrary bin edges:
bins = [0, 0.150]
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [8]
# bin_edges: [ 0.    0.15]

# Or, if bin is an integer, you can set the number of bins:
bins = 4
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [5 0 0 3]
# bin_edges: [ 0.     0.031  0.062  0.093  0.124]

다른 팁

from pylab import *
data = []
inf = open('pulse_data.txt')
for line in inf:
    data.append(float(line))
inf.close()
#binning
B = 50
minv = min(data)
maxv = max(data)
bincounts = []
for i in range(B+1):
    bincounts.append(0)
for d in data:
    b = int((d - minv) / (maxv - minv) * B)
    bincounts[b] += 1
# plot histogram

plot(bincounts,'o')
show()

첫 번째 오류는 다음과 같습니다.

Traceback (most recent call last):
  File "C:\foo\foo.py", line 17, in <module>
    diffCounts[ str(getBin(diff)) ] += 1
TypeError: list indices must be integers

당신은 왜 변환 int 을 str 때 str 이 필요합니까?정,그 다음 우리는 얻을:

Traceback (most recent call last):
  File "C:\foo\foo.py", line 17, in <module>
    diffCounts[ getBin(diff) ] += 1
IndexError: list index out of range

기 때문에 당신은 만 5 버킷 등이 있습니다.난 이해하지 못하는 버킷팅 방식이지만,그것을 만들 수 있 50 양동이 어떻게 되는지:

6
Traceback (most recent call last):
  File "C:\foo\foo.py", line 21, in <module>
    maxBin = max(maxdiff)
TypeError: 'int' object is not iterable

maxdiff 는 단일 값이 너의 목록의 정수,그래서 무엇 max 유는?제거,지금 우리가 얻을:

6
Traceback (most recent call last):
  File "C:\foo\foo.py", line 28, in <module>
    print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
TypeError: argument 2 to map() must support iteration

충분히 확실히,당신이 사용하는 단일 값으로 두 번째 인수 map.자 단순화하고 마지막 두 줄이:

 binStr = '[' + str(lo) + ',' + str(hi) + ')'
 print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))

다.

 print "[%f, %f)\t%r" % (lo, hi, diffCounts[i])

지금 그것을 인쇄:

6
[0.000000, 1.000000)    3
[1.000000, 3.000000)    0
[3.000000, 7.000000)    2
[7.000000, 15.000000)   0
[15.000000, 31.000000)  0
[31.000000, 63.000000)  0
[63.000000, 127.000000) 3

나는 확실하지 않는 다른 무엇을 하기 때문에,내가 이해하지 못하는 버킷팅 당신은 바라고 사용합니다.그것은 것을 포함하는 바이너리의 힘을,그러나지 않는 감각을 만드는 나에게...

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow