@Bakuriu is probably correct that this is probably a micro-optimization. Your bottleneck is almost definitely IO, and after that, decompression. Allocating the memory twice probably isn't significant.
However, if you wanted to avoid the extra memory allocation, you could use numpy.frombuffer
to view the string as a numpy array.
This avoids duplicating memory (the string and the array use the same memory buffer), but the array will be read-only, by default. You can then change it to allow writing, if you need to.
In your case, it would be as simple as replacing fromstring
with frombuffer
:
f = gzip.GzipFile(filename)
f.read(10000) # fixed length ascii header
event_dtype = np.dtype([
('Id', '>u4'), # simplified
('UnixTimeUTC', '>u4', 2),
('Data', '>i2', (1600,1024) )
])
s = f.read( event_dtype.itemsize )
event = np.frombuffer(s, dtype=event_dtype, count=1)
Just to prove that memory is not duplicated using this approach:
import numpy as np
x = "hello"
y = np.frombuffer(x, dtype=np.uint8)
# Make "y" writeable...
y.flags.writeable = True
# Prove that we're using the same memory
y[0] = 121
print x # <-- Notice that we're outputting changing y and printing x...
This yields: yello
instead of hello
.
Regardless of whether or not it's a significant optimization in this particular case, it's a useful approach to be aware of.