I don't pretend to be an expert in any of this, but I'm intrigued by GIS data so this caught my interest. I installed liblas
and its dependencies on my Fedora 19 system and played with the example data files that came with liblas
.
Using your code I ran into the same problem of watching all my memory get eaten up. I don't know why that should happen - perhaps unwanted references hanging around preventing the garbage collector from working as we'd hope. This could probably be fixed, but I won't try it.
I did notice some interesting features of the liblas
module and decided to try them. I believe you can get the data you seek.
After opening your file, have a look at the XML description from the header.
h = f.get_header()
print(h.get_xml())
It's hard to look at (feel free to play with xml.dom.minidom or lxml.etree), but in my example files it showed the byte layout of the point data (mine had 28 bytes too). In mine, offset 18 was a single short (2 bytes) assigned to Point Source ID. You should be able to retrieve this with p.data[18:19]
, p.get_data()[18:19]
, p.point_source_id
, or p.get_point_source_id()
. Unfortunately the data
references chew up memory and p.point_source_id
has a bug (bug fix pull request submitted to developers). If we change your code to use the last access method, everything seems to work fine. So, try this in your for
loop instead:
for p in f:
line_num = p.get_point_source_id()
if line_num not in line_list:
line_list.append(line_num)
counter += 1
Note that
counter == h.get_count()
If you just want the set of unique Point Source ID values ...
line_set = set(p.get_point_source_id() for p in f)
Hopefully your data value is also available as p.get_point_source_id()
. Let me know how it works for you in the comments. Cheers!