Question

I am working with mulipulse lidar data that collects points along a number of lines within the flight path. I am trying to determine the name and number of individual lines within the las file. I am using liblas module in python.

I found this documentation that explains the different fields stored in an las file. It mentions a data field (get_data and set_data) at the very bottom of the page.

The 'point data format' and 'point data record length' in the header set aside space for this 'data' field. My header says I have 28 bytes set aside for the data field, and there are 28 values stored in the data field. The 19th value (at least in two datasets from two different sensors) refers to the line number. I have a single value in single pulse data and 4 in multi-pulse data.

I was wondering if there is a standard for what is stored in this field or if it is proprietary.

Also, as a way to get the name of each scan line, I wrote the following code:

import liblas
from liblas import file as lasfile

# Get parameters
las_file = r"E:\Testing\00101.las"

f = lasfile.File(las_file, mode='r')

line_list = []
counter = 0
for p in f:
    line_num = p.data[18]
    if line_num not in line_list:
        line_list.append(line_num)
    counter += 1
print line_list

It results with the following error:

Traceback (most recent call last):
  File "D:\Tools\Python_Scripts\point_info.py", line 46, in <module>
    line_num = p.data[18]
  File "C:\Python27\ArcGIS10.1\lib\site-packages\liblas\point.py", line 560, in get_data
    length = self.header.data_record_length
  File "C:\Python27\ArcGIS10.1\lib\site-packages\liblas\point.py", line 546, in get_header
    return header.Header(handle=core.las.LASPoint_GetHeader(self.handle))
WindowsError: [Error -529697949] Windows Error 0xE06D7363

Does anyone know more about the line numbers stored in the las point/header? Can anyone explain the error? It seems to allocate nearly 2gb of ram before I get the error. I am on win xp, so I'm guessing its a memory error, but I don't understand why accessing this 'data' field hogs memory. Any help is greatly appreciated.

Was it helpful?

Solution

I don't pretend to be an expert in any of this, but I'm intrigued by GIS data so this caught my interest. I installed liblas and its dependencies on my Fedora 19 system and played with the example data files that came with liblas.

Using your code I ran into the same problem of watching all my memory get eaten up. I don't know why that should happen - perhaps unwanted references hanging around preventing the garbage collector from working as we'd hope. This could probably be fixed, but I won't try it.

I did notice some interesting features of the liblas module and decided to try them. I believe you can get the data you seek.

After opening your file, have a look at the XML description from the header.

h = f.get_header()
print(h.get_xml())

It's hard to look at (feel free to play with xml.dom.minidom or lxml.etree), but in my example files it showed the byte layout of the point data (mine had 28 bytes too). In mine, offset 18 was a single short (2 bytes) assigned to Point Source ID. You should be able to retrieve this with p.data[18:19], p.get_data()[18:19], p.point_source_id, or p.get_point_source_id(). Unfortunately the data references chew up memory and p.point_source_id has a bug (bug fix pull request submitted to developers). If we change your code to use the last access method, everything seems to work fine. So, try this in your for loop instead:

for p in f:
    line_num = p.get_point_source_id()
    if line_num not in line_list:
        line_list.append(line_num)
    counter += 1

Note that

counter == h.get_count()

If you just want the set of unique Point Source ID values ...

line_set = set(p.get_point_source_id() for p in f)

Hopefully your data value is also available as p.get_point_source_id(). Let me know how it works for you in the comments. Cheers!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top