successively take values between two sub-strings and temporarily store it before writing them in a different order withhin a python fileread

StackOverflow https://stackoverflow.com/questions/16120714

  •  11-04-2022
  •  | 
  •  

Вопрос

I am reading the body of a .gpx file and need to change the format of the data so it can be read as a .kml

.kml has lat and long in a swapped order from .gpx, so i need to find a way to successively take values between two sub-strings and temporarily store it before writing them in a different order. .kml also separates <time> from <coords> but that is pretty much the same type of task.

I have looked at many resources including:

i tried

lat = re.search('<trkpt lat="(.*)" lon="', x)
lon = re.search('" lon="(.*)">', x)

which obviously doesnt work for multiple values in the original file. my code probably isn't very pythonic (yet). The code:

def convert(fileName):
    f = open(fileName, "r")
    x = f.read()                
    x = re.sub(r'<trkpt lat="', '           <gx:coord>', x)
    x = re.sub(r'" lon="', ' ', x)
    x = re.sub(r'"><ele>', ' ', x)
    x = re.sub(r'</ele>', '</gx:coord>\n', x)
    x = re.sub(r'<speed>.*?</speed>', '', x)
    return x

is getting me close to a format of whats needed. But i can't work out how to successively pass the multiple values, swap them around a bit and progressively re-write

i'm new to python....please send help. thanks!

EDIT

examples of each file type follows (for clarity i have taken off the header text of each)

.gpx looks like this and has time and coordinates concurrent. As you can see, each data point exists between <trkpt and </trkpt> (.gpx also has speed and sometimes other stuff that needs cleaning out too):

<trkseg> <trkpt lat="-33.8598" lon="151.17912"><ele>7.8</ele><speed>0.9013878</speed><time>2012-09-25T07:38:42Z</time></trkpt><trkpt lat="-33.859936" lon="151.17906"><ele>20.8</ele><speed>2.25</speed><time>2012-09-25T07:38:43Z</time></trkpt><trkpt lat="-33.859818" lon="151.17934"><ele>-3.4</ele><speed>1.5</speed><time>2012-09-25T07:38:45Z</time></trkpt> <trkpt lat="-33.859947" lon="151.17914"><ele>16.2</ele><speed>1.5</speed><time>2012-09-25T07:38:49Z</time></trkpt><trkpt lat="-33.860016" lon="151.1792"><ele>18.0</ele><speed>1.75</speed><time>2012-09-25T07:38:52Z</time></trkpt><trkpt lat="-33.86008" lon="151.17923"><ele>18.4</ele><speed>1.5811388</speed><time>2012-09-25T07:38:57Z</time></trkpt><trkpt lat="-33.86013" lon="151.17932"><ele>18.1</ele><speed>1.75</speed><time>2012-09-25T07:39:03Z</time></trkpt>

OK....and this is the equivalent .kml which separates <when> from the coordinates<gx:coords>. of course there is always the same number of each. you can see the elevation (<ele> in the .gpx) is an untagged number in coords after the position data.

`

        <when>2012-09-25T07:38:42Z</when>
        <when>2012-09-25T07:38:43Z</when>
        <when>2012-09-25T07:38:45Z</when>
        <when>2012-09-25T07:38:49Z</when>
        <when>2012-09-25T07:38:52Z</when>
        <when>2012-09-25T07:38:57Z</when>
        <when>2012-09-25T07:39:03Z</when>
        <gx:coord>151.17912 -33.8598 7.8</gx:coord>
        <gx:coord>151.17906 -33.859936 20.8</gx:coord>
        <gx:coord>151.17934 -33.859818 -3.4</gx:coord>
        <gx:coord>151.17914 -33.859947 16.2</gx:coord>
        <gx:coord>151.1792 -33.860016 18</gx:coord>
        <gx:coord>151.17923 -33.86008 18.4</gx:coord>
        <gx:coord>151.17932 -33.86013 18.1</gx:coord> 

`

Это было полезно?

Решение

This is working but is SLOW. for a small .gpx of 477k, it is writing a .kml of 207k that takes 198 seconds to complete. my hunch is that it is the stringIO.stringIO(x) that's so slow. any ideas how to speed it up would be fantastic.

Here are the key snips ONLY of what i have done:

f = open(fileName, "r")
x = f.read()
x = re.sub(r'\n', '', x, re.S) #remove any newline returns
name = re.search('<name>(.*)</name>', x, re.S)
print "Attachment name (as recorded from GPS device): " + name.group(1)

x = re.sub(r'<(.*)<trkseg>', '', x, re.S)  #strip header
x = x.replace("</trkseg></trk></gpx>",""); #strip footer
x = x.replace("<trkpt","\n<trkpt"); #make the file in lines
x = re.sub(r'<speed>(.*?)</speed>', '', x, re.S) #strip speed
x = re.sub(r'<extensions>(.*?)</extensions>', '', x, re.S) # strip out extensions

then

#.kml header goes here
kmlTrack = """<?xml version="1.0" encoding="UTF-8"?><kml xmlns="http://www.ope......etc etc

then

buf = StringIO.StringIO(x)
for line in buf:
            if line is not None:
                    timm = re.search('time>(.*?)</time', line, re.S)
                    if timm is not None:
                            kmlTrack += ("          <when>"+ timm.group(1)+"</when>\n")
                            checkSumA =+ 1
buf = StringIO.StringIO(x)
for line in buf:
            if line is not None:
                    lat = re.search('lat="(.*?)" lo', line, re.S)
                    lon = re.search('lon="(.*?)"><ele>', line, re.S)
                    ele = re.search('<ele>(.*?)</ele>', line, re.S)
                    if lat is not None:
                            kmlTrack += ("          <gx:coord>"+ lon.group(1) + " " + lat.group(1) + " " + ele.group(1) + "</gx:coord>\n")
                            checkSumB =+ 1
if checkSumA == checkSumB:
            #put a footer on
            kmlTrack += """     </gx:Track></Placemark></Document></kml>"""
else:
            print ("checksum error")
            return None

with open("Realbush2.kml", "a") as myfile:
            myfile.write(kmlTrack)
return ("succsesful .kml file-write completed in :" + str(c.seconds) + " seconds.")

Once again, this is working but it is very slow. If anyone can see how to speed this up, please let me know! Thanks

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top