Question

My first python project is a converter to get data in shape for MySQL import. I've already cut away all unwanted lines of the file using target.writelines(data[start:stop]).

Now I've got about 2000 Lines like this:

1,2011,54,0,.375,-.183,2.325,1.221,0,.016,0,0,431.4,.345,1.563,25.13,13.23

whereas 54 represents the julian day.

For import to a table with 3 rows (datetime, value id, value) it should be turned into:

2011-23-02 00:00:00,1,-0.183
2011-23-02 00:00:00,2,2.325
2011-23-02 00:00:00,3,1.221
2011-23-02 00:00:00,4,0
2011-23-02 00:00:00,5,0.016
2011-23-02 00:00:00,6,0
2011-23-02 00:00:00,7,0
2011-23-02 00:00:00,8,431.4
2011-23-02 00:00:00,9,0.345
2011-23-02 00:00:00,10,1.563

Note the first, fifth and the two last values have been removed. First and fifth are present in every line, the two last ones only at 0 and 12 o'clock.

I've read that the conversion from julian can be archived with the datetime module (Convert julian day into date).

Which Python tools would you suggest to get the job done efficiently?

UPDATE

Thank you CoDEmanX, I implemented your suggested code with some alternations and am almost done. 2 Questions are left:

  1. Is there a built in way to handle leap years correctly (e.g. julian 60 should be 29th Feb in a leap year and 1st Mar the other years)?

  2. I tried to implement the time (hours, minutes). Since the time var varies in length (1-4 characters) my current implementation only works from 1000 to 2355. I could poll the lenght of time and make a date format command for each case. My guess is that there's a simpler solution.

    lines = f_open.readlines()
    # split string and ignore unwanted elements
    for line in lines:
        _, year, julian, time, value1, value2, value3, value4, value5, value6, value7, value8, value9, value10, *_ = line.split(",")
    # format date, convert julian day-of-year to 'day-month'
        date = "%s-%s %s:%s:00" % (int(year), datetime.strptime(julian, "%j").strftime("%m-%d"), time[:2], time[2:])
        with open(targetName, 'a') as target:
            target.write(",".join((date, "1", value1+"\n")))
            target.write(",".join((date, "2", value2+"\n")))
            target.write(",".join((date, "3", value3+"\n")))
            target.write(",".join((date, "4", value4+"\n")))
            target.write(",".join((date, "5", value5+"\n")))
            target.write(",".join((date, "6", value6+"\n")))
            target.write(",".join((date, "7", value7+"\n")))
            target.write(",".join((date, "8", value8+"\n")))
            target.write(",".join((date, "9", value9+"\n")))
            target.write(",".join((date, "10", value10+"\n")))
    
Was it helpful?

Solution

The built-in python modules should be sufficient. Avoid premature optimization and only look for ways to improve speed and efficiency if the simple solutions turn out to be too slow.

from datetime import datetime

line = "1,2011,54,0,.375,-.183,2.325,1.221,0,.016,0,0,431.4,.345,1.563,25.13,13.23"

# split string and ignore 4th and every thing from the 6th element on
id, year, julian, _, value, *_ = line.split(",")

# format date, convert julian day-of-year to 'day-month'
date = "%s-%s 00:00:00" % (int(year), datetime.strptime(julian, "%j").strftime("%d-%m"))

print(",".join((date, id, value)))
#>>> 2011-23-02 00:00:00,1,.375

# could cast to numeric types if needed
#id = int(id)
#value = float(value)

Leap years in real Julian dates are not handled easily, especially when looking at the Wiki article about the Julian calendar - it's not even consistent (changed to the calendar system around 45BC for instance).

The datetime module does take leap years into accout however, if you provide it the year and the day of year:

>>> datetime.strptime("2004 60", "%Y %j")
datetime.datetime(2004, 2, 29, 0, 0)

>>> datetime.strptime("2005 60", "%Y %j")
datetime.datetime(2005, 3, 1, 0, 0)

So you can calculate the date like:

date = datetime.strptime(year+julian, "%Y%j").strftime("%Y-%d-%m")

And all together with the splitting of the values to multiple lines:

from datetime import datetime

# ...
lines = f_open.readlines()

# split string and ignore unwanted elements
for line in lines:
    _, year, julian, time, *values = line.split(",")

# format date, convert julian day-of-year to 'day-month'
    date = datetime.strptime(year+julian, "%Y%j").strftime("%Y-%d-%m")
    time = datetime.strptime(time.rjust(4, "0"), "%H%M").strftime("%H:%M:%S")
    timestamp = "%s %s" % (date, time)

    with open(targetName, 'a') as target:
        for i, value in enumerate(values[:10], 1):
            target.write(",".join((timestamp, str(i), value)))
            target.write("\n")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top