Os.path : can you explain this behavior?
Question
I love Python because it comes batteries included, and I use built-in functions, a lot, to do the dirty job for me.
I have always been using happily the os.path module to deal with file path but recently I ended up with unexpected results on Python 2.5 under Ubuntu linux, while dealing with string that represent windows file paths :
filepath = r"c:\ttemp\FILEPA~1.EXE"
print os.path.basename(filepath)
'c:\\ttemp\\FILEPA~1.EXE']
print os.path.splitdrive(filepath)
('', 'c:\ttemp\\FILEPA~1.EXE')
WTF ?
It ends up the same way with filepath = u"c:\ttemp\FILEPA~1.EXE" and filepath = "c:\ttemp\FILEPA~1.EXE".
Do you have a clue ? Ubuntu use UTF8 but I don't feel like it has something to do with it. Maybe my Python install is messed up but I did not perform any particular tweak on it that I can remember.
Solution
If you want to manipulate Windows paths on linux you should use the ntpath module (this is the module that is imported as os.path on windows - posixpath is imported as os.path on linux)
>>> import ntpath
>>> filepath = r"c:\ttemp\FILEPA~1.EXE"
>>> print ntpath.basename(filepath)
FILEPA~1.EXE
>>> print ntpath.splitdrive(filepath)
('c:', '\\ttemp\\FILEPA~1.EXE')
OTHER TIPS
From a os.path
documentation:
os.path.splitdrive(path)
Split the pathname path into a pair (drive, tail) where drive is either a drive specification or the empty string. On systems which do not use drive specifications, drive will always be the empty string. In all cases, drive + tail will be the same as path.
If you running this on unix, it doesnt use drive specifications, hence - drive will be empty string.
If you want to solve windows paths on any platform, you can just use a simple regexp:
import re
(drive, tail) = re.compile('([a-zA-Z]\:){0,1}(.*)').match(filepath).groups()
drive
will be a drive letter followed by :
(eg. c:
, u:
) or None
, and tail
the whole rest :)
See the documentation here, specifically:
splitdrive(p) Split a pathname into drive and path. On Posix, drive is always empty.
So this won't work on a Linux box.