Question

I am having issues with non ASCII characters.

I am using python 2.7.3

python -V
Python 2.7.3

I installed http://pymediainfo.readthedocs.org/en/latest/

via

easy_install pymediainfo

imported as below

from pymediainfo import MediaInfo

media_info = MediaInfo.parse(os.path.join(path, to, file))

using this with ascii characters is fine

for debugging I printed the 'command' from:

/usr/local/lib/python2.7/dist-packages/pymediainfo-1.3.5-py2.7.egg/pymediainfo/__init__.py

the class that 'runs' the mediainfo command in /usr/local/lib/python2.7/dist-packages/pymediainfo-1.3.5-py2.7.egg/pymediainfo/init.py

ENV_DICT = {
    "PATH": "/usr/local/bin/:/usr/bin/",
    "LD_LIBRARY_PATH": "/usr/local/lib/:/usr/lib/"}

@staticmethod
def parse(filename, environment=ENV_DICT):
    command = ["mediainfo", "-f", "--Output=XML", filename]        
    print command
    print repr(command)
    fileno_out, fname_out = mkstemp(suffix=".xml", prefix="media-")
    fileno_err, fname_err = mkstemp(suffix=".err", prefix="media-")
    fp_out = os.fdopen(fileno_out, 'r+b')
    fp_err = os.fdopen(fileno_err, 'r+b')
    p = Popen(command, stdout=fp_out, stderr=fp_err, env=environment)
    p.wait()
    fp_out.seek(0)
    xml_dom = MediaInfo.parse_xml_data_into_dom(fp_out.read())
    fp_out.close()
    fp_err.close()
    return MediaInfo(xml_dom)

both print and print repr() display:

['mediainfo', '-f', '--Output=XML', "/mnt/path/Long 73\xc2\xb0 58' W.avi"]

the filename is:

Long 73° 58' W.avi

Looking at a UTF-8 table \xc2\xb0 corresponds to °

I am aware this might just be just the console not interpreting the encoding as it should but the output of mediainfo is just

<?xml version="1.0" encoding="UTF-8"?>
<Mediainfo version="0.7.58">
</Mediainfo>

which means "file not found"

os.path.isfile(os.path.join(path, to, file))

returns true for these files

and in bash using

mediainfo -f --Output=XML "/path/to/file"

works

i have goggled and searched around and cannot see the answer.

Any ideas?

Thanks

UPDATE

I used this new test script

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import os
import subprocess as sub
root = "/mnt/path"
    for rootfldr in sorted(os.listdir(root)):
        if os.path.isfile(os.path.join(root, rootfldr)):
            command = ['mediainfo', '-f', '--Output=XML', rootfldr]
            aa = sub.Popen(command, stdout=sub.PIPE, stderr=sub.PIPE, stdin=sub.PIPE)
            result = aa.communicate()[0]
            print rootfldr
            print result

And the results were fine (some of then had non ASCII chars).

I then decided to change (from the pymedinfo parse function):

p = Popen(command, stdout=fp_out, stderr=fp_err, env=environment)

to

p = Popen(command, stdout=fp_out, stderr=fp_err)

and the problem was solved

I am guessing that something is missing and or wrong from

ENV_DICT = {
    "PATH": "/usr/local/bin/:/usr/bin/",
    "LD_LIBRARY_PATH": "/usr/local/lib/:/usr/lib/"}
Was it helpful?

Solution

The command looks ok. The filename is passed as a bytestring that contains text encoded using utf-8. If your filesystem uses utf-8 then it is correct filename:

>>> print "/mnt/path/Long 73\xc2\xb0 58' W.avi".decode('utf-8')
/mnt/path/Long 73° 58' W.avi

It might be a bug in pymediainfo. Try passing the environment argument explicitely as a workaround e.g., environment=os.environ.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top