Question

I have a program that scans my mp3s and creates a database of the information. The source is on github.

When I run the update.sh script from the command line, it works fine. When I run it as a cron job I see errors like (the exact code varies):

'utf-8' codec can't encode character '\udce2' in position 37: surrogates not allowed

And looking at the logs, I am seeing slightly different data being printed for the track title. When invoked from the command line, I see:

DEBUG:root:delegating artist Wendy Carlos, track "Jesu, Joy of Man’s Desiring", BWV 147 No. 10 to finder

but when run from cron, I see:

DEBUG:root:delegating artist Wendy Carlos, track "Jesu, Joy of Man\u2019s Desiring", BWV 147 No. 10 to finder

In both cases the text following "track" is an ID3 tag from the stagger library (tag.title).

Now stagger is a pure Python library that appears to be giving different results for identical input. So while I could say it's a bug, and leave it to them to fix, I guess there's something in the environment that differs between command line and cron. But what? The python is invoked from a virtualenv in both cases (see script linked above).

So my question is - what kind of thing could cause this? It can't be magic. There must be a rational explanation...

[Also, in this case, it's a v2.4 ID3 accroding to kid3]

[Also, to be perfectly fair, it's not clear that it's Stagger that's causing the error. It may be that it's returning the same value, but python is handling it differently in the two cases.]

Was it helpful?

Solution

It might be due to differences in the values of the LANG environment variable. Please see this article for details.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top