Pregunta

I'm looking for best practices on debugging Python UDFs.

I can't get this UDF to run and I can't get an error message of value to appear in the logs.

This function takes a date of the format 'DD-MON-YY' as input ('01-JAN-2013', for example) and returns the week of the year that day occurred as output (For '01-JAN-2013', that would be the zeroth week of the year, so the return value would be 0).

@outputSchema("week_number:int")
def week_from_date(input_date):
    date_to_match = re.match('(\d{2}).?([A-Za-z]{3}).?(\d{4})', input_date)
    if date_to_match:
        day, month, year = date_to_match.group(1), date_to_match.group(2), date_to_match.group(3)        
        import time
        from time import gmtime, strftime
        d = time.strptime("%s %s %s" % (day, month, year), "%d %b %Y")
       return int(strftime("%U", d))
    else:
            return -1

I'm receiving this error: Backend error : Error executing function

Is there anyway to get a more descriptive error message? What are the best practices for debugging Python UDFs?

¿Fue útil?

Solución

Looking at your code, I see indentation mistakes that can be the source of the issue (although it can be related to your post and not the original code).

However, you can see a more detailed error stack from two sources: - Pig logs, usually in a text file (ex: pig_1388770791476.log); - Hadoop job tracker: by clicking on the related job and then on the killed task, you can see errors and the corresponding stack.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top