كيف يمكنني معالجة هذا الملف النصي وتحليل ما أحتاج إليه؟

https://stackoverflow.com/questions/1246752

12-09-2019
|

سؤال

أحاول تحليل ooput من وحدة python doctest وتخزينها في ملف HTML.

لقد حصلت على إخراج مماثلة لهذا:

**********************************************************************
File "example.py", line 16, in __main__.factorial
Failed example:
    [factorial(n) for n in range(6)]
Expected:
    [0, 1, 2, 6, 24, 120]
Got:
    [1, 1, 2, 6, 24, 120]
**********************************************************************
File "example.py", line 20, in __main__.factorial
Failed example:
    factorial(30)
Expected:
    25252859812191058636308480000000L
Got:
    265252859812191058636308480000000L
**********************************************************************
1 items had failures:
   2 of   8 in __main__.factorial
***Test Failed*** 2 failures.

يسبق كل فشل بخط من العلامات النجمية، والتي تقوم بحذف كل فشل اختبار من بعضها البعض.

ما أود القيام به هو تجريد اسم الملف والطريقة التي فشلت، وكذلك النتائج الفعلية والفعالية. ثم أود إنشاء مستند HTML باستخدام هذا (أو قم بتخزينه في ملف نصي ثم قم بإجراء جولة ثانية من التحليل).

كيف يمكنني القيام بذلك باستخدام Python فقط أو بعض مزيج من Unix Shell Utilities؟

تحرير: قمت بصياغة البرنامج النصي Shell التالي الذي يطابق كل كتلة كيف أود، لكنني غير متأكد من كيفية إعادة توجيه كل مباراة SED إلى ملفها الخاص.

python example.py | sed -n '/.*/,/^\**$/p' > `mktemp error.XXX`

المحلول

هذا هو برنامج نصي سريع وقذر يوزل الإخراج إلى TUPLES مع المعلومات ذات الصلة:

import sys
import re

stars_re = re.compile('^[*]+$', re.MULTILINE)
file_line_re = re.compile(r'^File "(.*?)", line (\d*), in (.*)$')

doctest_output = sys.stdin.read()
chunks = stars_re.split(doctest_output)[1:-1]

for chunk in chunks:
    chunk_lines = chunk.strip().splitlines()
    m = file_line_re.match(chunk_lines[0])

    file, line, module = m.groups()
    failed_example = chunk_lines[2].strip()
    expected = chunk_lines[4].strip()
        got = chunk_lines[6].strip()

    print (file, line, module, failed_example, expected, got)

نصائح أخرى

يمكنك كتابة برنامج Python لاختيار هذا بعيدا، ولكن ربما يكون هناك شيء أفضل يجب القيام به هو النظر في تعديل المستند إلى إخراج التقرير الذي تريده في المقام الأول. من مستندات doctest.doctestrunner:

                                  ... the display output
can be also customized by subclassing DocTestRunner, and
overriding the methods `report_start`, `report_success`,
`report_unexpected_exception`, and `report_failure`.

كتبت محيطا سريعا في البخوات للقيام بذلك.

from pyparsing import *

str = """
**********************************************************************
File "example.py", line 16, in __main__.factorial
Failed example:
    [factorial(n) for n in range(6)]
Expected:
    [0, 1, 2, 6, 24, 120]
Got:
    [1, 1, 2, 6, 24, 120]
**********************************************************************
File "example.py", line 20, in __main__.factorial
Failed example:
    factorial(30)
Expected:
    25252859812191058636308480000000L
Got:
    265252859812191058636308480000000L
**********************************************************************
"""

quote = Literal('"').suppress()
comma = Literal(',').suppress()
in_ = Keyword('in').suppress()
block = OneOrMore("**").suppress() + \
        Keyword("File").suppress() + \
        quote + Word(alphanums + ".") + quote + \
        comma + Keyword("line").suppress() + Word(nums) + comma + \
        in_ + Word(alphanums + "._") + \
        LineStart() + restOfLine.suppress() + \
        LineStart() + restOfLine + \
        LineStart() + restOfLine.suppress() + \
        LineStart() + restOfLine + \
        LineStart() + restOfLine.suppress() + \
        LineStart() + restOfLine  

all = OneOrMore(Group(block))

result = all.parseString(str)

for section in result:
    print section

اعطي

['example.py', '16', '__main__.factorial', '    [factorial(n) for n in range(6)]', '    [0, 1, 2, 6, 24, 120]', '    [1, 1, 2, 6, 24, 120]']
['example.py', '20', '__main__.factorial', '    factorial(30)', '    25252859812191058636308480000000L', '    265252859812191058636308480000000L']

ربما تكون هذه واحدة من النصوص الأقل أناقة من البرامج النصية التي كتبتها على الإطلاق، ولكن يجب أن يكون لها إطار عمل ما تريده دون اللجوء إلى Unix Utilities وبرامج نصية منفصلة لإنشاء HTML. إنه لم تختبر، لكن يجب أن يحتاج فقط إلى التغيير والتبديل الصغير للعمل.

import os
import sys

#create a list of all files in directory
dirList = os.listdir('')

#Ignore anything that isn't a .txt file.
#
#Read in text, then split it into a list.
for thisFile in dirList:
    if thisFile.endswith(".txt"):
        infile = open(thisFile,'r')

        rawText = infile.read()

        yourList = rawText.split('\n')

        #Strings
        compiledText = ''
        htmlText = ''

        for i in yourList:

            #clunky way of seeing whether or not current line  
            #should be included in compiledText

            if i.startswith("*****"):
                compiledText += "\n\n--- New Report ---\n"

            if i.startswith("File"):
                compiledText += i + '\n'

            if i.startswith("Fail"):
                compiledText += i + '\n'

            if i.startswith("Expe"):
                compiledText += i + '\n'

            if i.startswith("Got"):
                compiledText += i + '\n'

            if i.startswith(" "):
                compiledText += i + '\n'


    #insert your HTML template below

    htmlText = '<html>...\n <body> \n '+htmlText+'</body>... </html>'


    #write out to file
    outfile = open('processed/'+thisFile+'.html','w')
    outfile.write(htmlText)
    outfile.close()

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow