Question

I want to read text files from a folder and subfolders and save them to a json file as a dictionary with python. I'm not sure whether the read is already correct but in the search I always get an error message.

here is my indexing code

#!/usr/bin/python

import sys
import glob
import os
import json

basePath = str(sys.argv[1])
allfolder = []
filename = []
fh = []

for files in glob.glob( basePath + '/*.txt' ):
    filename.append(files)

for root, dirs, files in os.walk( basePath ):
    allfolder.append(dirs)

searchfolder = allfolder[0]

for folder in searchfolder:
    for files in glob.glob( basePath + '/' + folder + '/*.txt' ):
          filename.append(files)     

dic = open('index.json',"w")
info = {}

for i in filename:
    fobj = open(i,"r")
    for line in fobj:
        zeile = line.split(" ")
        for a in zeile:
            b = a.strip()
            if b == "":
                break
            dic.write(json.dumps({'wort' : b, 'pfad' : i}, indent=2))
    fobj.close()
dic.close()

and my seach code

#!/usr/bin/python

import sys
import os
import json

dictionary = 'index.json'
search = str(sys.argv[1])

if os.path.isfile(dictionary) == False:
    print('Die Datei wurde nicht gefunden')

json_data=open(dictionary)

data = json.load(json_data)
pprint(data)
json_data.close()

and now the error message

christoph@Notebook-CSWS:~/System/Blatt4$ python3 sucher.py a
Traceback (most recent call last):
  File "sucher.py", line 15, in <module>
    data = json.load(json_data)
  File "/usr/lib/python3.3/json/__init__.py", line 274, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.3/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.3/json/decoder.py", line 355, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 4 column 2 - line 54490 column 2 (char 87 - 1833872)

Can any one help me with my problem?
Thank you in advance!

Était-ce utile?

La solution

The problem is, that you are creating broken json file.

If you take tool like jsonlint and let it check your "index.json" file, you will see the problem in json file format.

Your code, creating the JSON file is using dic.write in a loop, so it is appending pieces of JSON string, which are likely to be valid one by one, but not concatenated.

The solutions are:

Write resulting JSON file in one shot

If you can afford it, create complete dictionary content first in memory (probably in form of a dictionary), and finally, when you have it complete, dump it into json file just once.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top