Question

I'm having a rather hard problem that I just can't get fixed.. The idea is to loop through a part of data and find any indentation. (always spaces) Every time a line has a bigger indentation than the previous, for example 4 more whitespaces, the first line should be the key for a dictionary and the next values should be appended.

If there is another indent this means there should be made a new dictionary with a key and values. This should happen recursive until being through the data. To make things easier to understand I made an example:

Chassis 1:
    Servers:
        Server 1/1:
            Equipped Product Name: EEE UCS B200 M3
            Equiped PID: e63-samp-33
            Equipped VID: V01
            Acknowledged Cores: 16
            Acknowledged Adapters: 1
    PSU 1:
        Presence: Equipped
        VID: V00
        HW Revision: 0

The idea is to be able to get any part of data returned in dictionary form. dictionary.get("Chassis 1:") should return ALL data, dictionary.get("Servers") should return everything that is indented deeper than the line "Servers". dictionary.get("PSU 1:") should give {"PSU 1:":"Presence: Equipped", "VID: 100", "HW Revision: 0"} and so on. I've drawn a little scheme to demonstrate this, every colour is another dictionary.

When the indentation goes less deep again, for example from 8 to 4 spaces the data should be appended to the dictionary that has data which is less indented.

I've gave it an attempt in code but it is not coming anywhere near where I want it..

for item in Array:
    regexpatt = re.search(":$", item)
    if regexpatt:
        keyFound = True
        break

if not keyFound:
    return Array

#Verify if we still have lines with spaces
spaceFound = False
for item in Array:
    if item != item.lstrip():
        spaceFound = True
        break

if not spaceFound:
    return Array

keyFound = False
key=""
counter = -1
for item in Array:
    counter += 1
    valueTrim = item.lstrip()
    valueL = len(item)
    valueTrimL = len(valueTrim)
    diff = (valueL - valueTrimL)
    nextSame = False
    if item in Array:
        nextValue = Array[counter]
        nextDiff = (len(nextValue) - len(nextValue.lstrip()))
        if diff == nextDiff:
            nextSame = True


    if diff == 0 and valueTrim != "" and nextSame is True:
        match = re.search(":$", item)
        if match:
            key = item
            newArray[key] = []
            deptDetermine = True
            keyFound = True
    elif diff == 0 and valueTrim != "" and keyFound is False:
        newArray["0"].append(item)
    elif valueTrim != "":
        if depthDetermine:
            depth = diff
            deptDetermine = False
        #newValue = item[-valueL +depth]
        item = item.lstrip().rstrip()
        newArray[key].append(item)

for item in newArray:
    if item != "0":
        newArray[key] = newArray[key]

return newArray

The result should be like this for example:

{
    "Chassis 1": {
        "PSU 1": {
            "HW Revision: 0", 
            "Presence: Equipped", 
            "VID: V00"
        }, 
        "Servers": {
            "Server 1/1": {
                "Acknowledged Adapters: 1", 
                "Acknowledged Cores: 16", 
                "Equiped PID: e63-samp-33", 
                "Equipped Product Name: EEE UCS B200 M3", 
                "Equipped VID: V01"
            }
        }
    }
}

I hope this explains the concept enough

Was it helpful?

Solution

This should give you the nested structure you want.

If you want every nested dictonary, also available from the root. Uncomment the if .. is not root parts

def parse(data):

    root = {}
    currentDict = root
    prevLevel = -1
    parents = []
    for line in data:
        if line.strip() == '': continue
        level = len(line) - len(line.lstrip(" "))
        key, value = [val.strip() for val in line.split(':', 1)]

        if level > prevLevel and not len(value):
            currentDict[key] = {}
            # if currentDict is not root:
            #     root[key] = currentDict[key]
            parents.append((currentDict, level))
            currentDict = currentDict[key]
            prevLevel = level
        elif level < prevLevel and not len(value):
            parentDict, parentLevel = parents.pop()
            while parentLevel != level:
                if not parents: return root
                parentDict, parentLevel = parents.pop()
            parentDict[key] = {}
            parents.append((parentDict, level))
            # if parentDict is not root:
            #     root[key] = parentDict[key]
            currentDict = parentDict[key]
            prevLevel = level
        else:
            currentDict[key] = value
    return root 




with open('data.txt', 'r') as f:
    data = parse(f)
    #for pretty print of nested dict
    import json
    print json.dumps(data,sort_keys=True, indent=4)

output:

{
    "Chassis 1": {
        "PSU 1": {
            "HW Revision": "0", 
            "Presence": "Equipped", 
            "VID": "V00"
        }, 
        "Servers": {
            "Server 1/1": {
                "Acknowledged Adapters": "1", 
                "Acknowledged Cores": "16", 
                "Equiped PID": "e63-samp-33", 
                "Equipped Product Name": "EEE UCS B200 M3", 
                "Equipped VID": "V01"
            }
        }
    }
}

OTHER TIPS

That data format really does look like YAML. Just in case someone stumbles onto this and is fine with a library solution:

import yaml
import pprint

s = """
Chassis 1:
    Servers:
        Server 1/1:
            Equipped Product Name: EEE UCS B200 M3
            Equiped PID: e63-samp-33
            Equipped VID: V01
            Acknowledged Cores: 16
            Acknowledged Adapters: 1
    PSU 1:
        Presence: Equipped
        VID: V00
        HW Revision: 0
"""

d = yaml.load(s)
pprint.pprint(d)

The output is:

{'Chassis 1': {'PSU 1': {'HW Revision': 0,
                         'Presence': 'Equipped',
                         'VID': 'V00'},
               'Servers': {'Server 1/1': {'Acknowledged Adapters': 1,
                                          'Acknowledged Cores': 16,
                                          'Equiped PID': 'e63-samp-33',
                                          'Equipped Product Name': 'EEE UCS B200 M3',
                                          'Equipped VID': 'V01'}}}}

For reference:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top