Try a pandas Series, it was built for this.
import pandas as pd
s = pd.Series({'a':1, 'b':2, 'c':3})
s.values # a numpy array
Question
This is more of a question about programming style. I scrap webpages for fields such as: "Temperature: 51 - 62", "Height: 1000-1500"...etc The results are saved in a dictionary
{"temperature": "51-62", "height":"1000-1500" ...... }
All key and values are string type. Every key can map to one of many possible values. Now I want to convert this dictionary to numpy array/vector. I have the following concerns:
I am wondering what is the most clear and efficient way of write such a conversion in Python. I am thinking of building another dictionary maps the key to the index number of the vector. And many other dictionaries that maps the values to integers.
Another problem I am having is I am not sure about the range of some keys. I want to dynamically keep track of the mapping between string values and integers. For example, I may find that key1 can map to a val1_8 in the future.
Thanks
Solution
Try a pandas Series, it was built for this.
import pandas as pd
s = pd.Series({'a':1, 'b':2, 'c':3})
s.values # a numpy array
OTHER TIPS
>>> # a sequence of dictionaries in an interable called 'data'
>>> # assuming that not all dicts have the same keys
>>> pprint(data)
[{'x': 7.0, 'y1': 2.773, 'y2': 4.5, 'y3': 2.0},
{'x': 0.081, 'y1': 1.171, 'y2': 4.44, 'y3': 2.576},
{'y1': 0.671, 'y3': 3.173},
{'x': 0.242, 'y2': 3.978, 'y3': 3.791},
{'x': 0.323, 'y1': 2.088, 'y2': 3.602, 'y3': 4.43}]
>>> # get the unique keys across entire dataset
>>> keys = [list(dx.keys()) for dx in data]
>>> # flatten and coerce to 'set'
>>> keys = {itm for inner_list in keys for itm in inner_list}
>>> # create a map (look-up table) from each key
>>> # to a column in a NumPy array
>>> LuT = dict(enumerate(keys))
>>> LuT
{'y2': 0, 'y3': 1, 'y1': 2, 'x': 3}
>>> idx = list(LuT.values())
>>> # pre-allocate NUmPy array (100 rows is arbitrary)
>>> # number of columns is len(LuT.keys())
>>> D = NP.empty((100, len(LuT.keys())))
>>> keys = list(LuT.keys())
>>> keys
[0, 1, 2, 3]
>>> # now populate the array from the original data using LuT
>>> for i, row in enumerate(data):
D[i,:] = [ row.get(LuT[k], 0) for k in keys ]
>> D[:5,:]
array([[ 4.5 , 2. , 2.773, 7. ],
[ 4.44 , 2.576, 1.171, 0.081],
[ 0. , 3.173, 0.671, 0. ],
[ 3.978, 3.791, 0. , 0.242],
[ 3.602, 4.43 , 2.088, 0.323]])
compare the last result (first 5 rows of D) with data, above
note that the ordering is preserved for each row (a single dictionary) with a less-than-complete set of keys--in other words, column 2 of D always corresponds to the values keyed to y2,, etc., even if the given row in data has no values stored for that key; eg, look at the third row in data, which has only two key/value pairs, in the third row of D, the first and last column are both 0, these columns correspond to keys x and y2, which are in fact the two missing keys