Skip keys without Type checking in Python (pymssql)
-
21-06-2021 - |
Question
I need to access all the non-integer keys for a dict
that looks like:
result = {
0 : "value 1",
1 : "value 2",
"key 1" : "value 1",
"key 2" : "value 2",
}
I am currently doing this by:
headers = [header for header in tmp_dict.keys() if not isinstance(header, int)]
My question:
- Is there a way to do this without type checking?
- This
tmp_dict
is coming out of a query usingpymssql
with theas_dict=True
attribute, and for some reason it returns all the column names with data as expected, but also includes the same data indexed by integers. How can I get my query result as a dictionary with only the column values and data?
Thanks for your help!
PS - Despite my issues being resolved by potentially answering 2, I'm curious how this can be done without type checking. Mainly for the people who say "never do type checking, ever."
Solution
With regard to your question about type checking, the duck-type approach would be to see whether it can be converted to or used as an int
.
def can_be_int(obj):
try:
int(obj)
except (TypeError, ValueError):
return False
return True
headers = [header for header in tmp_dict.keys() if not can_be_int(header)]
Note that float
s can be converted to int
s by truncating them, so this isn't necessarily exactly equivalent.
A slight variation on the above would be to use coerce(0, obj)
in place of int(obj)
. This will allow any kind of object that can be converted to a common type with an integer. You could also do something like 0 + obj and 1 * obj
which will check for something that can be used in a mathematical expression with integers.
You could also check to see whether its string representation is all digits:
headers = [header for header in tmp_dict.keys() if not str(header).isdigit()]
This is probably closer to a solution that doesn't use type-checking, although it will be slower, and it's of course entirely possible that a column name would be a string that is only digits! (Which would fail with many of these approaches, to be honest.)
Sometimes explicit type-checking really is the best choice, which is why the language has tools for letting you check types. In this situation I think you're fine, especially since the result dictionary is documented to have only integers and strings as keys. And you're doing it the right way by using isinstance()
rather than explicitly checking type() == int
.
OTHER TIPS
Looking at the source code of pymssql (1.0.2), it is clear that there is no option for the module to not generate data indexed by integers. But note that data indexed by column name can be omitted if the column name is empty.
/* mssqldbmodule.c */
PyObject *fetch_next_row_dict(_mssql_connection *conn, int raise) {
[...]
for (col = 1; col <= conn->num_columns; col++) {
[...]
// add key by column name, do not add if name == ''
if (strlen(PyString_AS_STRING(name)) != 0)
if ((PyDict_SetItem(dict, name, val)) == -1)
return NULL;
// add key by column number
if ((PyDict_SetItem(dict, PyInt_FromLong(col-1), val)) == -1)
return NULL;
}
[...]
}
Regarding your first question, filtering result set by type checking is surely the best way to do that. And this is exactly how pymssql is returning data when as_dict is False:
if self.as_dict:
row = iter(self._source).next()
self._rownumber += 1
return row
else:
row = iter(self._source).next()
self._rownumber += 1
return tuple([row[r] for r in sorted(row.keys()) if type(r) == int])
The rationale behind as_dict=True
is that you can access by index and by name. Normally you'd get a tuple
you index into, but for compatibility reasons being able to index a dict
as though it was a tuple
means that code depending on column numbers can still work, without being aware that column names are available.
If you're just using result
to retrieve columns (either by name or index), I don't see why you're concerned about removing them? Just carry on regardless. (Unless for some reason you plan to pickle or otherwise persist the data elsewhere...)
The best way to filter them out though, is using isinstance
- duck typing in this case is actually unpythonic and inefficient. Eg:
names_only = dict( (k, v) for k,v in result.iteritems() if not isinstance(k, int) )
Instead of a try
and except
dance.
>>> sorted(result)[len(result)/2:]
['key 1', 'key 2']
This will remove the duplicated integer-keyed
entrys. I think what you're doing is fine though.