Skip keys without Type checking in Python (pymssql)

https://stackoverflow.com/questions/11545740

21-06-2021
|

Question

I need to access all the non-integer keys for a dict that looks like:

result = { 
           0 : "value 1",
           1 : "value 2",
           "key 1" : "value 1",
           "key 2" : "value 2", 
         }

I am currently doing this by:

headers = [header for header in tmp_dict.keys() if not isinstance(header, int)]

My question:

Is there a way to do this without type checking?
This tmp_dict is coming out of a query using pymssql with the as_dict=True attribute, and for some reason it returns all the column names with data as expected, but also includes the same data indexed by integers. How can I get my query result as a dictionary with only the column values and data?

Thanks for your help!

PS - Despite my issues being resolved by potentially answering 2, I'm curious how this can be done without type checking. Mainly for the people who say "never do type checking, ever."

Solution

With regard to your question about type checking, the duck-type approach would be to see whether it can be converted to or used as an int.

def can_be_int(obj):
    try:
        int(obj)
    except (TypeError, ValueError):
        return False
    return True

headers = [header for header in tmp_dict.keys() if not can_be_int(header)]

Note that floats can be converted to ints by truncating them, so this isn't necessarily exactly equivalent.

A slight variation on the above would be to use coerce(0, obj) in place of int(obj). This will allow any kind of object that can be converted to a common type with an integer. You could also do something like 0 + obj and 1 * obj which will check for something that can be used in a mathematical expression with integers.

You could also check to see whether its string representation is all digits:

headers = [header for header in tmp_dict.keys() if not str(header).isdigit()]

This is probably closer to a solution that doesn't use type-checking, although it will be slower, and it's of course entirely possible that a column name would be a string that is only digits! (Which would fail with many of these approaches, to be honest.)

Sometimes explicit type-checking really is the best choice, which is why the language has tools for letting you check types. In this situation I think you're fine, especially since the result dictionary is documented to have only integers and strings as keys. And you're doing it the right way by using isinstance() rather than explicitly checking type() == int.

OTHER TIPS

Looking at the source code of pymssql (1.0.2), it is clear that there is no option for the module to not generate data indexed by integers. But note that data indexed by column name can be omitted if the column name is empty.

/* mssqldbmodule.c */
PyObject *fetch_next_row_dict(_mssql_connection *conn, int raise) {
    [...]
    for (col = 1; col <= conn->num_columns; col++) {
        [...]
        // add key by column name, do not add if name == ''
        if (strlen(PyString_AS_STRING(name)) != 0)
            if ((PyDict_SetItem(dict, name, val)) == -1)
                return NULL;

        // add key by column number
        if ((PyDict_SetItem(dict, PyInt_FromLong(col-1), val)) == -1)
            return NULL;
    }
    [...]
}

Regarding your first question, filtering result set by type checking is surely the best way to do that. And this is exactly how pymssql is returning data when as_dict is False:

if self.as_dict:
    row = iter(self._source).next()
    self._rownumber += 1
    return row
else:
    row = iter(self._source).next()
    self._rownumber += 1
    return tuple([row[r] for r in sorted(row.keys()) if type(r) == int])

The rationale behind as_dict=True is that you can access by index and by name. Normally you'd get a tuple you index into, but for compatibility reasons being able to index a dict as though it was a tuple means that code depending on column numbers can still work, without being aware that column names are available.

If you're just using result to retrieve columns (either by name or index), I don't see why you're concerned about removing them? Just carry on regardless. (Unless for some reason you plan to pickle or otherwise persist the data elsewhere...)

The best way to filter them out though, is using isinstance - duck typing in this case is actually unpythonic and inefficient. Eg:

names_only = dict( (k, v) for k,v in result.iteritems() if not isinstance(k, int) )

Instead of a try and except dance.

>>> sorted(result)[len(result)/2:]
['key 1', 'key 2']

This will remove the duplicated integer-keyed entrys. I think what you're doing is fine though.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow