문제

Is the exact behavior of the str.__mod__ documented?

These two lines of code works just as expected:

>>> 'My number is: %s.' % 123
'My number is: 123.'
>>> 'My list is: %s.' % [1, 2, 3]
'My list is: [1, 2, 3].'

This line behaves as expected too:

>>> 'Not a format string' % 123
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting

But what does this line and why it doesn't raise any error?

>>> 'Not a format string' % [1, 2, 3]
'Not a format string'

P. S.

>>> print(sys.version)
3.3.2 (default, Aug 15 2013, 23:43:52) 
[GCC 4.7.3]
도움이 되었습니까?

해결책 2

When the newest printf-style formatting was added, it seems quite a few little quirks appeared in the % formatting. Today (version 3.8), this is documented here, but was already mentionned as far as version 3.3 here.

The formatting operations described here exhibit a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly). Using the newer formatted string literals, the str.format() interface, or template strings may help avoid these errors. Each of these alternatives provides their own trade-offs and benefits of simplicity, flexibility, and/or extensibility.

In this specific case, Python sees a non-tuple value with a __getitem__ method on the right hand side of the % and assumes a format_map has to be done. This is typically done with a dict, but could indeed be done with any objects with a __getitem__ method.

In particular, a format_map is allowed to ignore unused keys, because you typically do not iterate over items of a mapping to access them.

>>> "Include those items: %(foo)s %(bar)s" % {"foo": 1, "bar": 2, "ignored": 3}
'Include those items: 1 2'

Your example is a use of that feature where all keys of your container are ignored.

>>> "Include no items:" % {"foo": 1, "bar": 2}
'Include no items:'

If you want further proof of that, check what happens when you use a list as the right-hand side.

>>> lst = ["foo", "bar", "baz"]
>>> "Include those items: %(0)s, %(2)s" % lst
TypeError: list indices must be integers or slices, not str

Python indeed attempts to get lst["0"], unfortunately there is no way to specify that the "0" should be converted to int, so this is doomed to fail with the % syntax.

Older versions

For the record, this seems to be a quirk which appeared way before Python 3.0, as I get the same behaviour as far as I can go, despite the documentation starting to mention it only for version 3.3.

Python 3.0.1+ (unknown, May  5 2020, 09:41:19) 
[GCC 9.2.0] on linux4
Type "help", "copyright", "credits" or "license" for more information.
>>> 'Not a format string' % [1, 2, 3]
'Not a format string'

다른 팁

I think the responsible lines can be found in the CPython source code, I got git v3.8.2:

In the function

PyObject *
PyUnicode_Format(PyObject *format, PyObject *args)

in Objects/unicodeobject.c, line 14944, there are the following lines

Objects/unicodeobject.c, line 15008

if (ctx.argidx < ctx.arglen && !ctx.dict) {
    PyErr_SetString(PyExc_TypeError,
                    "not all arguments converted during string formatting");
    goto onError;
}

This will give the error, if the arglen does not match, but will not give an error if ctx.dict is "true". When is it "true"?

Objects/unicodeobject.c, line 14976

if (PyMapping_Check(args) && !PyTuple_Check(args) && !PyUnicode_Check(args))
    ctx.dict = args;
else
    ctx.dict = NULL;

OK, PyMapping_Check checks the passed args, if that is "true", and we do not have a tuple or unicode string, we set ctx.dict = args.

What does PyMapping_Check do?

Objects/abstract.c, line 2110

int
PyMapping_Check(PyObject *o)
{
    return o && o->ob_type->tp_as_mapping &&
        o->ob_type->tp_as_mapping->mp_subscript;
}

From my understanding if that object can be used as a "mapping", and can be indexed/subscripted, this will return 1. In that case the value of ctx.dict will be set to args, which is !0, and so it will not go to the error case.

Both dict and list can be used as such mappings, and will thus not raise an error when used as arguments. tuple is explicitly excluded in the check in line 14976, probably since it is used to pass variadic arguments to the formatter.

Whether or why this behaviour is intentional is unclear to me, though, the parts in the source code are uncommented.


Based on this, we can try:

assert 'foo' % [1, 2] == 'foo'
assert 'foo' % {3: 4} == 'foo'
class A:
    pass
assert 'foo' % A() == 'foo'
# TypeError: not all arguments converted during string formatting
class B:
    def __getitem__(self):
        pass
assert 'foo' % B() == 'foo'

So it is sufficient for an object to have a __getitem__ method defined to not trigger an error.


EDIT: In v3.3.2, which was referenced in the OP, the offending lines are lines 13922, 13459 and 1918 in the same files, the logic looks the same.


EDIT2: In v3.0, the checks are in lines 8841 and 9226 in Objects/unicodeobject.c, PyMapping_Check from Objects/abstract.c is not used in the Unicode formatting code, yet.


EDIT3: According to some bisecting and git blame, the core logic (on ASCII strings, not unicode strings) goes back to Python 1.2, and was implemented by GvR himself over a quarter of a century ago:

commit caeaafccf7343497cc654943db09c163e320316d
Author: Guido van Rossum <guido@python.org>
Date:   Mon Feb 27 10:13:23 1995 +0000

    don't complain about too many args if arg is a dict

diff --git a/Objects/stringobject.c b/Objects/stringobject.c
index 7df894e12c..cb76d77f68 100644
--- a/Objects/stringobject.c
+++ b/Objects/stringobject.c
@@ -921,7 +921,7 @@ formatstring(format, args)
                        XDECREF(temp);
                } /* '%' */
        } /* until end */
-       if (argidx < arglen) {
+       if (argidx < arglen && !dict) {
                err_setstr(TypeError, "not all arguments converted");
                goto error;
        }

Probably GvR can tell us why this is intended behaviour.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top