質問

I stop it in the example of "datetime", is rewritten in a real example of lxml.
(It may be strange because English is translated in Google Translate is my statement I'm sorry.)

It is thought that I like lxml from very good performance, but the source is hard to read.
If you are actively using the XML, I do frequently can also be modified code of python.
Time has passed since forgotten, source because very difficult to understand,
I have taken the time to debug and fix.
For example, I think usually when you search as follows: deep XML hierarchy.

elem = lxml.etree.parse ("xxx/xxx/sample.xml").getroot()

elem.xpath("//depth3/text()")[0]

elem.find("./depth1/depth2/depth3").get("attr1").text

I wanted to use as follows.
(Use this code it's just me.)

elem.depth3.text (Ex.1)
OR
elem.depth1.depth2.depth3.text (Ex.2)

I tried the class inheritance is first to implement this.
You have customized a little bit by referring to the "Using custom Element classes in lxml".
I used the __getattr__ in order to search an XML element.

from lxml import etree
class CustomElement (etree.ElementBase):
    def __ getattr__ (self, k):
        ret = self.xpath ("/ /" + k)
        setattr(self, k, ret)
        return getattr(self, k)

Example of (Ex.1) to succeed.
But the example of (Ex.2) becomes Attribute Error __getattr__ is not present in the instance of the return of etree._Element depth1.

Although not (supplemental) practical, but I used an example of adding a "millisecond" of "datetime" in the first question from Easy to understand.

It was thought then it was a way to add functions to the Element class of lxml using the ctypes module.

import ctypes
import lxml.etree

class PyObject_HEAD(ctypes.Structure):
    _fields_ = [
        ('HEAD', ctypes.c_ubyte * (object.__basicsize__ -
                           ctypes.sizeof(ctypes.c_void_p))),
        ('ob_type', ctypes.c_void_p)
    ]
def __getattr__(self, k):
    ret = self.xpath("//" + k)
    setattr(self, k, ret)
    return getattr(self, k)

_get_dict          = ctypes.pythonapi._PyObject_GetDictPtr
_get_dict.restype  = ctypes.POINTER(ctypes.py_object)
_get_dict.argtypes = [ctypes.py_object]

EE = _get_dict(lxml.etree._Element).contents.value
EE["__getattr__"] = __getattr__

elem = lxml.etree.parse("xxx/xxx/sample.xml").getroot()
elem.xpath("//depth3")[0]

=> Return _Element object

from ispect import getsource
print getsource(elem.__getattr__)

=>def __getattr__(self, k):
=> ret = self.xpath("//" + k)
=> setattr(self, k, ret)
=> return getattr(self, k)
sources is added..

elem.depth3

=> AttributeError .. no attribute 'depth3'

I do not know if or should I write how using the "PyObject_GetAttr".
Please tell me if.

Best regards


====================Previous Question===================================
I'm trying to enhancements in ctypes. Add function usually go well. However, it does not work if you add a special method and Why?

import ctypes as c

class PyObject_HEAD(c.Structure):
    _fields_ = [
        ('HEAD', c.c_ubyte * (object.__basicsize__ -
                              c.sizeof(c.c_void_p))),
        ('ob_type', c.c_void_p)
    ]

pgd = c.pythonapi._PyObject_GetDictPtr
pgd.restype = c.POINTER(c.py_object)
pgd.argtypes = [c.py_object]

import datetime

def millisecond(td):
    return (td.microsecond / 1000)

d = pgd(datetime.datetime)[0]
d["millisecond"] = millisecond

now = datetime.datetime.now()
print now.millisecond(), now.microsecond

This prints 155 155958, Ok!

def __getattr__(self, k):
    return self, k

d["__getattr__"] = __getattr__

now = datetime.datetime
print now.hoge

This doesn't work, why?

Traceback (most recent call last):
  File "xxxtmp.py", line 31, in <module>
    print now.hoge
AttributeError: type object 'datetime.datetime' has no attribute 'hoge'
役に立ちましたか?

解決

PyObject_GetAttr (Objects/object.c) uses the type's tp_getattro slot, or tp_getattr if the former isn't defined. It doesn't look up __getattribute__ in the MRO of the type.

For a custom __getattr__ you'll need to subclass datetime. Your heap type will use slot_tp_getattr_hook (Objects/typeobject.c) as its tp_getattro. This function will look for __getattribute__ and __getattr__ in the type's MRO by calling _PyType_Lookup (Objects/typeobject.c).


Given your update, see "using custom Element classes in lxml". For multiple results I've hacked a __getattr__ hook that uses a suffix notation for the index. It defaults to index 0 otherwise. Admittedly I haven't given it much thought, but clashes with existing names can be avoided if you always use the index.

from lxml import etree

def make_parser(element):
    lookup = etree.ElementDefaultClassLookup(element=element)
    parser = etree.XMLParser()
    parser.setElementClassLookup(lookup)
    return parser

class CustomElement(etree.ElementBase):
    def __getattr__(self, attr):
        try:
            name, index = attr.rsplit('_', 1)
            index = int(index)
        except ValueError:
            name = attr
            index = 0
        return self.xpath(name)[index]

parser = make_parser(CustomElement)

For example:

>>> spam = etree.fromstring(r'''
... <spam>
...     <foo>
...         <bar>eggs00</bar>
...         <bar>eggs01</bar>
...     </foo>
...     <foo>
...         <bar>eggs10</bar>
...         <bar>eggs11</bar>
...     </foo>
... </spam>
... ''', parser)

>>> spam.foo_0.bar_0.text
'eggs00'
>>> spam.foo_0.bar_1.text
'eggs01'
>>> spam.foo_1.bar_0.text
'eggs10'
>>> spam.foo_1.bar_1.text
'eggs11'

他のヒント

I don't think you can override __getattr__ that way. Basically, you are hacking the object's __dict__ to include a new method. If you call now.millisecond, the original "attribute getter" gets called, looks into the dict, and returns you're new method. I'm not sure where this attribute getter resides (might be in C code), but it can't be in the dict it looks up stuff in - so you can't override it this way.

You might try __getattribute__ instead, but I don't know whether that will work either. Be aware that it's much harder to implement correctly (see https://stackoverflow.com/a/3278104/143091).

That being said, it's probably not a good idea to hack builtins this way. A lot of python standard library code might depend on behavior that you change, and your code might fail in hard to understand ways. Also it's confusing for people who know python and try to understand you're code.

I hope you don't have this nasty trick from me. I only use it to backport features that are not available in older versions of python, or a library, for example:

if not hasattr(wnck.Screen, "get_workspaces"):
    def get_workspaces(screen):
        return [screen.get_workspace(i) for i in range(screen.get_workspace_count())]
        _get_dict(wnck.Screen)[0]['get_workspaces'] = get_workspaces

This way, I can develop primarily for the modern version of the library, but still support ancient versions if just a function or two are missing, without having to change my code.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top