pyDatalog: handling unbound variables in a custom predicate

https://stackoverflow.com/questions/15199636

18-03-2022
|

Question

I'm writing a pyDatalog program to analyse weather data from Weather Underground (just as a demo for myself and others in the company at the moment). I have written a custom predicate resolver which returns readings between a start and end time:

# class for the reading table.
class Reading(Base):
      __table__ = Table('reading', Base.metadata, autoload = True, autoload_with = engine)
      def __repr__(self):
        return str(self.Time)
      # predicate to resolve 'timeBetween(X, Y, Z)' statements
      # matches items as X where the time of day is between Y and Z (inclusive).
      # if Y is later than Z, it returns the items not between Z and Y (exclusive).
      # TODO - make it work where t1 and t2 are not bound.
      # somehow needs to tell the engine to try somewhere else first.
      @classmethod
      def _pyD_timeBetween3(cls, dt, t1, t2):
        if dt.is_const():
          # dt is already known
          if t1.is_const() and t2.is_const():
            if (dt.id.Time.time() >= makeTime(t1.id)) and (dt.id.Time.time() <= makeTime(t2.id)):
              yield (dt.id, t1.id, t2.id)
        else:
          # dt is an unbound variable
          if t1.is_const() and t2.is_const():
            if makeTime(t2.id) > makeTime(t1.id):
              op = 'and'
            else:
              op = 'or'
            sqlWhere = "time(Time) >= '%s' %s time(Time) <= '%s'" % (t1.id, op, t2.id)
            for instance in cls.session.query(cls).filter(sqlWhere):
              yield(instance, t1.id, t2.id)

This works fine in the case where t1 and t2 are bound to specific values:

:> easterly(X) <= (Reading.WindDirection[X] == 'East')
:> + rideAfter('11:00:00')
:> + rideBefore('15:00:00')
:> goodTime(X) <= rideAfter(Y) & rideBefore(Z) & Reading.timeBetween(X, Y, Z)
:> goodTime(X)
[(2013-02-19 11:25:00,), (2013-02-19 12:45:00,), (2013-02-19 12:50:00,), (2013-02-19  13:25:00,), (2013-02-19 14:30:00,), (2013-02-19 15:00:00,), (2013-02-19 13:35:00,), (2013-02-19 13:50:00,), (2013-02-19 12:20:00,), (2013-02-19 12:35:00,), (2013-02-19 14:05:00,), (2013-02-19 11:20:00,), (2013-02-19 11:50:00,), (2013-02-19 13:15:00,), (2013-02-19 14:55:00,), (2013-02-19 12:00:00,), (2013-02-19 13:00:00,), (2013-02-19 14:20:00,), (2013-02-19 14:15:00,), (2013-02-19 13:10:00,), (2013-02-19 12:10:00,), (2013-02-19 14:45:00,), (2013-02-19 14:35:00,), (2013-02-19 13:20:00,), (2013-02-19 11:10:00,), (2013-02-19 13:05:00,), (2013-02-19 12:55:00,), (2013-02-19 14:10:00,), (2013-02-19 13:45:00,), (2013-02-19 13:55:00,), (2013-02-19 11:05:00,), (2013-02-19 12:25:00,), (2013-02-19 14:00:00,), (2013-02-19 12:05:00,), (2013-02-19 12:40:00,), (2013-02-19 14:40:00,), (2013-02-19 11:00:00,), (2013-02-19 11:15:00,), (2013-02-19 11:30:00,), (2013-02-19 11:45:00,), (2013-02-19 13:40:00,), (2013-02-19 11:55:00,), (2013-02-19 14:25:00,), (2013-02-19 13:30:00,), (2013-02-19 12:30:00,), (2013-02-19 12:15:00,), (2013-02-19 11:40:00,), (2013-02-19 14:50:00,), (2013-02-19 11:35:00,)]

However if I declare the goodTime rule with the conditions in the other order (i.e. where Y and Z are unbound at the point it tries to resolve timeBetween), it returns an empty set:

:> atoms('niceTime')
:> niceTime(X) <= Reading.timeBetween(X, Y, Z) & rideAfter(Y) & rideBefore(Z)
<pyDatalog.pyEngine.Clause object at 0x0adfa510>
:> niceTime(X)
[]

This seems wrong - the two queries should return the same set of results.

My question is whether there is a way of handling this situation in pyDatalog? I think what needs to happen is that the timeBetween predicate should be able to tell the engine to back off somehow and try to resolve other rules first before trying this one, but I can't see any reference to this in the docs.

Solution

The pyDatalog reference says : "although the order of pyDatalog statements is indifferent, the order of literals within a body is significant" pyDatalog does resolve predicates in a body in the order they are stated.

Having said that, it would be possible to improve pyDatalog to resolve predicates with bound variables first, but I'm not sure why this would be important.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow