Question

I need author name and last commit time for a specified file with python. Currentrly, I'm trying to use dulwich.

There're plenty of apis to retrieve objects for a specific SHA like:

repo = Repo("myrepo")
head = repo.head()
object = repo.get_object(head)
author = object.author
time = object.commit_time

But, how do i know the recent commit for the specific file? Is there a way to retrieve it like:

repo = Repo("myrepo")
commit = repo.get_commit('a.txt')
author = commit.author
time = commit.commit_time

or

repo = Repo("myrepo")
sha = repo.get_sha_for('a.txt')
object = repo.get_object(sha)
author = object.author
time = object.commit_time

Thank you.

Was it helpful?

Solution 2

Something like this seems to work:

from dulwich import repo, diff_tree

fn = 'a.txt'
r = repo.Repo('.')
prev = None
walker = r.get_graph_walker()
cset = walker.next()
while cset is not None:

    commit = r.get_object(cset)
    if prev is None:
        prev = commit.tree
        cset = walker.next()
        continue


    res = None
    delta = diff_tree.tree_changes(r, prev, commit.tree)
    for x in diff_tree.tree_changes(r, prev, commit.tree):
        if x.new.path == fn:
            res = cset
            break

    if res:
        break

    prev = commit.tree
    cset = walker.next()

print fn, res

OTHER TIPS

A shorter example, using Repo.get_walker:

r = Repo(".")
p = "the/file/to/look/for"

w = r.get_walker(paths=[p], max_entries=1)
try:
    c = iter(w).next().commit
except StopIteration:
     print "No file %s anywhere in history." % p
else:
    print "%s was last changed at %s by %s (commit %s)" % (
        p, time.ctime(c.author_time), c.author, c.id)

An answer updated for python 3.10 and Dulwich 0.20.32 or later based on the answer by @jelmer here.

from dulwich import repo
import datetime

r = repo.Repo("path/to/repo")
p = b"relative/path/to/file/in/repo" # Must be bytes not string
w = r.get_walker(paths=[p], max_entries=1)
l = list(w)
if l:
    c = l[0].commit
    when = datetime.datetime.fromtimestamp(c.author_time)
    print(f"{p} last modified {when} by {c.author} in {c.id}")
else:
    print(f"No file called {p} found in repo")

However I find this very slow (in my test repo 0.688 seconds) the equivalent using GitPython is faster (0.453 seconds) for me:

import git # import of GitPython

repo = git.repo.Repo("path/to/repo")
p = "relative/path/to/file/in/repo" # Note string rather than bytes
walker = repo.iter_commits(paths=[p, ], max_count=1)
l = list(walker)
if l:
    c = l[0]
    print(f"{p} last modified {c.authored_datetime} by {c.author} in {c}")
else:
    print(f"No file called {p} found in repo")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top