Question

Basically I want to get the number of lines-of-code in the repository after each commit.

The only (really crappy) ways I have found is to use git filter-branch to run wc -l *, and a script that runs git reset --hard on each commit, then runs wc -l

To make it a bit clearer, when the tool is run, it would output the lines of code of the very first commit, then the second and so on. This is what I want the tool to output (as an example):

me@something:~/$ gitsloc --branch master
10
48
153
450
1734
1542

I've played around with the ruby 'git' library, but the closest I found was using the .lines() method on a diff, which seems like it should give the added lines (but does not: it returns 0 when you delete lines for example)

require 'rubygems'
require 'git'

total = 0
g = Git.open(working_dir = '/Users/dbr/Desktop/code_projects/tvdb_api')    

last = nil
g.log.each do |cur|
  diff = g.diff(last, cur)
  total = total + diff.lines
  puts total
  last = cur
end
Was it helpful?

Solution

You might also consider gitstats, which generates this graph as an html file.

OTHER TIPS

You may get both added and removed lines with git log, like:

git log --shortstat --reverse --pretty=oneline

From this, you can write a similar script to the one you did using this info. In python:

#!/usr/bin/python

"""
Display the per-commit size of the current git branch.
"""

import subprocess
import re
import sys

def main(argv):
  git = subprocess.Popen(["git", "log", "--shortstat", "--reverse",
                        "--pretty=oneline"], stdout=subprocess.PIPE)
  out, err = git.communicate()
  total_files, total_insertions, total_deletions = 0, 0, 0
  for line in out.split('\n'):
    if not line: continue
    if line[0] != ' ': 
      # This is a description line
      hash, desc = line.split(" ", 1)
    else:
      # This is a stat line
      data = re.findall(
        ' (\d+) files changed, (\d+) insertions\(\+\), (\d+) deletions\(-\)', 
        line)
      files, insertions, deletions = ( int(x) for x in data[0] )
      total_files += files
      total_insertions += insertions
      total_deletions += deletions
      print "%s: %d files, %d lines" % (hash, total_files,
                                        total_insertions - total_deletions)


if __name__ == '__main__':
  sys.exit(main(sys.argv))

http://github.com/ITikhonov/git-loc worked right out of the box for me.

The first thing that jumps to mind is the possibility of your git history having a nonlinear history. You might have difficulty determining a sensible sequence of commits.

Having said that, it seems like you could keep a log of commit ids and the corresponding lines of code in that commit. In a post-commit hook, starting from the HEAD revision, work backwards (branching to multiple parents if necessary) until all paths reach a commit that you've already seen before. That should give you the total lines of code for each commit id.

Does that help any? I have a feeling that I've misunderstood something about your question.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top