A git commit sha is generated from the following information
- commit message
- author signature (identity + timestamp)
- committer signature (identity + timestamp)
- tree sha (hierarchy of directories and files witin the commit)
- list of the shas of the parent commits
As the shas are different, this is because at least one of these information differ.
In order to get a better understanding of what are those data for each commit (and how they differ one from another) you can run the following command to get the raw output of each commit
$ git show --format=raw <commit_sha>
Example of the output of this command
Based on a random commit of the libgit2 project
$ git show --format=raw eb58e2d
commit eb58e2d0be4e07c2ef873a5f0562eaa90826c2de
tree 41959050b1e3adb428e140102a0c321949be516b
parent 3b5001b4c911db9c47d62399c1adc03bd8a3ca72
parent 3e9e6cdaff8acb11399736abbf793bf2d000d037
author Vicent Marti <tanoku@gmail.com> 1371063948 +0200
committer Vicent Marti <tanoku@gmail.com> 1371063948 +0200
Merge remote-tracking branch 'arrbee/minor-paranoia' into development
diff --cc src/refdb.c
index 359842e,4271b58..6da409a
--- a/src/refdb.c
+++ b/src/refdb.c
@@@ -86,9 -86,10 +86,10 @@@ int git_refdb_compress(git_refdb *db
return 0;
}
-static void refdb_free(git_refdb *db)
+void git_refdb__free(git_refdb *db)
{
refdb_free_backend(db);
+ git__memset(db, 0, sizeof(*db));
git__free(db);
}
Back to your questions
I get zero output - doesn't this mean the commits are identical
This means that the content of what is being pointed at by the commits is the same. But the metadata may certainly differ.
Maybe I mis-understand, but shouldn't an SHA-1 hash directly represent content in a file?
In Git, SHA-1 hashes are used to represent git objects: blobs (i.e. files), trees (i.e. list of blobs and sub trees) and commits. You can find more information about this in the chapter 9.2 Git Internals - Git Objects of the Pro Git book.
For example, in my log I found the following quadruple
This may happen when you amend/rebase/fixup the content of your commits for instance. In these cases, only the commit date would change.
In any case, I am wondering if it is wise / unwise to attempt to filter such apparent duplicates
You don't have to cleanup by yourself. Those objects are stored in the Git object database. Git implements a garbage collecting mechanism which will regularly and automatically remove orphaned objects from it (see git-gc documentation for more details).