Git list all files modified (not added) since a specific commit INCLUDING ones that were added and later modified

StackOverflow https://stackoverflow.com/questions/23311187

Question

I recently started working on a project with a huge code base. I decided to create a local git repo to keep track of all my changes. Rather than downloading all the project's existing files and adding them to git. I only downloaded the ones I needed. As I needed more files, I downloaded them and added them to git.

Now the client wants me to provide a list of all files that I've changed since a particular commit.

git diff --diff-filter=M --name-only $last_deploy_commit_id

gives only the modified files that existed at that commit.

git diff --diff-filter=A --name-only $last_deploy_commit_id

lists all files added since that commit but not (necessarily) modified later on.

git diff --diff-filter=AM --name-only $last_deploy_commit_id

lists all files added OR modified since that commit.

What I want is to have a list of all files that

  • Either, already existed and were modified since that commit
  • Or, didn't exist at that commit, were created AND were later modified, both since that commit.

Is there a way to do this? I'm on Windows, if that helps. I'm open to using some PowerShell based script if need be.

Was it helpful?

Solution

You can pass the --name-status flag to git log to do this, along with a commit range <commit>^..HEAD:

$ git log --oneline 70f5c30^..HEAD --name-status
7f6aafa Add poopoo
A       poopoo.txt
1d961ae Add hello and goodbye
M       blar.txt
M       rawr.txt
0a1acf9 Add rawr
A       rawr.txt
70f5c30 Add blar moo and I'LL BE BACK!
M       README.md
A       blar.txt

The commit range <commit>^..HEAD uses an exclusive starting point, meaning that it's not included, so you have to use the parent of <commit>, which is <commit>^. See Pro Git: Commit Ranges.

NOTE: git log is a porcelain command, meaning that it's not guaranteed to be backwards compatible in future versions of Git. Normally, if you want to use the output of Git commands in a script, you'd use one of the plumbing commands instead. But since this seems to be a one-time use thing, using git log just this once for this purpose seems like a reasonable solution.

Filtering Out Added but Un-modified Files

After getting the output above, you could then possibly grep (or whatever the Windows PowerShell equivalent of grep is) lines that contain M or A and sort them, then filter out filenames where there is a line for A, but no line for M.

I don't want to spend the time to learn enough PowerShell in order to do this, but here's how you could filter the results if you were using a Unix environment with Ruby:

$ git log --oneline <commit>^..HEAD --name-status | \
$ grep --extended-regexp "^(A|M)" | \
$ ruby ~/Desktop/stackoverflow-answer.rb

where stackoverflow-answer.rb contains the following:

x = ARGF.map { |line| line.split("\t").map(&:chomp) }
        .each_with_object({}) do |parts, hash|
          if hash[parts.last]
            hash[parts.last] << parts.first
          else
            hash[parts.last] = [parts.first]
          end
        end
        .reject { |k,v| v.size == 1 && v.first == 'A' }
        .keys
puts x

OTHER TIPS

Warning, a Powsershell command won't work in a rootGit repo folder which has been renamed.
With Git 2.20 (Q4 2018), the way the Windows port figures out the current directory has been improved.

See commit 4745fee (23 Oct 2018) by Anton Serbulov (skvoboo).
See commit 937974f (23 Oct 2018) by Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit cc67487, 30 Oct 2018)

mingw: fix getcwd when the parent directory cannot be queried

GetLongPathName() function may fail when it is unable to query the parent directory of a path component to determine the long name for that component. It happens, because it uses FindFirstFile() function for each next short part of path. The FindFirstFile() requires List Directory and Synchronize desired access for a calling process.

In case of lacking such permission for some part of path, the GetLongPathName() returns 0 as result and GetLastError() returns ERROR_ACCESS_DENIED.

GetFinalPathNameByHandle() function can help in such cases, because it requires Read Attributes and Synchronize desired access to the target path only.

The GetFinalPathNameByHandle() function was introduced on Windows Server 2008/Windows Vista. So we need to load it dynamically.

That will help when doing a git log in a powershell session:

mingw: ensure getcwd() reports the correct case

When switching the current working directory, say, in PowerShell, it is quite possible to use a different capitalization than the one that is recorded on disk.
While doing the same in cmd.exe adjusts the capitalization magically, that does not happen in PowerShell so that getcwd() returns the current directory in a different way than is recorded on disk.

Typically this creates no problems except when you call:

git log .

in a subdirectory called, say, "GIT/" but you switched to "Git/" and your getcwd() reports the latter, then Git won't understand that you wanted to see the history as per the GIT/ subdirectory but it thinks you wanted to see the history of some directory that may have existed in the past (but actually never did).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top