Is using Git Stash as a workflow an antipattern?

https://softwareengineering.stackexchange.com/questions/255165

05-10-2020
|

Question

I've recently been looking at how me and my team uses Git and how our workflows work. We currently use a feature-branch workflow which seems to work well.

I've also seen some individuals on our team use workflow based on git stash. The workflow goes something like this:

Work on a main branch (like master)
Make commits as you go
If you need to get changes or switch branches, push your uncommitted changes onto the stash
Once your updating is done, pop the changes off the stash.

I should mention that this workflow is used instead of a feature branch workflow. Instead of taking a branch and working on it, developers here only ever work on a single branch and push/pop off the stack as they see fit.

I actually don't think this is a great workflow, and branching would be more appropriate than using git stash in this way. I can see the value of git stash as an emergency operation, but not for using it in a daily, regular workflow.

Would using git stash regularly be considered an anti-pattern? If so, what are some specific problems that could arise? If not, what are the benefits?

Solution

From the Git SCM Book:

Often, when you’ve been working on part of your project, things are in a messy state and you want to switch branches for a bit to work on something else. The problem is, you don’t want to do a commit of half-done work just so you can get back to this point later. The answer to this issue is the git stash command.

Stashing takes the dirty state of your working directory — that is, your modified tracked files and staged changes — and saves it on a stack of unfinished changes that you can reapply at any time.

Given this description, I would say this is an Anti Pattern. An overly simplified explanation of Git Stash would be that it is the "Cut and Paste" of source control. You take a bunch of changed files, "stash" them away in a holding pen outside of Git's normal branching workflow, and then reapply those changes to a different branch at a later date.

Going back a little further, committing to master is the anti pattern here. Use branches. That's what they were designed for.

It really boils down to this:

You can hammer a screw into the wall and it will hold up a picture, but using a screwdriver is what you should do. Don't use a hammer when the screwdriver is sitting right beside you.

About Committing "Broken" Code

While the following is opinion, I have come to this opinion from experience.

Commit early, and commit often. Commit as much broken code as you want. View your local commit history as "save points" while you hack away at something. Once you've done a logical piece of work, make a commit. Sure it might break everything, but that doesn't matter as long as you don't push those commits. Before pushing, rebase and squash your commits.

Create new branch
Hack hack hack
Commit broken code
Polish the code and make it work
Commit working code
Rebase and Squash
Test
Push when tests are passing

For the OP, this Linux kernal message thread might be of interest, because it kind of sounds like some members of the OP's team is using Git in a similar manner.

@RibaldEddie said in a comment below:

First of all, a stash is not outside of a "branching workflow" since under the hood a stash is just another branch.

(at the risk of incurring the wrath of many people)

Linus said:

With "git stash", you can have multiple different stashed things too, but they don't queue up on each other - they are just random independent patches that you've stashed away because they were inconvenient at some point.

What I think @RibaldEddie is trying to say is that you can use git stash in a feature branch workflow -- and this is true. It's not the use of git stash that is the problem. It is the combination of committing to master and using git stash. This is an anti pattern.

Clarifying `git rebase`

From @RibaldEddie's comment:

Rebasing is much more like copy-pasting and even worse modifies committed history.

(Emphasis mine)

Modifying commit history is not a bad thing, as long as it is local commit history. If you rebase commits that you've already pushed, you'll essentially orphan anyone else using your branch. This is bad.

Now, say you've made several commits during the course of a day. Some commits were good. Some... not so good. The git rebase command in conjunction with squashing your commits is a good way to clean up your local commit history. It's nice to merge in one commit to public branches because it keeps the commit history of your team's shared branches clean. After rebasing, you'll want to test again, but if tests pass then you can push one clean commit instead of several dirty ones.

There is another interesting Linux Kernel thread on clean commit history.

Again, from Linus:

I want clean history, but that really means (a) clean and (b) history.

People can (and probably should) rebase their private trees (their own work). That's a cleanup. But never other peoples code. That's a "destroy history"

So the history part is fairly easy. There's only one major rule, and one minor clarification:

You must never EVER destroy other peoples history. You must not rebase commits other people did. Basically, if it doesn't have your sign-off on it, it's off limits: you can't rebase it, because it's not yours.

Notice that this really is about other peoples history, not about other peoples code. If they sent stuff to you as an emailed patch, and you applied it with "git am -s", then it's their code, but it's your history.

So you can go wild on the "git rebase" thing on it, even though you didn't write the code, as long as the commit itself is your private one.

Minor clarification to the rule: once you've published your history in some public site, other people may be using it, and so now it's clearly not your private history any more.

So the minor clarification really is that it's not just about "your commit", it's also about it being private to your tree, and you haven't pushed it out and announced it yet.

...

Now the "clean" part is a bit more subtle, although the first rules are pretty obvious and easy:

Keep your own history readable

Some people do this by just working things out in their head first, and not making mistakes. but that's very rare, and for the rest of us, we use "git rebase" etc while we work on our problems.

So "git rebase" is not wrong. But it's right only if it's YOUR VERY OWN PRIVATE git tree.

Don't expose your crap.

This means: if you're still in the "git rebase" phase, you don't push it out. If it's not ready, you send patches around, or use private git trees (just as a "patch series replacement") that you don't tell the public at large about.

(emphasis mine)

Conclusion

In the end, the OP has some developers doing this:

git checkout master
(edit files)
git commit -am "..."
(edit files)
git stash
git pull
git stash (pop|apply)

There are two problems here:

Developers are committing to master. Lock this down immediately. Really, this is the biggest problem.
Developers are constantly using git stash and git pull on master when they should be using feature branches.

There is nothing wrong with using git stash -- especially before a pull -- but using git stash in this manner is an anti pattern when there are better workflows in Git.

Their use of git stash a red herring. It is not the problem. Committing to master is the problem.

OTHER TIPS

I personally only use stash for short, unexpected interruptions, like someone asking a question that requires changing to a different branch. I do this because I have forgotten about stashes before, then they wouldn't apply cleanly. Regular commits on feature branches are much harder to forget about, and easier to merge, so now I tend to just make a broken commit, then do a git reset HEAD~1 or a rebase if I don't want to keep it later.

However, the great thing about distributed version control is people can use their preferred workflow in their own repositories, as long as the shared repositories meet the standards. I would make sure people aren't just using a stash workflow because they don't have enough training or awareness of alternatives, but if they still choose a workflow you find suboptimal, I would leave it be.

I think the part of your question that is an anti-pattern is the use of a single shared master branch. However, if you were to include a develop branch in addition to the master branch and then use stashes to deal with your own context switches in the develop branch, that would not be an anti-pattern, and it very closely mirrors some of the workflow describe by organizations like Etsy and Facebook.

That having been said, @Greg Burghardt's answer above is a bit too favourable to the so-called git-flow or feature-branch work flow. I used to advocate for a similar strategy but after realizing that it adds unnecessary complexity and creates a false sense of security, I no longer do. It's also a holdover from the days of non-decentralized version control systems like subversion.

Firstly since Git is a decentralized version control system unlike subversion, a developer's local repository is essentially a giant branch of the code in and of itself. What an individual develop does locally doesn't and shouldn't have an impact on the other team members unless broken or buggy code is pushed up to any shared branches in a shared repository.

The rebase command, however, can damage the history of branch when there is a merge conflict in one of the replayed commits. From http://ryantablada.com/post/the-dangers-of-rebasing-a-branch

The rest of the rebase goes along smoothly, tests all seem to pass. A PR is made.

And then some more code is written that relies on the commentsForAllPosts property and everything is broken. But who do we go and ask for help? git blame shows that line of code has only been written by the server side engineer and he throws up his hands.

Now your front-end engineer is out on vacation, sick leave, or who knows. No one can figure out what that code should look like!

Rebase has killed the team's ability to look at the history to find what went wrong because any merge conflicts on the child branch are killed and the original code is lost forever.

If this same merge conflict came up and merge was used, the blame would show that that line of code had been touched in the merge process, the commit on the parent branch, and the commit on the child branch. Some toying around with the three permutations and you can get the original intent back into the code base and working without a ton of head scratching a finger pointing. And all you really had was another commit

Furthermore, a multiple branching model presupposes that no two branches could ever contain interdependent code changes. When that does inevitably occur, the developer now has to juggle yet more branches to work efficiently.

The fundamental anti-pattern that I see is not related to branches vs. stashes, but rather more about the kinds of problems that some very smart people have been talking about for a while now: do you have confidence in your code via the use of unit tests and a good architecture? Are you able to make incremental changes to your code such that your developers can reason about changes easily and understand what a change will do? Do your developers even run through new code once to see if it actually works? (Yes I have seen this before).

If the answer to those questions is no, then it doesn't really matter how many branches you have-- developers will say that code is ready, working, and fit for production when it really isn't, and no number of branches will help you when that code goes up to production anyway.

git stash is a tool. It in itself is not a pattern, nor an anti-pattern. It is a tool, much like a hammer is a tool. Using a hammer to drive nails is a pattern and using a hammer to drive screws is an anti pattern. Likewise, there are workflows and environments where git stash is the correct tool to use, and workflows and environments where it is wrong.

The 'everyone commit and push to the mainline' workflow is one that works quite reasonably where there are no high risk changes. Its often seen used in svn environments where there is one authoritative central server that has the code.

Git, however, ties to do away with the one central server. Having all the developers doing commit, pull (or rebase if you're into that), push all the time can make a big mess.

The biggest problems come up with you've got something in progress thats broken and you need to work on a priority bug. This means you need to set that work aside for a bit, grab the latest, work on that without having the previous work in progress cause issues with the build that you're trying to do.

For this, git stash would be the proper tool to use.

There is, however, a bigger problem lurking at the heart of this workflow. Is is that all the roles of version control branches are on a single branch. Mainline, development, maintenance, accumulation, and packaging are all on Master. This is a problem. (See Advanced SCM Branching Strategies for more on this approach to branches)

And yes, you did call out that its not a great workflow and that there are problems with it. However, the problem isn't with the tool git stash. The problem is the lack of distinct branching for roles or for incompatible policies.

git stash however, is something that I've used when I've had a situation where I've built for a bit, got in a sticky state that I wasn't sure if it was the right one... so I stashed my changes and then explored another approach to solving the issue. If that worked - great, discard the stuff on the stash and continue on. If the other exploration got stickier then reset back to the previous change and reapply the stash to work on that. The alternative would be to commit, checkout, branch and then either continue on this new branch or go back and reset it. The question is is it really worth putting that into the history when its just something I want to explore for a bit?

git stash isn't an anti pattern. Using git stash as an alternative to branching while everyone commits to the Master is an anti pattern - but not because of git stash.

If you haven't hit it yet, just wait until you have issues with builds, when someone needs to make significant architectural changes to lots of files (and the merge conflicts) or some untested work in progress code that leaks out to production for this anti-pattern to catch up with you.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange

Is using Git Stash as a workflow an antipattern?

About Committing "Broken" Code

Clarifying git rebase

Conclusion

Clarifying `git rebase`