Question

My research team and I are using Git to manage our various scripts and notes/publications for our on-going projects, and I'm wondering about what is considered best practice for the situation we are in.

Until now, the scripts we were using were fairly small and did not require too much computing power. We were usually agreeing on what a script should do; one or several people coded it, ran it on their personal machine, and pushed some output file on the master branch. We did not follow a specific workflow and were mostly using the repo like a Dropbox: since we are a small team, someone pushes to the master branch once they are finished with whatever they are doing so everyone else can see it.

Recently we have been needing more computing power and got access to a high performance machine where we can run our scripts for several days. Our workflow is currently the following: we code the scripts and test them on our personal machines for small runs; once we are ready to do some extensive run we put all the necessary files into a production/ directory, and I scp it to the server.

We cannot clone the whole repo on the server because we have a limited amount of space we can use, and we are not comfortable copying unpublished results and other internal files on a machine we share with other people.

We've considered creating another repo just for that, but it does not seem better than what we currently do as in the end I would still need to copy the files from the main repo to the new one. I have also read about sparse checkouts, but as far as I understand the whole repo is still downloaded.

Is there a best practice or standard workflow to do that kind of things?

Was it helpful?

Solution

When I read

We cannot clone the whole repo on the server

the first thing I asked myself was - why would you want to do this? "Git" is an excellent tool for development, but for deploying programs into a production environment it is never my first choice (though I see some people try to use - maybe abuse - it for this purpose).

once we are ready to do some extensive run we put all the necessary files into a production/ directory, and I scp it to the server

And if that is a well working workflow for you, why don't you just automate it? For every individually deployable component, write a deploy script which does exactly what you wrote above - getting the latest version out of the repo, putting exactly the files for deployment - nothing less, nothing more - into a production directory, and then "scp" those files to a specific location at the server.

If required, the script may automate a lot more of the gory details of your deployment process, like backing up the previous script's version on the server, making adjustments to the configuration, writing a log file, adding a tag to your Git repo to mark the deployed version, or put a copy of the deployed version into a second repo for archiving purposes, if you need this.

If your deployment requirements will increase over time, you may consider to get / buy a ready-made deployment tool for your environment. There are several tools available specificially for web-server deployments, but I guess they could also work for your use case. I am under the impression, however, that you currently don't really need such a tool, and that a simple script will serve you better, but maybe in the future, who knows?

OTHER TIPS

Maybe you should see this https://medium.com/faun/beginner-friendly-introduction-to-gitlab-ci-cd-1c80ee5ba0ae. Sorry I can't comment, so I post it here as an answer. On my case, I use intergration on my side on the gitlab, and run remote script using webhook, to PHP server and do the test and deploy to the server (I'm working on web app). I hope it helps.

Licensed under: CC-BY-SA with attribution
scroll top