How do we avoid development files in the release?

https://softwareengineering.stackexchange.com/questions/366771

30-01-2021
|

题

Situation:
Our Python project is hosted at GitHub. The actual release should only contain handful of files, but or project also contains several non-release files that are required for testing and ironically for packaging, etc.
We roughly follow the GitFlow model, so develop is our mainline that we build on and stabilize before merging to a master branch from where the release is made.

Problem:
When we make a release, that release contains not only the actual release files but also contains all the meta and testing stuff. This happens because GitHub's way of creating releases is to simply ZIP up a snapshot of a branch.

Target:
I want the release to contain only the relevant files: 1 Python script, 1 plugin file, 1 config file, 1 license file, 1 readme file.
I want to exclude all test data, test config scripts, meta git files, meta GitHub files, etc.

Ideas:

Our master branch could contain only those relevant files. Care would have to be taken to avoid "contaminating" it with unwanted files during merges from develop.
Drawback: This prevents us from doing automated verification testing on the master branch.
We could introduce a release branch (true to GitFlow) and do the verification testing there, then merge only the relevant files to a "clean" master branch.
Drawback: Having more branches adds more complexity to this small project.
I'm very curious to hear your ideas! What have I completely overlooked?

解决方案

When we make a release, that release contains not only the actual release files but also contains all the meta and testing stuff. This happens because GitHub's way of creating releases is to simply ZIP up a snapshot of a branch.

That's not entirely correct, as far as I can tell. In GitHub, a release is a tag pointing to a specific commit - and while GitHub provides an option to download the source code corresponding to this tag as a .zip file from the release page, this is not the intended way to distribute the application.

Instead, you're expected to attach one or more assets to the release - which means preparing the packages yourself (or using a build/release script) and attaching them either via the GitHub API or manually from the release creation webpage.

The tag itself and the option to download sources are only there if you want to inspect the source code at the point of that specific release. They're not intended to be consumable by end user - think about what would happen if the project is written in a compiled language, such as C# or Java. In that case, the user would not even be able to execute the release, since the build artifacts are not in the repository at all.

其他提示

You are confusing your source code with your deliverables.

All your source code, all your documentation, all your test code, should all be in your git repository. And then you should have a build script that extracts all the deliverables into one directory (with subdirectories obviously), and that gets shipped to the customer.

Let's say for a new feature your deliverable should contain another python file implementing that feature, and you have two test scripts and four files with test data. You update the script building the deliverable by adding a command to copy that one python file into the deliverable, then you submit your python file, your two test scripts, your four files with test data, and your modified build script to git. 7 new and one modified file in git, one additional file sent to the customer.

The idea of the git repository is that I can start with your company, take my empty computer, clone your git repository, and have everything that I need to develop.

Your “release“ isn't a ZIP archive of your sources, but should be

a specific state of your project, and
any number of build artefacts that can be installed/deployed.

Your source code at some commit corresponds to a release, but it's not the same thing as the release. Instead, you'll have a build process that creates artefacts (e.g. packages, dists, wheels, binaries) from the source. It is the job of the build process to include only the correct files.

Git branches are useful to represent different threads of development that may branch off and merge again. It is not possible (without excessive effort) to have some files or changes that only live in specific branches. Instead, the code should always be in a buildable state, on whatever branch it lives.

GitHubs built-in “releases” page is mostly a pretty view of your Git tags. You don't have to use this. GH doesn't dictate your workflow. However, it is a low-effort way to get a “downloads” site for open source projects. Using GH releases, you can

create or use a Git tag (e.g. v1.2.3)
describe the release (often a changelog)
add any kind of download

GH adds the source archive as a default downloadable, but you can add your own. For a Python project, this could be a wheel or sdist. Note that GH offers an API that allows you to automate the upload as part of your build script. Some 3rd party providers like Travis CI have built-in integrations.

I'd like to elaborate a bit on what @maciej-stachowski said:

That's not entirely correct, as far as I can tell. In GitHub, a release is a tag pointing to a specific commit - and while GitHub provides an option to download the source code corresponding to this tag as a .zip file from the release page

This is not entirely correct either. GitHub uses git archive to create archive files (.zip, .tar.gz) from the repository, and offers you to download them. The result of the git archive operation is what you get when you either click on the Download ZIP button on the current master or for a specific tag.

Now the nice thing about git archive is, that you can actually control what goes into the archive, via the .gitattributes file. While this method doesn't allow you to run any code (so you cannot add files not in your repository - neither by generating artifacts nor by downloading them), it allows you to filter out files you don't want in your source archives. A typical .gitattributes file looks like

# exclude .gitignore and similar from the generated tarball
.git*           export-ignore
# exclude CI configurations (useless outside the repo)
.travis-ci.yml  export-ignore
.appveyor.yml   export-ignore

In general I find it good practice to exclude things like .gitignore from the source-tarballs; it helps tremendously if downstream packagers are using git for packaging; they often do want to see all the build artifacts that you try to hide with .gitignore; also the various CI-configurations are normally only useful in the context of your git provider (e.g. GitHub).

许可以下： CC-BY-SA 和归因

不隶属于 softwareengineering.stackexchange