How to run long (>10 hours running time) verification test for continuous integration of scientific software

https://softwareengineering.stackexchange.com/questions/382714

17-02-2021
|

Pregunta

This question continues on a previous question asked here about continuous integration for scientific software. Like the poster of that question, I am developing software for numerical simulations and I am in the process of applying continuous integration (CI).

My main problem is that I have tests that are used for verification that have to run for a long time (>10 hours). Also, those tests require the use high performance computing resources (an HPC cluster).

From what I have read so far, the idea behind CI is to make sure that a merge cannot be executed in the remote repository, if:

the build has failed (this is not a problem)
a verification case is broken by the pull request.

Testing for the build success is possible on the server that is hosting git (Bitbucket, Gitlab, etc), because the code can be compiled rather quickly (order of minutes).

Testing for the verification cases would require the remote git repository server to communicate with the HPC cluster and run simulations there until it is certain that no verification case is broken.

I am using Bitbucket as the remote git server, and I was reading about Bitbucket pipelines, Travis, Jenkins, etc.

I have found information about integrating Bitbucket and Jenkins: here and here.

The problem I see with the use of another server for running the tests is the authentication (security). The users of the HPC cluster are accessing this machine via SSH. The HPC cluster manages the execution of simulations with a help of the workload manager that schedules simulations as jobs that are described with a queue, priority and status.

If I use Jenkins to submit a job on the HPC cluster using the ssh plugin, the script will submit a simulation and exit with 0, if the submission was successful. This does not mean that the test has succeeded, because the simulation can take hours to complete.

Also, if the Jenkins server is to use SSH to connect to the HPC cluster, it needs the public SSH key. I haven't found the way for Bitbucket to communicate this information to Jenkins.

Has anyone tried to use continuous integration with tests that run for hours/days?

Edit : The responses to the question address the fact that one cannot wait for 10 hours for his/her commit to be accepted. This is not the plan: the idea is to run the whole test suite, when a pull request is submitted to the main upstream repository, to make sure that nothing submitted in the pull request breaks what has already been implemented. Same tests can (and should) be run manually by the devs on the HPC cluster before even submitting the pull request. In my field, the pull request means a numerical agorithm has been developed and tested, this happens maybe once in a month.

Solución

I believe you have three issues here.

Executing asynchronous tests
Security Credentials
Quality Gateways

Asynchronous Tests

While not ideal, Jenkins was designed for synchronous work loads. There is definitely a mismatch here.

So my first question is how do you know when the tests have completed for better or worse?

If you have to poll a directory or a service for status and results, simply start that polling as the next step after job submission.
If you receive an event, split the pipeline up into two segments. Have the second segment triggered by receiving that event. You may need to write a small program to map the event sent from your cluster into a web request to trigger the next stage in jenkins.

Security Credentials

Jenkins is capable of storing the key information securely using its internal credential management. This will require setting up a specific key for Jenkins. This may have a plus side in allowing usage data to be collected on, and maybe even limitations to be imposed on the Jenkins specific account on the compute cluster.

Quality Gateways

Continuous Integration pipelines are built around providing feed-back as quickly as possible.

So when you say that testing takes ten hours, I wince because I am hearing ten hours till any results are available. Now taking ten hours to complete testing is fine, indeed there are projects out there that take days to completely test. The point is that Continuous means a stream of information, it needs to be available in a timely manner not ten hours later.

Think of the pipeline as raising a Quality meter. Each stage in the pipeline pushes that meter higher. You want to schedule your stages in such a way that you discover errors early, by essentially pushing the quality meter up as quickly as possible.

This maximises the likelihood that the developer is still thinking about that problem and can quickly fix it. It also maximises the total resource saving possible, by discovering that the build is bad earlier, and not pursuing further verification. Also the pull request can be rejected sooner, as any error is a show stopper.

Test Suites

So try and split your tests up into short running test suites. Anywhere between 15 minutes and 1 hour are good lengths. Much shorter and the overhead becomes burdensome, much longer and the results aren't keeping that continuity of information flow. Try and schedule the shorter tests first (for that quick turnaround), but balance that up against the total time to completion. It may make sense to operate a single queue of faster tests, and parallelise the longer tests alongside.

Smoke Tests

Smoke tests are another avenue for reducing the time to discover an error, by reducing the size of the problem and running a single variation of an algorithm up front. The test will run faster than its heavier weight cousins and will exercise much of the mechanics as well. It is not as complete as the heavier tests, but will locate obvious issues sooner. Again balance these up as they don't replace your current tests, they serve to outline important sections in the hopes of identifying breakages earlier.

Reduce Platform Dependence

While you still need that final verification step on the compute cluster, most errors are detectable as unit tests. The more of the algorithm you can cover-off in unit testing the quicker you can prove confidence that it has not been broken. As a bonus unit tests don't need the compute cluster in order to be run.

Otros consejos

It's great that you are trying to implement some kind of continuous integration, but here it does not seem a good fit. It is not possible to run the complete test suite before merging any changes, at least not without hurting productivity noticeably. You therefore need to consider whether the benefits of these slow tests outweigh their costs (here, possibly literal costs for time on the cluster).

You can then devise a strategy to reduce the costs. For example:

run the full test suite less frequently, e.g. every night or before a release.
use a smaller test suite for CI. The focus of this test suite is not demonstrating that it works, but just catching obvious problems early (a kind of smoke testing).

There are lots of possibilities to create a smaller, faster test suite:

If the test suite consists of multiple problems, only select a subset for the CI tests, possibly at random.
Choose smaller problem sizes. E.g. reduce the resolution or duration of simulations. Use sampled data sets.
Focus on component tests rather than complete runs of the software.

To be clear: it is totally fine not to do any CI testing. That wouldn't be great, but it can be a valid decision if you're aware of the risks. (Primarily, the risk that the software was broken without anyone noticing). But if the only way to make a commit is to wait hours or even days, that might be worse than no tests at all. As long as you still run the complete test suite regularly, you can hedge the risk that things break without notice. While you won't be alerted before the defect is merged, you'll still be alerted close to the cause of the problem.

Your technical difficulties of configuring Jenkins are minor in this regard. Possibly your HPC job submission mechanism implies that while Jenkins is suitable to kick off HPC jobs, Jenkins might not be a suitable platform for aggregating test results or gating branch merges. This might not be a huge problem e.g. if you give up the goal of CI-style gated merges, and instead settle for nightly tests. Then, it might be sufficient if the devs find the results as an email the next morning.

Continuous integration with tests that run for hours/days is not continuous integration.

Seriously you've just taken us back to the days of nightly builds. Now sure there is no LOGICAL reason a test has to complete in a timely manner. But the humans lose the benefits of continuous integration if you do this.

Continuous integration ensures that integration is performed by the same person who made these changes. It forces coders to look at how their stuff impacts other stuff before they break that stuff.

If I can turn stuff in and go on vacation and completely miss the fall out caused by my stuff, stop calling it continuous integration. I don't care what tools you used.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange