Pergunta

I want to test multiple implementations for the same problem. For example I have a problem/code part "X" with implementation "X1" and "X2", and problem "Y" with implementation "Y1", "Y2", and "Y3". Think of them as some small, 10 line functions with slightly different codes inside them. Can I use git somehow to switch between these implementations, so replace the corresponding lines of code? I thought about branching, but I don't know how to apply it this way. I would like to switch between every implementation to a single problem, with any combination of problems. (So "X1" with "Y2", then "X2" with "Y1", etc.) Then I would like to measure the performance and choose the final implementation to use. Can I use git to achieve it?

Foi útil?

Solução

This is an unorthodox use of git, and probably a simple configuration file where you choose which implementation to use would be a better approach.

However, git is actually well suited for "code swapping", so you may for instance test different implementations like this:

$ git init .
$ git checkout -b  impl1
$ cat << EOF > main.py
> print "implementation1"
> EOF
$ git add -A && git commit -m "Implementation 1"
$ git checkout -b master  ## You can probably skip this ...
$ git checkout -b impl2
$ rm main.py && cat << EOF > main.py
> print "implementation2"
> EOF
$ git add -A && git commit -m "Implementation 2"

Now you can switch between implementations like this:

$ git checkout impl1
$ git checkout impl2

And test their performance, or whatever, against each other:

$ git checkout impl1 && time python main.py
Switched to branch 'impl1'
implementation1
python main.py  0,02s user 0,01s system 77% cpu 0,040 total

$ git checkout impl2 && time python main.py
Switched to branch 'impl2'
implementation2
python main.py  0,02s user 0,01s system 91% cpu 0,034 total

Everything seems normal, print takes about the same time to print different strings :)

Outras dicas

I'm actually doing this as part of a deploy to EMR script for machine learning tasks being carried out in Spark. I have a script that fires up a cluster, checks out code, and runs a branch as part of an experiment. The results are then stored to a database. Easy peasy.

To be more specific, one experiment might be testing whether a certain pipeline step should come before or after another one, or perhaps, be left out entirely (an A/B/C test really). Of course, this could be done with a switch/if/else statement inside the code, but the nice part about this is how easy it will be to select the best performing branch and merge it into the production branch, and also the added benefit of being able to fire up multiple clusters in parallel and running A/B/C/etc all simultaneously without much effort at all.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top