Question

Imagine a distributed software system, installed on a group of a few hundred computers (nodes). Nodes are responsible for automatically running scheduled tasks. There are hundreds of tasks, and every task is scheduled to run on about 5-10 nodes. Nodes may stop for days, and may be removed from the system. Every task is defined by one or more source files, and node-specific config files. The code is developed and tested directly on nodes (using remote access), since only these are equipped with the special hardware and have the network context required to run the tasks (building a separate test system would be too expensive). The source files of every task refer to shared source files (libraries), and libraries may refer to other libraries. The dependency tree of tasks and libraries is complicated.

I don't have any experience with distributed version control systems, but I feel that this system could be built around a DVCS. Different libraries, and source files of different tasks, would have their own repository. Every node which runs a given task should have an instance of the repo of that task. The repo of every library, used by at least one task of a node, should also be present on that node. Developers would modify and commit code locally on nodes, and distribute the modifications to repos on other nodes using DVCS techniques.

Question #1 What would be the best approach to distribute code changes to other nodes?

Some possible scenarios:

  1. Developers push their modifications to every other node which has an instance the same repo. (But they may forget/don't have time to do so.)
  2. Nodes automatically pull every change from every other remote repo, and update themselves. (But there may be conflicts.)
  3. For each repo, one of the instances is used as a "reference". Developers push their modifications to this instance, and every other node having an instance automatically pulls from here and updates itself. (But the node having the reference instance may stop.)

Question #2 What would be the best way to handle dependencies?

If more than one tasks (or libraries) refer to the same library, and the referred library has to be modified, one or more referring tasks (or libraries) may stop working (dependency hell). It would be better to stick with the originally referred version, and upgrade to the new one after proper testing. That is, more than one version of the same source file should be present in the same repo, which does not seem possible. Do I have to create a new branch for the new version of the referred library? If yes, how should I upgrade the referring repos?

Thank you for your help.

Was it helpful?

Solution

I don't have any experience with distributed version control systems, but I feel that this system could be built around a DVCS.

Wrong feeling, in common. VCS (SCM) is Version Control System (Source Control Management), i.e - track

  • changes
  • in historical aspect mostly
  • as flat array without complex dependences (some dependences are still taken into account)

You have to see at another category of tools - configuration management software, which handle deploys, policies, complex dependences, conditions etc natively

You can get some iteration to CM with DVCS, but it will be hard work and pale semblance of existing well-established tools as a result

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top