What are real-world examples of Gradle's dependency graph?

https://stackoverflow.com/questions/2414281

gradle

19-09-2019
|

Question

As noted in the documentation, Gradle uses a directed acyclic graph (DAG) to build a dependency graph. From my understanding, having separate cycles for evaluation and execution is a major feature for a build tool. e.g. The Gradle doc states that this enables some features that would otherwise be impossible.

I'm interested in real-world examples that illustrate the power of this feature. What are some use-cases for which a dependency graph is important? I'm especially interested in personal stories from the field, whether with Gradle or a similarly equipped tool.

I am making this 'community wiki' from the outset, as it will be difficult to assess a 'correct' answer.

Solution

This provocative question provided motivation for finally looking into Gradle. I still haven't used it, so I can only offer analysis noted while browsing the docs, not personal stories.

My first question was why a Gradle task dependency graph is guaranteed to be acyclic. I didn't find the answer to that, but a contrary case is easily constructed, so I'll presume that cycle detection is a validation that is run when the graph is built, and the build fails prior to execution of the first task if there are illegal cyclical dependencies. Without first building the graph, this failure condition might not be discovered until the build is nearly complete. Additionally, the detection routine would have to run after every task was executed, which would be very inefficient (as long as the graph was built incrementally and available globally, a depth-first search would only be required to find a starting point, and subsequent cycle evaluations would require minimal work, but the total work would still be greater than doing a single reduction on the entire set of relations at the outset). I'd chalk up early detection as a major benefit.

A task dependency can be lazy (see: 4.3 Task dependencies, and a related example in 13.14). Lazy task dependencies could not be evaluated correctly until the entire graph is built. The same is true for transitive (non-task) dependency resolution, which could cause innumerable problems, and require repeated recompilations as additional dependencies are discovered and resolved (also requiring repeated requests to a repository). The task rules feature (13.8) wouldn't be possible either. These issues, and likely many others, can be generalized by considering that Gradle uses a dynamic language, and can dynamically add and modify tasks, so prior to a first-pass evaluation, results could be non-deterministic since the execution path is built and modified during runtime, thus, different sequences of evaluation could produce arbitrarily different results if there are dependencies or behavioral directives that are unknown until later, because they haven't been created yet. (This may be worthy of investigating with some concrete examples. If it is true, then even two passes would not always be sufficient. If A -> B, B -> C, where C changes the behavior of A so that it no longer depends on B, then you have a problem. I hope there are some best practices on restricting metaprogramming with non-local scope, to not allow it in arbitrary tasks. A fun example would be a simulation of a time travel paradox, where a grandchild kills his grandfather or marries his grandmother, vividly illustrating some practical ethical principles!)

It can enable better status and progress reporting on a currently executing build. A TaskExecutionListener provides before/after hooks to the processing of each task, but without knowing the number of remaining tasks, there isn't much it could say about status other than "6 tasks completed. About to execute task foo." Instead, you could initialize a TaskExecutionListener with the number of tasks in gradle.taskGraph.whenReady, and then attach it to the TaskExecutionGraph. Now it could provide information to enable report details like "6 of 72 tasks completed. Now executing task foo. Estimated time remaining: 2h 38m." That would be useful to display on a console for a continuous integration server, or if Gradle was being used to orchestrate a large multi-project build and time estimates were crucial.

As pointed out by Jerry Bullard, the evaluation portion of the lifecycle is critical to determining the execution plan, which provides information about the environment, since the environment is partially determined by the execution context (Example 4.15 in the Configure by DAG section). Additionally, I could see this being useful for execution optimization. Independent subpaths could be safely handed to different threads. Walking algorithms for execution can be less memory intensive if they aren't naive (my intuition says that always walking the path with the most subpaths is going to lead to a larger stack than always preferring paths with the least subpaths).

An interesting use of this might be a situation where many components of a system are initially stubbed out to support demos and incremental development. Then during development, rather than updating the build configuration as each component becomes implemented, the build itself could determine if a subproject is ready for inclusion yet (perhaps it tries to grab the code, compile it, and run a pre-determined test suite). If it is, the evaluation stage would reveal this, and the appropriate tasks would be included, otherwise, it selects the tasks for the stubs. Perhaps there's a dependency on an Oracle database that isn't available yet, and you're using an embedded database in the meantime. You could let the build check the availability, transparently switch over when it can, and tell you that it switched databases, rather than you telling it. There could be a lot of creative uses along those lines.

Gradle looks awesome. Thanks for provoking some research!

OTHER TIPS

An example from the same documentation illustrates the power of this approach:

As we describe in full detail later (See Chapter 30, The Build Lifecycle) Gradle has a configuration phase and an execution phase. After the configuration phase Gradle knows all tasks that should be executed. Gradle offers you a hook to make use of this information. A use-case for this would be to check if the release task is part of the tasks to be executed. Depending on this you can assign different values to some variables.

In other words, you can hook into the build process early, so you can alter its course as needed. If some actual build work had already been performed, it may be too late to change.

I'm evaluating different build systems now and with gradle I managed to add ugly code that enumerates all tasks of type 'jar' and changes them so, that every jar manifest includes 'Build-Number' attribute (which is used later to compose final file names):

gradle.taskGraph.whenReady {
    taskGraph ->
    taskGraph.getAllTasks().findAll {
        it instanceof org.gradle.api.tasks.bundling.Jar
    }.each {
        it.getManifest().getAttributes().put('Build-Number', project.buildVersion.buildNumber)
    }
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow