"Fossies" - the Fresh Open Source Software Archive

Member "UXP-2019.10.31/taskcluster/docs/taskgraph.rst" (30 Oct 2019, 11815 Bytes) of package /linux/www/UXP-2019.10.31.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming markdown format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

TaskGraph Mach Command

The task graph is built by linking different kinds of tasks together, pruning out tasks that are not required, then optimizing by replacing subgraphs with links to already-completed tasks.

Concepts

Kinds

Kinds are the focal point of this system. They provide an interface between the large-scale graph-generation process and the small-scale task-definition needs of different kinds of tasks. Each kind may implement task generation differently. Some kinds may generate task definitions entirely internally (for example, symbol-upload tasks are all alike, and very simple), while other kinds may do little more than parse a directory of YAML files.

A kind.yml file contains data about the kind, as well as referring to a Python class implementing the kind in its implementation key. That implementation may rely on lots of code shared with other kinds, or contain a completely unique implementation of some functionality.

The full list of pre-defined keys in this file is:

implementation

Class implementing this kind, in the form <module-path>:<object-path>. This class should be a subclass of taskgraph.kind.base:Kind.

kind-dependencies

Kinds which should be loaded before this one. This is useful when the kind will use the list of already-created tasks to determine which tasks to create, for example adding an upload-symbols task after every build task.

Any other keys are subject to interpretation by the kind implementation.

The result is a nice segmentation of implementation so that the more esoteric in-tree projects can do their crazy stuff in an isolated kind without making the bread-and-butter build and test configuration more complicated.

Dependencies

Dependencies between tasks are represented as labeled edges in the task graph. For example, a test task must depend on the build task creating the artifact it tests, and this dependency edge is named 'build'. The task graph generation process later resolves these dependencies to specific taskIds.

Decision Task

The decision task is the first task created when a new graph begins. It is responsible for creating the rest of the task graph.

The decision task for pushes is defined in-tree, in .taskcluster.yml. That task description invokes mach taskcluster decision with some metadata about the push. That mach command determines the optimized task graph, then calls the TaskCluster API to create the tasks.

Note that this mach command is not designed to be invoked directly by humans. Instead, use the mach commands described below, supplying parameters.yml from a recent decision task. These commands allow testing everything the decision task does except the command-line processing and the queue.createTask calls.

Graph Generation

Graph generation, as run via mach taskgraph decision, proceeds as follows:

  1. For all kinds, generate all tasks. The result is the "full task set"
  2. Create dependency links between tasks using kind-specific mechanisms. The result is the "full task graph".
  3. Select the target tasks (based on try syntax or a tree-specific specification). The result is the "target task set".
  4. Based on the full task graph, calculate the transitive closure of the target task set. That is, the target tasks and all requirements of those tasks. The result is the "target task graph".
  5. Optimize the target task graph based on kind-specific optimization methods. The result is the "optimized task graph" with fewer nodes than the target task graph.
  6. Create tasks for all tasks in the optimized task graph.

Transitive Closure

Transitive closure is a fancy name for this sort of operation:

The effect is this: imagine you start with a linux32 test job and a linux64 test job. In the first round, each test task depends on the test docker image task, so add that image task. Each test also depends on a build, so add the linux32 and linux64 build tasks.

Then repeat: the test docker image task is already present, as are the build tasks, but those build tasks depend on the build docker image task. So add that build docker image task. Repeat again: this time, none of the tasks in the set depend on a task not in the set, so nothing changes and the process is complete.

And as you can see, the graph we've built now includes everything we wanted (the test jobs) plus everything required to do that (docker images, builds).

Optimization

The objective of optimization to remove as many tasks from the graph as possible, as efficiently as possible, thereby delivering useful results as quickly as possible. For example, ideally if only a test script is modified in a push, then the resulting graph contains only the corresponding test suite task.

A task is said to be "optimized" when it is either replaced with an equivalent, already-existing task, or dropped from the graph entirely.

A task can be optimized if all of its dependencies can be optimized and none of its inputs have changed. For a task on which no other tasks depend (a "leaf task"), the optimizer can determine what has changed by looking at the version-control history of the push: if the relevant files are not modified in the push, then it considers the inputs unchanged. For tasks on which other tasks depend ("non-leaf tasks"), the optimizer must replace the task with another, equivalent task, so it generates a hash of all of the inputs and uses that to search for a matching, existing task.

In some cases, such as try pushes, tasks in the target task set have been explicitly requested and are thus excluded from optimization. In other cases, the target task set is almost the entire task graph, so targetted tasks are considered for optimization. This behavior is controlled with the optimize_target_tasks parameter.

Action Tasks

Action Tasks are tasks which help you to schedule new jobs via Treeherder's "Add New Jobs" feature. The Decision Task creates a YAML file named action.yml which can be used to schedule Action Tasks after suitably replacing {{decision_task_id}} and {{task_labels}}, which correspond to the decision task ID of the push and a comma separated list of task labels which need to be scheduled.

This task invokes mach taskgraph action-task which builds up a task graph of the requested tasks. This graph is optimized using the tasks running initially in the same push, due to the decision task.

So for instance, if you had already requested a build task in the try command, and you wish to add a test which depends on this build, the original build task is re-used.

Action Tasks are currently scheduled by [pulse_actions](https://github.com/mozilla/pulse_actions). This feature is only present on try pushes for now.

Mach commands

A number of mach subcommands are available aside from mach taskgraph decision to make this complex system more accesssible to those trying to understand or modify it. They allow you to run portions of the graph-generation process and output the results.

mach taskgraph tasks

Get the full task set

mach taskgraph full

Get the full task graph

mach taskgraph target

Get the target task set

mach taskgraph target-graph

Get the target task graph

mach taskgraph optimized

Get the optimized task graph

Each of these commands taskes a --parameters option giving a file with parameters to guide the graph generation. The decision task helpfully produces such a file on every run, and that is generally the easiest way to get a parameter file. The parameter keys and values are described in parameters; using that information, you may modify an existing parameters.yml or create your own.

Task Parameterization

A few components of tasks are only known at the very end of the decision task -- just before the queue.createTask call is made. These are specified using simple parameterized values, as follows:

{"relative-datestamp": "certain number of seconds/hours/days/years"}

Objects of this form will be replaced with an offset from the current time just before the queue.createTask call is made. For example, an artifact expiration might be specified as {"relative-timestamp": "1 year"}.

{"task-reference": "string containing <dep-name>"}

The task definition may contain "task references" of this form. These will be replaced during the optimization step, with the appropriate taskId for the named dependency substituted for <dep-name> in the string. Multiple labels may be substituted in a single string, and <<> can be used to escape a literal <.

Taskgraph JSON Format

Task graphs -- both the graph artifacts produced by the decision task and those output by the --json option to the mach taskgraph commands -- are JSON objects, keyed by label, or for optimized task graphs, by taskId. For convenience, the decision task also writes out label-to-taskid.json containing a mapping from label to taskId. Each task in the graph is represented as a JSON object.

Each task has the following properties:

task_id

The task's taskId (only for optimized task graphs)

label

The task's label

attributes

The task's attributes

dependencies

The task's in-graph dependencies, represented as an object mapping dependency name to label (or to taskId for optimized task graphs)

task

The task's TaskCluster task definition.

kind_implementation

The module and the class name which was used to implement this particular task. It is always of the form <module-path>:<object-path>

The results from each command are in the same format, but with some differences in the content:

The output of the mach taskgraph commands are suitable for processing with the jq utility. For example, to extract all tasks' labels and their dependencies:

jq 'to_entries | map({label: .value.label, dependencies: .value.dependencies})'