Graph framework

When we started with QI, we knew it had to be modular and that workflows must be data instead of code as was the case with Gamp (see "History" below). I earned my PhD researching Kahn process networks and that pushed me towards a graph-based approach where nodes process data and edges define data-flow. We agreed that it was the right approach, not the least because it would be familiar to at least some of our users as, e.g., Foundry's Nuke uses it.

Before embarking on making this framework, I studied the TPL Dataflow library in some detail, and it left me in doubt about whether it could be adapted to all of our upfront requirements:

  • Workflow description must be serializable independently of the code that executes it.

  • Messages present in the edges must also be serialized / deserialized. (The intention was to be able to cancel workflow execution and resume it at a later point. We solved this in another way though and removed this capability from the code.)

  • Event reporting and interactive queries. The latter sends an event to the front-end and blocks the node until a reply is received.

  • Persistent (serializable) trace of graph execution.

As will become evident from the "Getting started" section, the graph topology is decoupled from the code executing it. This enabled users to create libraries of "workflows templates". The user could load a workflow template from the database, customize parameters (e.g., input path with files to process) and then start the job.

Historical note

Quine was a follow-up to another failed startup where we (the co-founders) first met. In that previous startup, we developed a workflow application called Gamp. It was loved by users, but it had severe technical problems: workflow steps were hard-coded into the application and were inter-twined. Any customization pretty much needed also a few debugging sessions to get it right. Not the least, it was coded in Qt and C++, which was a nightmare to deploy on two different platforms. (Windows ans OSX, OSX being the worst.)

"Gamp" is a norwegian word for a work-horse.

QI had a number of predefined node types that performed various "fixed" functions whereas users were able to create flexible workflows on their own; an example is shown in the figure below.

Simple Workflow
Simple Workflow

While QI was a huge technical advancement over Gamp (hey, no need to compile and deploy a new binary in order to support a slightly different workflow!), the figure hints at messiness of the real-world that did not fit neatly into the graph framework and which we implemented through various ad-hoc "hacks":

  • Users wanted an estimate of global progress ("when will the job finish?"), not just progress of individual nodes for individual files. While parallel execution of nodes extracts performance (e.g.: while file #13 is being copied, file #4 is being transcoded and file #3 is being uploaded), it is difficult to show to the user what's going on.

  • Copy node is special. Users want to immediately know when all files have been copied and verified so that they can reuse the original magazine ASAP.

  • In a "clean" implementation, nodes should not need to know about each other. However, situations occurred where they had to inspect parts of each other's state.

  • Semi-success states. Processing a file at a node might result in an error, but this should not mark the whole job as being in error state.

  Tip

The real world is messy.

Users want automation, but they also want to "feel in control". They are, in fact, NOT in control when the automation overtakes manual tasks, but the illusory perception of control is important from the UI/UX perspective.

Even though QI was a powerful and performant tool that was used on at least 150 professional drama productions over the years, the users were never "happy"; they "tolerated" it because it saved them many hours of work per day. One big reason was its data-oriented UI, which required a "workflow expert" to configure it before use in a production. Another big reason were its lacking reporting capabilities that would provide them with the illusion of control. We were simply unable to find a satisfactory UI/UX solution to users' conflicting requirements:

  • Users want performance and automation with little interaction: this is best achieved with batched, asynchronous and parallel execution.

  • Users want the same kind of intuitive feedback about job state that they're used to getting from interactive, serial processes.

Our users got their job done, many work hours were saved (less overtime!), but they were never confident in QI having performed all the tasks it should. This distrust persisted even after many successful workflow runs because they never felt that they were "in control".

Concepts

The basic concepts are nodes which are connected through ports which are used to exchange messages. A connected set of nodes constitutes a graph. Message receiving is handled implicitly by the base classes, while messages must be sent explicitly to an output port.

Predefined abstract node types
Graph Nodes

A node can have at most one input port for receiving messages, and 0 or more output ports for sending messages. Predefined (abstract) node types are shown in the figure below. A node with multiple input ports is not provided because 1) they weren't needed to define real-world workflows, and 2) it provides multiple, equally-valid, implementation options. ("AND": node fires when there's a message on every port, "OR": node fires when at least one port has a message.)

Every graph must have a single source node and a single drain node (a transform node with no outputs). The latter requirement is somewhat artificial but aids in validation before the graph is started:

  • Every output port of every node must be connected to at least one input port. It is allowed to connect the same output port to many (different) input ports

  • Every input port of every node must have a connection from at leat one output port.

These checks exist to ensure graph termination when there are no more messages to process.

See Also