Pipelines
=========

Pipelines act as graph managers and lifecycle coordinators. A pipeline object
knows when to start nodes, how to connect them, and where to store their state.
It transforms a declarative configuration into a running, observable dataflow
system.

At its core, a Pipeline is a configuration interpreter. It doesn't contain
business logic strictly speaking, so its only responsibility is to instantiate
nodes, wire them together according to a graph definition, and manage their
collective lifecycle. This separation of concerns is deliberate: the pipeline
handles topology, while nodes handle data processing and transformation.

The easiest way to interact with Juturna pipelines is through ``JSON`` objects.
The definition of a pipeline graph can be included in a plain ``JSON`` file, and
the library will *compile* it into a pipeline instance. When defining a
pipeline, however, we still need to provide Juturna with a few extra bits of
information.

Configuration items
-------------------

The most basic pipeline configuration file looks something like this:

.. code-block:: json

    {
      "version": "1.0.1",
      "plugins": ["./plugins"],
      "pipeline": {
        "name": "my_awesome_pipeline",
        "id": "1234567890",
        "folder": "./running_pipelines",
        "nodes": [ ],
        "links": [ ]
      }
    }

``version`` refers to the Juturna version in use; this field needs to be
specified, even if it is likely to be removed in the future - as of now, it is
important to keep it to make sure the API version is correct and nothing breaks
apart.

``plugins`` is a collection of directories Juturna will look into when
automatically importing plugin nodes.

``name`` is a symbolic name to assign to the pipeline, and will be used for
logging and static file and artifact management.

``id`` is aunique pipeline identifier, not as handy as the pipeline name but
still required by other tools that might wrap Juturna and manage multiple
pipelines at once. With those tools, pipeline ids are likely to be assigned
automatically in order to prevent overlaps.

``folder`` is the path to the folder where the required pipeline tree will be
created (here is where any files generated by the pipeline are stored).

``nodes`` is the list of nodes composing the pipeline.

``links`` is the list of connections between node pairs.

Once we have this configuration in place, we can go ahead and create the
actual pipeline object.

.. code-block:: python

    import juturna as jt


    pipe = jt.components.Pipeline.from_json('path/to/config.json')

The ``Pipeline`` class constructor accepts a dictionary object:

.. code-block:: python

    import json

    import juturna as jt


    with open('path/to/config.json') as f:
        config = json.load(f)

    pipe = jt.components.Pipeline(config)

Pipeline lifecycle
------------------

Instantiating a pipeline object doesn't really do much. An empty pipeline
container is created, with all the instructions required to build the actual
nodes and linking them.

A pipeline object can be in only three states:

- a ``NEW`` pipeline is freshly created, and can't really do much;
- a ``READY`` pipeline instantiated all the concrete nodes, and acquired
  system and external resources required for its functioning - as the state
  name says, this pipeline is ready to go;
- a ``RUNNING`` pipeline is consuming data sources, processing and sending them,
  so it working (hepefully) just how it is supposed to.

.. image:: ../_static/img/pipeline_lifecycle.svg
   :alt: pipeline
   :width: 80%
   :align: center

There are 4 main methods to manage a pipeline lifecycle:

``warmup()`` triggers node instantiation and linking, moving a pipeline from
``NEW`` to ``READY``. Internally, this calls the warm up method on every node in
the pipeline. Depending on the resources nodes need to acquire (learning
models, third-party services, network connections), calling the ``warmup`` can
be time-consuming.

``start()`` triggers the beginning of the pipeline workflow, moving its state
from ``READY`` to ``RUNNING``. Internally, this calls the start method on every
node in the pipeline.

``stop()`` interrupts the pipeline execution, moving it from ``RUNNING`` to
``READY``. Again, this call is propagated to every node in the pipe.

``destroy`` can be invoked if any kind of custom memory management should be
performed by any of its composing nodes.

Any call sequence that does not respect the state transitions here discussed
will generate an exception:

#. only a new pipe can be warmed up;
#. only a ready pipe can be started;
#. a running pipeline cannot be destroyed.

.. admonition:: Warming up and configuring
    :class: :NOTE:

    A node defines 2 separate methods to get up and running: ``warmup`` and
    ``configure``. Ideally, you want to use the ``configure`` method to acquire
    resources that need to be acquired externally, so depend not only on the
    node implementation (ports, connections), and the ``warmup`` to prepare the
    node internal status for execution.

.. admonition:: Restarting a pipeline might catch fire (|version|-|release|)
    :class: :ATTENTION:

    Please be aware that the pipeline lifecycle and state transitions do not
    currently support restarting a stopped pipeline. When designing your
    workflow, please assume **pipelines cannot be restarted once stopped, only
    destroyed**. If you need to stop a pipeline and later restart it, destroy it
    and create a new pipe instead.