# Task Execution

WARNING

App Engine and its Task system are still in BETA. Therefore, all information presented in this documentation is subject to potentially breaking changes.

# Execution Flow

To execute an uploaded task in app-engine it follows the following steps:

Create a task Run which is an isolated context for the task.
Provision the inputs of the task.
Run the provisioned Run
The task processes the inputs and generates outputs
Use APIs to retrieve outputs generated by the task from app-engine storage

# Data Provisioning

Tasks receive their inputs and write their outputs as files and directories within the Task container, specifically in an input folder (e.g., /inputs) and an output folder (e.g., /outputs). The data is provisioned from Cytomine or directly through the input provisioning API (opens new window) to app-engine storage,

# Task Execution

App-engine schedules tasks to run in a configured kubernetes cluster, when Cytomine is deployed using docker compose it uses k3s (opens new window) to run the tasks on the other hand when it is deployed using helm (opens new window) it schedules the tasks to the same kubernetes cluster. Cytomine can be deployed on a single machine using docker compose or into a multi-node kubernetes cluster and based on these two modes of deployment the app-engine can be deployed to manage the provisioning in two ways:

Local mode: for single machine deployment using docker compose
Cluster mode: for multi-node deplopyment

# Local Mode

In this mode the app-engine shares data with the task scheduled in k3s using volumes to optimize data sharing speed, the task reads and writes directly from app-engine storage.

To enable this mode make these entries in the docker compose yaml file in cytomine/compose.yml (opens new window):

WARNING

Make sure all four paths are the same otherwise the data sharing fails. the variable AE_DATA_PATH should be an absolute path to the needed data and written in the .env file

The environment variable SCHEDULER_RUN_MODE has two possible values local or cluster, make sure to set it to local.
In app-engine make sure STORAGE_BASE_PATH and RUN_STORAGE_BASE_PATH both point to the absolute path of app-engine directory within cytomine data directory
In app-engine define a mount bind to the absolute path of app-engine directory within cytomine data directory, make sure both host and container paths are absolute and matching.
In k3s define a mount bind to the absolute path of app-engine directory within cytomine data directory, make sure both host and container paths are absolute and matching.

# Example

For the app-engine:

app-engine:
    image: cytomine/app-engine:latest
    restart: unless-stopped
    depends_on:
      - postgis
      - registry
      - k3s
    volumes:
      - ${DATA_PATH:-./data}/app-engine:/data
      - ${PWD}/.kube/shared:/root/.kube:ro
      - ${AE_DATA_PATH:-./data}/app-engine-shared-inputs:/app-engine-shared-inputs:rw
      - ${DATA_PATH:-./data}/app-engine-shared-datasets:/app-engine-shared-datasets:rw
    environment:
      API_PREFIX: ${APP_ENGINE_API_BASE_PATH:-/app-engine/}
      DB_HOST: postgis
      DB_NAME: appengine
      DB_PASSWORD: password
      DB_PORT: 5432
      DB_USERNAME: appengine
      REGISTRY_URL: http://registry:5000
      STORAGE_BASE_PATH: /app-engine-shared-inputs
      RUN_STORAGE_BASE_PATH: /app-engine-shared-inputs
      REF_STORAGE_BASE_PATH: /app-engine-shared-datasets
      SCHEDULER_RUN_MODE: local # values: local, cluster
      SCHEDULER_ADVERTISED_URL: http://172.16.238.10:8080
      SCHEDULER_REGISTRY_ADVERTISED_URL: 172.16.238.4:5000
      SCHEDULER_USE_HOST_NETWORK: true
      SCHEDULER_TASKS_NAMESPACE: app-engine-tasks
      KUBERNETES_MASTER: https://k3s:6443/
    networks:
      host_network:
        ipv4_address: 172.16.238.10

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

For k3s :

k3s:
    image: "rancher/k3s:v1.30.14-rc3-k3s3"
    command: server
    tmpfs:
      - /run
      - /var/run
    ulimits:
      nproc: 65535
      nofile:
        soft: 65535
        hard: 65535
    privileged: true
    restart: always
    environment:
      - K3S_TOKEN=65535-65535-65535 # plain text secret!
      - K3S_KUBECONFIG_OUTPUT=/output/config
      - K3S_KUBECONFIG_MODE=666
    volumes:
      - k3s-server:/var/lib/rancher/k3s
      - ./k3s/registries.yaml:/etc/rancher/k3s/registries.yaml:ro
      - ./k3s/serviceaccounts.yaml://var/lib/rancher/k3s/server/manifests/serviceaccounts.yaml:ro
      - ${AE_DATA_PATH:-./data}/app-engine-shared-inputs:/app-engine-shared-inputs:rw
      - ${DATA_PATH:-./data}/app-engine-shared-datasets:/app-engine-shared-datasets:ro
      # This is just so that we get the kubeconfig file out
      - .kube/shared:/output/
    ports:
      - 6443:6443
    networks:
      host_network:
        ipv4_address: 172.16.238.15

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

# Cluster Mode

When deploying on multiple machines app-engine and task might run on different nodes therefore sharing the same data with data movement is not possible, in this case the init containers pull the inputs from app-engine storage via API calls and once processed push the outputs back to app-engine storage, the data transfer impacts performance due to heavy I/O and data transfer. to enable cluster mode set SCHEDULER_RUN_MODE to cluster.

# Datasets Referencing

WARNING

Currently dataset referencing only works in local mode

Tasks can process large datasets of images or files but provisioning them one by one or sending them over the wire is extremely slow and I/O intensive so using references optimizing input provisioning, to handle a dataset app-engine supports array data type to create arrays of other types, given the dataset is in a directory in the same machine it can be accessed by app-engine and also the task running in k3s by sharing the data using volumes.

WARNING

Not to be confused with app-engine storage, The dataset is simply a directory of images somewhere in the machine and it is not used by app-engine except for referencing

Given that there is a dataset of files or images in a directory somewhere in the machine app-engine creates symbolic links from the dataset items to reference them in both storage and the running task, to set it up do the following:

WARNING

Make sure all three paths are the same otherwise the data sharing fails because the symbolic links require matching absolute paths.

In app-engine define a mount bind to the absolute path of dataset directory, make sure both host and container paths are absolute and matching.
In k3s define a mount bind to the absolute path of dataset directory, make sure both host and container paths are absolute and matching.
In app-engine set REF_STORAGE_BASE_PATH to the absolute path of dataset directory
local mode is set as descriped above.

# Example

app-engine:
    image: cytomine/app-engine:latest
    restart: unless-stopped
    depends_on:
      - postgis
      - registry
      - k3s
    volumes:
      - ${DATA_PATH:-./data}/app-engine:/data
      - ${PWD}/.kube/shared:/root/.kube:ro
      - ${AE_DATA_PATH:-./data}/app-engine-shared-inputs:/app-engine-shared-inputs:rw
      - ${DATASET_PATH:-./data}/app-engine-shared-datasets:/app-engine-shared-datasets:ro
    environment:
      API_PREFIX: ${APP_ENGINE_API_BASE_PATH:-/app-engine/}
      DB_HOST: postgis
      DB_NAME: appengine
      DB_PASSWORD: password
      DB_PORT: 5432
      DB_USERNAME: appengine
      REGISTRY_URL: http://registry:5000
      STORAGE_BASE_PATH: /app-engine-shared-inputs
      RUN_STORAGE_BASE_PATH: /app-engine-shared-inputs
      REF_STORAGE_BASE_PATH: /app-engine-shared-datasets
      SCHEDULER_RUN_MODE: local # values: local, cluster
      SCHEDULER_ADVERTISED_URL: http://172.16.238.10:8080
      SCHEDULER_REGISTRY_ADVERTISED_URL: 172.16.238.4:5000
      SCHEDULER_USE_HOST_NETWORK: true
      SCHEDULER_TASKS_NAMESPACE: app-engine-tasks
      KUBERNETES_MASTER: https://k3s:6443/
    networks:
      host_network:
        ipv4_address: 172.16.238.10

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

For k3s :

k3s:
    image: "rancher/k3s:v1.30.14-rc3-k3s3"
    command: server
    tmpfs:
      - /run
      - /var/run
    ulimits:
      nproc: 65535
      nofile:
        soft: 65535
        hard: 65535
    privileged: true
    restart: always
    environment:
      - K3S_TOKEN=65535-65535-65535 # plain text secret!
      - K3S_KUBECONFIG_OUTPUT=/output/config
      - K3S_KUBECONFIG_MODE=666
    volumes:
      - k3s-server:/var/lib/rancher/k3s
      - ./k3s/registries.yaml:/etc/rancher/k3s/registries.yaml:ro
      - ./k3s/serviceaccounts.yaml://var/lib/rancher/k3s/server/manifests/serviceaccounts.yaml:ro
      - ${AE_DATA_PATH:-./data}/app-engine-shared-inputs:/app-engine-shared-inputs:rw
      - ${DATASET_PATH:-./data}/app-engine-shared-datasets:/app-engine-shared-datasets:ro
      # This is just so that we get the kubeconfig file out
      - .kube/shared:/output/
    ports:
      - 6443:6443
    networks:
      host_network:
        ipv4_address: 172.16.238.15

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

To provision an input of type array and subtype image just pass the absolute path of the array items as the following:

{
    "param_name": "input",
    "value": [
    {
        "index": 0,
        "value": "/home/some-user/dataset/iamge__4er5tgf4.tif" // <------ it should match DATASET_PATH variable
    },
    {
        "index": 1,
        "value": "/home/some-user/dataset/image__3c61911a.tif" // <------ it should match DATASET_PATH variable
    }
]
}

1
2
3
4
5
6
7
8
9
10
11
12
13

← Others Architecture →