# Task Execution
WARNING
App Engine and its Task system are still in BETA. Therefore, all information presented in this documentation is subject to potentially breaking changes.
# Execution Flow
To execute an uploaded task in app-engine it follows the following steps:
- Create a task
Runwhich is an isolated context for the task. - Provision the inputs of the task.
- Run the provisioned
Run - The task processes the inputs and generates outputs
- Use APIs to retrieve outputs generated by the task from app-engine storage
# Data Provisioning
Tasks receive their inputs and write their outputs as files and directories within the Task container, specifically in an input folder (e.g., /inputs) and an output folder (e.g., /outputs). The data is provisioned from Cytomine or directly through the input provisioning API (opens new window) to app-engine storage,
# Task Execution
App-engine schedules tasks to run in a configured kubernetes cluster, when Cytomine is deployed using docker compose it uses k3s (opens new window) to run the tasks on the other hand when it is deployed using helm (opens new window) it schedules the tasks to the same kubernetes cluster. Cytomine can be deployed on a single machine using docker compose or into a multi-node kubernetes cluster and based on these two modes of deployment the app-engine can be deployed to manage the provisioning in two ways:
- Local mode: for single machine deployment using docker compose
- Cluster mode: for multi-node deplopyment
# Local Mode
In this mode the app-engine shares data with the task scheduled in k3s using volumes to optimize data sharing speed, the task reads and writes directly from app-engine storage.
To enable this mode make these entries in the docker compose yaml file in cytomine/compose.yml (opens new window):
WARNING
Make sure all four paths are the same otherwise the data sharing fails. the variable AE_DATA_PATH should be an absolute path to the needed data and written in the .env file
- The environment variable
SCHEDULER_RUN_MODEhas two possible valueslocalorcluster, make sure to set it tolocal. - In app-engine make sure
STORAGE_BASE_PATHandRUN_STORAGE_BASE_PATHboth point to the absolute path of app-engine directory within cytomine data directory - In app-engine define a mount bind to the absolute path of app-engine directory within cytomine data directory, make sure both host and container paths are absolute and matching.
- In k3s define a mount bind to the absolute path of app-engine directory within cytomine data directory, make sure both host and container paths are absolute and matching.
# Example
For the app-engine:
app-engine:
image: cytomine/app-engine:latest
restart: unless-stopped
depends_on:
- postgis
- registry
- k3s
volumes:
- ${DATA_PATH:-./data}/app-engine:/data
- ${PWD}/.kube/shared:/root/.kube:ro
- ${AE_DATA_PATH:-./data}/app-engine-shared-inputs:/app-engine-shared-inputs:rw
- ${DATA_PATH:-./data}/app-engine-shared-datasets:/app-engine-shared-datasets:rw
environment:
API_PREFIX: ${APP_ENGINE_API_BASE_PATH:-/app-engine/}
DB_HOST: postgis
DB_NAME: appengine
DB_PASSWORD: password
DB_PORT: 5432
DB_USERNAME: appengine
REGISTRY_URL: http://registry:5000
STORAGE_BASE_PATH: /app-engine-shared-inputs
RUN_STORAGE_BASE_PATH: /app-engine-shared-inputs
REF_STORAGE_BASE_PATH: /app-engine-shared-datasets
SCHEDULER_RUN_MODE: local # values: local, cluster
SCHEDULER_ADVERTISED_URL: http://172.16.238.10:8080
SCHEDULER_REGISTRY_ADVERTISED_URL: 172.16.238.4:5000
SCHEDULER_USE_HOST_NETWORK: true
SCHEDULER_TASKS_NAMESPACE: app-engine-tasks
KUBERNETES_MASTER: https://k3s:6443/
networks:
host_network:
ipv4_address: 172.16.238.10
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
For k3s :
k3s:
image: "rancher/k3s:v1.30.14-rc3-k3s3"
command: server
tmpfs:
- /run
- /var/run
ulimits:
nproc: 65535
nofile:
soft: 65535
hard: 65535
privileged: true
restart: always
environment:
- K3S_TOKEN=65535-65535-65535 # plain text secret!
- K3S_KUBECONFIG_OUTPUT=/output/config
- K3S_KUBECONFIG_MODE=666
volumes:
- k3s-server:/var/lib/rancher/k3s
- ./k3s/registries.yaml:/etc/rancher/k3s/registries.yaml:ro
- ./k3s/serviceaccounts.yaml://var/lib/rancher/k3s/server/manifests/serviceaccounts.yaml:ro
- ${AE_DATA_PATH:-./data}/app-engine-shared-inputs:/app-engine-shared-inputs:rw
- ${DATA_PATH:-./data}/app-engine-shared-datasets:/app-engine-shared-datasets:ro
# This is just so that we get the kubeconfig file out
- .kube/shared:/output/
ports:
- 6443:6443
networks:
host_network:
ipv4_address: 172.16.238.15
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Cluster Mode
When deploying on multiple machines app-engine and task might run on different nodes therefore sharing the same data with data movement is not possible, in this case the init containers pull the inputs from app-engine storage via API calls and once processed push the outputs back to app-engine storage, the data transfer impacts performance due to heavy I/O and data transfer. to enable cluster mode set SCHEDULER_RUN_MODE to cluster.
# Datasets Referencing
WARNING
Currently dataset referencing only works in local mode
Tasks can process large datasets of images or files but provisioning them one by one or sending them over the wire is extremely slow and I/O intensive so using references optimizing input provisioning, to handle a dataset app-engine supports array data type to create arrays of other types, given the dataset is in a directory in the same machine it can be accessed by app-engine and also the task running in k3s by sharing the data using volumes.
WARNING
Not to be confused with app-engine storage, The dataset is simply a directory of images somewhere in the machine and it is not used by app-engine except for referencing
Given that there is a dataset of files or images in a directory somewhere in the machine app-engine creates symbolic links from the dataset items to reference them in both storage and the running task, to set it up do the following:
WARNING
Make sure all three paths are the same otherwise the data sharing fails because the symbolic links require matching absolute paths.
- In app-engine define a mount bind to the absolute path of dataset directory, make sure both host and container paths are absolute and matching.
- In k3s define a mount bind to the absolute path of dataset directory, make sure both host and container paths are absolute and matching.
- In app-engine set
REF_STORAGE_BASE_PATHto the absolute path of dataset directory - local mode is set as descriped above.
# Example
app-engine:
image: cytomine/app-engine:latest
restart: unless-stopped
depends_on:
- postgis
- registry
- k3s
volumes:
- ${DATA_PATH:-./data}/app-engine:/data
- ${PWD}/.kube/shared:/root/.kube:ro
- ${AE_DATA_PATH:-./data}/app-engine-shared-inputs:/app-engine-shared-inputs:rw
- ${DATASET_PATH:-./data}/app-engine-shared-datasets:/app-engine-shared-datasets:ro
environment:
API_PREFIX: ${APP_ENGINE_API_BASE_PATH:-/app-engine/}
DB_HOST: postgis
DB_NAME: appengine
DB_PASSWORD: password
DB_PORT: 5432
DB_USERNAME: appengine
REGISTRY_URL: http://registry:5000
STORAGE_BASE_PATH: /app-engine-shared-inputs
RUN_STORAGE_BASE_PATH: /app-engine-shared-inputs
REF_STORAGE_BASE_PATH: /app-engine-shared-datasets
SCHEDULER_RUN_MODE: local # values: local, cluster
SCHEDULER_ADVERTISED_URL: http://172.16.238.10:8080
SCHEDULER_REGISTRY_ADVERTISED_URL: 172.16.238.4:5000
SCHEDULER_USE_HOST_NETWORK: true
SCHEDULER_TASKS_NAMESPACE: app-engine-tasks
KUBERNETES_MASTER: https://k3s:6443/
networks:
host_network:
ipv4_address: 172.16.238.10
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
For k3s :
k3s:
image: "rancher/k3s:v1.30.14-rc3-k3s3"
command: server
tmpfs:
- /run
- /var/run
ulimits:
nproc: 65535
nofile:
soft: 65535
hard: 65535
privileged: true
restart: always
environment:
- K3S_TOKEN=65535-65535-65535 # plain text secret!
- K3S_KUBECONFIG_OUTPUT=/output/config
- K3S_KUBECONFIG_MODE=666
volumes:
- k3s-server:/var/lib/rancher/k3s
- ./k3s/registries.yaml:/etc/rancher/k3s/registries.yaml:ro
- ./k3s/serviceaccounts.yaml://var/lib/rancher/k3s/server/manifests/serviceaccounts.yaml:ro
- ${AE_DATA_PATH:-./data}/app-engine-shared-inputs:/app-engine-shared-inputs:rw
- ${DATASET_PATH:-./data}/app-engine-shared-datasets:/app-engine-shared-datasets:ro
# This is just so that we get the kubeconfig file out
- .kube/shared:/output/
ports:
- 6443:6443
networks:
host_network:
ipv4_address: 172.16.238.15
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
To provision an input of type array and subtype image just pass the absolute path of the array items as the following:
{
"param_name": "input",
"value": [
{
"index": 0,
"value": "/home/some-user/dataset/iamge__4er5tgf4.tif" // <------ it should match DATASET_PATH variable
},
{
"index": 1,
"value": "/home/some-user/dataset/image__3c61911a.tif" // <------ it should match DATASET_PATH variable
}
]
}
2
3
4
5
6
7
8
9
10
11
12
13
← Others Architecture →