|Table of Contents|
Package Software in Cytomine-Core
Interactions between components
Data on Cytomine (annotations, properties, terms, ...) generally come from two different sources:
- In an human usage, a user (e.g. an expert) identifies interesting structures on images and adds so-called user annotations or other content using the web client interface.
- It is sometimes very useful to get the help of a program to annotate or create new content automatically.
In the second case, a human user asks for the execution of an available program (Software). This execution (Job) is determined by its arguments (JobParameter), that is, values for parameters required by the program (SoftwareParameter). The job is run on a particular execution platform (ProcessingServer) and interacts with the Cytomine using the REST API, meaning that the job must authenticate itself. A new special user (UserJob) that inherits authorities from the human user that asked for execution is created. To be available in a project, a software must be linked to it (SoftwareProject).
The files required to execute the software (algorithm's code, ...) is retrieved from a VCS (version content system such as Git) provider (e.g Github) (SoftwareUserRepository).
The annotations produced by jobs are called algo annotations and belongs to a user job.
|Software||A representation of an executable program|
Sample Detector: A script using adaptive threshold to identify samples in a whole-slide image
|SoftwareParameter||A parameter of the given software.||Integer imageID|
|SoftwareParameterConstraint||A constraint that an instance of the software parameter (i.e. a job parameter) must satisfy.||imageID > 0|
|ParameterConstraint||A generic constraint to apply to a software parameter.||x > y|
|SoftwareUserRepository||A repository from a provider (e.g: Github) that contain the software code.||A Github account with the code of SampleDetector|
|SoftwareProject||A link to make a given software available in a specific project.||A link between SampleDetector and MyProject|
|Project||A Cytomine project, containing images, annotations, users and jobs.||MyProject: A project with whole-slide images|
|Job||A particular execution of a software launched by a user.||An execution at time T of SampleDetector|
|JobParameter||A value given for a specific parameter for a specific job.||imageID=123456|
|ProcessingServer||An execution server where the job will be launched.||Local-server|
A special user created for a job. All data generated by the job belongs to this special user.
This user inherits authorities from human user that launched job.
In short, it is essential to understand the distinction between a Software and a Job. See a Software as a program or a function and a job, an instanciation of this program or function with arguments (parameter values) which is runned on a particular machine, i.e. a ProcessingServer. It can be the local server (Cytomine server) or a distant one in a data-center, in the cloud...
Software unicity constraint is checked by the couple (name, version) meaning that a same algorithm with several versions results in several software with the same name but a different version. Each software should be tagged with a version number to ensure reproducibility of results. When a new version of a software is available, previous version is set to deprecated and the new release is added into Cytomine as a new software. This new software is automatically added into projects having the previous version.
It is only accepted to have a software with version number in the case where you are developping a new software. In this case, the software won't be executable as the code is stored on your local machine.
|Software name||Software version||Comment|
development version. Not executable.
|MySoft||1.0||the first release of MySoft. Executable.|
|MySoft||1.1||the second release of MySoft. Executable. Deprecate Mysoft (1.0)|
The status that a software can take are:
|Status||Explanation||Web UI representation|
True if software resource has an executable command.
In practice, only under developmeent software shouldn't be executable.
|deprecated||True if software with a given version is not the latest release.|
A Cytomine job lifecycle can be summarized with the following status:
|not launched||True if the job has been asked for execution from Cytomine-Core but Cytomine-software-router doesn't have handled the request yet.|
|waiting||True if the job is being transfered to its processing server.|
|in queue||True if the job is in the queue of its processing server.|
|running||True if the job is running. It is the responsability of software maintainer to set status to running in the running script.|
|success||True if the job has been finished successfully.|
|error||True if an error occurred during execution (in queue or running).|
|killed||True if software has been killed. Killing a software is only possible if it is in queue or running.|
When the status switches from running to success/error/killed, the log (standard output of running job) is linked to the job as an attached file and can be downloaded from web interface.
Software router architecture
The software router is a component external to Cytomine-Core and is responsible to
- automate the adding of new softwares from trusted repositories and manage their new versions
- communicate with distant servers to manage running jobs
Communication between Cytomine-Core and Cytomine-software-router is done through AMQP queues, an open standard protocol for message-oriented middleware.
Software management in software router
One of the major role of the software router is to automate the adding of new algorithms. To accomplish this, a thread is pulling at a defined interval the list of built Docker containers (each container represents an algorithm). Older versions are replaced by newer version detected by the pulling mechanism. If a new version is detected the algorithm descriptor will be retrieved from the corresponding GitHub repository and will be used to add the algorithm (execution command, arguments, …) to the Cytomine interface. The descriptor is an adaptation of existing Boutiques descriptors with additional features proper to Cytomine.
The images used to run a job are pulled from Docker Hub and converted as a Singularity image. This technology allows you to benefit from containers in the HPC world. The images will be refreshed at the same time as the algorithms and will be transferred to a specific processing server via SCP.
Job management in software router
The execution of algorithms can be done by using a default processing server or another one. The architecture provides a slurm container instance for local execution. For each execution demand, the request will be sent from the Cytomine-Core to the software router through a specific queue associated to the chosen processing server. The execution command will be transformed by a processing method to be understandable by a specific type of processing server (GPU, CPU-only, …). The log file will be retrieved via SCP and be added as an attached file to the job domain.
The package ProcessingMethod contains all the implemented processing methods associated with processing servers. The role of this package is to build a specific command understandable by the targeted processing server. Currently, only SLURM processing method is implemented, meaning that the router is able to convert a Cytomine job execution request into a regular SLURM job. The package has been designed to easily add implementation for other processing methods such as other job schedulers or related (Kubernetes, ...)