Page tree
Skip to end of metadata
Go to start of metadata

In this how to, we will see how to develop and share a new Cytomine software step by step. A software is an "external" code snippet (e.g. written in Python) that can manipulate Cytomine data, process images and create annotations, ... In Cytomine, such as software will be encapsulated into a reproducible software environment (using container technologies) with versioning (using git tags). In addition, automatic build server configuration will allow to deploy automatically new software releases on Cytomine servers.

The following quick start summary illustrates the process to add such as software to a Cytomine instance:


Step 1 - Requirements

In order to keep track of software version history and for the sake of result reproducibility, a Cytomine software has to be linked to a repository of a version control system (VCS) such as Github (currently it is the only one supported by  Cytomine) and to an image building system such as Docker hub (currently is is the only one supported by Cytomine).

If you don't have experience with Github, please create an account. If you develop your software for an organization (your lab, your company, ...), you can create Github organization (kind of user group) and enroll Github users (e.g. your lab members) into it. Please refer to Github documentation.

In order to be reproducible on any platform, a Cytomine software shares code and execution environment. While the code sharing is done through Github, execution environment is shared on a Docker registry, a platform to share Docker images. Currently, Cytomine only supports Docker Hub as Docker registry.

Please create an account on Docker Hub, create an organization if necessary and link your Github account with your Docker Hub account. Linking these 2 accounts allows Github to automatically ask DockerHub to update the built image when files on Github change. Read the following tutorial if necessary:

Platform usage can be summarized with the following table:

SourceCompiledPlatform typePlatform example
Algorithm code files(depends on language)VCSGithub

Execution environment description (Dockerfile)

Docker imageDocker registryDocker Hub

In this tutorial, Github username will be urubens, linked to Github organization Cytomine-ULiege and Docker Hub organization cytomineuliege (Docker Hub doesn't accept uppercase letters nor hyphen in organization names).


In order to follow this tutorial you also need to install

  1. Cytomine Python client on your machine. Follow the installation instructions on the page Data access using Python client
  2. Docker CE (to test your Dockerfile) on your machine. Follow the installation instructions on the official documentation (see "Install Docker CE").

Step 2 - Github & DockerHub setup

First, create a Github repository to store the source code of your software (if you don't have experience with Github, please read this guide). 

The repository name is important !

In order to be automatically added, Cytomine will crawl your Github userspace (personal or organization) and try to get all its repositories beginning with a prefix to save them as softwares.

After this optional prefix, the software name must be given.

Thus, repository name is "${prefix}${softwareName}".

A Cytomine administrator of the Cytomine instance where software wants to be automatically installed has to trust the userspace or organization space where the software is available. Check [HOWTO] Manage software and servers for administrators

Other option: If you develop the software in a Github userspace A and you want it available in a Github userspace B which is trusted, you can simply fork the repository into B space (and re-create an automated build for the forked repository).


In your Github space, click on "New" and follow the procedure to create a repository for your software.

In this tutorial, we use the prefix "S_" and name our software "SampleDetector". The repository name is then "S_SampleDetector".

Due to DockerHub limitations, your repository name should not be longer than 50 caracters (including organization name) otherwise the linking between Github and DockerHub will fail.

We will now create a link between Github and DockerHub. The goal is to have an automatic DockerHub build when a new Github release (git tag) is published (see step 9).

Go on DockerHub and click on "Create automated build" (1). Then, create an auto-build Github (2). Select right user/organization that you used to create your Github repository and then the repository corresponding to your software.

Now, we can configure the automated build:

  • (1) Choose the right Docker namespace (also called Docker Hub username, in our case corresponding to the lab organization) that you created at step 1. You will need this value in step 4.
  • (2) Choose the Docker registry repository name. In practice, keep the same as Github repository (Dockerhub will convert uppercase letters into lowercase). You will need this value in step 4.
  • (3) Click on "Create".

Go to "Build Setting".

Change the build setting (by default on first screenshot) to automate build by Github tags (releases), as on second screenshot illlustrated below. You have thus to

  • Choose "Tag" instead of "Branch"
  • Empty the name input or insert "/.*/", which is the default value ("This will target all tags" appears in grey)
  • Empty the Docker tag name ("Same as tag" appears in grey)
  • Remove the second build setting by clicking on red minus icon.
  • Save changes.

You have now a link between the software repository on Github and the software execution environment stored on Docker Hub such that when a new code release is published on GitHub, the execution environment will be automatically updated and a new version will be automatically produced.

Step 3 - Get repository locally

Clone the Git repository with the git clone command. In our example

git clone https://github.com/Cytomine-ULiege/S_SampleDetector.git
cd S_SampleDetector

A sample repository containing boilerplate code is available in this repository. In the future, a command line utility will be able to generate this boilerplate code directly for you.

Step 4 - Write descriptor.json

Create a new file named descriptor.json. This file allows you to describe your software, as well as software parameters in a standardized way. The descriptor schema has been inspired by the Boutiques project and some extensions have been made especially for Cytomine. The current Cytomine schema version is cytomine-0.1.

Currently, you have to design all descriptor by hand. However, we plan to adapt the Boutiques command line tool (bosh) to the Cytomine-schema in order to provide a template generator and reduce boilerplate code. 

Refer to the Software JSON descriptor reference for a complete and detailed explanation of each attribute that can be defined in the descriptor.

You need at least to provide attributes name, container-image (with at least image and type), command-line, inputs (a list of input with at least id, name and type) and schema-version.

We can start with main general information:

{
	"name": "SampleDetector", //(1)
	"container-image": {
		"image": "cytomineuliege/s_sampledetector", //(2)
		"type": "singularity" //(3)
	},
	"schema-version": "cytomine-0.1",
	"description": "optional description",
    //...
}

(1) Current Cytomine constraint: Name must be equals to repository name where prefix has been removed.

Example: Repository = "S_MySoftware" → descriptor name = "MySoftware"


(2) Current Cytomine constraint: Image name must be equals to lowercase repository name is VCS (eg. Github, including prefix) preceded by image index (eg. Docker Hub) username.

Example: Repository = "S_MySoftware" and DockerHub username = "cytomineuliege" → image = "cytomineuliege/s_mysoftware"


(3) Current Cytomine constraint: Type must be "singularity" because Cytomine executes all containers with Singularity. Cytomine will convert Docker images into Singularity ones in background before run it.


Then, we have to describe the command line to execute the code. Suppose that we want to run a python script named "run.py" with a parameter alpha. To interact with Cytomine 5 extra parameters are required and are filled by server automatically:

  1. cytomine_host - in order the script communicates with the right Cytomine instance
  2. cytomine_public_key - in order the script communicates with Cytomine
  3. cytomine_private_key - in order the script communicates with Cytomine
  4. cytomine_id_project - in order to create the related job (a job is a software execution in a given project)
  5. cytomine_id_software - in order to create the related job (a job is a software execution in a given project)

In the command line, we have to provide a placeholder for each parameter. These placeholder will be replaced appropriate value (see below). We thus have:

{
    //...
	"command-line": "python run.py CYTOMINE_HOST CYTOMINE_PUBLIC_KEY CYTOMINE_PRIVATE_KEY CYTOMINE_ID_PROJECT CYTOMINE_ID_SOFTWARE ALPHA",
	"inputs": [
		//...
	]
}

As in our example we have 6 parameters (5 mandatory parameters + the alpha parameter), we have to define 6 inputs (see Software JSON descriptor reference for a full example):

{
    //...
	"command-line": "python run.py CYTOMINE_HOST CYTOMINE_PUBLIC_KEY CYTOMINE_PRIVATE_KEY CYTOMINE_ID_PROJECT CYTOMINE_ID_SOFTWARE ALPHA",
	"inputs": [
		{
			"id": "cytomine_host", // This will be the name of software parameter in Cytomine
			"value-key": "@ID", // The placeholder for this parameter in the command-line, @ID will be replaced by value of parameter "id" in uppercase
			"command-line-flag": "--@id", // Will put a command line flag before value in the command line flag, @id will be replaced by value of parameter "id"
			"type": "String",
			"optional": false,
			"set-by-server": true
		},
		{
			//...same kind of objects. See Software JSON descriptor reference for a full example.
		}
	]
}

At execution, the generated command-line example will then be, for example:

python run.py --cytomine_host research.cytomine.be --cytomine_public_key XXX --cytomine_private_key XXX --cytomine_id_project 123 --cytomine_id_software 456 --alpha 3


Currently, you have to check the consistency of your descriptor by your own. In the future, we plan to adapt the Boutiques validator tool to Cytomine schema, allowing to ensure that your descriptor is complete and meaningful.

Step 5 - Publish your descriptor

Once your JSON descriptor is ready, you can publish this descriptor on Cytomine platform using the Python client. This method will create a new software

  • without version number
  • not executable from the web interface

This "fake" software is only executable to some extent by you from your local machine, by launching your script locally. It allows you to test your implementation and interacts with your Cytomine server.

When your software will be tested and ready for production, we will see (Step 9) how to make it as a release ready to be installed on new Cytomine instance, and leading to a new software with a version number (its release) and directly executable from Cytomine web interface.

To publish the descriptor, you need the Python client installed on your local machine (as previously described in Step 1 and Data access using Python client page) and make it available (e.g. source activate cytomine) .

Then in Python console, run (by replacing host, public_key, private_key by your own values):

from cytomine import Cytomine
from cytomine.utilities.descriptor_reader import read_descriptor
with Cytomine("host", "public_key", "private_key") as c:
	software = read_descriptor("descriptor.json")
	print("Not executable software created with ID {}".format(software.id))

where host, public_key and private_key have been replaced by your own values.

Step 6 - Write Dockerfile

If you don't have experience with Docker, please read carefully part 1 (orientation) and part 2 (containers) of Get started guide by Docker. In this tutorial, you will have to create a -very simple- Dockerfile in order to run your algorithm inside a container.

The Dockerfile is a recipe which explain how to create your execution environment. We simply give here the minimum or required explanation to deploy your application. For a more detailed guide, check Dockerfile reference.

Basically, your Dockerfile will start with a FROM instruction which initializes a new build stage and sets the base Image for subsequent instructions. We have defined 3 base images (see Cytomine-software-utils repository for their complete definition):

  • cytomineuliege/software-python3-base: A python environment execution in Python 3 with the Cytomine Python client installed with its dependencies
  • cytomineuliege/software-python2-base: A python environment execution in Python 2.7 with the Cytomine Python client installed with its dependencies (not recommended, please use Python 3 for new projects)
  • cytomineuliege/software-java8-base: A Java environment execution in Java 8 with the Cytomine Java client installed with its dpendencies.

Then add your script dependencies with the RUN instruction, which allows you to make traditional bash command. In the case of a Python software you can simply make RUN pip install my_package.

Create a directory where your files will be saved and add them into it with the ADD instruction.

Finally, the ENTRYPOINT instruction indicates which script is launched at container start up (in practice: your script).

For example, if we want to make a project in Python 3 using Scikit-learn library, the Dockerfile should be similar to

FROM cytomineuliege/software-python3-base:latest
RUN pip install scikit-learn
RUN mkdir -p /app
ADD run.py /app/run.py
ENTRYPOINT ["python", "/app/run.py"]

We will see how to test your Dockerfile in step 8.

Avoid to use Dockerfile instruction WORKDIR because it is not well supported by Singularity. (Keep in mind that you create a Docker image which will be converted in a Singularity image to be able to run it on a HPC)

Step 7 - Write code

Write your own code processing Cytomine data. If you develop in Python, you can have a look at some basic examples to manipulate data. In addition, the Cytomine Python client has several utilities such as CytomineJob to reduce boilerplate code and Cytomine-related stuffs. You can refer to Data access using Python client page and see existing software implementations such as S_Stats-TermArea (a basic script in Groovy performing annotation statistics in image), S_Segment-CV-AdaptThres-Sample (a simple example in Python performing sample segmentation through thresholding), S_Segment-CV-AdaptThres-Object-BI (a Python script performing object detection in big images using adaptive thresholding), S_Classify-ML-RandomSubwET-Train (a machine learning workflow to train object classification model), S_Segmentation-DL-UNet-Predict (a deep learning software to perform segmentation) .

As you write a script that will be runned into a Singularity container (converted from the Docker container), all files that you create during the script execution (e.g. image download from Cytomine, ...) must be in $HOME directory because Singularity runs the container in a user isolated mode, which does not have access to /root and / unlike Docker.

In Python, use the following base directory:

import os
base_path = os.getenv("HOME") # Mandatory for Singularity


Step 8 - Test code

You are highly encouraged to test your code directly inside the Docker image on your own computer. Working like this has several advantages. First, the execution environment you used during your development will be the same as the one where the script will be executed in production by users. Second, it prevents you to install all your required dependencies on your local machine. Taking back our example where Scikit-learn is required (Step 6), you never need to install this library on your machine because it will be installed in the Dockerized execution environment.

To build a Docker image (i.e. taking the Dockerfile recipe and build the executable environment), run

docker build -t cytomineuliege/s_sampledetector .

where cytomineuliege/s_sampledetector is the name of the built image we want to create.

If the command fails, there is an issue in your Dockerfile (you can have a look at our S_ repositories on our GitHub to see examples).

Then, to run an image (i.e. a container) on your own computer, use the command docker run ... followed by the parameter you expect (as they were defined in the software JSON descriptor).

For example, a possible execution could be

docker run -it cytomineuliege/s_sampledetector --cytomine_host research.cytomine.be \
	--cytomine_public_key XXX \
	--cytomine_private_key XXX \
	--cytomine_id_project 123 \
	--cytomine_id_software 456 \
	--alpha 3

Each time you change a file (such as your run.py), you need to re-build your Docker image before running it so last changes are incorporated into your image.

Step 9 - Make new release

When you have finished a modification and checked that it works, commit your changes (the first time: git add your files descriptor.json, run.py, Dockerfile), git commit, and then push them to Github (git push). 

When you consider your new code is tested and deserved to be released, go to Github and create a new release version.

As Github suggests, try to follow semantic versioning.

This will add a new tag to the last commit. As we configured in Step 2  automated build to trigger a new build (with a new Docker Hub tag) for every new Github tag, a new Docker image will be automatically built and published with the same tag as provided on Github. Cytomine instances that trust this Github/Docker Hub repository (see [HOWTO] Manage software and servers for administrators) will add this new version as a new Cytomine software (with the new release version number) and mark previous versions as deprecated (but they will still be executable). Also, this new software release will automatically be available in all projects using previous versions.

Questions / Answers

How to share code between several software ?

Create a library with code that has to be shared (probably a new VCS respository) and import this library in the Dockerfile of software that require it.

For example, create a Github repository for your library (and create a release) then in the Dockerfile of your software git clone the release then pip install your library so that your software will have access to the library functions.

How to update execution environment when a dependency is updated ?

If a dependency is updated, including the ones coming from Docker base image such as the Python client in cytomineuliege/software-python3-base, the environment execution is not updated. You need to make a new release to trigger a new image building. For the sake of reproducibility, any change in code including in dependencies must lead to a new version.

How to save data related to a job (an execution) ?

For key-value pairs, use a property object (Property) with domain be.cytomine.processing.Job and the id of current job.

For files, use an attached file object (AttachedFile) with domain be.cytomine.processing.Job and the id of current job.

  • No labels

2 Comments

  1. In Step 2, specify where we configure Cytomine to specify the repository to check (config file?):

    "

    The repository name is important !

    In order to be automatically added, Cytomine will crawl your Github userspace (personal or organization) and try to get all its repositories beginning with a prefix to save them as softwares. XXXXXX"

  2. A Cytomine administrator of the Cytomine instance where software wants to be automatically installed has to trust the userspace or organization space where the software is available. Check here: [HOWTO] Manage software and servers for administrators

    Other option: If you develop the software in your own Github userspace and you want it available in Cytomine-ULiege which is trusted, you can simply fork the repository into the Cytomine-ULiege space (and re-create an automated build for the forked repo).