This page describes:
- A overview of the available algorithms
- A description of the main concepts of the image analysis workflow with Cytomine
- How to install the CYTOMINE "Data Mining Python Package"
- Code examples to run image analysis algorithms
- How to contribute with novel algorithms
Overview of available algorithms
Algorithms are available here: https://github.com/cytomine/Cytomine-python-datamining/tree/master/cytomine-applications
They are automatically installed by our deployment procedure in the software_router container. In addition, you can also install the Cytomine-DataMining package on remote machines (see instructions).
Image semantic segmentation based on multiple output classification using sliding window and extremely randomized trees (see Dumont et al. VISAPP 2009)
Example: tumor/necrosis segmentation in H&E/IHC histology.
This module aims at learning and applying pixel classifiers for semantic segmentation of images (e.g. tumor contouring). It is based on a previously published algorithm Dumont et al. VISAPP 2009. It is decomposed into a training phase (model builder) and a prediction phase (prediction).
For a given list of projects (cytomine_annotation_projects), the training phase downloads from Cytomine-Core cropped image regions for each annotation (minimal bounding box fully containing the annotation geometry) and its corresponding binary mask (in PNG format with alpha channel). Positive examples are selected using the cytomine_predict_terms variable, negative examples are other annotations that are not included into cytomine_predict_terms neither cytomine_excluded_terms. Then, the method builds a segmentation (pixel classifier) model using random, fixed-size (pyxit_target_width and pyxit_target_height) subwindow extraction and extremely randomized trees with multiple outputs. The tree model is built using the same parameters than for classification. It is saved locally on the running server filesystem. The prediction phase applies such a binary segmentation model on individual tiles of a gigapixel image. A connected components labeling algorithm is then applied on the running server to extract contours from each tile pixel classifier mask. These contours are then translated into valid geometric shapes for spatial database by using hit-or-miss transforms. Point coordinates of valid geometries found in each tile are then converted to real coordinates (in the whole slide image) and communicated to Cytomine-Core that internally translates the HTTP requests into spatial insertion queries (these annotations are added in the UserJob layer). Finally, once all tiles are processed, these contours are eventually merged by spatial union queries over tiles to take into account the fact that a single region of interest (e.g. a tumor islet) may actually overlap several tiles. Both training and prediction phases can be applied at any resolution level (cytomine_zoom_level) of gigapixel images with seamless conversion of coordinates. See Cytomine User Guide for a step-by-step example on toy data.
Object retrieval based on random subwindows and vectors of random tests (see Marée et al., MIR 2010)
Example: similar cell searching in lung cytology.
The goal of an image retrieval algorithm is to retrieve from a large database of images (the reference set), visually similar images to a query image.
We integrated a novel implemention in Java of a previously published algorithm Marée et al., MIR 2010 based on random subwindows and vectors of random tests. It uses an image similarity measure reminiscent of tf-idf based on random projections of subwindows extracted from annotation cropped images. The retrieval server is automatically called by Cytomine-WebUI through its REST API once a user selects or draws an annotation. For each Cytomine project, the reference set can be configured either as the set of all annotations of this project only, or from all projects using the same ontology. Our algorithm is incremental (novel annotations are indexed on-the-fly) and it can be distributed across several servers. It relies on vectors of random tests that test many subwindow pixel intensities and it uses disk-based hashtables to index annotations identifiers. Our implementation allows to set in server configuration files the number of vectors and the number of tests per vector, as well as the size intervals of random subwindows (by default 0-100%), the resized patch size (by default 16x16) and in which colorspace they are encoded (by default HSV).
CBIRetrieval: a Java library package that can be run as a simple application (command line) or can be imported in a JVM application (jar). https://github.com/loic911/CBIRetrieval
CBIRestAPI: an HTTP REST API server for CBIRetrieval lib. https://github.com/loic911/CBIRestAPI
Interest Point Detection based on multiresolution features and extremely randomized trees (see technical report, publication in preparation)
Example: Interest point detection in cephalometry (left) and zebrafish development microscopy (right).
The goal of this module is to build and apply interest point (landmark) detection models. Typical applications include morphometric measurements in developmental studies. This module implements novel variants of an algorithm previously presented in . This package is decomposed into a training phase (model builder) and a prediction phase (prediction). For the training phase, it downloads from a given project images and landmarks (x,y positions and landmark class) by communicating with Cytomine-Core. It then builds landmark detection models (one model per landmark) using multiresolution pixel-based features and extremely randomized tree classifiers. These models are saved in files locally. For the prediction phase, it downloads novel images and applies all landmark prediction models to each image. Landmark coordinates are uploaded to Cytomine-Core as Point annotations with predicted terms (landmark class).
More precisely, a binary classification scheme is used in the implementation. In this approach, each pixel in an image can either be a landmark or not. It is a landmark pixel if it is located at an euclidean distance <R to the real landmark position. Each pixel is described by square subwindows of size W taken at D different resolutions centered at the location of the pixel. If D resolutions are taken, the resolutions of the image divided by 2⁰ towill be used. During training, all the landmark pixels are extracted from each image of our training dataset. If N is the number of landmark pixels extracted for one image, P*N non-landmark pixels are also extracted from each image, at a distance of at most RMAX to the landmark position. After this first extraction, the process is repeated Nr times by applying Nr random rotations to each image of the training dataset. These rotations will be chosen in the interval [-a,a]. A model of T extremely randomized trees is built. During prediction, we extract Npred pixels located at random positions chosen from a multivariate normal law based on the position of the landmark in the training dataset. The position of each landmark is then computed as the median of the locations of the pixels predicted as landmarks with the highest probability by the corresponding model.
All these parameters (R, W, D, P, RMAX, Nr, a, T, Npred) can be chosen individually for model building, or tuned through K-fold cross validation.
This module applies an adaptive thresholding algorithm to whole image thumbnails from a project (downloaded from Cytomine-Core and Cytomine-IMS) and uploads detected geometries to Cytomine-Core (in a individual UserJob layer for each image). It can be used to detect tissue sample regions in order e.g. to quantify tissue sample area or before applying other algorithms (e.g. segmentation) to avoid applying time-consuming algorithms to image background regions. There are two variants:
https://github.com/cytomine/Cytomine-python-datamining/tree/master/cytomine-applications/detect_sample/ (for sample detection)
https://github.com/cytomine/Cytomine-python-datamining/tree/master/cytomine-applications/object_finder (for individual objects detection)
Object sorting based on a two-tier approach using thresholding to segment objects (based on the previous object finder) then our image classifcation approach based on random subwindows and extremely randomized trees [Marée et al., Pattern Recognition Letters, 2016].
Example: cell sorting in cytology.
The aim of supervised image classification is to automatically build computerized models able to predict accurately the class (among predefined ones) of new images (objects), once trained from a set of labelled images (objects). A typical example is individual cell classification in cell populations.
This module is decomposed into three components: a validation phase (validation), a training phase (model builder) and a prediction phase (prediction). The validation phase downloads from Cytomine-Core, for a given list of projects and additional filters (terms, users,...), cropped image regions for each annotation (minimal bounding box fully containing the annotation geometry). Using a cross-validation protocol, it learns classification models (see algorithm below) and evaluates their classification accuracy for a given set of parameters. The predictions are uploaded to Cytomine-Core and the resulting confusion matrix can be visualized in the Cytomine-WebUI (confusion matrices are interactive, with the possibility to visualize annotation misclassifications directly in original gigapixel images). The training phase similarly downloads annotations, builds a classification model on the whole set of annotations for the given set of parameters and saves the classification model locally. The prediction phase applies such a model on existing annotations downloaded from Cytomine-Core within a gigapixel image (e.g. annotations detected by the object finder module), and uploads predicted annotation terms to Cytomine-Core (in a userjob layer).
The algorithm involves the extraction of random subwindows described by raw pixel values and the use of ensemble of extremely randomized trees by different means. More detailed algorithm description and illustration of its main steps are given in our Pattern Recognition Letters paper.
Note that this module is currently being reimplemented in order to have a more flexible and more efficient workflow.
- Segment, Locate, Dispatch and Classify (SLDC) based on [Mormont et al., Benelearn, 2016]
SLDC is a generic framework for accelerating development and execution of multi-gigapixel images analysis workflows., with a focus on problems of object detection and classification. It enables developers to implement arbitrarily complex workflows without caring about problem-independant concerns such as large image handling and parallelization. Developers can plug their segmentation and classification algorithms into SLDC which will then provide them with the detected objects and their classification labels.
Currently, SLDC cannot be used directly through the Cytomine web interface but can be used to implement workflows that can then be added as software into Cytomine. More information here: [DOC] SLDC (Segment, Locate, Dispatch and Classify)
How to install the "Cytomine datamining Python package"
The Cytomine Data Mining Python package includes all algorithm except the image retrieval algorithm that is implemented in Java.
The Cytomine-DataMining Python package comes with the automated Docker Installation in the software_router container:
For algorithm developers, it is also possible to install and run it on remote computers. See the installation procedure here.
Code examples to run image analysis algorithms
Examples are provided in each directory in the source repository (in cytomine-applications).
Algorithms can also be launched from Cytomine-WebUI, see User guide documentation.
For debugging purposes of your own installation, logs of algorithms executed through the Cytomine-WebUI are available in the software_router Docker container.
On your server (hosting the Docker containers), enter into the docker and print the logs using tail/cat:
sudo docker-enter software_router
ls -altr /software_router/algo/logs/
How to contribute with novel algorithms
Novel algorithms can be implemented in Python or Java using our clients or in any language that supports HTTP requests.
Our clients include basic functions to e.g. get image tiles, get annotation listing, get annotation image crops, add annotations and predicted terms, etc. These functions encapsulate http calls.