Skip to content

NrgXnat/ml-plugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

181 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XNAT Machine Learning Development Plugin

Offers support for running training experiments on models to support machine learning in XNAT. This workflow urrently supports NVIDIA Clara and other TLT models.

Building

To build the XNAT Machine Learning plugin:

  1. If you haven't already, clone this repository and cd to the newly cloned folder.
  2. Build the plugin: ./gradlew jar (on Windows, you can use the batch file: gradlew.bat jar). This should build the plugin in the file build/libs/xnat-template-plugin-1.0.0.jar (the version may differ based on updates to the code).
  3. Copy the plugin jar to your plugins folder: cp build/libs/ml-plugin-1.0.0.jar /data/xnat/home/plugins

You'll also need the XNAT Datasets Plugin installed.

Quick Start

The following examples:

  • Include the server address http://xnat. You should replace this with the site URL for your deployed XNAT system.
  • Use httpie to demonstrate how calls might work

Create a dataset definition

A dataset definition is analogous to a query or stored search: it specifies the criteria data should meet to be included in the resolved dataset, but does not itself indicate any particular file or files. A dataset definition consists of the following properties:

  • Project
  • Label
  • Description (optional)
  • Criteria, which is a list of criterion objects

The criteria are the primary content in the definition. Each criterion itself consists of two properties:

  • Resolver indicates the implementation that should interpret the criterion
  • Payload is the data that the resolver interprets

Currently the main type of dataset definition uses a resolver named TaggedResourceMap. This resolver takes one or more resource tag fields, each of which specifies a tag value and values for the following properties:

Property Description Table
SeriesDescription Searches image scan attributes type, series_description, and series_class xnat_imagescandata
ResourceLabel Searches the resource label for matching scans xnat_abstractresource
ResourceFormat Searches the resource format for matching scans xnat_resource
ResourceContent Searches the resource content for matching scans xnat_resource

The format of the search values implies the type of comparison used:

  • If a value contains the character '%' by itself, the search uses a LIKE comparison. If you want to include the actual '%' character without using a LIKE, escape it with another '%', e.g. X%%Y.
  • If a value starts and ends with the character '/', the search uses a regular expression comparison.
  • If a value starts with '/' and ends with '/i', the search uses a case-insensitive regular expression comparison.
  • Any other value is treated as a literal search, i.e. attribute = 'value'.

Here's a sample dataset definition:

{
    "project": "AbSegCt",
    "label": "AbSegCt_training_data",
    "description": "This is a definition for data for training the AbSegCt segmentation model",
    "criteria": [
        {
            "resolver": "TaggedResourceMap",
            "payload": {
                "Images": {
                    "tag": "image",
                    "SeriesDescription": ["T1%"],
                    "ResourceFormat": ["NIFTI"],
                    "ResourceContent": ["/T1./i"],
                    "ResourceLabel": ["/nifti/i"]
                },
                "Labels": {
                    "tag": "label",
                    "SeriesDescription": ["Segment%"],
                    "ResourceFormat": ["NIFTI"],
                    "ResourceContent": ["/Segmentat.{3}/i"],
                    "ResourceLabel": ["/nifti/i"]
                }
            }
        }
    ]
}

If you save this JSON to a file named absegct-dataset-definition.json, you can create the definition object in XNAT with a call like this:

$ cat absegct-dataset-definition.json | http --session=username POST https://xnatdev.xnat.org/xapi/sets/definitions

Create a dataset

A dataset is the result obtained from resolving a dataset definition at a particular point in time. The contents of a particular dataset don't change based on new data being added or existing data being renamed, moved, or deleted. To resolve a dataset definition, you can POST to a REST endpoint identifying a particular definition:

These calls would look similar to those below:

$ http --session=username POST https://xnatdev.xnat.org/xapi/sets/definitions/XNAT_E00101
$ http --session=username POST https://xnatdev.xnat.org/xapi/sets/definitions/projects/AbSegCT/AbSegCt_training_data

Create a new model

To create a new model, you can post to the REST endpoint http://xnat/xapi/ml/models/model/_PROJECT_/_MODEL_, where:

  • PROJECT indicates the project in which the model should be created
  • MODEL indicates the label for the new model object

There are two ways you can submit the actual files that compose the model:

  • Set the content type to multipart/form-data and add each file to the "form" request with the name modelFile
  • Set the content type to application/zip and the request body to a zip file containing all of the files

From the command-line, these calls might look like this:

$ http --session=username --form http://xnat/xapi/ml/models/model/AbSegCT/model_1 modelFile@checkpoint modelFile@model.ckpt.data-00000-of-00001 modelFile@model.ckpt.index modelFile@model.ckpt.meta modelFile@model.fzn.pb modelFile@model.trt.pb
$ http --session=username --form http://xnat/xapi/ml/models/model/AbSegCT/model_2 @model.zip

Create a training configuration

Question: The model is not referenced by the training configuration. Should it be? How independent of a model is the training configuration? Can a training configuration be used for more than one model?

In addition to the standard project and label properties, a training configuration brings together a few different items:

  • The configuration to be used when launching training sessions for the model
  • Any fields within the configuration that may be parameterized at launch
  • The ID of the resolved dataset
  • A JSON template that contains a wrapper for the dataset
  • Parameters for partitioning the dataset files

A training configuration can be most easily created by POSTing a JSON body that looks something like this:

{
    "project": "project",
    "label": "label",
    "collectionId": "XNAT_E00102",
    "configuration": {
        "parameterizable": ["epochs", "multi_gpu", "learning_rate"],
        "template": { ... }
    },
    "dataset": {
        "template": { ... },
        "parameterizable": {
            "training": 70,
            "validation": 20,
            "test": 10
        }
    }
}

If you specify both project and label in the training configuration JSON, you can simply POST the JSON to the REST endpoint http://xnat/xapi/ml/config. You can omit the project and label fields in the JSON to allow easier re-use of the same configuration template, but then need to specify these values with the REST URL:

$ cat config_train.json | http --session=username POST http://xnat/xapi/ml/config
$ cat config_train.json | http --session=username POST http://xnat/xapi/ml/config/project/AbSegCT/AbSegCT_config_train

Note that the second form of this REST call uses the values for project and label from the URL, even if these have different values in the POSTed object!

Both template fields can be inserted as literal JSON (i.e. no encoding required). These are usually tightly tied to the model and training algorithm and should be specified by the developer(s) of the model to be trained. Note that the configuration template is delivered exactly as specified when the configuration is rendered, with the exception of substituting values for any fields the user specifies at launch time, while the dataset template is rendered by adding elements for each of the fields specified in the dataset's parameterizable field. Given the following dataset configuration:

"dataset": {
    "template": {
        "name": "AbSegCt",
        "quantitative": [
            0,
            1
        ],
        "licence": "CC-BY-SA 4.0",
        "labels": {
            "1": "PZ",
            "2": "TZ",
            "0": "background"
        },
        "release": "1.0 04/05/2018",
        "modality": {
            "1": "ADC",
            "0": "T2"
        },
        "tensorImageSize": "4D",
        "reference": "Miskatonic University",
        "description": "Abdominal segmentation"
    },
    "parameterizable": {
        "training": 70,
        "validation": 20,
        "test": 10
    }
}

The resulting rendered dataset would look like this (the actual image lists are truncated to a single session for readability):

{
    "name": "AbSegCt",
    "quantitative": [
        0,
        1
    ],
    "licence": "CC-BY-SA 4.0",
    "labels": {
        "1": "PZ",
        "2": "TZ",
        "0": "background"
    },
    "release": "1.0 04/05/2018",
    "modality": {
        "1": "ADC",
        "0": "T2"
    },
    "tensorImageSize": "4D",
    "reference": "Miskatonic University",
    "description": "Abdominal segmentation",
    "training": [
        {
            "image": "/data/xnat/archive/prostate/arc001/prostate_45_MR_01/SCANS/1/NIFTI/prostate_45.nii.gz",
            "label": "/data/xnat/archive/prostate/arc001/prostate_45_MR_01/SCANS/2/NIFTI/prostate_45.nii.gz"
        }
    ],
    "numTraining": 70,
    "validation": [
        {
            "image": "/data/xnat/archive/prostate/arc001/prostate_45_MR_01/SCANS/1/NIFTI/prostate_45.nii.gz",
            "label": "/data/xnat/archive/prostate/arc001/prostate_45_MR_01/SCANS/2/NIFTI/prostate_45.nii.gz"
        }
    ],
    "numValidation": 20,
    "test": [
        {
            "image": "/data/xnat/archive/prostate/arc001/prostate_45_MR_01/SCANS/1/NIFTI/prostate_45.nii.gz",
            "label": "/data/xnat/archive/prostate/arc001/prostate_45_MR_01/SCANS/2/NIFTI/prostate_45.nii.gz"
        }
    ],
    "numTest": 10
}

Note that the number of images in each set as indicated by the num_Partition_ values reflects the values set in the dataset's parameterizable field:

  • If the values for those parameters add up to 100, they are taken as percentages and the dataset is partitioned based on those percentages, regardless of the number of images in the dataset. In this case, a dataset with 500 images would have a training partition with 350 images, a validation partition with 100 images, and a test partition with 50 images.
  • If the values for the parameters don't add up to 100, they must add up to the same value as the total number of images in the dataset (note that there might be multiple tags such as image and label for an image: these are considered to be part of a single image).

Launching a training session

Once you have a model, its training configuration, and a dataset, you can begin to train the model. You may launch multiple training sessions simultaneously or serially for the same model, varying the configuration parameters each time to fine tune the training outcome. A training session launch request can take the following attributes:

Property Description
label The label for the training session. This is intended to be human readable and can be used to make it easy to distinguish training sessions, e.g. "session epochs 50 learning rate 0.4" and "session epochs 50 learning rate 0.6".
processingId A unique processing ID for the session. This is used internally by XNAT for things like routing requests to containers running the training session or generating processing data to allow monitoring training progress.
modelId The ID of the model to be trained.
configurationId The ID of the training configuration to be used for training.
username The username of the user requesting the training session.
parameters Any parameters and arguments for the training session.
sessionId The training session ID. This is optional and can be used when updating a training session that has been queued but not yet launched.

The REST endpoint to launch a training session is http://xnat/xapi/ml/train/launch:

http --session=username POST http://xnat/xapi/ml/train/launch processingId="abSegCt-model-20200413153659" label="AbSegCt model epochs 50 LR 0.5 multi-gpu" modelId=XNAT_E00100 configurationId=XNAT_E00101 parameters:='{"epochs": "50", "learning_rate": "0.5", "multi_gpu": "true"}'

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors