Skip to main content

COINSTAC Computation Development Guide

Coinstac

COINSTAC is a Electron based application environment for decentralized algorithms, this guide aims to describe developing an alogrithm for COINSTAC.

Requirements

Getting Started

To successfully run computation in the simulator we need to

Other topics

Create a compsec

Starting a COINSTAC computatation begins with making a compspec.json a document that allows the COINSTAC system to understand how to run and use your computation with others or by itself.

Below is an example from the COINSTAC regression algorithm.

The key sections here are

  • meta - uniquely names and describes your computation for organizational purposes within the COINSTAC system
  • computation - describes container type, tells COINSTAC where to find and pull your image, and gives COINSTAC the initial files to run for both the remote server and each local client
  • input - describes the intended input data and parameters coming from your inputspec.json document and eventually the COINSTAC UI
  • output - defines the expected output keys and values once a computation has completed (this is usually associated with the "Table" display type)
  • display - defines how the expected output should be displayed by COINSTAC

For further detail see: Compspec API Documentation.

For now lets take a look at a simple example:

{
"meta": {
"name": "Computation Name",
"id": "cool-comp-name",
"version": "v1.0.0",
"repository": "https:\/\/github.com\/trendscenter\/coinstac-cool-comp-name",
"description": "This Is A Cool Computation"
},
"computation": {
"type": "docker",
"dockerImage": "cool-comp-name",
"command": [
"python",
"\/computation\/local.py"
],
"remote": {
"type": "docker",
"dockerImage": "cool-comp-name",
"command": [
"python",
"\/computation\/remote.py"
]
},
"input": {
"number":
{
"defaultValue": 1,
"label": "Number",
"type": "number",
"source": "owner"
}
},
"output": {
},
"display": [
]
}
}

The output and display sections can be blank for now, and will be used later for telling COINSTAC UI how to deal with your computation once it completes.

Use a container system

To run your computation in COINSTAC you'll need to encapsulate it in an image. Currently COINSTAC supports Docker. (We plan on supporting Singularity image formats, and Singularity sif images, which will be created from docker images in the near future)

Create a Dockerfile with Docker

All COINSTAC images need to inherit from a COINSTAC base image which contains a microservice that allows the container to persist between iterations when running a pipeline. A list of base images can be found here.

A typical computation setup often has two scripts for remote and local computation.

This Dockerfile example below includes the COINSTAC base image, places your code into the root /computation directory, and installs any libraries required by your code. The Dockerfile should be in your computation's root directory or otherwise reference paths based on it's location.

Note: please create and use a .dockerignore file to limit and extraneous system files not needed or used by the computation, this keeps image size down

FROM coinstacteam/coinstac-base
# Set the working directory
WORKDIR /computation
# Copy the current directory contents into the container
COPY requirements.txt /computation
# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt
# Copy the current directory contents into the container
COPY . /computation

Create the scripts

Here is a simple example that just sends a number and sums it until it reaches 5

local script

#!/usr/bin/python

import sys
import json

doc = json.loads(sys.stdin.read())
if 'start' in doc['input']:
sums = 1
else:
sums = doc['input']['sum'] + 1

output = { "output": { "sum": sums } }
sys.stdout.write(json.dumps(output))
remote script

#!/usr/bin/python

import sys
import json

doc = json.loads(sys.stdin.read())
sums = 0
for site, output in doc['input'].items():
sums = sums + output['sum'];
sums = sums / len(doc['input'])
if sums > 4:
success = True
else:
success = False

output = { "output": { "sum": sums }, "success": success }
sys.stdout.write(json.dumps(output))
And the compspec this uses

{
"meta": {
"name": "decentralized test",
"id": "coinstac-decentralized-test",
"version": "v1.0.0",
"repository": "github.com\/user\/computation.git",
"description": "a test that sums the last two numbers together for the next"
},
"computation": {
"type": "docker",
"dockerImage": "coinstac-decentralized-test",
"command": [
"python",
"\/computation\/local.py"
],
"remote": {
"type": "docker",
"dockerImage": "coinstac-decentralized-test",
"command": [
"python",
"\/computation\/remote.py"
]
},
"input": {
"start": {
"type": "number"
}
},
"output": {
"sum": {
"type": "number",
"label": "Decentralized Sum"
}
}
}
}
Creating an inputspec

To run a computation in coinstac-simulator we need a document that gives the pipeline the inital input with relation to the compspec. This document is smaller chunk of what the UI generates for the pipeline, the rest is done for you in simulator.

The inputspec.json's default location is in a folder called ./test in your root directory. The inputspec is just the variables in your compspec's input section with fulfilled values. Here's and example:

{
"start": // just the variable name we choose for our sum example
{
"value": 1 // the inital value given to the pipeline
}
}

If you specify more clients in the simulator with the -c flag, the above data will just be cloned amoung them to start with. To start with different data for each client, make the inputspec an array:

[
{
"start": // just the variable name we choose for our sum example
{
"value": 1 // the inital value given to the pipeline
}
},
{
"start": // just the variable name we choose for our sum example
{
"value": 1.5 // the inital value given to the pipeline
}
}
]
Running the computation

We have our compspec, local, and remote scripts, now let's try to run it First we'll want to build our docker box, on some machines (linux and possibly windows) you may have to run these commands as an administrator.

docker build -t coinstac/coinstac-decentralized-test .

Note: the image name in the -t tag must be the same as your dockerImage in your compspec Now in the *root directory of your project run:

coinstac-simulator

You should see the computation start compstart End compend And finally the output as JSON compoutput

local vs decentralized

local and decentralized computations are made nearly the same, except for two key differences: Decentralized computations always are ended by the remote, and decentralized computations have a "remote" section in their compspec.

Advanced usage

In this secion we'll go over some more advanced use cases of simualtor and COINSTAC itself

Adding test files in simulator

Adding files to simulator is a bit of a manual process at the moment, they must be put into the ./test folder under a the automatic sitenames the simualtor generates. Sim names each site local# where # is 0 to the number of clients the run has. Here's what the directory sturcture looks like for local0, or the first client:

├── test
│ ├── inputspec.json
│ ├── local0
│ │ └── simulatorRun
│ │ ├── subject0_aseg_stats.txt
│ │ ├── subject10_aseg_stats.txt

The simualtorRun directory is the directory that is shared into docker, put your files in there and they will be accessible by your computation via ["state"]["baseDirectory"] + myfile.txt for a python example.

Transferring files between clients

Files can be transferred client to remote and remote to client by writing to a special directory. The transfer directory can be access while in a computation via the state->transferDiretory key, looking like inputJSON['state']['transferDiretory'] in python.

Any file put into the transfer directory is then moved from that directory to the input directory on the corresponding client when that iteration ends. From remote to client would look like this: Remote:

├── ['state']['transferDiretory']
│ ├── move-this-file.txt
Iteration ends....

Remote:

├── ['state']['transferDiretory']
│ ├──

All Clients:

├── ['state']['baseDiretory']
│ ├── move-this-file.txt

From any client to remote would look like this: A Client:

├── ['state']['transferDiretory']
│ ├── move-this-file.txt
Iteration ends....

A Client:

├── ['state']['transferDiretory']
│ ├──

Remote:

├── ['state']['baseDiretory']
│ ├── ['state']['baseDiretory']['clientID'] // files moved into folders based on the sending Client's ID so they cannot conflict
│ │ └── move-this-file.txt

NOTE: All transferred files are deleted at the end of the pipeline, to persist files write them to the output directory.

Running pipelines in sim

Simulator can run pipelines, which is multiple computations in series with each other. The computations can be different, or the even the same computation, what's required to do this is a pipelinespec.json file and the -p or -pipeline flag in simulator. Here's an example:

["./compspec.json", "/Users/someUser/coinstac/packages/coinstac-images/coinstac-decentralized-test/compspec.json"]

Running this pipeline:

coinstac-simulator -p ./pipelinespec.json

You can see that the pipeline spec file is a list of compspecs, this list can be either absolute paths, or a path relative to the pipelinespec.json.

If a computation is the first in the list, the normal inputspec.json is used for input. However all subsequent computations must have a inputspec-pipeline.json, this is used for any computation that isn't the first, such that you can have multiple specs to make running between pipeline and non-pipeline mode less cumbersome.

The pipelinespec.json file need not be in any computations directory, but it may be easier to put it in the first computation's directory for source control.

Inputspec-pipeline and using previous step output

Later steps in the pipeline can use variables output from previous steps (note: there is a string size limit of 256mb), using the following paradigm:

{"inputVariableName":
{"fromCache":
{"step":0, "variable":"outputVariableName"}
}
}

Contrasting a normal inputspec that looks like:

{"inputVariableName":
{
"value": 0
}
}

The fromCache key tells the pipeline you want to access a previous steps data, the step key tells it which step zero indexed you would like, and the variable key is the name of the previously outputted variable to access. Here's a full example, inputspec.json for first step:

{"firstVariable":{"value":1}}

it's output compspec:

"output": {
"firstOutput": {
"type": "number",
"label": "output from first step"
}
},

inputspec-pipeline.json for the next step:

{"nextStepInputVar":{"fromCache": {"step":0, "variable":"firstOutput"}}}

You can see that the first step outputs firstOutput which then is plugged into nextStepInputVar for the second step, chaining the pipeline I/O together.

Multiple clients in pipeline mode

Pipeline mode can also do multiple clients, each with their own input. The inputspec.json is identical to normal multi client mode, however we also need a multi client inputspec-pipeline.json for all the subequent pipeline computations in the pipelinespec. A multi client run for the previous example might look like this, first step inputspec.json:

[{"firstVariable":{"value":0.5}},
{"firstVariable":{"value":23}}]

second step inputspec-pipeline.json:

[{"nextStepInputVar":{"fromCache": {"step":0, "variable":"firstOutput"}}},
{"nextStepInputVar":{"fromCache": {"step":0, "variable":"firstOutput"}}}]

Some notes on this:

  • The arrays for the multiple clients between the multiple inputspec and inputspec-pipeline files must be the same length, as the client count has to be the same throughout the pipeline
  • The arrays for the multiple clients must also be in the same order, as in array element [5] in the initial input spec should correspond to array element [5] in all other inputspec-pipeline files, otherwise your client input/output chaining will be mismatched.

Transfer data between local and remote easily using our intuitive 'coinstac_computation' python library

Please check out the documentation here

A full fledged example

A more real world example can be found here with our Single Shot Regression Demo that uses freesurfer file test data to run it's simulator example