# Home

Isabl is a platform for the integration, management, and processing of individual-centric multimodal data. Welcome to the Isabl Documentation!

Isabl is a plug-and-play data science framework designed to support the processing of multimodal patient-centric data. Have questions? Ask [here](https://gitter.im/isabl-io/community).

Isabl has been developed by the [Elli Papaemmanuil's Lab](https://www.mskcc.org/research-areas/labs/elli-papaemmanuil).

{% content-ref url="/pages/-LgZKlH7TVorkX37z2JF" %}
[Quick Start](/quick-start)
{% endcontent-ref %}

## Features

* 👾 **Backend, Data Model and RESTful API**
  * Metadata version control
  * Fully featured and brisk RESTful API with extensive swagger documentation
  * Comprehensive permissions controls and user groups
  * Patient centric relational model with support for:
    * *Individuals*, *samples*,  *experiments* and *cohorts*
    * *Assembly* aware bioinformatics *applications* and *analyses*
    * Choice models such as *diseases*, *centers* and more
    * **Custom fields** for all schemas!
* 🤖 **Command Line Interface and Software Development Kit**
  * Digital Assets Management (Permissions, Storage, Tracking)
  * Automated execution and tracking of bioinformatics applications
  * Project and patient level results auto-merge
  * Operational automations on data import and analyses status change
  * Dynamic retrieval of data and results using versatile queries
  * Fully featured SDK for post-processing analyses
* 🚀 **Web Application**
  * User Interface to browse and manage the operations metadata
  * Analyses tracking and results visualization
  * Flexibility to edit and customize models
  * Batch creation of metadata by excel file submission
  * Single Page Application that provides a crispy user experience
  * Possibility to integrate third-party services like JIRA
* ✅ **Plug-n-play and reliable codebase**
  * Docker-compose is the only dependency for the web application and the backend
  * The Command Line Interface is a portable pip installable package
  * Continuously Integrated with +98 % coverage across all codebase
  * isabl is upgradable, no need to fork out from codebase

## Who is using Isabl

* [Elli Papaemmanuil's lab](https://www.mskcc.org/research-areas/labs/elli-papaemmanuil).
* The [Department of Pediatrics](https://www.mskcc.org/blog/one-one-how-msk-using-precision-medicine-tailor-treatment-children-cancer) at Memorial Sloan Kettering.
* [Sohrab Shah's lab](https://www.mskcc.org/profile/sohrab-shah).
* The [Microbiome Program](https://www.mskcc.org/research-areas/labs/jonathan-peled) at Memorial Sloan Kettering.&#x20;
* The [Single-Cell Analytics Service (SAIL)](https://www.mskcc.org/research/ski/innovation-labs/single-cell-analytics-innovation-lab-sail) at Memorial Sloan Kettering.&#x20;
* [Cristina Curtis' Lab](https://med.stanford.edu/curtislab.html) at Stanford Medicine.

... And many other groups at Weill Cornell, California State University, University of Oviedo (Spain), are currently testing it as a potential fit!

## Infrastructure

Isabl is a modular infrastructure with four main components: (1) an individual-centric and extensible relational database (Isabl-db); (2) a comprehensive RESTful API (Isabl-api) used to support integration with data processing environments and enterprise systems (e.g. clinical databases, visualization platforms); (3) a Command Line Client (CLI; Isabl-cli) used to manage digital assets and deploy bioinformatics applications; (4) a front end single page web application (Isabl-web) with system wide queries enabled.

![Isabl is composed of a patient centric relational model, a web-based metadata architecture, and a command line client.](https://docs.google.com/drawings/d/e/2PACX-1vQnO2UBtPAGuUqobgfAH2GFbvuE5aCAzrYpxa_nBb8tigeT-GdfAkurTnOpzrpa_QDxBH-nrQ-lnxEk/pub?w=998\&h=712)

RESTful API capabilities are documented with Swagger (<https://swagger.io>) and Redoc (<https://github.com/Rebilly/ReDoc>) following OpenAPI specifications ([https://www.openapis.org](https://www.openapis.org/)). Importantly, Isabl's metadata infrastructure is decoupled and agnostic of compute and data storage environments (e.g. local, cluster, cloud). This functionality separates dependencies and fosters interoperability across compute environments.

![](https://docs.google.com/drawings/d/e/2PACX-1vTLYVgPubPSlSgyUahpZ3fOT-p9lmrMet5qCl1klS2VzEnFIE4zLW0WK3cDZaCgAmwcsa3Ta-J9ujdG/pub?w=889\&h=667)

## Data Model

**I**sabl's relational model maps workflows for data provenance, processing, and governance. Metadata is captured across the following thematic categories: (1) project, individual and sample level attributes; (2) raw data properties including experimental technique, technology, and related parameters (e.g. read length); (3) analytical workflows to include a complete audit trail of versioned algorithms, related execution parameters, reference files, analyses status tracking, and results deposition; (4) data governance information for management of system and data access across stakeholders.

![](https://docs.google.com/drawings/d/e/2PACX-1vTG3QBMOtwM5DhpFG07iQFj0SA0J7CE4e8Xd3ZJcpJy24EiDu9HbGomqslNFgqV3rauJ-z_VU-SY-ja/pub?w=1305\&h=791)

## Why Isabl

Isabl ensures that all bioinformatics operations follow the DATA reproducibility checklist (Documentation, Automation, Traceability, and Autonomy), whilst guarantees that assets are managed according to the FAIR principles (Findable, Interoperable, Accessible, Reusable).

![](https://docs.google.com/drawings/d/e/2PACX-1vRCagXfy-ubxEHKL3GOSTTEGE1g9hWk1Ic0yTx3tWsBJHWSIfO5Y2Hcu0wTeBtb3mA1DeEXKw4c1fBd/pub?w=1216\&h=810)

Here are some reasons why you may want to use Isabl:

* You don't have a +10 engineers group but do have hundreds of samples
* You'll rather not have your data managed by postdocs, PhD students
* Crosslink samples from different cohorts
* Answer new questions using existing data
* Full log and audit trail of your informatics operations
* Automatically merge results as new samples are added to big cohorts
* You want to have programmatic access to the entire data capital
* Seamlessly run reproducible pipelines across your projects

{% embed url="<https://www.youtube.com/watch?v=L1JhVqZ3oBY>" %}

## Similar projects

* [The Genome Modeling System](https://github.com/genome/gms) *Genome Institute at Washington University platform.*
* [SeqWare](https://seqware.github.io/) *analyze massive genomics datasets.*
* [QuickNGS](http://bifacility.uni-koeln.de/quickngs/web/) *efficient high-throughput data analysis of Next-Generation Sequencing data.*
* [HTS-flow](https://github.com/arnaudceol/htsflow) *a framework for the management and analysis of NGS data.*

## What Isabl is not

* Isabl is not a *Workflow Management System* such as [toil](https://github.com/DataBiosphere/toil), [bpipe](https://github.com/ssadedin/bpipe), instead Isabl facilitates automated deployment and databasing of data processing pipelines.
* Isabl is not a *Platform as a Service (PAAS)* provider such as [DNA nexus](https://www.dnanexus.com), [Seven Bridges](https://www.sevenbridges.com) or [Fire Cloud](https://software.broadinstitute.org/firecloud/), instead an information system that could potentially feed in metadata and data to these services.
* Isabl differs from *Server Workbenches* such as [Galaxy](https://usegalaxy.org/) or Pegasus, instead of being configuration friendly, Isabl is designed to conduct systematic analyses automatically and in a standardized way with as little human input as possible.
* Isabl is not a *Workflow Language*, instead the Bioinformatics Applications in `isabl` only define meta-data driven validation and logic to build commands to trigger pipelines written in any language.


# Quick Start

⏱ tutorial time: 10 minutes

{% hint style="warning" %}
The Isabl Platform is **freely available for research and academic purposes**. For commercial use, a license is required. To access the source code, please [get in touch with us](mailto:undefined).
{% endhint %}

Welcome to the *10 Minutes to Isabl* guide! This tutorial will walk you through installation, meta data registration, data import, and automated data processing.&#x20;

{% hint style="info" %}

* Also check the [Retrieve Data](/retrieve-data) and [Register Metadata](/data-model) guides!
* Join us on [Gitter](https://gitter.im/isabl-io/community) if you have questions
* Submit an [issue](https://github.com/isabl-io/demo/issues/new) 🐛 if you are having  problems with this guide
  {% endhint %}

## Intro to Isabl

Checkout the documentation [home](/) page for an intro to Isabl.

![Isabl is composed of a patient centric relational model, a web-based metadata architecture, and a command line client.](https://user-images.githubusercontent.com/8843150/62899299-77088f00-bd25-11e9-9695-cda93ab825a5.png)

## Prerequisites

* [Docker Compose](https://docs.docker.com/compose/install/) for building and running the application.

{% hint style="warning" %}
Make sure your installation doesn't require `sudo` to run `docker` and `docker-compose`. Otherwise you will have issues running this demo. Check `docker run hello-world` runs without problem. If you have permissions issues, see [how to run docker as-non-root user](https://docs.docker.com/install/linux/linux-postinstall/#manage-docker-as-a-non-root-user).
{% endhint %}

## Demo Setup

Let's start by clone the demo:

```bash
git clone https://github.com/papaemmelab/isabl_demo.git --recurse-submodules
cd isabl_demo
```

{% hint style="warning" %}
Make sure the `git submodule` folders `isabl_api` and `isabl_cli` are not empty. If they are, probably the `--recurse-submodules` flag didn't work.&#x20;

As a workaround, run:

`git submodule update --recursive` --init
{% endhint %}

Next, source a simple initiation profile:

```bash
source .demo-profile
```

{% hint style="info" %}
If you are **redoing** the tutorial, we recommend to remove the demo directory and clone it again:

```bash
chmod -R u+w isabl_demo && rm -rf isabl_demo
```

Also remove the Docker volume:

```bash
docker volume rm isabl_demo_local_postgres_data
```

{% endhint %}

## Installation

{% hint style="warning" %}
This installation relies on docker private images. Please make sure with the *isabl team admins* that you have proper access to them.
{% endhint %}

Build and run the application (this might take a few minutes):

{% hint style="success" %}
Before running `docker-compose build,` specify your **platform** architecture if it's different from the standard intel `linux/amd64`, i.e. you have an **Apple M1/M2**:&#x20;

```bash
export DOCKER_DEFAULT_PLATFORM=linux/arm64
```

If you are not sure, you can check your **platform** by running:

<pre class="language-bash"><code class="lang-bash"><strong>uname -m
</strong></code></pre>

{% endhint %}

```bash
demo-compose build
```

Now we can run the application in the background:

```bash
demo-compose up -d
```

{% hint style="info" %}
You can type `demo-compose down` to stop the application. And use `demo-compose logs -f` in a new session to see logs.
{% endhint %}

Create a superuser by running `demo-django createsuperuser`, for example using credentials: `username=admin password=admin email=admin@demo.io`

Now, visit <http://localhost:8000/> and log in!

{% hint style="info" %}
`demo-compose`, `demo-django`, and `demo-cli` are simple wrappers around `docker-compose` - check them out. The `isabl_demo` directory was bootstrapped using [cookiecutter-isabl](https://isabl-io.github.io/docs/#/api/settings), a proud fork of [cookiecutter-django](https://github.com/pydanny/cookiecutter-django)! Many topics from their [guide](https://cookiecutter-django.readthedocs.io/en/latest/developing-locally-docker.html#) will be relevant to your project.
{% endhint %}

## Create Project

Creating a project in Isabl is as simple as adding a title. You can also specify optional fields:

![Hover over the menu and click in the + icon to add a new project.](/files/-Lgydgys4qCBaZd1MsbU)

## Configure Isabl CLI

We need to let `isabl-cli` know where can the API be reached, and what CLI settings we should use (if you are using `demo-compose`, these variables are already set):

```bash
# install isabl-cli from github (Use a virtualenv as a good practice)
pip install -e ./isabl_cli
pip install -e ./isabl_apps

# point to your API URL
export ISABL_API_URL=http://localhost:8000/api/v1/

# set environment variable for demo client
export ISABL_CLIENT_ID="demo-cli-client"
```

Now we can create a client for `isabl-cli`:

```bash
# create and update the client object
demo-cli python3.6 ./assets/metadata/create_cli_client.py

# check the file used to create the client
cat ./assets/metadata/create_cli_client.py
```

## Register Samples

Before we create samples, let's use `isabl-cli` to add choices for *Center*, *Disease*, *Sequencing Technique*, and *Data Generating Platform*:

```bash
demo-cli python3.6 ./assets/metadata/create_choices.py
```

{% hint style="info" %}
New options can also be easily created using the admin site: <http://localhost:8000/admin/>
{% endhint %}

We will use *Excel submissions* to register samples through the web interface. To do so, the demo project comes with a pre-filled metadata form available at:

```bash
open assets/metadata/demo_submission.xlsm
```

{% hint style="info" %}
When prompted to allow *macros*, say yes. This will enable you to toggle between optional and required columns. By the way, Isabl has multiple mechanisms for metadata ingestion! Learn more [here](/data-model).
{% endhint %}

Now let's proceed to submit this excel form. First open the directory:

```bash
open assets/metadata
```

And drag the `demo_submission.xlsm` file into the submit samples form:

![Click the + button in the project panel header to add new samples.](/files/-Lh6AS2RG3s-qgQoxWoU)

We can also review metadata at the command line:

```bash
isabl get-metadata experiments --fx
```

{% hint style="info" %}
Expand and navigate with arrow keys, press e to *expand all* and E to minimize. Learn more at [`fx` documentation](https://github.com/antonmedv/fx/blob/master/docs.md#interactive-mode). Use `--help` to learn about other ways to visualize metadata (e.g. `tsv`).
{% endhint %}

For this particular demo, we wanted to create a *sample tree* that showcased the flexibility of Isabl's data model. Our demo individual has two samples, one normal and one tumor. The tumor sample is further divided into two biological replicates (or *aliquots*), with two experiments conducted on the second aliquot:

![A data generation process tree that resulted in 4 sequencing experiments (or ultimately bams), produced from two samples of the same individual.](/files/-Lh11Ux_IYr_LaK9TIJC)

## RESTful API

Although not required for this tutorial, you are welcome to checkout the RESTful API documentation at: <http://localhost:8000/api/v1/> or <https://isabl.github.io/redoc/>.

![](/files/-Lh0wz_FbRqJ9QIkJHDL)

## **Import** Reference Data

Given that `isabl-cli` will move our test data, let's copy original assets into a *staging* directory:

```bash
mkdir -p assets/staging && cp -r assets/data/* assets/staging
```

Now let's import the genome:

<pre class="language-bash"><code class="lang-bash"><strong>demo-isabl import-reference-genome \
</strong>    --assembly GRCh37 \
    --genome-path assets/staging/reference/reference.fasta
</code></pre>

We can also import BED files for our demo *Sequencing Technique*:

```bash
demo-isabl import-bedfiles \
    --technique DEMO_TECHNIQUE \
    --targets-path assets/staging/bed/targets.bed \
    --baits-path assets/staging/bed/baits.bed \
    --assembly GRCh37 \
    --species HUMAN \
    --description "Demo BED files"
```

Check that import was successful:

<pre class="language-bash"><code class="lang-bash"><strong>isabl get-bed DEMO_TECHNIQUE  # retrieve BED file
</strong>isabl get-reference GRCh37    # retrieve reference genome
</code></pre>

By means of the `--data-id` flag, the command `get-reference` also allows you to retrieve the indexes generated during import. To get a list of available files per assembly run:

```bash
isabl get-reference GRCh37 --resources
```

{% hint style="info" %}
Learn more about importing data into Isabl [here](/import-data).&#x20;

On this previous steps of the demo, we used `demo-isabl` to import references and bedfiles, because it is a wrapper to run `isabl` inside a container with `samtools`, `bwa` and `tabix` . These tools are needed to index these imported genomic files.
{% endhint %}

## Import Experimental Data

Next step is to import data for the samples we just created:

```bash
isabl import-data \
    -di ./assets/staging            `# provide data location ` \
    -id identifier                  `# match files using experiment id` \
    -fi identifier.contains "demo"  `# filter samples to be imported ` \
    --symlink                       `# to not move the original demo data`
```

{% hint style="success" %}
Add `--commit` to complete the operation.
{% endhint %}

Retrieve imported data for the normal to see how directories are created:

```bash
isabl get-data -fi sample.identifier "demo normal"
```

The front end will also reflect that data has been imported.

## Writing Applications

Isabl is a language agnostic platform and can deploy any pipeline. To get started, we will use some applications from [isabl-io/apps](https://github.com/isabl-io/apps). Precisely we will run alignment, quality control, and variant calling. Applications were [previously registered](/quick-start#configure-isabl-cli) in client object. Once registered, they are available in the client:

```bash
isabl apps-grch37
```

{% hint style="info" %}
Learn more about customizing your instance with [Isabl Settings](/isabl-settings).
{% endhint %}

First we'll run alignment (pass `--commit` to deploy):

```bash
isabl apps-grch37           `# apps are grouped by assembly ` \
    bwa-mem-0.7.17.r1188    `# run bwa-mem version 0.7.17.r1188 ` \
    -fi tags.contains data  `# filter using tags, feel free to try others `
```

{% hint style="info" %}
Note that if you try to re-run the same command, Isabl will notify you that results are already available. If for some reason the analyses fail, you can force a re-run using `--force`.
{% endhint %}

Now we can retrieve bams from the command line:

```bash
isabl get-bams -fi sample.individual.identifier "demo individual"
```

We can also visualize aligned bams online:

{% hint style="info" %}
Insert `2:123,028-123,995` in the locus bar, that's were our test data has reads. Learn more about default BAMs in the [Writing Applications](/writing-applications) guide.
{% endhint %}

![Although the BAM file is an output of the bwa-mem analysis, Isabl enables registering default bams to an experiment. Thus a link is available in the sample panel.](/files/-LgzGgaIKFlkeqCTYLk-)

## Auto-merge Analyses

Let's get some stats for our experiments with a quality control [application](https://github.com/isabl-io/apps/blob/4f893b8995c110c1d685f49a04737533173907c4/isabl_apps/apps/qc_data/apps.py#L20):

```bash
isabl apps-grch37 qc-data-0.1.0 -fi identifier.icontains demo --commit
```

This quality control application has defined logic to merge results at a project and individual level. Upon completion of analyses execution, Isabl automatically runs the auto-merge logic:

![A short message is displayed at the end of the run indicating merge analyses are being run.](/files/-Lh0jhPR5KNpMsyI-hqJ)

Isabl-web can render multiple types of results, in this case we will check at HTML reports. Results for our `qc-data` application are available at an *experiment*, *individual*, and *project* level. In this example we are looking at the *project-level* auto-merge analysis:

![A project level Quality Control report. Can you find the Experiment and Individual-level reports? ](/files/-Lh0mp_n6jPxv9A1bucE)

{% hint style="info" %}
[Applications](/writing-applications) can define any custom logic to merge analyses.
{% endhint %}

## Multi-experiment Analyses

Up until now we've run applications that are linked to one experiment only. However, analyses can be related to any number of *target* and *reference* experiments. For example [this implementation of *Strelka*](https://github.com/isabl-io/apps/blob/4f893b8995c110c1d685f49a04737533173907c4/isabl_apps/apps/strelka/apps.py#L15) uses *tumor-normal* pairs. Before you can run this command you will need to retrieve the system id of your experiments, let's try:

```bash
isabl get-metadata experiments -f system_id
```

Now insert those identifiers in the following command:

```bash
isabl apps-grch37 strelka-2.9.1 \
    --pairs {TUMOR 1 ID} {NORMAL ID}  `# replace tumor 1 system id and normal system id` \
    --pairs {TUMOR 2 ID} {NORMAL ID}  `# replace tumor 2 system id and normal system id` \
    --pairs {TUMOR 3 ID} {NORMAL ID}  `# replace tumor 3 system id and normal system id`
```

You can retrieve registered results for the analysis, for instance the *indels* VCF:

```bash
isabl get-results -fi name STRELKA --result-key indels
```

{% hint style="info" %}
To find out what other results are available use:

```bash
# app-primary-key can be retrieved from the frontend
isabl get-results --app-results {app-primary-key}

# when writing this tutorial, the app key for strelka was 5
isabl get-results --app-results 5
```

{% endhint %}

Furthermore, you can get paths for any instance in the database using `get-paths`:

```bash
isabl get-outdirs -fi name STRELKA
```

Lastly, lets check the indels VCFs through the web portal:

![](/files/-Lh13xb5xlwhHghGWwiT)

## Software Development Kit

To finalize the tutorial, we'll use Isabl as an SDK with `ipython`:

```bash
# we'll access ipython using the cli container
demo-cli ipython
```

Then lets check the output directories of Strelka:

```python
import isabl_cli as ii

# retrieve analyses from API using filters
analyses = ii.get_analyses(name="STRELKA")

# list the strelka ouput directories
for i in analyses:
    !ls {i.storage_url}/strelka
```

The analysis objects are [`Munch`](https://github.com/Infinidat/munch), in other words they are dot-dicts (like javascript):

```python
analysis = analyses[0]

# get the target experiment or tumor
target = analysis.targets[0]

# print the parent sample class
print(target.sample.category)

# see available fields
print(target.keys())
```

## Wrap up and Next Steps

Learn about [CLI advanced configuration](https://github.com/isabl-io/docs/tree/c6cbd729fc1d9696332a3c78ea48aa4c7f409066/guides/cli/README.md#configuration) to customize functionality:

{% content-ref url="/pages/-LgzI4B7ORXzpBOP-j8S" %}
[Isabl Settings](/isabl-settings)
{% endcontent-ref %}

Learn about writing applications:

{% content-ref url="/pages/-LgZjlyvKfTRM6B0HnMl" %}
[Writing Applications](/writing-applications)
{% endcontent-ref %}

Ready for **production**? learn more about [deployment](https://github.com/isabl-io/docs/tree/c6cbd729fc1d9696332a3c78ea48aa4c7f409066/tutorials/deployment/README.md):

{% content-ref url="/pages/-LgZjrA075oF5YZLVGsU" %}
[Production Deployment](/production-deployment)
{% endcontent-ref %}


# Registering Metadata

🏷 Create Individuals, Samples, and Experiments before importing data.

## Isabl Data Model

`Isabl` models a data generation process where *Experiments* such as Whole Genome Sequencing are performed on *Samples* collected from different *Individuals*. This normalization approach reduces data redundancy and improves data integrity.

{% tabs %}
{% tab title="Data Generation Process" %}
![](https://user-images.githubusercontent.com/8843150/62899450-c77fec80-bd25-11e9-9d92-dc2758cb49d2.png)
{% endtab %}

{% tab title="Unique Together Constraints" %}
Unique together constraints enable Isabl link new samples and experiments to existing records in the database. The following fields are enforced to be unique together across the entire system:

| Database Schema | Unique Together Fields |
| --------------- | ---------------------- |

| **Individual** | <ul><li>Center</li><li>Species</li><li>Identifier</li></ul> |
| -------------- | ----------------------------------------------------------- |

| **Samples**  | <ul><li>Individual</li><li>Sample Class</li><li>Identifier</li></ul> |
| ------------ | -------------------------------------------------------------------- |
| **Aliquots** | <ul><li>Sample</li><li>Identifier</li></ul>                          |

| **Experiments**  | <ul><li>Sample</li><li>Aliquot ID</li><li>Technique</li><li>Identifier</li></ul> |
| ---------------- | -------------------------------------------------------------------------------- |
| **Applications** | <ul><li>Name</li><li>Version</li><li>Species</li><li>Assembly</li></ul>          |
| {% endtab %}     |                                                                                  |

{% tab title="Database Diagram" %}
![](https://docs.google.com/drawings/d/e/2PACX-1vSwWHBNAC_xh7IjDKaXnh0c4PN0cg1RopPG0_s9jHS2Jg1Zg4P3o4b0qU9tJ-5dQQhH9bTht4p3etGH/pub?w=2512\&h=3263)
{% endtab %}
{% endtabs %}

The concept of cohorts, where multiple *Experiments* are grouped and analyzed together, is fundamental and well supported. Furthermore, `isabl` also tracks and executes *Assembly* aware *Bioinformatics Applications* making sure that results are a function of the reference genome. Instances of these applications are also tracked and referred as *Analyses*.

![](https://user-images.githubusercontent.com/8843150/62899485-dc5c8000-bd25-11e9-894e-664f11028d20.png)

## Metadata Registration

Isabl offer different mechanisms for metadata registration.

![](https://user-images.githubusercontent.com/8843150/62899505-eb433280-bd25-11e9-9c8d-2267d7092b36.png)

{% hint style="warning" %}
Only users with the proper permissions or *superusers* can create or modify models in the database, by using any of the methods for metadata registration.

When using the web interface, available buttons such as **Create New Submission (+)** may be hidden depending of your user role. If you're not seeing this feature, or your getting *permission denied* using the API, please contact your `isabl` administrators.
{% endhint %}

### Adding Extra Choices

If you need more choices for `species`, `gender`, sample `category`, and technique `methods`, please refer to the [extra choices documentation](/isabl-settings#extra-choices-settings).

### Sync Diseases with Onco Tree

If you are working with Cancer, you can create sync Isabl Diseases with [Onco Tree](http://oncotree.mskcc.org/#/home). Simply run:

```bash
# you can find more onco tree versions at http://oncotree.mskcc.org
python ./manage.py sync_oncotree --oncotree-version oncotree_2019_03_01
```

## Register Samples with Excel

Through the web interface, is possible to import an *Excel Submission* to register new samples.

{% hint style="warning" %}
Note that this feature is limited to create only new *Individuals*, *Samples* and *Experiments*. If you need to create *Centers*, *Diseases*, *Techniques*, *Platforms*, for your available choices you need to use the **Admin** interface at `http://<your-isabl-host>/admin/`or the [API method](/data-model#register-samples-with-restful-api-and-cli).
{% endhint %}

By clicking in **Create New Submission** button in the top right menu of the user, or by clicking in **Add Batch Samples** in the top right button of the Project view.

![](https://docs.google.com/drawings/d/e/2PACX-1vTnj1KCwWgfTPLqedUc13XX6wCNshQGDWi-VA8gmh7oXX6tDzNXQGfVzHAXGmaAJfXcskFTPrNEfW9o/pub?w=1276\&h=267)

It will open a modal where you can download the latest *Submission* form by clicking **GET FORM.** By latest, it means it will be updated with the available custom fields, and available choices added to options like: center, diseases, platforms and techniques.

{% hint style="info" %}
When prompted to allow *macros*, say yes. This will enable you to toggle between optional and required columns.
{% endhint %}

![Metadata can be registered using Excel Submission forms.](https://docs.google.com/drawings/d/e/2PACX-1vQ3WHDsObpa2x9vLV4vORr6HeK_xSbSFLgMnAFP44OPVvxE_ABIoSX1NcwQgf-hf42nimp8gPWVfb-t/pub?w=2256\&h=498)

After the submission is created it can be uploaded through the web interface and a preliminary summary from the metadata submitted will be shown. This Information about the number of models that will be created (i.e. *1 Individual, 2 Samples, 4 Experiments*) or errors in the submission form fields (i.e. *Error: individual gender FMALE is not a valid choice*) guides you in the submission process, before you can commit it.

{% hint style="success" %}
After uploading the submission file, if you don't get any validation errors and your summary looks correct, hit the **Commit Submission** button to register the submission and make definitive changes in the database.
{% endhint %}

![](/files/-Lh6AS2RG3s-qgQoxWoU)

After committing your *Submission,* your new available samples should've been created by now, and you can visualize in the *Sample Tree* the relationship between the new models you just registered.

![Sample Tree of the new registered samples.](/files/-LhgHqP6DuYKftqV67EG)

## Register Samples with RESTful API and CLI

`Isabl` comes with a comprehensive RESTful API reference, where you can learn how to use every available endpoint for each resource of the database. You can access it by browsing to `http://<your-isabl-host>/api/v1/`

![Swagger API Documentation](/files/-Lh0wz_FbRqJ9QIkJHDL)

Create endpoints are *get or create*, they try to retrieve existing objects using either the *primary key* field or *unique together constraints*, and if it doesn't find a match creates a new object. If the view supports nested objects, these will also be retrieved or created using the same criteria.

{% hint style="warning" %}
**IMPORTANT**: If an existing object is retrieved, its fields won't be updated with the posted data.
{% endhint %}

Let's say you want to create a new *Individual.* According to the API documentation, we need to provide at  species, gender, an identifier, and the center associated with the individual. Note that `center` is a *nested*  object, you may need to create a new one or query an existing one. Let's say you want to get an existing one. This is how you'd do it with `isabl_cli`:

{% code title="Using Isabl SDK" %}

```python
import isabl_cli as ii

center = ii.get_instance(
    'centers',
    'MEMORIAL SLOAN KETTERING (MSK)',
)
individual = ii.create_instance(
    'individuals',
    identifier = 'EXTERNAL_ID_1',
    species = 'HUMAN',
    gender = 'FEMALE',
    center = center,
)
```

{% endcode %}

You can also make http requests directly to the API (you can create a new token from the admin site):

{% code title="A request to the API" %}

```bash
# get token for authentication
curl  -X POST  \
    -d '{ "username": <your-username>, "password": <your-password> }'
    -H "Content-Type: application/json"  \
    http://<your-isabl-host>/api/v1/rest-auth/login/

# create individual
curl \
    -X POST \
    -H 'Authorization: Token <your-token>' \
    -H "Content-Type: application/json" \
    -d '{"identifier": "EXTERNAL_ID_1", "species": "HUMAN", "gender": "FEMALE", "center": {"acronym": "MSK", "name": "MEMORIAL SLOAN KETTERING" } }' \
    http://<your-isabl-host>/api/v1/individuals
```

{% endcode %}

## Manage User Groups and Permissions

Isabl offers an optional configuration of groups that you can adopt:

| Group         | Permissions                                                                                                                                  |
| ------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| **Managers**  | Can create, update, and delete Custom Fields, Individuals, Centers, Diseases, Experiments, Techniques, Platforms, Projects, and Submissions. |
| **Analysts**  | Can create, update, and delete Custom Fields, Applications, Analyses, and Assemblies. They can also download analyses results.               |
| **Engineers** | Engineers have the same permissions of both managers and analysts.                                                                           |

&#x20;In order to create these groups run the following command:

```bash
python manage.py create_default_groups
```

{% hint style="info" %}
These groups are **optional** and you can create your own using the Django Admin.
{% endhint %}

{% hint style="success" %}
**Pro tip:** use the `Can Download Results` permission to  configure what users can download analyses results in your Isabl instance.
{% endhint %}


# Importing Data

📦 Learn how to import raw data into Isabl using existing metadata.

Isabl-CLI enables tracking and managing of raw data, as well as reference resources that are a function of a *genome assembly* or an *experimental technique*.

## Data Import

Isabl-CLI supports automated data import by recursively exploring data deposition directories and matching raw data files with identifiers registered in the database. For example, the client can be instructed to explore the `/projects` directory (**A**), retrieving only samples from *Project 393*, and match files using *Sample Identifiers* (**B**).

![Isabl supports automatic import from data deposition directories.](https://user-images.githubusercontent.com/8843150/62899370-a1f2e300-bd25-11e9-9e50-1d88e870d19a.png)

Upon `--commit`, Isabl-CLI proceeds to move (or symlink) matched files into scalable directory structures (**C**). The experiments data path is created by hashing the four last digits of the its primary key. For instance, data for Experiment 57395 will be stored at `{storage-directory}/experiments/73/95/57395/`. This hashing approach ensures a maximum of 1000 subdirectories in any folder at a worst case scenario of 10 million experiments.

### Supported Data Formats

Isabl experiments can be linked to any kind of data. Be default Isabl will match the following data types:

```python
RAW_DATA_FORMATS = [
    ("CRAM", "CRAM"),
    ("FASTQ_R1", "FASTQ_R1"),
    ("FASTQ_R2", "FASTQ_R2"),
    ("FASTQ_I1", "FASTQ_I1"),
    ("BAM", "BAM"),
    ("PNG", "PNG"),
    ("JPEG", "JPEG"),
    ("TXT", "TXT"),
    ("TSV", "TSV"),
    ("CSV", "CSV"),
    ("PDF", "PDF"),
    ("DICOM", "DICOM"),
    ("MD5", "MD5"),
]
```

If you need to support more raw data formats,  adding the **EXTRA\_RAW\_DATA\_FORMATS** both in the api and client settings, you can extend the [valid data format choices](/isabl-settings#extra-choices-settings) in the backend, and provide a new format file validator in the [client settings](/isabl-settings#isabl-cli-settings) or a new [data importer](/isabl-settings#isabl-cli-settings).&#x20;

**ie.** to support `MAF` format:

```bash
# In the api settings
EXTRA_RAW_DATA_FORMATS = [("MAF", "MAF")]

# In the cli client settings
EXTRA_RAW_DATA_FORMATS = [("\\.maf(\\.gz)?$", "MAF")]
```

{% hint style="info" %}
**Tip:** subclassing `isabl_cli.data.LocalDataImporter` and overwriting `RAW_DATA_INSPECTORS`might be enough to support new data formats.
{% endhint %}

### Import Data from Yaml

Isabl-CLI also supports explicit importing into a single experiment by specifying absolute file paths and metadata in a yaml file via the `import-data-from-yaml` command. The metadata will be added to the `file_data` field in an experiment's `raw_data`.

The two main parameters to be specified when importing are:

* **-fi:** an argument that takes a pair of values (field, field value) to identify an experiment. For example, if you had an experiment with a `system_id` of `TEST_EXPERIMENT_T01` , the argument would look like:&#x20;

  ```python
  -fi system_id TEST_EXPERIMENT_T01
  ```
* **--files-data:** an argument that takes an absolute file path to the yaml file containing absolute file paths and metadata. For example, if you had a yaml file `/absolute/path/to/files_data.yaml` with the following contents:&#x20;

  <pre class="language-yaml" data-title="/absolute/path/to/files_data.yaml"><code class="lang-yaml">/absolute/path/to/file_1.fastq.gz: 
      metadata1: value1 
      metadata2: value2
  ​/absolute/path/to/file_2.fastq.gz: 
      metadata3: value3 
      metadata4: value4
  </code></pre>

  the argument would look like:

  ```python
  --files-data /absolute/path/to/files_data.yaml
  ```

Full command using examples above:

```
isabl import-data-from-yaml \
-fi system_id TEST_EXPERIMENT_T01 \
--files-data /absolute/path/to/files_data.yaml \
--commit
```

{% hint style="info" %}
**View command details by running:** `isabl import-data-from-yaml --help`
{% endhint %}

## Import Reference Data and BED files

You can link reference data to assemblies and techniques. Here are a few ways of how to go about it.

### Link Arbitrary Reference Data for Techniques and Assemblies

The need to register *arbitrary* resources for any assembly or technique (e.g. gene annotations) is also supported:

```bash
isabl import-reference-data --help

# extra resources are included in the assembly directory
assemblies/
├── GRCh37
│   ├── chr_alias
│   │   └── hg19_alias.tab
│   ├── cytoband
│   │   └── cytoBand.txt
│   ├── genes
│   │   └── refGene.txt
│   └── genome_fasta
│       └── GRCh37.fasta ...
└── GRCm38
```

### Import Assembly Reference Genome

Isabl supports the ability to track resources for assemblies and techniques. For instance, ensuring that reference FASTA files are uniformly index, named, and tracked across genome builds:

```bash
# indexes are created with `bwa index`, `samtools faidx`, `samtools dict`
isabl import-reference-genome --help

# example of isabl assemblies directories
assemblies/
├── GRCh37
│   └── genome_fasta
│       ├── GRCh37.fasta
│       ├── GRCh37.fasta.amb
│       ├── GRCh37.fasta.ann
│       ├── GRCh37.fasta.bwt
│       ├── GRCh37.fasta.dict
│       ├── GRCh37.fasta.fai
│       ├── GRCh37.fasta.pac
│       └── GRCh37.fasta.sa
└── ...
```

### Import BED Files for Sequencing Techniques

Lastly, you can register BED files for any *sequencing* technique, which will be compressed, indexed, moved to the technique data directory, and registered in the database:

```bash
# compressed with bgzip, indexed with tabix
isabl import-bedfiles --help

# example of isabl technique directories
techniques/
├── 34
│   └── bed_files
│       └── GRCh37
│           ├── dna-td-hemepact-v4-grch37.baits.bed
│           ├── dna-td-hemepact-v4-grch37.baits.bed.gz
│           ├── dna-td-hemepact-v4-grch37.baits.bed.gz.tbi
│           ├── dna-td-hemepact-v4-grch37.targets.bed
│           ├── dna-td-hemepact-v4-grch37.targets.bed.gz
│           └── dna-td-hemepact-v4-grch37.targets.bed.gz.tbi
└── ...
```

Imported assets are available for systematic processing by Isabl applications.

## Customizing Import Logic

All registration mechanisms are configurable and can be customized by providing an alternative python `sub class`:

| Setting Name                    | Default                                      |
| ------------------------------- | -------------------------------------------- |
| **DATA\_IMPORTER**              | isabl\_cli.data.LocalDataImporter            |
| **REFERENCE\_GENOME\_IMPORTER** | isabl\_cli.data.LocalReferenceGenomeImporter |
| **REFERENCE\_DATA\_IMPORTER**   | isabl\_cli.data.LocalReferenceDataImporter   |
| **BED\_IMPORTER**               | isabl\_cli.data.LocalBedImporter             |

Although only local storage is supported at the time of writing, Isabl-CLI capability can be extrapolated to cloud solutions including integration with cloud workbenches such as Arvados.


# Retrieving Data

🤓 The ultimate guide for data analysts using Isabl!

## Introduction to Filters

**Filters** enable you to subset the data of your interest. For example you can use filters to retrieve all the BAM files of a given project, or get all VCFs from a given variant calling application. Filters are *field-value* pairs and can be used both in the Command Line and within Python. Check out this examples:

{% code title="# A request to the API" %}

```bash
curl http://{my-isabl-instance}/api/v1/experiments?sample__identifier={the-sample-id}
```

{% endcode %}

{% code title="# Using Isabl CLI" %}

```bash
isabl get-outdirs -fi application.name BWA_MEM -fi status SUCCEEDED
```

{% endcode %}

{% code title="# Using Isabl SDK" %}

```python
import isabl_cli as ii
samples = ii.get_instances('samples', individual__species="HUMAN")
```

{% endcode %}

{% hint style="info" %}
Note that fields can *traverse* the relational model. To do so concatenate the fields with `__` (e.g. `samples__disease__acronym=AML`, or a dot in the Command Line `application.name=PINDEL`)
{% endhint %}

### Filters Modifiers

As indicated in the previous *hint*, filter fields can traverse database relationships. However, all filters can be augmented using *lookups:*

{% tabs %}
{% tab title="Related Filters" %}

```python
{related_field}__{related_filter}="query"
```

Here is a quick representation of Isabl's relational model, hence related filters:

![](/files/-Lm6iXjfM-_YG9bKsQ4c)
{% endtab %}

{% tab title="Lookup Types" %}
Furthermore, all query parameters in this API support advanced lookup types:

| Lookup Type         | Description             | Example                                           |
| ------------------- | ----------------------- | ------------------------------------------------- |
| **`!`**             | Negate any query        | `name!=isabel`                                    |
| **`[i]exact`**      | Exact match             | `name__exact=isabl`  `name__iexact=IsAbL`         |
| **`[i]contains`**   | Value contains query    | `name__contains=isa`  `name__icontains=iSa`       |
| **`[i]startswith`** | Value starts with query | `name__startswith=isab`  `name__istartswith=iSab` |
| **`[i]endswith`**   | Value starts with query | `name__endswith=bl`  `name__iendswith=bL`         |
| **`in`**            | Comma separated query   | `name__in=isabl,besuhof`                          |
| **`isnull`**        | Value is null           | `name__isnull=true`                               |
| **`regex`**         | Use regex pattern       | `name__regex=isabl`                               |
| **`gt`**            | Greater than            | `total__gt=1`                                     |
| **`gte`**           | Greater or equal        | `total__gte=1`                                    |
| **`lt`**            | Less than               | `total__lt=1`                                     |
| **`lte`**           | Less than or equal      | `total__lte=1`                                    |
| {% endtab %}        |                         |                                                   |

{% tab title="Datetime Lookups" %}
Moreover, *Datetime* query parameters support extra lookups:

| Lookup Type  | Description                      | Example                                                    |
| ------------ | -------------------------------- | ---------------------------------------------------------- |
| \`\`         | No lookup, `ISO` format required | `created=`                                                 |
| **`date`**   | Filter by date `YYYY-MM-DD`.     | `created__date=2016-06-04`  `created__date__gt=2016-06-04` |
| **`day`**    | Filter by day `DD`               | `created__day=04`                                          |
| **`month`**  | Filter by month `MM`             | `created__month=06`                                        |
| **`year`**   | Filter by year `YYYY`            | `created__year=2016`                                       |
| **`time`**   | Filter by time `HH-MM-SS`        | `created__time=21:00:51`                                   |
| {% endtab %} |                                  |                                                            |

{% tab title="Full Relational Model" %}
![UML Diagram of the the db schema ](/files/-M-h4Hc9ZnuDSkPKS3IG)
{% endtab %}
{% endtabs %}

{% hint style="info" %}
To get a full description of all available filters please visit Isabl's Redoc API documentation at <https://isabl.github.io/redoc/> or  [https://isabl.mskcc.org/api/v1 ](https://isabl.mskcc.org/api/v1/)(replacing `isabl.mskcc.org` with your own host). Another useful way to explore the relational model is by using [`isabl get-metadata`](/retrieve-data#dynamically-explore-metadata)`.`
{% endhint %}

### Common Filters

Here are some common and useful filters for Isabl.

#### Limit vs Count Limit

The filter `count_limit` enables you to limit the total number of instances that will be retrieved. For example to get the output directory for the first 10 successful analyses you could do:

```bash
isabl get-outdirs -fi status SUCCEEDED -fi count_limit 10
```

On the other side, `limit` will determine how many instances should be retrieved at the same time. For example, the following command would retrieve paths to *all* successful analyses in batches of 10000:

```bash
isabl get-outdirs -fi status SUCCEEDED -fi limit 100000
```

#### Has BAM File

To get for example all experiments that have a BAM file for `GRCh37` you could do:

```python
experiments = ii.experiments(has_bam_for="GRCh37")
```

### Performance Filters

The following filters can be used to (quite dramatically) improve the performance for some queries:

| Filter | Description | Usage |
| ------ | ----------- | ----- |

| **`distinct`** | If you set distinct to *false*, the each result within the query won't be guaranteed to be unique, yet the response will be faster. | `distinct=false` |
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ---------------- |

| **`paginator`** | By activating the *cursor* pagination, you would be able to traverse queries results, but you won't know the total number of results. | `paginator=cursor` |
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------- | ------------------ |

{% hint style="info" %}
`paginator=cursor` is still experimental, please [report an issue](https://github.com/isabl-io/api/issues/new/choose) if you have trouble.
{% endhint %}

## Isabl Command Line Client

Filters in the command line are usually provided using the `-fi` or `--filters` flags. Relations or lookups can be provided using double underscores or dots (e.g. `application.name` or `application__name`). Here is a list of Isabl commands available to retrieve information:

| Command             | Description                                                                                                                                                                                                                                              | Example                                                    |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
| **`get-count`**     | Get count of database instances given a particular query. For example, how many failed analyses are in the system?                                                                                                                                       | `isabl get-count analyses -fi status FAILED`               |
| **`get-metadata`**  | Retrieve instances metadata in multiple formats. To limit the number of fields you are interested in use `-f` (i.e. [`--fields`](/retrieve-data#dynamically-explore-metadata)).                                                                          | `isabl get-metadata samples -fi category TUMOR -f disease` |
| **`get-data`**      | This command will retrieved the *raw* data linked to experiments as imported in Isabl (e.g. BAM, FASTQ, CRAM). Use `--verbose` to see what experiments have ***missing*** data.                                                                          | `isabl get-data -fi projects.pk.in 102,103`                |
| **`get-bams`**      | Get the *official* bam registered for a given list of experiments. Use `--assembly`if there are BAMs available for different versions of the genome. Use [`has_bam_for`](/retrieve-data#has-bam-file) to filter those experiments with a registered BAM. | `isabl get-bams -fi has_bam_for GRCh37`                    |
| **`get-reference`** | Isabl supports the linkage of auxiliary resources to the *assembly* instances. [By default](/retrieve-data#assembly-resources)`get-reference` gives you the path to the reference FASTA, however you can retrieve other linked resources.                | `isabl get-reference GRCh37`                               |
| **`get-bed`**       | Retrieve a BED file linked to a particular sequencing technique. By default, the *targets* BED file is returned, to get the *baits* BED use `--bed-type`.                                                                                                | `isabl get-bed HEMEPACT --assembly GRCh37`                 |
| **`get-paths`**     | Retrieve the storage directory for any instance in the database. Use [`--pattern`](/retrieve-data#retrieving-application-results) to retrieve files within those directories.                                                                            | `isabl get-paths projects 102`                             |
| **`get-outdirs`**   | This command is a short cut of `isabl get-paths analyses`. Learn more about retrieving results [here](/retrieve-data#retrieving-application-results).                                                                                                    | `isabl get-outdirs -fi name PINDEL -fi status SUCCEEDED`   |
| **`get-results`**   | Retrieve analyses results produced by applications. Use [`--app-results`](/retrieve-data#retrieving-application-results)to list all available choices for a given application.                                                                           | `isabl get-results -fi application.pk 1 -r command_script` |

### Dynamically Explore Metadata

Another useful way to explore the relational model is by using `isabl get-metadata`:

```bash
isabl get-metadata experiments --fx
```

![](/files/-LhLm4A0slpYiRHCGr8w)

{% hint style="info" %}
Expand and navigate with arrow keys, press e to *expand all* and E to minimize. Learn more at [`fx` documentation](https://github.com/antonmedv/fx/blob/master/docs.md#interactive-mode). Use `--help` to learn about other ways to visualize metadata (e.g. `tsv`).
{% endhint %}

Furthermore, you can limit the amount of information you are retrieving by passing the list of fields you are interested in:

```bash
isabl get-metadata analyses -f application.name -f status
```

### Assembly Resources

By default, the command `get-reference` helps you retrieve the assembly reference genome.

```bash
isabl get-reference GRCh37    # retrieve reference genome
```

However, by means of the `--data-id` flag, the command `get-reference` also allows you to retrieve the indexes generated during import. To get a list of available files per assembly use `--resources`:

```bash
$ isabl get-reference GRCh37 --resources

    genome_fasta         Reference Genome Fasta File.
    genome_fasta_fai     Index generated by: samtools faidx
    ...
```

Then get the one you are interested in with:

```bash
isabl get-reference GRCh37 --data-id genome_fasta_fai
```

### Retrieving Application Results

You can use `get-outdirs` within the command line to systematically explore output directories. For example:

```bash
isabl get-outdirs -fi status FAILED | xargs tree -L 2
```

Further more you can retrieve files within those directories by using `--pattern`:

```bash
isabl get-outdirs -fi status FAILED --pattern 'head_job.*'
```

Additionally, you can retrieve results directly registered by the application:

```bash
for i in `isabl get-results -fi status FAILED -r command_err`; do
   echo exploring $i;
   cat $i;
done
```

To visualize what results are available for a given application run:

```bash
isabl get-results --app-results <application primary key>
```

{% hint style="info" %}
You can retrieve the application primary key from the front end.
{% endhint %}

## Isabl Software Development Kit

Importantly, `isabl-cli` can also be used as a Software Development Kit within python:

{% code title="Try from an ipython session" %}

```python
import isabl_cli as ii  # ii stands for `interactive isabl` 😎
```

{% endcode %}

{% hint style="info" %}
If you are using `ipython`, use `?` to get help on a method (e.g. `ii.get_instances?`)
{% endhint %}

### Getting Instances

To get started, we can retrieve specific instances from the database:

```python
# retrieve an experiment with a system id (primary keys also work)
experiment = ii.Experiment("DEM_10000_T01_01_TD1")

# we can also get an analysis using it's primary key (we'll limit retrieved fields)
analysis = ii.Analysis(10235, fields="status,application")

# same thing for assemblies
assembly = ii.Assembly("GRCh37")
```

{% hint style="info" %}
These instances are [`Munch`](https://github.com/Infinidat/munch), in other words they are dot-dicts (like javascript). So you can do both `analysis["status"]` or `analysis.status`.
{% endhint %}

A more general way to retrieve *any* object in the database is using `get_instance`:

```python
project = ii.get_instance("projects", 100)  # the signature is (endpoint, identifier)
```

Some examples of things you can do with these instances:

```python
# get the target experiment or tumor
target = analysis.targets[0]

# print the experiment sample class
print(experiment.sample.category)

# see available analysis fields
print(analysis.keys())

# see all available data
print(vars(assembly))
```

To get multiple instances you can do:

```python
# get all TUMOR experiments in project 102
experiments = ii.get_experiments(projects=102, sample__category="TUMOR")

# get the first 10 SUCCEEDED analyses in the same project
analyses = ii.get_experiments(projects=102, status="SUCCEEDED", count_limit=10)

# get all the projects where I'm the owner
projects = ii.get_projects(owner__startswith="besuhof")
```

Similarly to `isabl get-count` , you can determine the number of available results for a given query:

```python
ii.get_instances_count("analyses", status="FAILED")
```

### Getting all Samples from an Individual

To retrieve all samples and experiments for a given individual:

```python
# you can also use the individual system_id
individual = ii.get_tree(10000)

# them all samples are available at
samples = individual.sample_set

# and all experiments for a given sample
experiments = samples[0].experiment_set
```

You can also retrieve multiple *trees:*

```python
individuals = ii.get_trees(projects=267)
```

### Create, Delete, and Modify Instances

If you have permissions, you will be able to systematically alter instances in the database:

```python
# create a disease
ii.create_instance("diseases", name="Osteosarcoma", acrynom="OS")

# update an individual's gender
ii.patch_instance(individual.pk, gender="UNKNOWN")

# delete an analysis
ii.delete_instance("analyses", analysis.pk)
```

{% hint style="danger" %}
*With great power, comes...* yeah you know how it goes. Just be careful.
{% endhint %}

### Isabl SDK Utils

Here are other useful utilities available in `isabl-cli`:

| Method                 | Description                                                                               |
| ---------------------- | ----------------------------------------------------------------------------------------- |
| `ii.api.chunks`        | Given a list of elements, return a list of chunks of `n` elements from the original list. |
| `ii.api.api_request`   | Perform an authenticated request to Isabl API.                                            |
| `ii.api.retry_request` | Retry an HTTP request multiple times with a delay between each failure.                   |

## **Isabl Web**

Isabl Web is a great tool to retrieve information and understand the state of affairs within the system. Simply type something in the search bar to retrieve instances across multiple schemas:

![](/files/-LhMv6xFDv8QwX_Smynu)

Multiple panels will be stacked horizontally as you request more information:

![image](https://user-images.githubusercontent.com/8843150/62899748-73c1d300-bd26-11e9-9039-55567ecf5aca.png)

### **Projects Detail Panel**

The projects detail panel conveys all assets and stakeholders linked to a particular project:

![](/files/-LhMrwIYakuX9VQ0E1Gx)

*Live Tables* are directly wired to the API and will enable you to search and filter on specific columns. For the later, simply click in the column name:

![Directly searching on the sample Identifier column.](/files/-LhMw8nsnLrfR0NaGqqJ)

### **The Samples View**

The *samples tree* panel provides access to all assets generated on a given individual.

![A data generation process tree that resulted in 4 experiments (or ultimately bams), produced from two samples of the same individual.](/files/-Lh11Ux_IYr_LaK9TIJC)

By clicking on a given node in the tree, you can retrieve more metadata, filter out available analyses on that instance, and even get access to BAM files:

![Although the BAM file is an output of the bwa-mem analysis, Isabl enables registering default bams to an experiment. Thus a link is available in the sample panel.](/files/-LgzGgaIKFlkeqCTYLk-)

### Analyses Results

We can retrieve different types of results for all analyses generated by Isabl. For example accessing a project level quality control report:

![A project level Quality Control report. Can you find the Experiment and Individual-level reports? ](/files/-Lh0mp_n6jPxv9A1bucE)

Similarly we can retrieve other types of results such as a VCF:

![](/files/-Lh13xb5xlwhHghGWwiT)


# Writing Applications

⚡️Isabl Applications enable you to systematically deploy data science tools across thousands of Experiments in a metadata driven approach. Learn how to build them here.

## Introduction

Isabl *Applications* enable you to systematically deploy data science tools across thousands of *Experiments* in a metadata driven approach. The most important things to know about applications are:

* Applications are **agnostic** to the underlying tools being utilized.
* Applications can submit analyses to **multiple** compute environments (local, cluster, cloud).
* Results are stored as **analyses** for which uniqueness is a function of the experiments used.
* Once implemented, applications can be deployed across any subset of experiments in the database.

{% hint style="info" %}
Isabl applications are not Workflow Management Systems (see [what Isabl is not trying to solve](/#what-isabl-is-not)). Instead they use metadata to systematically build and deploy any type of execution commands across thousands of experiments.
{% endhint %}

### How Does an Application Look Like?

During this tutorial we will build a hello world application that show cases the functionalities and advantages of processing data with Isabl. Here is a really simple example of an Isabl application that echoes an experiment's sample identifier and it's raw data.

{% code title="hello\_world/apps.py" %}

```python
from isabl_cli import AbstractApplication
from isabl_cli import options


class HelloWorldApp(AbstractApplication):

    NAME = "HELLO WORLD"
    VERSION = "1.0.0"
    cli_options = [options.TARGETS]

    def get_command(self, analysis):
        experiment = analysis.targets[0]  
        return f"echo {experiment.sample.identifier} {experiment.raw_data} "

```

{% endcode %}

This application can now be executed system-wide using:

```bash
isabl apps hello-world --filters projects 102
```

{% hint style="info" %}
`cli_options` enabled us to run the app across multiple experiments using RESTful API filters (i.e. `--filters`). We will learn more about how to link experiments with analyses [later](/writing-applications#command-line-configuration).
{% endhint %}

### Analyses and Results

Results produced by applications are stored as analyses. The uniqueness of an analysis is determined by the experiments associated with it. Specifically, analyses can be linked to multiple *targets* and *references* experiments (e.g. tumor-normal pairs).  The possibility of linking analyses to multiple experiments allow for a wide variety of experimental designs:

* Single target analyses (e.g. quality control applications).
* Tumor-normal pairs (e.g. variant calling applications).
* One target vs. a pool of references (e.g. copy number applications).
* Multiple targets agains multiple references (e.g. all vs. all contamination testing).

Importantly, if someone tries to run the same application over the same experiments, a new analysis won't be created and but the existing one will be retrieved.

![Isabl applications are python classes with the role of constructing, validating, and deploying commands for tools (or pipelines) into compute environments across several samples, all guided by metadata retrieved from Isabl API.](/files/-Lo6gA3NA2NTbd5gbGwS)

### Getting Started with Isabl Applications

You should store your applications and custom Isabl logic in your own python package. [Cookiecutter Apps](https://github.com/isabl-io/cookiecutter-apps) will help you bootstrap your own Isabl project:

```bash
# first make sure you have cookiecutter installed
pip install cookiecutter

# now lets bootstrap your project
cookiecutter https://github.com/isabl-io/cookiecutter-apps

# finally install you project in a new virtual environment
cd <project-name> && pip install -r requirements.txt
```

An example of a project generated with  [Cookiecutter Apps](https://github.com/isabl-io/cookiecutter-apps) its available [here](https://github.com/isabl-io/example_apps). This project, and every project created with Cookiecutter Apps includes the hello world application described in this tutorial, check it out [here](https://github.com/isabl-io/example_apps/blob/master/example_apps/apps/hello_world/apps.py). Now let's learn about writing apps!

### Registering Applications

To make sure your applications are available when running `isabl --help`, make sure you add them to the client setting [**`INSTALLED_APPLICATIONS`**](/isabl-settings#isabl-cli-settings):&#x20;

```javascript
"INSTALLED_APPLICATIONS": [
    "example_apps.apps.HelloWorldApp",
] 
```

## Creating Applications

All Isabl Applications inherit from `isabl_cli.AbstractApplication`and are configured using a class based approach. Your role is to override attributes and methods to drive the behavior of your app.&#x20;

{% hint style="info" %}
Applications are represented both with a python class and a database object. The database object is created and updated automatically when the application is run.
{% endhint %}

### Versioning Applications

Applications are uniquely versioned by setting the `NAME` and `VERSION` attributes. The version of an application is not necessarily the version of the underlying tool being executed:

```python
class HelloWorldApp(AbstractApplication):

    NAME = "Hello World"
    VERSION = "1.0.0"
```

{% hint style="info" %}
A good strategy to version applications is to ask the question: *are results comparable across experiments?* An optimization (or bug fix) that doesn't change outputs **might not** require a version change.
{% endhint %}

Optionally you can also set `ASSEMBLY` and `SPECIES` to version the application as a function of a given genome assembly. This is particularly useful for NGS applications as often results are only comparable if data was analyzed against the same version of the genome:&#x20;

```python
class HelloWorldApp(AbstractApplication):

    NAME = "Hello World"
    VERSION = "1.0.0"
    ASSEMBLY = "GRCh37"
    SPECIES = "HUMAN"
```

You can add additional metadata to be attached to the database object, such as an application description and URLs (or comma separated URLs):

```python
class HelloWorldApp(AbstractApplication):
    
    application_description = "An App to show case different Isabl functionalities."
    application_url = "https://docs.isabl.io/writing-applications"
```

### Application Settings

Applications can depend on multiple configurations such as paths to executables, references files, compute requirements, etc. These settings are explicitly defined using the `application_settings` dictionary:

```python
class HelloWorldApp(AbstractApplication):

    application_import_strings = {"sym_link"}
    application_settings = {
        "echo_path": "echo",
        "default_message": "Hello World",
        "sym_link": "isabl_cli.utils.force_symlink"
    }
```

{% hint style="info" %}
If a setting is meant to be imported, include it in `application_settings_import_strings`.
{% endhint %}

Optional settings can be set to `None` whilst required but not yet defined settings can be set to the`NotImplemented` python object. Settings defined in the application python class are considered to be the *default settings*, yet they can be overridden using the database application field `settings`.&#x20;

```python
> HelloWorldApp().application.settings
{
   # settings as a function of Isabl CLI client's primary key
   1: {"echo_path": "/usr/local/bin/echo"}  
}
```

Note that `application.settings` are a function of the [client's primary key](https://TODO.org). This enables you to run the sample application in different compute architectures. You can configure `application.settings` using the Django Admin site or the application method `patch_application_settings`:

```python
HelloWorldApp().patch_application_settings(
    client_id=1,
    echo_path="/usr/bin/echo"
)
```

{% hint style="info" %}
If the value of a setting is a dictionary, the *schema* (i.e. keys) of that setting **will be validated** unless the dictionary contains `"skip_check": True.`
{% endhint %}

### Validate Application Settings

You can make sure applications are properly configured by performing settings validation. To do so, simply define `validate_settings` and raise an `AssertionError` if something is not set properly:

```python
from shutil import which

class HelloWorldApp(AbstractApplication):

    def validate_settings(self, settings):
        assert which(settings.echo_path), f"{settings.echo_path} not in PATH"
```

{% hint style="info" %}
The`application_settings` dictionary defines *default settings*, but during execution your app may have different settings for clients or environments. For example, you may have a small test reference file for testing and the real one for production. That's why you can define `NotImplemented` by default, but **validate** that it's in fact implemented on execution.
{% endhint %}

## Running Applications

Applications can be launched from both the command line and from python (we will learn more about the latter in the [operational automations guide](/operational-automations)).&#x20;

### Command Line Configuration

To support CLI capabilities you have to tell the application how to link analyses to experiments using command line options:

```python
from isabl_cli import options

class HelloWorldApp(AbstractApplication):

    cli_help = "This is the Hello World App - a way to learn Isabl applications."
    cli_options = [options.TARGETS]
```

The attribute `cli_options` is set to a list of [Click Options](https://click.palletsprojects.com/en/7.x/options/) that will be used to retrieve experiments from the API and link them to new analyses. Out of the box, Isabl supports the following CLI options to retrieve experiments:

| `isabl.options`                               | Description                                                                                                                                                                                                                                                       |
| --------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`TARGETS`**                                 | Enable `--filters (-fi)` to provide key value pair of RESTful API filters used to retrieve experiments (e.g. `-fi sample.category TUMOR`). Each experiment will be linked to a new analysis in a one-to-one basis using the *analysis.targets* field.             |
| **`PAIRS`**, **`PAIR`**, **`PARS_FROM_FILE`** | Enable `--pairs (-p), --pair (-p), --paris-from-file (-pf)` to provide pairs of target, reference experiments (e.g. `-p TUMOR-ID NORMAL-ID`). Each pair will be linked to a new analysis (*targets* list is one experiment, *references* list is one experiment). |
| **`REFERENCES`**, **`NULLABLE_REFERENCES`**   | Enable `--references-filters (-rfi)` to provide filters to retrieve reference experiments. This has to be coupled wit&#x68;**`TARGETS`**, each analysis will then be linked to one target, and to as many references.                                             |

When these options are not adequate for your experimental design, you can implement `get_experiments_from_cli_options`*.* This function takes the *evaluated* `cli_options` and must return a list of tuples: one tuple per analysis, each tuple with **2** lists: the target experiments and the reference experiments. Here is a an example of an application that creates only one analysis linked to all Whole Genome experiments in a project:

```python
from isabl_cli import api
import click

class HelloWorldApp(AbstractApplication):

    cli_options = [
        click.option("--project", help="project key", required=True),
        click.option("--method", help="technique method", default="WG"),
    ]

    def get_experiments_from_cli_options(self, **cli_options):
        project = cli_options["project"]
        method = cli_options["method"]
        filters = dict(projects=project, technique__method=method)
        targets = api.get_instances("experiments", **filters)
        return [(targets, [])]  # will result in only one analysis
```

By default, applications come with `--force` to remove and start analyses from scratch, `--restart` to run failed analyses again without trashing them, `--local` to run analyses locally, one after the other:

```python
class HelloWorldApp(AbstractApplication):

    cli_allow_force = True
    cli_allow_restart = True
    cli_allow_local = True
```

As such, the CLI configuration for our Hello World app will result in the following help message:

```bash
$ isabl apps hello-world-1.0.0 --help

Usage: isabl apps hello-world-1.0.0 [OPTIONS]

  This is the Hello World App - a way to learn Isabl applications.

Options:
  --targets-filters       API filters for target experiments [required] <TEXT TEXT>...
  --commit                Submit application analyses.  [default: False]
  --force                 Wipe unfinished analyses and start from scratch.
  --restart               Attempt restarting failed analyses from previous checkpoint.
  --help                  Show this message and exit.
```

The `--force` flag will not completely remove the analyses, but it will move them to a temporary trash directory within the [**`BASE_STORAGE_DIRECTORY`**](/isabl-settings#isabl-cli-settings). You may want to clean this location periodically using `crontab -e`:

{% code title="crontab -e" %}

```bash
# Clears trash directory.
0 0 * * *   source ~/.bash_profile &> /dev/null; rm -rf <replace with BASE_STORAGE_DIRECTORY>/.analyses_trash/* &> /dev/null;
```

{% endcode %}

{% hint style="success" %}
You can use`cli_options` to include any other argument your app may need in order to successfully build and deploy data processing tools.
{% endhint %}

### Validate Experiments Before Creating Analyses

Some of the advantages of metadata-driven applications is that we can prevent analyses that don't make sense, for example running a variant calling application on imaging data. Simply raise an `AssertionError` if something doesn't make sense, and the error message will be provided to the user:

```python
class HelloWorldApp(AbstractApplication):

    def validate_experiments(self, targets, references):
        assert len(targets) == 1, "only one target experiment per analysis"
        assert targets[0].raw_data, "target experiment has no linked raw data"
        self.validate_dna_only(targets)  # multiple validators are readily available
```

Analyses are understood to be unique if their *targets*, *references*, and *application* are the same ([as well as previously linked dependencies](/writing-applications#dependencies-on-other-applications)). If you need custom *get or create* logic, you can override the `get_or_create_analyses` method.

{% hint style="info" %}
`AbstractApplication`comes with [readily available validators](https://github.com/papaemmelab/isabl_cli/blob/bbc76fbb9e79ae72edd713c59b0059c8e28fa927/isabl_cli/app.py#L1614) that you may want to use. Here are some examples of commonly used ones:

* **`validate_dna_only`** Check technique category is`DNA.`
* **`validate_same_technique`** Validate experiments have same experimental technique.
* **`validate_same_individual`** Check experiments come from same individual.
  {% endhint %}

### Building Commands Using Metadata

Now that we know how to link analyses to experiments, lets dive into creating data processing commands. Our only objective is to use the *analysis* and *settings* objects to build a `shell` command and return it as a string (ignore `inputs` for now, we will learn more about it when specifying [application dependencies](/writing-applications#dependencies-on-other-applications)).

```python
from os.path import join
import click

class HelloWorldApp(AbstractApplication):
    
    cli_options = [options.TARGETS, click.option("--message")]
    
    def get_command(self, analysis, inputs, settings):
        echo = settings.echo_path
        target = analysis.targets[0]
        message = settings.run_args.message or settings.default_message
        output_file = join(analysis.storage_url, "output.txt")
        input_file = join(analysis.storage_url, "input.txt")
        settings.sym_link(target.raw_data[0].file_url, input_file)
        return (
            f"bash -c '{echo} Sample: {target.sample.identifier} > {output_file}' && "
            f"bash -c '{echo} Message: {message} >> {output_file}' && "
            f"bash -c '{echo} Data: >> {output_file}' && "
            f"bash -c 'cat {input_file} >> {output_file}' "
        )
```

All options passed in `cli_options` are available during `get_command` using the settings attribute `run_args`. In this simple example, we allowed the user to pass a custom `--message`.

This also means we can perform actions when [restarting](#command-line-configuration) an analysis, which can be useful for cleaning up files:

```python
from os.path import join, exists
from os import remove

class HelloWorldApp(AbstractApplication):
    cli_options = [options.TARGETS]

    def get_command(self, analysis, inputs, settings):
        if settings.restart:
            to_rm_path = join(analysis.storage_url, "run.lock")
            if exists(to_rm_path):
                remove(to_rm_path)
        
        return "echo 'Hello World!'"
```

{% hint style="info" %}
Isabl is **agnostic** to compute architecture, `get_command` does not need to worry about HPC schedulers, or cloud architecture (e.g. **`LSF`**, **`AWS`**), its only role is to return a shell command.
{% endhint %}

### Submitting Analyses to Compute Architectures

`Isabl` is agnostic of the compute infrastructure you're working on and can be configured to work with different batch systems (e.g. local, HPC, cloud). Currently, `Isabl` supports **`local`**, **`LSF`**, **`SGE`**, and **`Slurm`** submissions, how ever you can create a submitter for [other schedulers](/writing-applications#other-schedulers).

{% hint style="info" %}
**Importantly** `Isabl` is not a workflow management system or language like Toil, Bpipe, CWL, etc. Isabl however, can submit *head jobs* per analysis to a compute infrastructure.
{% endhint %}

#### Analyses Batch Submission

Isabl comes with prebuilt logic to submit thousands of analyses to **`LSF`**, **`SGE`**, and **`Slurm`**&#x75;sing Job Arrays. To do so simply set the Isabl CLI setting [**`SUBMIT_ANALYSES`**](/isabl-settings#isabl-cli-settings) as follows:

```javascript
// IBM's LSF
"SUBMIT_ANALYSES": "isabl_cli.batch_systems.lsf.submit_lsf",

// Sun Grid Engine
"SUBMIT_ANALYSES": "isabl_cli.batch_systems.sge.submit_sge",

// Slurm
"SUBMIT_ANALYSES": "isabl_cli.batch_systems.slurm.submit_slurm",
```

This submitter can check for the following configurations in [**`SUBMIT_CONFIGURATION`**](https://app.gitbook.com/@isabl/s/docs/~/edit/drafts/-Lo6bCi7iKfY_zd4B8U1/isabl-settings#isabl-cli-settings):

| Configuration Name     | Type          | Description                                                                                                               |
| ---------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------- |
| **`get_requirements`** | Import String | An import string to a function that will determine LSF requirements as a function of the experimental methods, see below. |
| **`extra_args`**       | String        | Default `qsub` , `bsub`, or `sbatch` args to be used across all submissions.                                              |
| **`throttle_by`**      | Integer       | The total number of analyses that are allowed to run at the same time (default is 50).                                    |

The method `get_requirements` must take the application and a list of targets' technique methods (which are submitted together in the same job array):

```python
def get_lsf_requirements(app, targets_methods):
    if isinstance(app, apps.HelloWorldApp):
        memory = 10 if "WG" in targets_methods else 1
        return f"-n {1} -R 'rusage[mem={memory}]'"
```

#### Other Schedulers

You can implement [**`SUBMIT_ANALYSES`**](/isabl-settings#isabl-cli-settings) functions for other schedulers, the function must take a list of tuples, each tuple being an analysis and the analysis head job script.

### Applications Run by Multiple Users

Isabl applications can be run by multiple users in the same unix group. However, if applications are run by users different than the [**`ADMIN_USER`**](/isabl-settings#isabl-cli-settings) and are not [re-runnable](/writing-applications#re-runnable-applications), then analyses will be set to `FINISHED` instead of `SUCCEEDED`. `isabl process-finished` can be run by the [**`ADMIN_USER`**](/isabl-settings#isabl-cli-settings) to copy and own the results and set the permissions to *read-only* whilst updating analyses status to `SUCCEEDED`. We recommend you add the following cron task in the [**`ADMIN_USER`**](/isabl-settings#isabl-cli-settings) profile using `crontab -e`:&#x20;

{% code title="crontab -e" %}

```bash
# Change analyses permissions and updates them to SUCCEEDED.
*/30 * * * * source ~/.bash_profile &> /dev/null; isabl process-finished &>> ~/moving.log
```

{% endcode %}

### Running Applications from Python

Apps can programmatically be triggered from python using the `run` method:

```python
HelloWorldApp().run(
    tuples=[([target_experiment], [])],
    run_args=dict(message="custom message"),
    commit=True,
)
```

{% hint style="info" %}
**Tip:** this is useful when creating [operational automations](/operational-automations)!
{% endhint %}

## Application Results

You can provide an specification your application results using the `application_results` dictionary. Each key is a result *id* and the value is a dictionary with specs of the result:

```python
class HelloWorldApp(AbstractApplication):

    application_results = {
        "input": {
            "frontend_type": "text-file",
            "description": "Symlink to experiment raw data.",
            "verbose_name": "Hello World Input",
            "external_link": "https://en.wikipedia.org/wiki/Symbolic_link",
            "pattern": "input.txt",
        },
        "output": {
            "frontend_type": "text-file",
            "description": "Sample id, hello world message, and content of raw data.",
            "verbose_name": "Hello World Result",
            "external_link": "https://hello.world/",
            "pattern": "output.txt",
        },
        "count": {
            "frontend_type": "number",
            "description": "Count of characters in output file.",
            "verbose_name": "Characters Count",
        },
    }
```

{% hint style="info" %}
By default, all applications come with 3 default settings `command_script`, `command_log`, and `command_err`. These point to the standard output, standard error, and analysis head job command, respectively.
{% endhint %}

Results can be paths to files, strings (e.g. MD5s), numbers, and any other serializable value. Here is a full list of the different specifications a result can have:

* **`frontend_type`** Defines [how the result should be displayed](/writing-applications#frontend-result-types) in the frontend.
* **`description`** Information about the result (required)
* **`verbose_name`** Name displayed for the result in the results list (required)
* **`optional`**&#x49;f `False`and result is missing, an alert will be shown online (optional)
* **`external_link`** URL to a resource that may explain about the result (optional)
* **`pattern`** A glob pattern to match recursively a filename within the analysis folder. (optional)
* **`exclude`** Any string to exclude that falls `pattern`. (optional)

{% hint style="info" %}
By default, analysis results are protected upon completion (i.e. permissions are set to read only). If you want your application to be re-runnable indefinitely, set `application_protect_results = False`.
{% endhint %}

### Frontend Result Types

{% hint style="success" %}
Specifying the result `frontend_type` is meant to define how to render it through Isabl Web. When set to `None` the result will still be available in the analysis, but it won't be shown in the frontend.&#x20;
{% endhint %}

Here is a full list of the result types that are supported for rendering in Isabl Web:

| Frontend Type              | Description                                                                                                                                 |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| **`text-file`**            | It's shown as a raw file, and its content is streamed as the user requests it.                                                              |
| **`tsv-file`**             | It can be shown as raw text or tabulated for easier inspection (i.e. VCF, TSV).                                                             |
| **`string`**, **`number`** | It's shown as a string and can't be downloaded.                                                                                             |
| **`image`**                | Previews are displayed in a gallery in the *analysis* view.                                                                                 |
| **`html`**, **`pdf`**      | Rendered as html in an `iframe`.                                                                                                            |
| **`igv_bam[:index]`**      | Can be streamed to visualized in an embedded IGV viewer. If another result called `bai` is the BAM index, you can set it to`igv_bam:bai`.   |
| **`None`**                 | For non-previewable files, that are either large, compressed or it's format is not supported. i.e. fastq, .RData objects, .pkl models, etc. |

{% hint style="info" %}
If a desired data type is not supported, please consider contributing to Isabl Web to add it. See [Contributing Guide](/contributing-guide)
{% endhint %}

### Re-runnable Applications

By default, analysis results are protected upon completion (i.e. permissions are set to read only). If you want your application to be re-runnable indefinitely, set:

```python
class HelloWorldApp(AbstractApplication):

    application_protect_results = False
```

### Databasing Analysis Results

When `application_results` is defined, you must implement `get_analysis_results`. This method must return a serializable dictionary of results and its only run after the analysis has been completed successfully. For our example it can be something like:

```python
class HelloWorldApp(AbstractApplication):

    def get_analysis_results(self, analysis):
        output = join(analysis.storage_url, "output.txt")

        with open(output) as f:
            count = sum(len(i) for i in f)

        return {
            "input": join(analysis.storage_url, "input.txt"),
            "output": output,
            "count": count
        }
```

When using the properties, `pattern` in the results' definition for the `application_results`, the `get_analysis_results`method can be simplified as isabl will match the filename using the `pattern`and `exclude`fields by walking recursively the analysis' output directory.&#x20;

For instance, this previous app needs only the `count`result to be defined, as `input` and `output`can be automatically matched, hence it can be simplified as:

```python
class HelloWorldApp(AbstractApplication):

    def get_analysis_results(self, analysis):
        results = super().get_analysis_results(self, analysis)
        
        with open(results["output"]) as f:
            results["count"] = sum(len(i) for i in f)
        
        return results
```

{% hint style="success" %}
In cases, where all `application_results` are filenames with `pattern`, the `get_analysis_results` is not even needed.&#x20;
{% endhint %}

### Project and Individual Level Auto-merge

Isabl applications can produce auto-merge analyses at a *project* and *individual* level. For example, you may want to merge variants whenever new results are available for a given project, or update quality control reports when a new sample is added to an individual. A newly versioned analysis will be created for each type of auto-merge and your role is to take a list of succeeded analysis and implement the merge logic.

```python
class HelloWorldApp(AbstractApplication):

    application_project_level_results = {
        "merged": {
            "frontend_type": "text-file",
            "description": "Merged output files.",
            "verbose_name": "Merged Output Files",
        },
        "count": {
            "frontend_type": "number",
            "description": "Count of characters for merged output.",
            "verbose_name": "Merged Outth put Characters Count",
        },
    }
    
    def merge_project_analyses(self, analysis, analyses):
        with open(join(analysis.storage_url, "merged.txt"), "w") as f:
            for i in analyses:
                with open(i.results.output) as output:
                    f.write(output.read())

    def get_project_analysis_results(self, analysis):
        merged = join(analysis.storage_url, "merged.txt")

        with open(merged) as f:
            count = sum(len(i) for i in f)

        return {"merged": merged, "count": count}

```

The first argument in `merge_project_analyses`, is the project level `analysis`, which is unique per project and application. The second argument is a list of all completed `analyses` of this application for a given project. Your role is to merge `analyses` output into the project level `analysis` directory. We need to define similar methods for the  **Individual** level auto merge. Lets say that our project-level merge logic is the same for individuals, then we can simply do:

```python
class HelloWorldApp(AbstractApplication):

    # reuse the project merge logic at the individual level
    application_individual_level_results = application_project_level_results
    merge_individual_analyses = merge_project_analyses
    get_individual_analysis_results = get_project_analysis_results

```

If at any arbitrary time you want to test the auto-merge logic, use any of these two commands:

```bash
# for project level auto merge
isabl merge-project-analyses --project <project id> --application <application id>

# for individual level automerge
isabl merge-individual-analyses --individual <individual id> --application <application id>
```

{% hint style="info" %}
Please note that merged output will always be stored in the same analysis for a given project or individual and application. Furthermore, you can validate analyses before running the merge operation by implementing `validate_project_analyses`, and `validate_individual_analyses`.
{% endhint %}

#### Submitting Merge Analysis to A Compute Architecture

Merge operations are triggered automatically when the last analysis that is meant to be merged finish running. By default, the merge operation will be conducted right after the analysis is patched to `SUCCEEDED`. However, you can define how merge analyses are submitted using Isabl CLI setting [**`SUBMIT_MERGE_ANALYSIS`**](/isabl-settings#isabl-cli-settings). For example in **`LSF`**:

```python
import subprocess

def submit_merge_analysis_to_lsf(instance, application, command):
    """Submit project merge to LSF."""
    command = ["bsub", "-n", "1", "-W", "40000", "-M", "32"] + command.split()
    subprocess.check_call(command)
    print("Submited merge analysis using: " + " ".join(command))
```

Here is an example for **`SGE`**:

```python
import subprocess

def submit_merge_analysis_to_sge(instance, application, command):
    """Submit project merge to SGE."""
    command = f"qsub -l h_vmem=[32G] << EOF\n{command}\nEOF\n"
    subprocess.check_call(command, shell=True)
    print(f"Submited project level merge with: {command}")
```

## Optional Functionality

This section lists additional optional functionality supported by Isabl applications. Particularly, dependencies on other applications, after completion analyses status, and unique analyses per individual.

### &#x20;Analyses Inputs and Dependencies on Other Applications

Application *inputs* are analysis-specific settings (*settings* are the same for all analyses, yet *inputs* are potentially different for each analysis). Inputs can be formally defined using `application_inputs`, inputs set to `NotImplemented` are considered required and must be resolved using `get_dependencies`:

```python
from isabl_cli import utils


class HelloWorldApp(AbstractApplication):
  
    application_inputs = {"previously_generated_result": NotImplemented}

    # get dictionary of inputs this function is run before get_command.
    # must return a tuple: (list of analyses dependencies primary keys, inputs dict)
    def get_dependencies(self, targets, references, settings):
        result, analysis_key = utils.get_result(  # return result value and key of analysis that generated it
            experiment=targets[0],                # get result for the first target experiment
            application_key=123,                  # primary key of previous app in database
            result_key="a_result_name",           # name of result in the app definition
        )

        return [analysis_key], {"previously_generated_result": result}
```

{% hint style="info" %}
The main objective of `application_inputs` and `get_dependencies`is to retrieve results and analyses that should be linked to the newly created analysis. Linked analyses are accessible from the analysis detail frontend view.
{% endhint %}

### Get After Completion Status

In certain cases you don't want your analyses to be marked as `SUCCEEDED` after completion, as you may want to flag them for manual review or leave them to know that you need to run an extra step on them. For these cases, you may want to set the *after-completion* status to `IN_PROGRESS`:

```python
class HelloWorldApp(AbstractApplication):
    def get_after_completion_status(self, analysis):
        return "IN_PROGRESS"
```

### Unique Analysis Per Individual

It is possible to create applications that are unique at the individual level. To do so set `unique_analysis_per_individual = True`. A good example of a unique per individual application could be a patient centric report that aggregates results across all samples. If you are interested on how analyses for these applications are created, give a look at `AbstractApplication.get_individual_level_analyses.`

{% hint style="info" %}
Individual Level Auto-Merge and Unique Analyses Per Individual are different concepts. Applications that require a unique analysis per individual **don't** support individual level auto-merge.&#x20;
{% endhint %}

### Get Notified When Analyses Fail

You can configure Isabl API to periodically check if any analysis has failed and send you email notifications. To do so, head to the admin site at `/admin/django_celery_beat/periodictask/add/` and in *Task (registered)* select `isabl_api.tasks.report_status_change_task`, then create a 1 hour interval, and provide the following Keyword arguments `{"status": "FAILED", "seconds": 3600}` (i.e. *every hour check how many analyses failed in the past hour*):

![Every hour check how many analyses failed in the past hour.](/files/-LqNEqiXFED-W83LOa60)

## Testing Applications

Our goal is to make it extremely easy to test your applications. Ideal apps can be tested locally, with fake/dummy data using *factory-created* database instances. [Isabl CLI](https://github.com/isabl-io/cli) and [Cookiecutter Apps](https://github.com/isabl-io/cookiecutter-apps) come with a range of utilities to help you test your applications.

### Useful Pytest Fixtures

If you created your project using [Cookiecutter Apps](https://github.com/isabl-io/cookiecutter-apps), the following `pytest` fixtures are available to you:&#x20;

| Name          | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`commit`**  | Enables you to run `pytest` using a `--commit` flag, this flag can [later](/writing-applications#application-test-example) be used to actually commit the application.                                                                                                                                                                                                                                                                                                                               |
| **`datadir`** | Path to the dummy data directory. Your tests directory comes with a `data` folder, which can be populated with dummy, small files - useful to run your apps.                                                                                                                                                                                                                                                                                                                                         |
| **`tmpdir`**  | This is like the regular pytest [`tmpdir`](http://doc.pytest.org/en/latest/tmpdir.html) yet it comes with some perks. First, it sets the current [**`DATA_STORAGE_DIRECTORY`**](/isabl-settings#isabl-cli-settings) to a temporary directory. Second, it comes with `tmpdir.docker`, a method to create executable scripts to docker containers and with specific `entrypoints`. For example `tmpdir.docker("ubuntu", "echo")` creates an executable script that calls `echo` using an Ubuntu image. |

Look at the [test example](https://app.gitbook.com/@isabl/s/docs/~/edit/drafts/-Lo6bCi7iKfY_zd4B8U1/writing-applications#application-test-example) to learn how use these fixtures.

### Creating Fake Data and Metadata

Isabl CLI comes with a full set of factories that facilitate the creation of fake metadata. Here is an example of how to create two experiments for the same sample:

```python
from isabl_cli import api
from isabl_cli import factories


meta_data = factories.ExperimentFactory()
meta_data["sample"]["individual"]["species"] = "HUMAN"
experiment = api.create_instance("experiments", **meta_data)
assert experiment.pk > 0, "failed to create database instance"
```

{% hint style="info" %}
We recommend limiting use of these factories to development instances of Isabl API. By default, `ISABL_API_URL` is set to <http://0.0.0.0:8000/api/v1/>.&#x20;
{% endhint %}

Being able to actually run the applications (i.e. passing `--commit`) during testing might be something valuable to you. In the case of Next Generation Sequencing, for example, you could create fake BAMs and really small reference genomes (few KBs) to test variant calling applications.

### Application Test Example

Here is a comprehensive example to test our `HelloWorldApp`, projects created [Cookiecutter Apps](https://github.com/isabl-io/cookiecutter-apps) will include this test:

```python
def test_hello_world_app(tmpdir, datadir, commit):
    # path to hello_world test data
    hello_world_datadir = join(datadir, "hello_world")
    raw_data = [
        dict(
            file_url=join(hello_world_datadir, "test.txt"),
            file_data=dict(extra="annotations"),
            file_type="FASTQ_R1",
        ),
    ]

    # overwrite default configuration for the default client
    meta_data = factories.ExperimentFactory(raw_data=raw_data)
    meta_data["sample"]["individual"]["species"] = "HUMAN"
    meta_data["storage_url"] = hello_world_datadir
    experiment = api.create_instance("experiments", identifier="a", **meta_data)

    # create files if you may test with real data at some point
    app = HelloWorldApp()
    app.application.settings.default_client = {
        "default_message": "Hello from Elephant Island.",
        "echo_path": tmpdir.docker("ubuntu", "echo")
    }

    # run application and make sure results are reported
    utils.assert_run(
        application=app,
        tuples=[([experiment], [])],
        commit=commit,
        results=["output", "count", "input"],
    )
```

{% hint style="info" %}
[The actual test](https://github.com/isabl-io/example_apps/blob/master/tests/test_hello_world.py) is much more comprehensive. It creates two experiments per individual and validates that [auto-merge](/writing-applications#project-and-individual-level-auto-merge) was actually conducted, that the application is [re-runnable](/writing-applications#re-runnable-applications), and that the [command line configuration](/writing-applications#command-line-configuration) works well.
{% endhint %}


# Operational Automations

🤖 Once you have set up your Isabl instance and created a few applications you can now automate your processes! In Isabl, this is achieved using signals.

## Operational Signals

Signals are python functions that take *one* argument: an **experiment** or an **analysis**. Signals for experiments are triggered on data import, whilst signals for analyses are triggered on status change (i.e.  analysis failure or completion).&#x20;

![](https://cdn-images-1.medium.com/max/800/1*nrOMsAUtbsmIHol-43Yunw.png)

## Registering Signals

Register signals by including the function import string in either [**`ON_STATUS_CHANGE`**](/isabl-settings#isabl-cli-settings) or [**`ON_DATA_IMPORT`**](/isabl-settings#isabl-cli-settings) Isabl CLI settings:&#x20;

```python
# experiments signals are triggered on data import
"ON_DATA_IMPORT": [
    "my_apps.signals.trigger_apps"
]

# analyses signals are triggered on status change
"ON_STATUS_CHANGE": [
    "my_apps.signals.trigger_dependencies"
]
```

### Signals on Data Import

Signals for experiments are triggered on data import and receive the experiment object as its only argument. You can use the metadata of the experiment to determine what automation should be applied.&#x20;

```python
from isabl_apps import apps


def signal_data_import(experiment):
    """Run upon data import using the cli."""
    species = experiment.sample.individual.species
    category = experiment.technique.category
    dna_aligner = {"HUMAN": apps.BwaMemGRCh37, "MOUSE": apps.BwaMemGRCm38}.get(species)
    tuples = [([experiment], [])]

    if category == "DNA" and dna_aligner:
        dna_aligner().run(tuples=tuples, commit=True)
```

Some examples are:

* Trigger assembly/species/category aware alignment
* Perform Gene quantification or Fusion calling in RNA
* Create symlinks to the raw data that are more human accessible

### Signals on Analysis Status Change

Analyses signals are triggered on status change. Each signal will receive the analysis object as its only argument. You can use the metadata of the experiment to determine what automation should be applied.&#x20;

```python
from isabl_apps import apps


def signal_apps_automation(analysis):
    """Run upon an analysis status update in the database."""
    qc_app = {
        "GRCh37": apps.QualityControlGRCh37,
        "GRCm38": apps.QualityControlGRCm38,
    }.get(analysis.application.assembly.name)

    if (
        analysis.status == "SUCCEEDED"
        and analysis.application.name in ["DISAMBIGUATE", "BWA_MEM", "STAR"]
        and qc_app
    ):
        qc_app().run(tuples=[(analysis.targets, [])], commit=True)
```

Some examples are:

* Trigger Quality Control/Coverage calculation after alignment has successfully been run
* Trigger Variant Calling after alignment
* Trigger Report Generation after analyses have been completed

## Working with Signals

Here are a few examples of how to work with signals, trigger them, and get notified if signals fail.

### Running Signals Manually

You can trigger signals manually with Isabl CLI:

```bash
# experiment signals
isabl run-signals experiments -s my_apps.signals.trigger_apps -fi projects 100 

# analyses signals
isabl run-signals analyses -s my_apps.signals.trigger_dependencies -fi projects 100 
```

### Rerunning Failed Signals

When signals fail during automation, database records are created to keep track of this event. Rerun failed signals with:

```bash
# rerun all failed signals
isabl rerun-signals

# rerun failed signals using filters
isabl rerun-signals \
    -fi import_string my_apps.signals.trigger_apps \
    -fi target_endpoint analyses \
    -fi target_id <an analysis primary key>
```

### Get Notified When Signals Fail

You can configure Isabl API to periodically check if any signal has failed and send you email notifications. To do so, head to the admin site at `/admin/django_celery_beat/periodictask/add/` and in *Task (registered)* select `isabl_api.tasks.report_failed_signals_task`, then create a 15 minutes interval, and hit save:

![](/files/-LqNA4xA-9YYmv-F0Bb9)


# Project Privacy

🔒 Configure metadata access and privacy by Project

{% hint style="success" %}
Since [`isabl_web@0.3.23`](https://www.npmjs.com/package/@papaemmelab/isabl-web/v/0.3.23) and [`isabl_api@1.1.0`](https://github.com/papaemmelab/isabl_api/releases/tag/v1.1.0) isabl supports project privacy.&#x20;
{% endhint %}

Projects now have an icon :lock: or `:share:` next to their title, that will show if a project is `private 🔒` or `public` , respectively.

<figure><img src="/files/pbV7RVq6ucuRReaVOsOw" alt="" width="389"><figcaption><p> Icon <span data-gb-custom-inline data-tag="emoji" data-code="1f512">🔒</span> showing the project is Private.</p></figcaption></figure>

And this will be *blue* and *active* if the current user can modify its sharing permissions. When clicking the icon a modal to modify sharing settings will open:

<figure><img src="/files/rZDgQgnpJ7vmHN1TIchS" alt="" width="500"><figcaption><p>Project Sharing Settings Modal, when clicking the <span data-gb-custom-inline data-tag="emoji" data-code="1f512">🔒</span>icon.</p></figcaption></figure>

This will allow to configure which users **can view** and who **can share** permissions:

* By default: every superuser and the project owner can always view and share. And change the project **visibility** (Public/Private).
* The ones that **can view** can see all project metadata, experiments, analyses, submissions, etc.
* The ones that **can share** can modify the project settings, by adding new users to view and/or share.

Similarly, a **Project's Sharing Settings** can be optionally defined when creating a new project. It will be `public` always by default (In the screenshot shows `private` to show the available options in the form when private is selected).

<figure><img src="/files/kIuVahO2fwTY43oWI1sr" alt="" width="375"><figcaption></figcaption></figure>

**What metadata can be private?**

{% hint style="success" %}
In practical terms, the metadata of any model (`Individual`, `Sample`, `Aliquot`, `Submission, Analysis)` linked to an `Experiment` that belongs exclusively to private `Projects`, it will look like if **it doesn't exist in the database** for a user that doesn't have access to any of those private projects.
{% endhint %}

**Example:**&#x20;

There are 5 samples, where some belong to `Project 1` (🔒Private) and `Project 2` (📖 Public)

<figure><img src="/files/aYGHUZfcodMynVOgHDCV" alt="" width="563"><figcaption></figcaption></figure>

* An user **with access** to the private `Project 1` will see all the `Experiments` with the following **Individual Tree**:

<figure><img src="/files/SRCbTkbbwBIEP67OVptK" alt="" width="563"><figcaption></figcaption></figure>

* An user **without access** to the private `Project 1` will see only see the `Experiments` that belong to the public `Project 2` , with the following **Individual Tree**:

<figure><img src="/files/iHD0ykISY0ohaPfCKB7p" alt="" width="563"><figcaption></figcaption></figure>


# Isabl Settings

Each component of Isabl can be configured in multiple ways. Generally, settings are strings, numbers, objects, and many other types.&#x20;

{% hint style="info" %}
**`Import Strings`** point to an *importable* object, whether its a class, a function, etc. (e.g.`isabl_cli.data.DataImporter`)
{% endhint %}

## Database Managed Settings

Some Isabl API, Isabl Web, and Isabl CLI and be configured from the admin site! To do so, go to `/admin/isabl_api/client/` and update the clients called `default-backend-settings` for Isabl API, and `default-frontend-settings` for Isabl Web. You may want to create your own client for Isabl CLI (see [Writing Applications](/writing-applications)).

## Isabl API Settings

Here is a detailed list of available configurations for Isabl API. To configure Isabl API, add a dictionary called `ISABL_SETTINGS` to your Django configuration.&#x20;

| Setting Name                       | Type            | Description                                                                                                                                                                                                                    |
| ---------------------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **EXTRA\_GENDERS**                 | *List*          | `Individual genders` as two-value tuples (value, verbose name).                                                                                                                                                                |
| **EXTRA\_STATUS**                  | *List*          | `Analysis status` as two-value tuples (value, verbose name).                                                                                                                                                                   |
| **EXTRA\_RAW\_DATA\_FORMATS**      | *List*          | Experiment `raw_data` formats as two-value tuples (value, verbose name). Learn how to support extra formats  [here](/import-data#supported-data-formats).                                                                      |
| **EXTRA\_SPECIES**                 | *List*          | Individual `species` as three-value tuples (value, verbose name, system\_id modifier). For example: `[("YEAST", "YEAST", "Y")]`                                                                                                |
| **EXTRA\_SAMPLE\_CATEGORIES**      | *List*          | Sample `categories` as three-value tuples (value, verbose name, system\_id modifier). For example`[("UNKNOWN", "UNKNOWN", "U")]`                                                                                               |
| **EXTRA\_TECHNIQUE\_METHODS**      | *List*          | Technique `methods`as three-value tuples (value, verbose name, system\_id modifier). For example:`[("SINGLE-CELL RNA", "SINGLE-CELL RNA", "scRNA")]`                                                                           |
| **API\_V1\_EXTRA\_PATTERNS**       | *Import String* | Import path to a list of [`urlpatterns`](https://docs.djangoproject.com/en/2.2/ref/urls/)to append to API v1.                                                                                                                  |
| **SYSTEM\_ID\_PREFIX**             | *String*        | A prefix for the default sample ID generated by Isabl (with the hope to differentiate objects across Isabl instances). For example, if set to `DEMO`, then experiments might be named `DEMO_H000002_T01_01_TD01`.              |
| **SYSTEM\_ID\_GENERATOR**          | *Import String* | Import path to a function that takes either an `Individual`, `Sample`, `Aliquot`, or `Experiment` and returns a system identifier. This will overwrite the default generation of system ids (e.g. `DEMO_H000002_T01_01_TD01`). |
| **FRONTEND\_URL**                  | *String*        | URL where the frontend is being deployed (not required if frontend running in same host as backend).                                                                                                                           |
| **JIRA\_SETTINGS**                 | *Dictionary*    | Feature to integrate JIRA with Isabl projects (It can assign a different JIRA epic to each Isabl Group). [See more](/isabl-settings#integrating-with-jira).                                                                    |
| **REQUIRE\_ACCOUNT\_APPROVAL**     | *Boolean*       | Disable new accounts and send out email to admins to require approval before new users can login.                                                                                                                              |
| **S3\_BASE\_STORAGE\_DIRECTORIES** | *Dictionary*    | Support loading analyses results from a AWS s3 based "BASE\_STORAGE\_DIRECTORY".  Learn more [here](/isabl-settings#loading-results-from-a-aws-s3-bucket).                                                                     |

Learn more about production deployment:

{% content-ref url="/pages/-LgZjrA075oF5YZLVGsU" %}
[Production Deployment](/production-deployment)
{% endcontent-ref %}

### Loading results from a AWS S3 bucket

If your Isabl CLI [**`BASE_STORAGE_DIRECTORY`**](/isabl-settings#isabl-cli-settings) is located in a S3 bucket, you can enable rendering results through the website using the **`S3_BASE_STORAGE_DIRECTORIES`** settin&#x67;**:**

```python
"S3_BASE_STORAGE_DIRECTORIES": {
    "/base/path/in/output-file": {
        "bucket": "an-S3-bucket-name",
        "access_key": "The AWS Access Key",
        "access_key": "The AWS Access Key",
        "base_directory": "/base/path/in/s3/object",
        "prefer_original": false
    }
}
```

For example if an analysis output path is `/mnt/data/analyses/01/01/1/head_job.log` and and the s3 path to it is `s3://my-bucket/data/analyses/01/01/1/head_job.log` you should use this setting:

```python
"S3_BASE_STORAGE_DIRECTORIES": {
    "/mnt/data/": {
        "bucket": "my-bucket",
        "base_directory": "/data/",
        ...
    }
}
```

if `prefer_original` is set to `True` (`False` by default), it will try to retrieve the original path first if exists (i.e. `/mnt/data/analyses/01/01/1/head_job.log`) , if not it will try to resolve for the file in S3 (i.e. `s3://my-bucket/data/analyses/01/01/1/head_job.log`)

### Integrating with Jira

Isabl provides a simple integration to [Jira](https://www.atlassian.com/software/jira/guides/use-cases/what-is-jira-used-for), as it's a widely used management tool used for development and analyses. The module creates a *Jira Epic* for each Isabl project, and allows to create a simple list of tasks (*Jira issues*) from the web.&#x20;

![Jira Integration Demo](/files/-MRgh89mUx9bNFgMO_sN)

To enable, you need to add the following settings in the web and in the api.

**In the Web Settings:**

{% code title="index.html" %}

```javascript
window.$isabl = {
    ...
    jira: true
}
```

{% endcode %}

**In the Api Settings:**

In Jira for Epic and Issues we can [create custom fields](https://confluence.atlassian.com/adminjiraserver/adding-a-custom-field-938847222.html). You need to create 2 custom fields where the Project ID and the Project Link will be stored, and use their ids in the settings.

```python
"JIRA_SETTINGS": {
    "active": True,
    "default": {
        "URL": "https://my-jira-instance",
        "AUTH": ("jira-username", "secret-jira-password"),
        "PROJECT_KEY": "ISABL",
        "URL_FIELD": "customfield_id_1",
        "PROJECT_FIELD": "customfield_id_2",
    }
},
```

{% hint style="success" %}
***Pro Tip:** Jira Epics* are created under *Jira Projects. Y*ou can create different Jira Projects for each Isabl Project Group, by adding extra dictionaries that map to isabl group acronyms.&#x20;

i.e. If a group in Isabl is called (name: Pathology, acronym: PATH), you can set up your settings like the example below. And all Isabl Projects that belong to the PATHOLOGY group, will have their own separate Jira Project under `https://my-jira-instance/projects/PATH/issues/`

```python
"JIRA_SETTINGS": {
    "active": True,
    "default": {
        "URL": "https://my-jira-instance",
        "AUTH": ("jira-username", "secret-jira-password"),
        "PROJECT_KEY": "ISABL",
        "URL_FIELD": "customfield_id_1",
        "PROJECT_FIELD": "customfield_id_2",
    },
    "PATH": {
        "URL": "https://my-jira-instance",
        "AUTH": ("jira-username", "secret-jira-password"),
        "PROJECT_KEY": "PATH",
        "URL_FIELD": "customfield_id_1",
        "PROJECT_FIELD": "customfield_id_2",
    }
},
```

{% endhint %}

## Isabl CLI Settings

Before anything, you must set the following environment variables:

```bash
# you need to let Isabl CLI where the API is being hosted
export ISABL_API_URL='your-isabl-instance.org/api/v1/'

# you can create a client object in from the django admin
export ISABL_CLIENT_ID=<insert-client-primary-key-or-slug>
```

Once this is set, you can configure your client object settings with the following configurations (see [**`isabl_cli.settings`**](https://github.com/isabl-io/cli/blob/master/isabl_cli/settings.py#L20) to check default settings):

<table data-header-hidden><thead><tr><th width="297.71060011217054">Setting Name</th><th width="194.33333333333331">Type</th><th>Description</th></tr></thead><tbody><tr><td>Setting Name</td><td>Type</td><td>Description</td></tr><tr><td><strong>API_BASE_URL</strong></td><td><em>String</em></td><td>url host where your isabl api is running. Needed to connect the CLI with the API.</td></tr><tr><td><strong>BASE_STORAGE_DIRECTORY</strong></td><td><em>String</em></td><td>Directory where data will be store in the file system.</td></tr><tr><td><strong>TIME_ZONE</strong></td><td><em>String</em></td><td>Current timezone used by <code>pytz</code> package.</td></tr><tr><td><strong>INSTALLED_APPLICATIONS</strong></td><td><em>Import Strings Array</em></td><td>Array of registered applications to be run with the cli tool. These apps can be seen and run <code>isabl apps --help</code>.</td></tr><tr><td><strong>CUSTOM_COMMANDS</strong></td><td><em>Import Strings Array</em></td><td>Array of custom commands to be added to the custom cli tool.</td></tr><tr><td><strong>SYSTEM_COMMANDS</strong></td><td><em>Import Strings Array</em></td><td>Array of system commands that are used in isabl-cli.</td></tr><tr><td><strong>ADMIN_COMMANDS</strong></td><td><em>Import Strings Array</em></td><td>Array of commands that are only executable by admin.</td></tr><tr><td><strong>ADMIN_USER</strong></td><td><em>String</em></td><td>Linux user for which admin operations will be limited. This user will own the data and analyses results. Nevertheless, Isabl Applications can be run by multiple unix users. Learn more about it <a href="/pages/-LgZjlyvKfTRM6B0HnMl#applications-run-by-multiple-users">here</a>.</td></tr><tr><td><strong>CREATE_SYMLINKS</strong></td><td><em>Boolean</em></td><td>By default, Isabl creates a <a href="/pages/-LgZWw5hvqD6PzJLeGcu">symlink farm</a> upon data import and analyses completion. You can turn it off with this setting.</td></tr><tr><td><strong>DEFAULT_LINUX_GROUP</strong></td><td><em>String</em></td><td>Linux group for data admin_user to use.</td></tr><tr><td><strong>MAKE_STORAGE_DIRECTORY</strong></td><td><em>Import String</em></td><td>Get and create path to a data directory.</td></tr><tr><td><strong>TRASH_ANALYSIS_STORAGE</strong></td><td><em>Import String</em></td><td>Move analysis <code>storage_url</code> to a trash directory.</td></tr><tr><td><strong>REFERENCE_DATA_IMPORTER</strong></td><td><em>Import String</em></td><td>Register input_bed_path in technique's storage dir and update data.</td></tr><tr><td><strong>DATA_IMPORTER</strong></td><td><em>Import String</em></td><td>Import raw data for multiple experiments.</td></tr><tr><td><strong>BED_IMPORTER</strong></td><td><em>Import String</em></td><td>Register input_bed_path in technique's storage dir and update data.</td></tr><tr><td><strong>ON_DATA_IMPORT</strong></td><td><em>Import Strings Array</em></td><td>Methods triggered when data has been imported successfully.</td></tr><tr><td><strong>ON_STATUS_CHANGE</strong></td><td><em>Import Strings Array</em></td><td>Methods triggered when an analysis changes status.</td></tr><tr><td><strong>ON_SIGNAL_FAILURE</strong></td><td><em>Import Strings Array</em></td><td>Methods triggered when an signal fails.</td></tr><tr><td><strong>SUBMIT_ANALYSES</strong></td><td><em>Import String</em></td><td>A function that will take a list tuples (analysis, path to analysis script) and submits them to a given compute architecture. </td></tr><tr><td><strong>SUBMIT_CONFIGURATION</strong></td><td>Dictionary</td><td>A schema-less dictionary to set configurations that the <strong><code>SUBMIT_ANALYSES</code></strong> function may utilize. See more Batch Systems.</td></tr><tr><td><strong>EXTRA_RAW_DATA_FORMATS</strong></td><td>Tuples Array</td><td>For every new supported format a tuple of <code>(regex validator, file_data_type)</code> i.e. To support Illumina ORA format: <code>["\\.ora?$", "ORA"]</code></td></tr></tbody></table>

{% content-ref url="/pages/-LgZjlyvKfTRM6B0HnMl" %}
[Writing Applications](/writing-applications)
{% endcontent-ref %}

#### Submit Configuration

## Isabl Web Settings

Customization of the User Interface can be achieved by defining a global `$isabl` settings dictionary in the main `index.html`.

```markup
<script>
    window.$isabl = { }
</script>
```

You can configure the `window.$isabl` with the following parameters:

<table data-header-hidden><thead><tr><th width="213.43835616438355">Setting Name</th><th width="150">Type</th><th width="150">Default</th><th>Description</th></tr></thead><tbody><tr><td>Setting Name</td><td>Type</td><td>Default</td><td>Description</td></tr><tr><td><strong><code>apiHost</code></strong></td><td><code>String</code></td><td><code>''</code></td><td>Url host where your <code>isabl api</code> is running.</td></tr><tr><td><strong><code>name</code></strong></td><td><code>String</code></td><td><code>'isabl'</code></td><td>Custom app title that is shown in the top of the app.</td></tr><tr><td><strong><code>logo</code></strong></td><td><code>String</code></td><td></td><td>Custom image path that is shown as the main logo of the app. Use a square size image for better display.</td></tr><tr><td><strong><code>jira</code></strong></td><td><code>Boolean</code></td><td><code>false</code></td><td>Activate the jira card in the projects view. If the jira endpoint is available from the api, it will show the current tickets for each project. Learn more about <a href="https://developer.atlassian.com/server/jira/platform/rest-apis/">jira integration</a>.</td></tr><tr><td><strong><code>oncoTree</code></strong></td><td><code>Boolean</code></td><td><code>false</code></td><td>If it's enabled, more information about the <a href="http://oncotree.mskcc.org/#/home">onco tree </a>disease is added in the <em>Sample details</em>.</td></tr><tr><td><strong><code>customFields</code></strong></td><td><code>Object</code></td><td><code>{}</code></td><td>Every detail info for each model shown in the frontend can be customized, by overwriting the specific key of a fields object. Available fields can be seen <a href="https://github.com/isabl-io/web/blob/master/src/utils/fields.js">here</a>. Learn more on how to create custom fields (<strong>TODO</strong>).</td></tr><tr><td><strong><code>restartButton</code></strong></td><td><code>Boolean</code></td><td><code>false</code></td><td>If enabled,<code>FAILED</code> analyses can send a signal to restart or force to the database. Learn more in<a href="/pages/-LgZjctcl_EQ4x-ZvNi9"> signals</a>.</td></tr><tr><td><strong><code>lightTheme</code></strong></td><td><code>Object</code></td><td>See <a href="https://github.com/isabl-io/web/blob/master/src/utils/settings.js#L8">defaults</a></td><td>Colors to customize the light theme.</td></tr><tr><td><strong><code>darkTheme</code></strong></td><td><code>Object</code></td><td>See <a href="https://github.com/isabl-io/web/blob/master/src/utils/settings.js#L20">defaults</a></td><td>Colors to customize the dark theme.</td></tr><tr><td><strong><code>modelIcons</code></strong></td><td><code>Object</code></td><td><code>{}</code></td><td><a href="https://material.io/resources/icons/?style=baseline">Material Icon</a> names to customize panel icons.</td></tr><tr><td><strong><code>distinctRecords</code></strong></td><td><code>Boolean</code></td><td><code>false</code></td><td>By default, api queries don't remove duplicates as it impacts speed. Set <code>true</code> to avoid duplicate records.</td></tr><tr><td><strong><code>treeIcons</code></strong></td><td><code>Object</code></td><td><code>{}</code></td><td><p>Please use these if the graph icons are broken:</p><pre><code>"treeIcons": {
    "male": "https://isabl-web.s3.us-east-1.amazonaws.com/male.svg",
    "female": "https://isabl-web.s3.us-east-1.amazonaws.com/female.svg",
    "normal": "https://isabl-web.s3.us-east-1.amazonaws.com/normal.svg",
    "tumor": "https://isabl-web.s3.us-east-1.amazonaws.com/tumor.png",
    "aliquot": "https://isabl-web.s3.us-east-1.amazonaws.com/normal.svg",
    "dna": "https://isabl-web.s3.us-east-1.amazonaws.com/dna.svg",
    "rna": "https://isabl-web.s3.us-east-1.amazonaws.com/rna.svg",
    "mouse": "https://isabl-web.s3.us-east-1.amazonaws.com/mouse.svg",
    "image": "https://isabl-web.s3.us-east-1.amazonaws.com/image.svg",
    "alien": "https://isabl-web.s3.us-east-1.amazonaws.com/unknown.svg",
  }
</code></pre></td></tr></tbody></table>

### Example of a custom UI configuration

```markup
<script>
    const customFields = {
        // overwrite 2 of the fields of the analysis panel
        analysisFields: {
            // Add a new formatter for status
            status: {
                section: 'Analysis Details',
                verboseName: 'Status',
                field: 'status',
                formatter: value => {
                    if (value === 'SUCEEDED') {
                        return 'DONE!'
                    } else if (value === 'FAILED') {
                        return 'OOPS...'
                    } else {
                        return value
                    }
                }
            },
            // Make notes not editable
            notes: {
                section: 'Analysis Details',
                verboseName: 'Notes',
                field: 'notes',
                editable: false
            }
        },
        // display a new custom field for project that exist in the db.
        // Use http://<api-host>/admin/ to create custom fields.
        projectFields: {
            // custom field
            irbConstent: {
              section: 'Project Details',
              verboseName: 'IRB Consent',
              field: 'custom_fields.irb_consent',
              editable: true,
              apiPermission: 'change_project'
            }
          }
        }
    }

    window.$isabl = {
        apiHost: "http://my.isabl.api.host",
        name: "My Cool App",
        logo: "/path/to/my/awesome/logo",
        customFields,
        restartButton: true,
        jira: true,
        oncoTree: true,
        darkTheme: {
          primary: '#1dc5a7',
          background: '#1a202c',
          surface: '#3a4556',
          accent: '#4a5568'
        },
        modelIcons: {
          project: 'insert_chart',
          analysis: 'bubble_chart',
          individual: 'person',
          sample: 'gesture',
          experiment: 'flash_on',
          search: 'search',
          submission: 'assignment'
        },
    }
</script>
```

{% hint style="warning" %}
**Important:** In case you're running your frontend instance in a different host that the backend, you should add this `ENV` variable to your django project where the `isabl-api` is running:

```bash
export FRONTEND_URL="http://your-frontend-host.com"
```

{% endhint %}

{% hint style="success" %}
If you want to consume `isabl-web` and serve your own html, outside of `isabl-api`. You can consume the frontend just as:

{% code title="index.html" %}

```markup
<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
        <meta name="viewport" content="width=device-width,initial-scale=1">
        <meta charset="UTF-8">
        <link href='https://fonts.googleapis.com/css?family=Roboto:300,400,500,700|Material+Icons' rel="stylesheet" type="text/css">
        <link rel="icon" href="favicon.ico">
        <title>My Awesome App</title>
    </head>
    <body>

 <div id="isabl-web"></div>
        <script src="https://cdn.jsdelivr.net/npm/vue"></script>
        <script src="https://cdn.jsdelivr.net/npm/isabl-web"></script>
        <script>
            window.$isabl = {
                apiHost: "http://my.isabl.api.host",
                //... other settings
            }
        </script>
    </body>
</html>
```

{% endcode %}
{% endhint %}


# Production Deployment

## Isabl API in Production

[Install `isabl-api` on premise](/production-deployment#isabl-api-on-premise) as a third party application in a Django project.

## Isabl CLI in Production

In your production environment you can install [Isabl CLI](https://github.com/papaemmelab/isabl_cli)  with:

```bash
# install isabl-cli from Github
pip install git+https://github.com/papaemmelab/isabl_cli#eggs=isabl-cli

# or clone locally and install as editable
git clone https://github.com/papaemmelab/isabl_cli <your-dir>/isabl_cli
pip install -e <your-dir>/isabl_cli
```

```bash
# let the client know what API should be used
export ISABL_API_URL="https://isabl.mskcc.org/api/v1/"

# set client id, you can create a new client in the admin site
export ISABL_CLIENT_ID="<replace with client primary key>"

# isabl should be now available
isabl --help
```

For example, if `ISABL_CLIENT_ID=1` you can update the settings field at [https://my.isabl/admin/isabl\_api/client/1/change/](https://isabl.mskcc.org/admin/isabl_api/client/1/change/). An example of such configuration could be:

```javascript
{
  "ADMIN_USER": "isablbot",
  "DEFAULT_LINUX_GROUP": "isabl",
  "BASE_STORAGE_DIRECTORY": "/isabl/data",
  "SUBMIT_ANALYSES": "isabl_cli.batch_systems.submit_lsf",
  "ON_DATA_IMPORT": ["isabl_apps.signals.signal_data_import"],
  "CUSTOM_COMMANDS": ["isabl_apps.cli.one_click_genome"],
  "ON_STATUS_CHANGE": ["isabl_apps.signals.signal_apps_automation"],
  "INSTALLED_APPLICATIONS": ["isabl_apps.apps.BwaMemGRCh37"]
}
```

This is how the admin website looks like for editing Isabl CLI settings:

![Editing Isabl CLI settings from the Admin.](/files/-Lsx30RcC9wLXYUyzGdz)

### Multiuser Setup

Isabl CLI can be used by multiple users. By default, any user can import data and result files are owned by whoever triggered the application. These capabilities can be limited to an [**`ADMIN_USER`**](/isabl-settings#isabl-cli-settings)**.** In this setup, data and results are owned by the`ADMIN_USER` yet [applications can be triggered by any user](/writing-applications#applications-run-by-multiple-users).

{% hint style="info" %}
An[**`ADMIN_USER`**](/isabl-settings#isabl-cli-settings)is a shared unix account that can be accessed by one or more engineers. These engineers are responsible for the data and results of Isabl installations.&#x20;
{% endhint %}

First you need to assign the right API permissions to your users. To facilitate this Isabl comes with the following command:

```bash
# from the django project directory run
python manage.py create_default_groups

# if you are using docker compose
docker-compose -f production.yml run --rm django python manage.py create_default_groups
```

This command will create the following three Django groups:

| Group name    | Description                            | Permissions to                                                                                                                  |
| ------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| **Managers**  | Individuals who register samples.      | CustomField, Individual, Center, Disease, Experiment, Technique, Platform, Project, Submission, Analysis                        |
| **Analysts**  | individuals who run analyses.          | CustomField, Application, Analysis, Assembly                                                                                    |
| **Engineers** | A combination of Managers and Analysts | CustomField, Individual, Center, Disease, Experiment, Technique, Platform, Project, Submission, Analysis, Application, Assembly |

Then you will need to configure the `ADMIN_USER` and the `DEFAULT_LINUX_GROUP` in the Isabl CLI *client object* (you can do so by updating your client `ISABL_CLIENT_ID` from the Django admin website). For example:

```javascript
{
  "ADMIN_USER": "isablbot",
  "DEFAULT_LINUX_GROUP": "isabl",
  ...
}
```

Once you follow the [writing applications guide](/writing-applications), you will understand that Isabl Applications can be managed using a python package. If you have multiple users triggering applications, you may want to have them all pointing to the same package. This can be either using the `PYTHONPATH` environment variable or pip installing locally your apps repo:

```bash
# using an environment variable
export PYTHONPATH=/path/to/my/isabl/apps

# alternatively you can have other users pip install the repo
pip install --editable /path/to/my/isabl/apps

# you may need to update the .eggs directory permissions
chmod -R g+rwX /path/to/my/isabl/apps/.eggs
```

Learn more about [Writing Applications](/writing-applications):

{% content-ref url="/pages/-LgZjlyvKfTRM6B0HnMl" %}
[Writing Applications](/writing-applications)
{% endcontent-ref %}

Learn more about [Isabl CLI settings](/isabl-settings#isabl-cli-settings):

{% content-ref url="/pages/-LgzI4B7ORXzpBOP-j8S" %}
[Isabl Settings](/isabl-settings)
{% endcontent-ref %}

Learn more about [Retrieving Data](/retrieve-data) using `isabl-cli` to fetch data:

{% content-ref url="/pages/-LgZjZpKw4RyA4xMgfnc" %}
[Retrieving Data](/retrieve-data)
{% endcontent-ref %}

{% hint style="success" %}
**Pro tip:** use the `Can Download Results` permission to  configure what users can download analyses results in your Isabl instance.
{% endhint %}

### Initialize Data Lake

With the admin user run the following snippet in the [**`BASE_STORAGE_URL`**](/isabl-settings#isabl-cli-settings):

```bash
DIRS="00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99"
BASE="analyses experiments"

# go to your data lake base directory (see: BASE_STORAGE_DIRECTORY)
cd /path/to/my/data/lake

for i in $BASE;
do
    for j in $DIRS;
    do
        for k in $DIRS;
        do
            DIR="$i/$j/$k"
            mkdir -p $DIR
            chmod u+wrX,g+wrX $DIR
        done;
    
    chmod g-w "$i/$j/"
    done;
done
```

## Isabl API on Premise

You can bootstrap a new Django project using [Cookiecutter API](https://github.com/isabl-io/cookiecutter):

```bash
# install cookiecutter
pip install cookiecutter 

# then bootstrap the project
cookiecutter https://github.com/papaemmelab/cookiecutter-api
```

#### Cookiecutter API Features

* [Isabl](https://isabl-io.github.io/docs/#/) out of the box
* For Django 2.0 & Python 3.6
* Renders a Django project with 100% starting test coverage
* [12-Factor](http://12factor.net/) based settings via [django-environ](https://github.com/joke2k/django-environ)
* Secure by default with SSL.
* Optimized development and production settings
* Registration via [django-allauth](https://github.com/pennersr/django-allauth)
* Send emails via [Anymail](https://github.com/anymail/django-anymail) (using [Mailgun](http://www.mailgun.com/) by default, but switchable)
* Media storage using Amazon S3
* [Docker-compose](https://github.com/docker/compose) for development and production (using [Caddy](https://caddyserver.com/) with [LetsEncrypt](https://letsencrypt.org/) support)
* [Procfile](https://devcenter.heroku.com/articles/procfile) for deploying to Heroku
* Run tests with `py.test`
* Customizable PostgreSQL version
* [Celery](http://www.celeryproject.org/) with [Flower](https://github.com/mher/flower)
* **optional** - Serve static files from Amazon S3 or [Whitenoise](https://whitenoise.readthedocs.io/)
* **optional** - Integration with [MailHog](https://github.com/mailhog/MailHog) for local email testing

#### Cookiecutter API Constraints

* Only maintained 3rd party libraries are used.
* Uses PostgreSQL everywhere (9.2+)
* Environment variables for configuration (This won't work with Apache/mod\_wsgi except on AWS ELB).

{% hint style="info" %}
[Isabl Cookiecutter](https://github.com/isabl-io/cookiecutter) is a proud fork of [cookiecutter-django](https://github.com/pydanny/cookiecutter-django), please note that most of their [documentation](https://cookiecutter-django.readthedocs.io/en/latest/) remains relevant! Also see [troubleshooting](https://cookiecutter-django.readthedocs.io/en/latest/troubleshooting.html). For reference, we forked out at commit [4258ba9](https://github.com/pydanny/cookiecutter-django/commit/4258ba9e2ddc822953e326f98f1f74842fa0fed1). If you have differences in your preferred setup, please fork [Isabl Cookiecutter](https://github.com/isabl-io/cookiecutter) to create your own version. **New to Django?** [Two Scoops of Django](http://twoscoopspress.com/products/two-scoops-of-django-1-11) is the best dessert-themed Django reference in the universe!
{% endhint %}

### Understanding the Docker Compose Setup

Before you begin, check out the `production.yml` file in the root of this project. Keep note of how it provides configuration for the following services:

* `django`: your application running behind `Gunicorn`;
* `postgres`: PostgreSQL database with the application's relational data;
* `redis`: Redis instance for caching;
* `caddy`: Caddy web server with HTTPS on by default.

Provided you have opted for Celery (via setting `use_celery` to `y`) there are three more services:

* `celeryworker` running a Celery worker process;
* `celerybeat` running a Celery beat process;
* `flower` running [Flower](https://github.com/mher/flower) (for more info, check out [CeleryFlower](https://cookiecutter-django.readthedocs.io/en/latest/developing-locally-docker.html#celeryflower) instructions for local environment).

{% hint style="info" %}
Check the original `cookiecutter-django` [deployment documentation](https://cookiecutter-django.readthedocs.io/en/latest/deployment-with-docker.html) to learn about AWS deployment, Supervisor Examples, Sentry configuration, and more. If you are deploying on an **intranet**, please see the HTTPS is on by default section.
{% endhint %}

### Configuring the Stack

The majority of services above are configured through the use of environment variables. Just check out [envs](https://cookiecutter-django.readthedocs.io/en/latest/developing-locally-docker.html#envs) and you will know the drill.

You will probably also need to setup the Mail backend, for example by adding a [Mailgun](https://mailgun.com) API key and a Mailgun sender domain, otherwise, the account creation view will crash and result in a 500 error when the backend attempts to send an email to the account owner.

### HTTPS is On by Default

The Caddy web server used in the default configuration will get you a valid certificate from Lets Encrypt and update it automatically. All you need to do to enable this is to make sure that your DNS records are pointing to the server Caddy runs on. You can read more about this here at [Automatic HTTPS](https://caddyserver.com/docs/automatic-https) in the Caddy docs. Please note:

* If you are not using a subdomain of the domain name set in the project, then remember to put the your staging/production IP address in the `DJANGO_ALLOWED_HOSTS` environment variable (see [settings](https://cookiecutter-django.readthedocs.io/en/latest/settings.html#settings)) before you deploy your website. Failure to do this will mean you will not have access to your website through the HTTP protocol.
* Access to the Django admin is set up by default to require HTTPS in production or once *live*.
* **⚠️ Attention!** If you are running your application on an intranet you may want to use [tls](https://caddyserver.com/docs/tls) caddy setting. Make sure that the `DOMAIN_NAME` configuration has the `https://` schema prepended in the caddy environment file `.envs/.production/.caddy` (see this [ticket](https://github.com/mholt/caddy/issues/1673) to learn more). Then include the following configuration in `compose/production/caddy/Caddyfile` in order to use a self signed certificate:

  ```
    tls self_signed
  ```

  Alternatively, If you have a local certificate and key provided by your institution, you will need to copy the keys in the caddy `compose/production/caddy/Dockerfile` and use:

  ```
    tls /path/to/cert path/to/key
  ```

### Postgres Data Volume Modifications

**Optional** | Postgres is saving its database files to the `production_postgres_data` volume by default. Change that if you want something else and make sure to make [backups](https://cookiecutter-django.readthedocs.io/en/latest/docker-postgres-backups.html) since this is not done automatically.

### Building & Running Production Stack

You will need to build the stack first. To do that, run:

```bash
docker-compose -f production.yml build
```

Once this is ready, you can run it with:

```bash
docker-compose -f production.yml up
```

To run the stack and detach the containers, run:

```bash
docker-compose -f production.yml up -d
```

To run a migration, open up a second terminal and run:

```bash
docker-compose -f production.yml run --rm django python manage.py migrate
```

To create a superuser, run:

```bash
docker-compose -f production.yml run --rm django python manage.py createsuperuser
```

If you need a shell, run:

```bash
docker-compose -f production.yml run --rm django python manage.py shell
```

To check the logs out, run:

```bash
docker-compose -f production.yml logs
```

If you want to scale your application, run:

```bash
docker-compose -f production.yml scale django=4
docker-compose -f production.yml scale celeryworker=2
```

{% hint style="danger" %}
**Warning!** don't try to scale `postgres`, `celerybeat`, or `caddy`.
{% endhint %}

To see how your containers are doing run:

```bash
docker-compose -f production.yml ps
```

### Mounting a Remote Data Directory

Its likely that the data resides in a different server than the web application. To make results available for the web server you may want to consider `sshfs`:

```bash
sshfs \
    -o nonempty \
    -o follow_symlinks \
    -o IdentityFile=/path/to/id_rsa \
    -o allow_other \
    $USER@<remote-server>:/remote/path /remote/path
```

{% hint style="info" %}
Note that we are mounting `/remote/path` to `/remote/path` so that the paths pushed by **Isabl CLI** match those available in the web server. Also note that you may need to restart the docker compose services after mounting this directory.
{% endhint %}


# Maintenance

🧼 Some utilities and good practices to keep your isabl instance data safe

## Database backups&#x20;

The [cookiecutter-api](https://github.com/papaemmelab/cookiecutter-api) comes with some handy utilities to manage postgres backups, that you can inspect in your project code directory (`compose/production/postgres/maintenance`):

**1. To create a backup:**

```bash
docker-compose -f <environment>.yml (exec |run --rm) postgres backup
```

**2. To list/view current backups:**

```bash
docker-compose -f <environment>.yml (exec |run --rm) postgres backups
```

**3. To restore a backup:**

```bash
docker-compose -f <environment>.yml (exec |run --rm) postgres restore
```

{% hint style="warning" %}
It's a good practice to run your `backup` script regularly. For example, adding it to a cron job that runs periodically (i.e. every month, or every week)
{% endhint %}

## Backup to Amazon S3

After this [commit](https://github.com/papaemmelab/cookiecutter-api/commit/c379dab5344491bceaebaa74257352edaec83f70) in `cookiecutter-api`, you can store backups in the cloud in Amazon S3.  \
\
You need to add these 3 environment variables in your `.envs/.production/.django` file:

```bash
export DJANGO_AWS_ACCESS_KEY_ID=
export DJANGO_AWS_SECRET_ACCESS_KEY=
export DJANGO_AWS_STORAGE_BUCKET_NAME=
```

**4. Upload recursively all the database backups to your AWS S3 bucket**

```bash
docker-compose -f production.yml run --rm awscli upload
```

**5. Download a specific backup to your instance docker volume**

```bash
docker-compose -f production.yml run --rm awscli download backup_2018_03_13T09_05_07.sql.gz
```

Learn more at [cookicutter-django docs](#backup-to-amazon-s3).


# Batch Systems

🪵 How to use different known batch systems for scalable job execution.

Currently Isabl supports out-of-the-box the following **Batch Systems** to submit jobs with `isabl_cli`:

<table><thead><tr><th width="156">Batch System</th><th width="219">Import string</th><th>Resources</th></tr></thead><tbody><tr><td><strong>LSF</strong></td><td><code>isabl_cli.batch_systems.submit_lsf</code></td><td>See more <a href=" https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=started-quick-start-guide">IBM Spectrum LSF Reference</a></td></tr><tr><td><strong>Slurm</strong></td><td><code>isabl_cli.batch_systems.submit_slurm</code></td><td>See more <a href="https://slurm.schedmd.com/documentation.html">Slurm - Worload Manager</a></td></tr><tr><td><strong>SGE</strong></td><td><code>isabl_cli.batch_systems.submit_sge</code></td><td>See more <a href="http://star.mit.edu/cluster/docs/0.93.3/guides/sge.html">Sun Grid Engine - queueing system</a></td></tr></tbody></table>

By default, all submissions are run **locally** using `isabl_cli.batch_systems.submit_local`&#x20;

{% hint style="success" %}
**What about other systems?**

To support other systems or to customize your own submission steps, you may need just to create your own method and define it on your `isabl_cli`'s `SUBMIT_ANALYSES` setting.

**Example:**

```python
# my_isabl_extension/batch_systems.py
def submit_aws_batch(app, command_tuples):
    ...
```

```json
// Isabl CLI settings:
{..., "SUBMIT_ANALYSES": "my_isabl_extension.batch_systems.submit_aws_batch"}
```

For help, creating your custom submit method, follow direction from the [existing ones](https://github.com/papaemmelab/isabl_cli/tree/master/isabl_cli/batch_systems).&#x20;

Please consider contributing any new one to the **Isabl Project!** :love\_letter:
{% endhint %}

### **Submit Configuration**

`isabl_cli`'s settings have a `SUBMIT_CONFIGURATION` dictionary to provide extra arguments to the submission methods. Current parameters used are:

<table><thead><tr><th width="213">Attribute</th><th width="93">Type</th><th width="209">Description</th><th>Example value</th></tr></thead><tbody><tr><td><strong><code>extra_args</code></strong></td><td><em>String</em></td><td>Any additional extra arguments passed to the batch submission command.</td><td><code>" --time=48:00:00 "</code>. Add a maximum job array time.</td></tr><tr><td><strong><code>get_requirements</code></strong></td><td><em>Import String</em></td><td>To define custom resources needs for different applications or methods.</td><td>See the example above.</td></tr><tr><td><strong><code>throttle_by</code></strong></td><td><em>Integer</em></td><td>To control the maximum number of jobs running simultaneously in a job array submission.</td><td><code>50</code></td></tr><tr><td><strong><code>unbuffer</code></strong></td><td><em>Boolean</em></td><td>redirect stdout and stderr to same file with ascii characters that allow colored logs.</td><td><code>True</code>. If not defined, it is <code>False</code> by default. See <a href="#colored-logs-with-batch-systems">Colored Logs</a>.</td></tr></tbody></table>

#### Example Settings:

```json
// isabl cli's settings
{
    ...
    "SUBMIT_ANALYSES": "isabl_cli.batch_systems.submit_slurm",
    "SUBMIT_CONFIGURATION": {
        "extra_args": " -p GPU_PARTITION ",
        "get_requirements": "my_isabl_extension.batch_systems.get_slurm_requirements",
        "throttle_by": 50,
        "unbuffer": True,
    },
    "SUBMIT_MERGE_ANALYSIS": "my_isabl_extension.batch_systems.submit_merge_analysis_to_slurm",
    ...
}
```

```python
# my_isabl_extension/batch_systems.py

from my_isabl_extension import apps


def get_slurm_requirements(app, targets_methods):
    """Get custom requirements for any app or method."""
    
    # Defaults
    runtime = 240 # 6 hours
    cores = 1
    memory = 6
    
    # Technique.method-specific    
    method = targets_methods[0]
    if method == "WG":
        runtime = 10080 # 7 days

    # Application-specific
    if isinstance(app, apps.BwaMemGRCh37):
        cores = 32
        runtime = 1440 # 24 hours
        memory = 8

    return (
        f"--ntasks=1 "
        f"--cpus-per-task={cores} "
        f"--time={runtime} "
        f"--mem-per-cpu={memory}G "
    )


def submit_merge_analysis_to_slurm(instance, application, command):
    """Submit command to run merge in slurm."""

    command = f"sbatch --ntasks=1 --cpus-per-task=1 --mem=32G {command}"
    subprocess.check_call(command.split())
    click.secho(f"Submitted project level merge with: {command}", fg="yellow")

```


# Contributing Guide

⏱ tutorial time: 20 minutes

This tutorial will help you set up a **full Development environment** with all components of the `isabl` infrastructure.

{% hint style="info" %}
📘 Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given. `isabl` could always use more documentation, whether as part of the **README**, in doc-strings, or even on the web in blog posts articles, and such. Also, bet you've read the [Zen Of Python](https://www.python.org/dev/peps/pep-0020/#the-zen-of-python).
{% endhint %}

## Isabl API

1. Clone locally:

   ```
    git clone git@github.com:papaemmelab/isabl_api.git
    cd api
   ```
2. Build containers:

   ```
    docker-compose build
   ```
3. Run tests using [tox](http://tox.readthedocs.io/):

   ```
    docker-compose run --rm django tox
   ```

   Or run [pytest](https://docs.pytest.org/en/latest/), [pylint](https://www.pylint.org/) or [pydocstyle](http://www.pydocstyle.org/en) individually:

   ```
    docker-compose run --rm django pytest --ds example.settings --cov=isabl_api
    docker-compose run --rm django pylint isabl_api
    docker-compose run --rm django pydocstyle isabl_api
   ```

#### Create a superuser

Create a superuser with username and password set to `admin` (we will need it later):

```
docker-compose run --rm django python manage.py migrate
docker-compose run --rm django python manage.py createsuperuser --email admin@isabl.io --username admin
```

#### Start the backend

```
docker-compose up
```

Now you can login in the frontend at <http://localhost:8000> (there won't be much to see). An easy way to create objects is to run the [client](#isabl-cli) tests.

#### Coverage report

Since tests were run inside a container, we need to combine the coverages to see the html report:

```
alias opencov="mv .coverage .coverage.tmp && coverage combine && coverage html && open htmlcov/index.html"
pip install coverage && opencov
```

## Isabl CLI

1. Clone locally:

   ```
    git clone git@github.com:papaemmelab/isabl_cli.git
    cd cli
   ```
2. Install with pip, it is strongly recommended to install in a [virtual environment](https://docs.python.org/3/tutorial/venv.html):

   ```
    pip install -r requirements.txt
   ```
3. Run tests using [tox](http://tox.readthedocs.io/), make sure you created the admin user:

   ```
    tox
   ```

   Or run [pytest](https://docs.pytest.org/en/latest/), [pylint](https://www.pylint.org/) or [pydocstyle](http://www.pydocstyle.org/en) individually:

   ```
    py.test tests/ --cov=isabl_cli -s
    pylint isabl_cli
    pydocstyle isabl_cli
   ```

**Note:** if your changes depend on a particular branch of Isabl API, make sure both Isabl CLI and Isabl API branches are called the same so that the travis configuration can pick that up.

## Isabl Web

1. Clone locally:

   ```
    git clone git@github.com:papaemmelab/isabl_web.git
    cd web
   ```
2. Install yarn:

   ```
    brew install yarn --without-node
   ```
3. Install dependencies:

   ```
    yarn install
   ```
4. Start the react development server:

   ```
    yarn serve
   ```

   **Important!** export `FRONTEND_URL=localhost:8080` before running `docker-compose up` in the api repository, note that the port may vary.

## Documentation

Simply create a PR in [github](https://github.com/isabl-io/docs):

```
 git clone git@github.com:isabl/docs.git
```

## Contribute with Github

1. Create a branch for local development and get ready to make changes locally:

   ```
    git pull
    git checkout -b name-of-your-bugfix-or-feature
   ```
2. Commit your changes and push your branch to GitHub (see the emoji reference):

   ```
    git add .
    git config commit.template .gitmessage
    git commit -m ":emoji: your short and nice description"
    git push origin name-of-your-bugfix-or-feature
   ```
3. Create a test in:

   ```
    tests/
   ```
4. Submit a pull request through the GitHub website.

### Formatting projects

Python Projects are formatted with [black](https://github.com/ambv/black). Is required for `api`, `cli` and `apps`, simply run:

```
pip install black && black .
```

Project `web` is formatted following the [Vue style guide](https://vuejs.org/v2/style-guide/). For this one, simply run:

```
yarn lint
```

### Bumping version of PyPi

Following the [semantic versioning](http://semver.org/) guidelines and update the `VERSION` file before creating a PR, for instance:

```
echo v0.1.0 > isabl_api/VERSION
git add isabl_api/VERSION
git commit -m ":gem: bump to version 0.1.0"
```

### Emoji reference

We use emojis to quickly categorize commits and pull requests. These are some common type of changes we use but feel free to ignore the conventions:

| emoji | name               | type of change              |
| ----- | ------------------ | --------------------------- |
| 🚀    | rocket             | new feature                 |
| 🐛    | bug                | bug fix                     |
| 📝    | memo               | changes to documentation    |
| 🎨    | art                | formatting  no code change  |
| 🔧    | wrench             | refactoring production code |
| ✅     | white\_check\_mark | adding/editing test logic   |
| 👕    | shirt              | no production code change   |
| 💎    | gem                | bump to new version         |

{% hint style="info" %}
**Tip:** To insert an emoji in mac type `control+cmd+space`. Alternative, type the emoji's name within two semicolons (e.g. `:rocket:`).
{% endhint %}


# Other CLI commands

☄️Isabl comes with a bunch of built-in commands to run from the terminal.

## Getting Super Powers

The python package of `isabl-cli` comes with a lot of useful functions that allows you to achieve mainly 3 type of needs:

1. Create and Execute analyses ([See Running Applications](/writing-applications#running-applications))
2. Retrieve information ([See Retrieving Data](/retrieve-data#isabl-command-line-client))
3. and, Import and manage files into your workspace (See [Importing Data](/import-data#data-import))

```
$ isabl --help

Usage: isabl [OPTIONS] COMMAND [ARGS]...

  Run Isabl command line tools.

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  apps-grch37                GRCh37 applications.
  apps-grcm38                GRCm38 applications.
  get-bams                   Get storage directories, use `pattern` to
                             match...
  get-bed                    Get a BED file for a given Sequencing...
  get-count                  Get count of database instances.
  get-data                   Get file paths for experiments raw data.
  get-metadata               Retrieve metadata for multiple instances.
  get-outdirs                Get analyses outdirs, use `pattern` to match...
  get-paths                  Get storage directories, use `pattern` to
                             match...
  get-reference              Retrieve reference data from assemblies...
  get-results                Get analyses results.
  import-bedfiles            Register targets and baits BED files in a...
  import-data                Find and import experiments data from many...
  import-reference-data      Register reference data for assemblies...
  import-reference-genome    Register an assembly reference genome.
  login                      Login with isabl credentials.
  merge-individual-analyses  Merge analyses by individual.
  merge-project-analyses     Merge analyses by project.
  patch-results              Update the results field of many analyses.
  process-finished           Process and update finished analyses.
  rerun-signals              Rerun failed signals.
  run-failed-analyses        Command to run failed analyses in batch.
  run-signals                Run any arbitrary signal on analyses or...
  run-web-signals            Run signals triggered from the frontend.

```

{% hint style="warning" %}
Some of these commands are only available for the **admin** user, like:

* To manage files and data ([See Importing Data](/import-data)):`import-data, import-bedfiles, import-reference-data, import-reference-genome`
* To change permissions of finished analyses: `process-finished`
* To update the linked results of completed analyses: `patch-results`
  {% endhint %}

### Create your custom CLI Commands

You can customize your available commands, by extending the `isabl-cli`. The following are examples of cases where you want to create commands that execute more than one app at the same time, or create a method for a common metadata query:

{% code title="my\_commands/cli.py" %}

```python
import datetime
import click
from isabl_cli import options
from isabl_cli import commands
from my_apps import apps


@click.command()
@options.PAIRS
@options.PAIRS_FROM_FILE
@options.SKIP
@options.COMMIT
@options.FORCE
@options.RESTART
@options.QUIET
def triple_caller(commit, force, quiet, restart, skip, pairs, pairs_from_file):
    """
    Command to run THE 3-CALLER pipeline.
    """
    for pipe in [
        apps.CallerOneGRCh37,
        apps.CallerTwoGRCh37,
        apps.CallerThreeGRCh37,
    ]:
        pipe().run(
            tuples=pairs + pairs_from_file,
            commit=commit,
            force=force,
            restart=restart,
            verbose=not quiet,
            run_args=dict(skip=skip),
        )
        

@click.command()
@click.pass_context
def get_succeded_kids_analyses(ctx):
    """
    Command to print information from all Succeded Analyses executed on
    Pediatric Children (age <= 18) of MSK Kids.
    """
    current_year = datetime.datetime.now().year
    filters = dict(
        status="SUCCEEDED",
        targets__center__name="MSK Kids",
        targets__sample__individual__birth_year__gte=(current_year - 18),
    )
    fields = [
        "targets__sample__individual__system_id",
        "pk",
        "status",
        "application__name",
        "application__version",
    ]
    ctx.invoke(
        commands.get_metadata,
        identifiers=None,
        endpoint="analyses",
        field=[i.split("__") for i in fields],
        filters=filters,
        no_headers=False,
        json_=False,
        use_fx=False,
    )

```

{% endcode %}

Then you can add the commands to your CLI Settings ([Learn how to customize your cli](/isabl-settings#isabl-cli-settings)):

```python
CUSTOM_COMMANDS = [
    my_commands.cli.triple_caller,
    my_commands.cli.get_succeded_kids_analyses
]
```

Now your custom commands will be available:

```bash
$ isabl --help

Usage: isabl [OPTIONS] COMMAND [ARGS]...

  Run Isabl command line tools.

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  
  ...
  
  triple_caller                Command to run THE 3-CALLER pipeline.
  get-succeded-kids-analyses   Command to print metadata of all Succeded...
```


# Bonus tips

🎱 Some extra features to maximize your isabl journey!"

### 🌈 Colored Logs with Batch Systems

<div><figure><img src="/files/2LlkJ7vkncjjmmayxZF4" alt=""><figcaption><p>Default: No colored logs </p></figcaption></figure> <figure><img src="/files/nvlXrhjt2DerRyjB8Uoa" alt=""><figcaption><p>Colored unbuffered logs</p></figcaption></figure></div>

When running jobs using batch systems like LSF or Slurm, many programs detect whether their output is being sent to a terminal (`TTY`) or a file and may disable features like color formatting or progress bars if they detect non-interactive output.

One workaround, using it carefully, is to use the [`unbuffer`](https://manpages.debian.org/stretch/expect/unbuffer.1.en.html) utility from the `expect` package. `unbuffer` simulates a terminal by running the program in a pseudo-terminal (PTY), tricking the program into thinking it is running interactively in a terminal, so it continues to produce colored output.

{% hint style="success" %}
**Requirements:**&#x20;

* Have **`unbuffer`** available. See tips on [how to install on an HPC](#bonus-tip-install-unbuffer-in-an-hpc-with-miniconda).
* **`unbuffer: True`,** in your `isabl_cli`s **`SUBMIT_CONFIGURATION`**.
* **`>=0.3.28`** version of **`@papaemmelab/isabl_web`**
* [8c8ad75](https://github.com/papaemmelab/isabl_cli/commit/8c8ad75c85b6c86f291787fcfbf4a51532930154) commit of **`isabl_cli`**
  {% endhint %}

{% hint style="warning" %}
Unbuffered output will redirect both stdout/stderr to the same`head_job.log`file,  causing the file `head_job.err` output to be empty from errors of the running application.
{% endhint %}

{% hint style="danger" %}
By simulating a pseudo-PTY, you can gain desired colored outputs, but some outputs like progress bars print undesired characters that can clutter your logs. Be aware to disable those if possible (i.e. `toil ... --disableProgress),`or for specific applications add a dummy env`TERM=dumb` to tell the program there's no effective terminal attached.
{% endhint %}

#### **➕ Bonus tip**: **Install `unbuffer` in an HPC with miniconda.**&#x20;

```bash
conda create -n myenv
conda activate myenv
conda install -c conda-forge expect
```

Having `$OPT_DIR` where you install your packages and `$BIN_DIR` where you add your executable binaries, create a runnable script called `unbuffer`, where you add your miniconda env's `unbuffer` executable and add its shared libraries:

{% code title="${BIN\_DIR}/unbuffer" %}

```bash
#!/bin/bash

OPT_DIR=<your-installation-dir>
export LD_LIBRARY_PATH="${OPT_DIR}/miniconda3/envs/myenv/lib:$LD_LIBRARY_PATH"
${OPT_DIR}/miniconda3/envs/myenv/bin/unbuffer "$@"
```

{% endcode %}

Test it works:&#x20;

```bash
$ unbuffer echo "Hello World!" 
Hello World
```

### :dna:  Use IGV links to be 1-click away to visualize your mutation calls

\
The frontend allows the user to visualize alignment files (`bam`/ `cram`) through IGV, by using the API's `igv` endpoints in 2 cases:

* `/api/v1/experiments/igv/{system_id}?assembly={assembly}&locus={locus}`: To visualize an aligned file stored in the`Experiment.bam_files`&#x20;
* `/api/v1/analyses/igv/{pk}?result={result}&result_index={result_index}&locus={locus}`: to visualize an analysis result whose `frontend_type` is `igv_<result><result-index>. See` [`Frontend Result Types`](/writing-applications#frontend-result-types)&#x20;

But you can also use these endpoints widely to dynamically create your own **igv links**, when doing data analyses or creating reports!

#### Demo time

This is a cool example where you can generate clickable links in jupyter to visualize fusions events. The idea is to be 1-click away to visualize the reads of a specific sample in the locus of interest to see if a mutation is a real event or not.  Taking into account that the param variable `locus` supports multiple locus for example to visualize the 2 breakpoints of the fusion. Each locus, creates a new igv view in a tab. The locus can be defined by adding multiple ones separated by a space.

<figure><img src="/files/JCIgZtNUOB83OyYoKnNA" alt=""><figcaption><p>Create IGV links in a Jupyter Notebook</p></figcaption></figure>

Here you have an **IGV link** for each event. For example, clicking in the `GTF2I::GTF2IRD1`fusion link, will open an IGV where you can visualize the 2 breakpoints:\
\
`https://{my-instance}/api/v1/experiments/igv/{system_id}?assembly={assembly}&locus={locus1}%20{locus2}`

{% hint style="info" %}
`%20` is the safe url encoding for a `space` character, needed to append multiple locus.
{% endhint %}

<figure><img src="/files/Hw22z8fzhKOSSIzmHRtG" alt=""><figcaption><p>Visualize events pointing to the desired locus. Example showing a<strong><code>GTF2I::GTF2IRD1</code></strong>fusion in 2 locus at <strong><code>7:74125440</code></strong>and  <strong><code>7:74015371</code></strong></p></figcaption></figure>

This feature is extremely powerful and useful, especially for generating reports with tens or thousands of variants. You can click on any variant to visualize it in IGV!


