Welcome to the 10 Minutes to Isabl guide! This tutorial will walk you through installation, meta data registration, data import, and automated data processing.
Also check the Retrieve Data and Register Metadata guides!
Join us on Gitter if you have questions
Submit an issue 🐛 if you are having problems with this guide
Checkout the documentation home page for an intro to Isabl.
Docker Compose for building and running the application.
Make sure your installation doesn't require sudo
to run docker
and docker-compose
. Otherwise you will have issues running this demo. Check docker run hello-world
runs without problem. If you have permissions issues, see how to run docker as-non-root user.
Let's start by clone the demo:
git clone https://github.com/isabl-io/demo.git --recurse-submodules && cd demo
Next, source a simple initiation profile:
source .demo-profile
If you are redoing the tutorial, we recommend to remove the demo directory and clone it again:
chmod -R u+w demo && rm -rf demo
Also remove the Docker volume:
docker volume rm isabl_demo_local_postgres_data
Build and run the application (this might take a few minutes):
demo-compose build
Now we can run the application in the background:
demo-compose up -d
You can type demo-compose down
to stop the application.
Visit http://localhost:8000/ and log in with username=admin password=admin
!
demo-compose
, demo-django
, and demo-cli
are simple wrappers around docker-compose
- check them out. The isabl_demo
directory was bootstrapped using cookiecutter-isabl, a proud fork of cookiecutter-django! Many topics from their guide will be relevant to your project.
Creating a project in Isabl is as simple as adding a title. You can also specify optional fields:
We need to let isabl-cli
know where can the API be reached, and what CLI settings we should use (if you are using demo-compose
, these variables are already set):
# point to your API URLexport ISABL_API_URL=http://localhost:8000/api/v1/# set environment variable for demo clientexport ISABL_CLIENT_ID="demo-cli-client"
Now we can create a client for isabl-cli
:
# create and update the client objectdemo-cli python3.6 ./assets/metadata/create_cli_client.py# check the file used to create the clientcat ./assets/metadata/create_cli_client.py
Before we create samples, let's use isabl-cli
to add choices for Center, Disease, Sequencing Technique, and Data Generating Platform:
demo-cli python3.6 ./assets/metadata/create_choices.py
New options can also be easily created using the admin site: http://localhost:8000/admin/
We will use Excel submissions to register samples through the web interface. To do so, the demo project comes with a pre-filled metadata form available at:
open assets/metadata/demo_submission.xlsm
When prompted to allow macros, say yes. This will enable you to toggle between optional and required columns. By the way, Isabl has multiple mechanisms for metadata ingestion! Learn more here.
Now let's proceed to submit this excel form. First open the directory:
open assets/metadata
And drag the demo_submission.xlsm
file into the submit samples form:
We can also review metadata at the command line:
isabl get-metadata experiments --fx
Expand and navigate with arrow keys, press e to expand all and E to minimize. Learn more at fx
documentation. Use --help
to learn about other ways to visualize metadata (e.g. tsv
).
For this particular demo, we wanted to create a sample tree that showcased the flexibility of Isabl's data model. Our demo individual has two samples, one normal and one tumor. The tumor sample is further divided into two biological replicates (or aliquots), with two experiments conducted on the second aliquot:
Although not required for this tutorial, you are welcome to checkout the RESTful API documentation at: http://localhost:8000/api/v1/ or https://isabl.io/redoc.
Given that isabl-cli
will move our test data, let's copy original assets into a staging directory:
mkdir -p assets/staging && cp -r assets/data/* assets/staging
Now let's import the genome:
isabl import-reference-genome \--assembly GRCh37 \--genome-path assets/staging/reference/reference.fasta
We can also import BED files for our demo Sequencing Technique:
isabl import-bedfiles \--technique DEMO_TECHNIQUE \--targets-path assets/staging/bed/targets.bed \--baits-path assets/staging/bed/baits.bed \--assembly GRCh37 \--species HUMAN \--description "Demo BED files"
Check that import was successful:
isabl get-bed DEMO_TECHNIQUE # retrieve BED fileisabl get-reference GRCh37 # retrieve reference genome
By means of the --data-id
flag, the command get-reference
also allows you to retrieve the indexes generated during import. To get a list of available files per assembly run:
isabl get-reference GRCh37 --resources
Learn more about importing data into Isabl here.
Next step is to import data for the samples we just created:
isabl import-data \-di ./assets/staging `# provide data location ` \-id identifier `# match files using experiment id` \-fi identifier.contains "demo" `# filter samples to be imported `
Add --commit
to complete the operation.
Retrieve imported data for the normal to see how directories are created:
isabl get-data -fi sample.identifier "demo normal"
The front end will also reflect that data has been imported.
Isabl is a language agnostic platform and can deploy any pipeline. To get started, we will use some applications from isabl-io/apps. Precisely we will run alignment, quality control, and variant calling. Applications were previously registered in client object. Once registered, they are available in the client:
isabl apps-grch37
Learn more about customizing your instance with Isabl Settings.
First we'll run alignment (pass --commit
to deploy):
isabl apps-grch37 `# apps are grouped by assembly ` \bwa-mem-0.7.17.r1188 `# run bwa-mem version 0.7.17.r1188 ` \-fi tags.contains data `# filter using tags, feel free to try others `
Note that if you try to re-run the same command, Isabl will notify you that results are already available. If for some reason the analyses fail, you can force a re-run using --force
.
Now we can retrieve bams from the command line:
isabl get-bams -fi sample.individual.identifier "demo individual"
We can also visualize aligned bams online:
Insert 2:123,028-123,995
in the locus bar, that's were our test data has reads. Learn more about default BAMs in the Writing Applications guide.
Let's get some stats for our experiments with a quality control application:
isabl apps-grch37 qc-data-0.1.0 -fi identifier.icontains demo --commit
This quality control application has defined logic to merge results at a project and individual level. Upon completion of analyses execution, Isabl automatically runs the auto-merge logic:
Isabl-web can render multiple types of results, in this case we will check at HTML reports. Results for our qc-data
application are available at an experiment, individual, and project level. In this example we are looking at the project-level auto-merge analysis:
Applications can define any custom logic to merge analyses.
Up until now we've run applications that are linked to one experiment only. However, analyses can be related to any number of target and reference experiments. For example this implementation of Strelka uses tumor-normal pairs. Before you can run this command you will need to retrieve the system id of your experiments, let's try:
isabl get-metadata experiments -f system_id
Now insert those identifiers in the following command:
isabl apps-grch37 strelka-2.9.1 \--pairs {TUMOR 1 ID} {NORMAL ID} `# replace tumor 1 system id and normal system id` \--pairs {TUMOR 2 ID} {NORMAL ID} `# replace tumor 2 system id and normal system id` \--pairs {TUMOR 3 ID} {NORMAL ID} `# replace tumor 3 system id and normal system id`
You can retrieve registered results for the analysis, for instance the indels VCF:
isabl get-results -fi name STRELKA --result-key indels
To find out what other results are available use:
# app-primary-key can be retrieved from the frontendisabl get-results --app-results {app-primary-key}# when writing this tutorial, the app key for strelka was 5isabl get-results --app-results 5
Furthermore, you can get paths for any instance in the database using get-paths
:
isabl get-outdirs -fi name STRELKA
Lastly, lets check the indels VCFs through the web portal:
To finalize the tutorial, we'll use Isabl as an SDK with ipython
:
# we'll access ipython using the cli containerdemo-cli ipython
Then lets check the output directories of Strelka:
import isabl_cli as ii# retrieve analyses from API using filtersanalyses = ii.get_analyses(name="STRELKA")# list the strelka ouput directoriesfor i in analyses:!ls {i.storage_url}/strelka
The analysis objects are Munch
, in other words they are dot-dicts (like javascript):
analysis = analyses[0]# get the target experiment or tumortarget = analysis.targets[0]# print the parent sample classprint(target.sample.category)# see available fieldsprint(target.keys())
Learn about CLI advanced configuration to customize functionality:
Learn about writing applications:
Ready for production? learn more about deployment: