🤓 The ultimate guide for data analysts using Isabl!
Filters enable you to subset the data of your interest. For example you can use filters to retrieve all the BAM files of a given project, or get all VCFs from a given variant calling application. Filters are field-value pairs and can be used both in the Command Line and within Python. Check out this examples:
# A request to the API
# Using Isabl CLI
isabl get-outdirs -fi application.name BWA_MEM -fi status SUCCEEDED
# Using Isabl SDK
import isabl_cli as ii
samples = ii.get_instances('samples', individual__species="HUMAN")
Note that fields can traverse the relational model. To do so concatenate the fields with
samples__disease__acronym=AML, or a dot in the Command Line
As indicated in the previous hint, filter fields can traverse database relationships. However, all filters can be augmented using lookups:
Full Relational Model
Here is a quick representation of Isabl's relational model, hence related filters:
Furthermore, all query parameters in this API support advanced lookup types:
Negate any query
Value contains query
Value starts with query
Value starts with query
Comma separated query
Value is null
Use regex pattern
Greater or equal
Less than or equal
Moreover, Datetime query parameters support extra lookups:
Filter by date
Filter by day
Filter by month
Filter by year
Filter by time
UML Diagram of the the db schema
To get a full description of all available filters please visit Isabl's Redoc API documentation at https://isabl.github.io/redoc/ or https://isabl.mskcc.org/api/v1 (replacing
isabl.mskcc.orgwith your own host). Another useful way to explore the relational model is by using
Here are some common and useful filters for Isabl.
count_limitenables you to limit the total number of instances that will be retrieved. For example to get the output directory for the first 10 successful analyses you could do:
isabl get-outdirs -fi status SUCCEEDED -fi count_limit 10
On the other side,
limitwill determine how many instances should be retrieved at the same time. For example, the following command would retrieve paths to all successful analyses in batches of 10000:
isabl get-outdirs -fi status SUCCEEDED -fi limit 100000
To get for example all experiments that have a BAM file for
GRCh37you could do:
experiments = ii.experiments(has_bam_for="GRCh37")
The following filters can be used to (quite dramatically) improve the performance for some queries:
If you set distinct to false, the each result within the query won't be guaranteed to be unique, yet the response will be faster.
By activating the cursor pagination, you would be able to traverse queries results, but you won't know the total number of results.
Filters in the command line are usually provided using the
--filtersflags. Relations or lookups can be provided using double underscores or dots (e.g.
application__name). Here is a list of Isabl commands available to retrieve information:
Get count of database instances given a particular query. For example, how many failed analyses are in the system?
Retrieve instances metadata in multiple formats. To limit the number of fields you are interested in use
This command will retrieved the raw data linked to experiments as imported in Isabl (e.g. BAM, FASTQ, CRAM). Use
Get the official bam registered for a given list of experiments. Use
Isabl supports the linkage of auxiliary resources to the assembly instances. By default
Retrieve a BED file linked to a particular sequencing technique. By default, the targets BED file is returned, to get the baits BED use
Retrieve the storage directory for any instance in the database. Use
Retrieve analyses results produced by applications. Use
Another useful way to explore the relational model is by using
isabl get-metadata experiments --fx
Expand and navigate with arrow keys, press e to expand all and E to minimize. Learn more at
--helpto learn about other ways to visualize metadata (e.g.
Furthermore, you can limit the amount of information you are retrieving by passing the list of fields you are interested in:
isabl get-metadata analyses -f application.name -f status
By default, the command
get-referencehelps you retrieve the assembly reference genome.
isabl get-reference GRCh37 # retrieve reference genome
However, by means of the
--data-idflag, the command
get-referencealso allows you to retrieve the indexes generated during import. To get a list of available files per assembly use
$ isabl get-reference GRCh37 --resources
genome_fasta Reference Genome Fasta File.
genome_fasta_fai Index generated by: samtools faidx
Then get the one you are interested in with:
isabl get-reference GRCh37 --data-id genome_fasta_fai
You can use
get-outdirswithin the command line to systematically explore output directories. For example:
isabl get-outdirs -fi status FAILED | xargs tree -L 2
Further more you can retrieve files within those directories by using
isabl get-outdirs -fi status FAILED --pattern 'head_job.*'
Additionally, you can retrieve results directly registered by the application:
for i in `isabl get-results -fi status FAILED -r command_err`; do
echo exploring $i;
To visualize what results are available for a given application run:
isabl get-results --app-results <application primary key>
You can retrieve the application primary key from the front end.
isabl-clican also be used as a Software Development Kit within python:
Try from an ipython session
import isabl_cli as ii # ii stands for `interactive isabl` 😎
If you are using
?to get help on a method (e.g.
To get started, we can retrieve specific instances from the database:
# retrieve an experiment with a system id (primary keys also work)
experiment = ii.Experiment("DEM_10000_T01_01_TD1")
# we can also get an analysis using it's primary key (we'll limit retrieved fields)
analysis = ii.Analysis(10235, fields="status,application")
# same thing for assemblies
assembly = ii.Assembly("GRCh37")
These instances are
A more general way to retrieve any object in the database is using
project = ii.get_instance("projects", 100) # the signature is (endpoint, identifier)
Some examples of things you can do with these instances:
# get the target experiment or tumor
target = analysis.targets
# print the experiment sample class
# see available analysis fields
# see all available data
To get multiple instances you can do:
# get all TUMOR experiments in project 102
experiments = ii.get_experiments(projects=102, sample__category="TUMOR")
# get the first 10 SUCCEEDED analyses in the same project
analyses = ii.get_experiments(projects=102, status="SUCCEEDED", count_limit=10)
# get all the projects where I'm the owner
projects = ii.get_projects(owner__startswith="besuhof")
isabl get-count, you can determine the number of available results for a given query:
To retrieve all samples and experiments for a given individual:
# you can also use the individual system_id
individual = ii.get_tree(10000)
# them all samples are available at
samples = individual.sample_set
# and all experiments for a given sample
experiments = samples.experiment_set
You can also retrieve multiple trees:
individuals = ii.get_trees(projects=267)
If you have permissions, you will be able to systematically alter instances in the database:
# create a disease
ii.create_instance("diseases", name="Osteosarcoma", acrynom="OS")
# update an individual's gender
# delete an analysis
With great power, comes... yeah you know how it goes. Just be careful.
Here are other useful utilities available in
Given a list of elements, return a list of chunks of
Perform an authenticated request to Isabl API.
Retry an HTTP request multiple times with a delay between each failure.
Isabl Web is a great tool to retrieve information and understand the state of affairs within the system. Simply type something in the search bar to retrieve instances across multiple schemas:
Multiple panels will be stacked horizontally as you request more information:
The projects detail panel conveys all assets and stakeholders linked to a particular project:
Live Tables are directly wired to the API and will enable you to search and filter on specific columns. For the later, simply click in the column name:
Directly searching on the sample Identifier column.
The samples tree panel provides access to all assets generated on a given individual.
A data generation process tree that resulted in 4 experiments (or ultimately bams), produced from two samples of the same individual.
By clicking on a given node in the tree, you can retrieve more metadata, filter out available analyses on that instance, and even get access to BAM files:
Although the BAM file is an output of the bwa-mem analysis, Isabl enables registering default bams to an experiment. Thus a link is available in the sample panel.
We can retrieve different types of results for all analyses generated by Isabl. For example accessing a project level quality control report:
A project level Quality Control report. Can you find the Experiment and Individual-level reports?
Similarly we can retrieve other types of results such as a VCF: