Retrieving Data
🤓 The ultimate guide for data analysts using Isabl!
Introduction to Filters
Filters enable you to subset the data of your interest. For example you can use filters to retrieve all the BAM files of a given project, or get all VCFs from a given variant calling application. Filters are field-value pairs and can be used both in the Command Line and within Python. Check out this examples:
Note that fields can traverse the relational model. To do so concatenate the fields with __
(e.g. samples__disease__acronym=AML
, or a dot in the Command Line application.name=PINDEL
)
Filters Modifiers
As indicated in the previous hint, filter fields can traverse database relationships. However, all filters can be augmented using lookups:
Here is a quick representation of Isabl's relational model, hence related filters:
To get a full description of all available filters please visit Isabl's Redoc API documentation at https://isabl.github.io/redoc/ or https://isabl.mskcc.org/api/v1 (replacing isabl.mskcc.org
with your own host). Another useful way to explore the relational model is by using isabl get-metadata
.
Common Filters
Here are some common and useful filters for Isabl.
Limit vs Count Limit
The filter count_limit
enables you to limit the total number of instances that will be retrieved. For example to get the output directory for the first 10 successful analyses you could do:
On the other side, limit
will determine how many instances should be retrieved at the same time. For example, the following command would retrieve paths to all successful analyses in batches of 10000:
Has BAM File
To get for example all experiments that have a BAM file for GRCh37
you could do:
Performance Filters
The following filters can be used to (quite dramatically) improve the performance for some queries:
paginator=cursor
is still experimental, please report an issue if you have trouble.
Isabl Command Line Client
Filters in the command line are usually provided using the -fi
or --filters
flags. Relations or lookups can be provided using double underscores or dots (e.g. application.name
or application__name
). Here is a list of Isabl commands available to retrieve information:
Dynamically Explore Metadata
Another useful way to explore the relational model is by using isabl get-metadata
:
Expand and navigate with arrow keys, press e to expand all and E to minimize. Learn more at fx
documentation. Use --help
to learn about other ways to visualize metadata (e.g. tsv
).
Furthermore, you can limit the amount of information you are retrieving by passing the list of fields you are interested in:
Assembly Resources
By default, the command get-reference
helps you retrieve the assembly reference genome.
However, by means of the --data-id
flag, the command get-reference
also allows you to retrieve the indexes generated during import. To get a list of available files per assembly use --resources
:
Then get the one you are interested in with:
Retrieving Application Results
You can use get-outdirs
within the command line to systematically explore output directories. For example:
Further more you can retrieve files within those directories by using --pattern
:
Additionally, you can retrieve results directly registered by the application:
To visualize what results are available for a given application run:
You can retrieve the application primary key from the front end.
Isabl Software Development Kit
Importantly, isabl-cli
can also be used as a Software Development Kit within python:
If you are using ipython
, use ?
to get help on a method (e.g. ii.get_instances?
)
Getting Instances
To get started, we can retrieve specific instances from the database:
These instances are Munch
, in other words they are dot-dicts (like javascript). So you can do both analysis["status"]
or analysis.status
.
A more general way to retrieve any object in the database is using get_instance
:
Some examples of things you can do with these instances:
To get multiple instances you can do:
Similarly to isabl get-count
, you can determine the number of available results for a given query:
Getting all Samples from an Individual
To retrieve all samples and experiments for a given individual:
You can also retrieve multiple trees:
Create, Delete, and Modify Instances
If you have permissions, you will be able to systematically alter instances in the database:
With great power, comes... yeah you know how it goes. Just be careful.
Isabl SDK Utils
Here are other useful utilities available in isabl-cli
:
Isabl Web
Isabl Web is a great tool to retrieve information and understand the state of affairs within the system. Simply type something in the search bar to retrieve instances across multiple schemas:
Multiple panels will be stacked horizontally as you request more information:
Projects Detail Panel
The projects detail panel conveys all assets and stakeholders linked to a particular project:
Live Tables are directly wired to the API and will enable you to search and filter on specific columns. For the later, simply click in the column name:
The Samples View
The samples tree panel provides access to all assets generated on a given individual.
By clicking on a given node in the tree, you can retrieve more metadata, filter out available analyses on that instance, and even get access to BAM files:
Analyses Results
We can retrieve different types of results for all analyses generated by Isabl. For example accessing a project level quality control report:
Similarly we can retrieve other types of results such as a VCF:
Last updated