Home

Isabl is a platform for the integration, management, and processing of individual-centric multimodal data. Welcome to the Isabl Documentation!

Isabl is a plug-and-play data science framework designed to support the processing of multimodal patient-centric data. Have questions? Ask here.

Isabl has been developed by the Elli Papaemmanuil's Lab.

Quick Start

Features

👾 Backend, Data Model and RESTful API
- Metadata version control
- Fully featured and brisk RESTful API with extensive swagger documentation
- Comprehensive permissions controls and user groups
- Patient centric relational model with support for:
  - Individuals, samples, experiments and cohorts
  - Assembly aware bioinformatics applications and analyses
  - Choice models such as diseases, centers and more
  - Custom fields for all schemas!
🤖 Command Line Interface and Software Development Kit
- Digital Assets Management (Permissions, Storage, Tracking)
- Automated execution and tracking of bioinformatics applications
- Project and patient level results auto-merge
- Operational automations on data import and analyses status change
- Dynamic retrieval of data and results using versatile queries
- Fully featured SDK for post-processing analyses
🚀 Web Application
- User Interface to browse and manage the operations metadata
- Analyses tracking and results visualization
- Flexibility to edit and customize models
- Batch creation of metadata by excel file submission
- Single Page Application that provides a crispy user experience
- Possibility to integrate third-party services like JIRA
✅ Plug-n-play and reliable codebase
- Docker-compose is the only dependency for the web application and the backend
- The Command Line Interface is a portable pip installable package
- Continuously Integrated with +98 % coverage across all codebase
- isabl is upgradable, no need to fork out from codebase

Who is using Isabl

Elli Papaemmanuil's lab.
The Department of Pediatrics at Memorial Sloan Kettering.
Sohrab Shah's lab.
The Microbiome Program at Memorial Sloan Kettering.
The Single-Cell Analytics Service (SAIL) at Memorial Sloan Kettering.
Cristina Curtis' Lab at Stanford Medicine.

... And many other groups at Weill Cornell, California State University, University of Oviedo (Spain), are currently testing it as a potential fit!

Infrastructure

Isabl is a modular infrastructure with four main components: (1) an individual-centric and extensible relational database (Isabl-db); (2) a comprehensive RESTful API (Isabl-api) used to support integration with data processing environments and enterprise systems (e.g. clinical databases, visualization platforms); (3) a Command Line Client (CLI; Isabl-cli) used to manage digital assets and deploy bioinformatics applications; (4) a front end single page web application (Isabl-web) with system wide queries enabled.

RESTful API capabilities are documented with Swagger (https://swagger.io) and Redoc (https://github.com/Rebilly/ReDoc) following OpenAPI specifications (https://www.openapis.org). Importantly, Isabl's metadata infrastructure is decoupled and agnostic of compute and data storage environments (e.g. local, cluster, cloud). This functionality separates dependencies and fosters interoperability across compute environments.

Data Model

Isabl's relational model maps workflows for data provenance, processing, and governance. Metadata is captured across the following thematic categories: (1) project, individual and sample level attributes; (2) raw data properties including experimental technique, technology, and related parameters (e.g. read length); (3) analytical workflows to include a complete audit trail of versioned algorithms, related execution parameters, reference files, analyses status tracking, and results deposition; (4) data governance information for management of system and data access across stakeholders.

Why Isabl

Isabl ensures that all bioinformatics operations follow the DATA reproducibility checklist (Documentation, Automation, Traceability, and Autonomy), whilst guarantees that assets are managed according to the FAIR principles (Findable, Interoperable, Accessible, Reusable).

Here are some reasons why you may want to use Isabl:

You don't have a +10 engineers group but do have hundreds of samples
You'll rather not have your data managed by postdocs, PhD students
Crosslink samples from different cohorts
Answer new questions using existing data
Full log and audit trail of your informatics operations
Automatically merge results as new samples are added to big cohorts
You want to have programmatic access to the entire data capital
Seamlessly run reproducible pipelines across your projects

Similar projects

The Genome Modeling System Genome Institute at Washington University platform.
SeqWare analyze massive genomics datasets.
QuickNGS efficient high-throughput data analysis of Next-Generation Sequencing data.
HTS-flow a framework for the management and analysis of NGS data.

What Isabl is not

Isabl is not a Workflow Management System such as toil, bpipe, instead Isabl facilitates automated deployment and databasing of data processing pipelines.
Isabl is not a Platform as a Service (PAAS) provider such as DNA nexus, Seven Bridges or Fire Cloud, instead an information system that could potentially feed in metadata and data to these services.
Isabl differs from Server Workbenches such as Galaxy or Pegasus, instead of being configuration friendly, Isabl is designed to conduct systematic analyses automatically and in a standardized way with as little human input as possible.
Isabl is not a Workflow Language, instead the Bioinformatics Applications in isabl only define meta-data driven validation and logic to build commands to trigger pipelines written in any language.

NextQuick Start

Last updated 1 year ago

Was this helpful?