Big Data and Data Sharing Resources

Data Resources

Data Resources - General

  • Cancer Research Data Commons 
    • The Cancer Research Data Commons (CRDC) is a centralized location where stakeholders can submit and access multiple curated data repositories

Data Resources - Pre-Clinical

  • FlowRepository 
    • FlowRepository is a repository of downloadable, annotated, and peer-reviewed flow cytometry data sets
  • immuneACCESS  
    • immuneACCESS is a large repository of T-cell and B-cell receptor sequencing data 
  • IOTN Data Sharing Catalog  
    • The IOTN Data Sharing Catalog includes datasets from IOTN awardee publications
  •  iReceptor 
    • iReceptor is a repository of antibody/B-cell and T-cell receptor repertoires 
  • McPAS-TCR 
    • McPAS-TCR is a curated database of of pathology associated T-cell receptor sequences
  •  NCTN/NCORP Data Archive 
    • The NCTN/NCORP Data Archive is a database that includes datasets originating from clinical trials within the National Clinical Trials Network (NCTN) and the NCI Community Oncology Research Program (NCORP)
  • VDJdb
    • VDJdb is a curated repository T-cell receptor sequences with known antigen specificities

Data Resources - Data Repositories

  • dbGap
    • The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans.
  • European Genome-Phenome Archive
    • The European Genome-phenome Archive (EGA) is a service for permanent archiving and sharing of personally identifiable genetic, phenotypic, and clinical data generated for the purposes of biomedical research projects or in the context of research-focused healthcare systems.
  • GEO
    • GEO is a public functional genomics data repository supporting MIAME-compliant data submissions.
  • Proteomics Data Commons (PDC) 
    • Clinical Proteomic Tumor Analysis Consortium (CPTAC). 
  • The Cancer Imaging Archive (TCIA) 
    • TCIA de-identifies and hosts a large archive of medical images of cancer accessible for public download.
  • The NCI's Genomic Data Commons (GDC) 
    • The NCI's GDC provides the cancer research community with a unified repository and cancer knowledge base that enables data sharing across cancer genomic studies in support of precision medicine. 

Software Tools

Software Tools - Framework Software & Libraries

  • BioConductor (Tools)  
    • Bioconductor is a R-based toolset used for analysis and comprehension of high-throughput genomic data 
  • IOTN Bioconductor Suite  
    • The IOTN Bioconductor suite is a collection of Bioconductor tools that can be used for immunotherapy data set analyses 
  • IOTN Software Sharing Catalog 
    • The IOTN Software Sharing Catalog is a collection of software programs and source codes from projects supported by IOTN grants 
  • SimpleITK 
    • Open-source interface to the Insight Segmentation and Registration Toolkit. The SimpleITK image analysis library is available in multiple programming languages including C++, Python, R, Java, C#, Lua, Ruby and Tcl. 
    • OpenCV 
      • OpenCV is a library of programming functions mainly aimed at real-time computer vision. 

    Software Tools - Cytometry

      • CyTOF workflow 
        • CyTOFworkflow is an R-based pipeline used for differential analyses of High-dimensional mass and flow cytometry (HDCyto) data 
      • Cytometry dATa anALYSis Tools 
        • CytometrydATa anALYSis Tools (CATALYST) is a tool that provides a pre-processing pipeline and visualization assistance for mass cytometry (CyTOF) data 
      • DiffCyt 
        • DiffCytis a bioconductor tool that assists with statistical methods for differential discovery analyses in high-dimensional cytometry data 
      • flowCore 
        • FlowCoreis a Bioconductor tool that assists with flow cytometry data analyses 

      Software Tools - Single Cell Sequencing

          • Kallisto
            • Kallisto is a program used for measuring transcript and/or target sequence abundance in high-throughput sequencing datasets
          • Seurat
            • Seurat is an R-based toolkit used for single-cell genomic analyses. Guided analyses forSeruat can be found here 
          • Scanpy
            • Scanpy(Single-Cell Analysis in Python) is a Python-based toolkit used for single-cell gene expression analyses 
          • Azimuth
            • Reference based cell type annotation forscRNA data 
          • Scvi-tools
            • Probabilistic models for single-cell omics data based on auto-encoder neural network models. Dimensionality reduction, dataset integration, differential expression, automated annotation.

          Software Tools - Spatial omics & Microscopy Imaging

                • cellpose
                  • Generalized, machine learning based algorithm for cellsegmentation 
                • CytoKit
                  • Collection of tools for quantifying and analyzing properties of individual cells in large fluorescent microscopy datasets 

                Software Tools - TCR

                        • FEST 
                          • Functional Expansion of Specific T-cells (FEST) is an online program used for analysis of TCR sequencing of short-term, peptide-stimulated cultures in order to identify antigen-specific clonotypic amplifications  
                          • Grouping of Lymphocyte Interactions by Paratope Hotspots (GLIPH) is an online program used for clustering T-cell receptors based on their predicted MHC-restricted peptide binding properties
                        • Immunarch 
                          • Immunarch is an R based package used for analysis of single-cell and bulk T-cell/antibody repertoires 
                        • iReceptor 
                          • iReceptor is an online platform used for analyzing T-cell receptor and B-cell/antibody repertoire data across federated repositories  
                        • MiXCR 
                          • MiXCR is an online tool used for analysis of raw "Immunome" sequencing data  
                        • SeeTCR 
                          • SeeTCR is an online tool for processing T-cell repertoire data into quantitative graphics  
                        • VDJServer 
                          • VDJServer is an immune repertoire analysis and archiving suite  
                        • VDJtools 
                          • VDJtools is an online tool for analysis of immune repertoire sequencing (RepSeq) data  
                        • VDJviz 
                          • VDJviz is an online browser used to navigate and analyze immune repertoire sequencing data 

                        Educational Resources

                        • CITE-Seq
                          • CITE-Seq is a method that uses DNA-barcoded antibodies to allow for quantifiable detection of proteins. This website provides introductory education into these techniques.
                        • Data Carpentry
                          • Data Carpentry is an organization that focuses on data science and computational education for researchers
                        • DataCamp
                          • DataCamp is an online data science educational organization, providing both basic and advanced offerings