GlossaryΒΆ

  • HDF5 is a general purpose binary container format for large scientific datasets.

  • h5py is a Python library providing low-level bindings to the libhdf5 C-library and a high-level, numpy-aware API to interact with HDF5 files on disk.

  • The cooler data model is a flexible sparse data model for Hi-C and other genomically-labeled arrays.

  • The cooler schema describes an implementation of the cooler data model using HDF5 as the underlying storage layer.

  • Cooler files store one or more cooler data collections, each representing a genomically-labeled sparse array.

  • Single-resolution cooler files are conventionally given the extension .cool. Multi-resolution files are usually suffixed .mcool.

  • The cooler Python package provides an API to create cooler files and to interact with them both as data frames and sparse matrices.

  • A genomic pairs list provides pointwise 2-tuples of single-bp genomic locations. In Hi-C this is also called a contact list.

  • A genomic matrix, 2D array or heatmap assigns unique quantitative values to pairs of genomic intervals taken from a bin segmentation of a genome assembly. In Hi-C, a contact matrix is obtained by aggregating pairs.