:orphan: .. _version-1: Schema ====== **Version: 1** This schema describes a compressed sparse row storage scheme (CSR) for a *symmetric* matrix with genomic dimension/axis annotations. Notes: - Any number of additional optional columns can be added to each table. (e.g. normalization vectors, quality masks). - Genomic coordinates are assumed to be 0-based and intervals half-open (1-based ends). Contact matrix ~~~~~~~~~~~~~~ The tables and indexes can be represented in the `Datashape `_ layout language: :: { chroms: { name: typevar['Nchroms'] * string[32, 'ascii'], length: typevar['Nchroms'] * int64, }, bins: { chrom_id: typevar['Nbins'] * int32, start: typevar['Nbins'] * int64, end: typevar['Nbins'] * int64, weight: typevar['Nbins'] * float64 }, pixels: { bin1_id: typevar['Nnz'] * int32, bin2_id: typevar['Nnz'] * int32, count: typevar['Nnz'] * int32 }, indexes: { chrom_offset: (typevar['Nchroms'] + 1) * int32, bin1_offset: (typevar['Nbins'] + 1) * int32 } } Notes: - Having the ``bin1_offset`` index, the ``bin1_id`` column becomes redundant, but we keep it for convenience as it is extremely compressible. It may be dropped in future versions. Metadata ~~~~~~~~~ Essential key-value properties are stored as root-level HDF5 attributes. A specific bucket called ``metadata`` is reserved for arbitrary JSON-compatible user metadata. :: nchroms : Number of rows in scaffolds table nbins : Number of rows in bins table nnz : Number of rows in matrix table bin-type : {"fixed" or "variable"} bin-size : Size of bins in base pairs if bin-type is "fixed" genome-assembly : Name of genome assembly library-version : Version of cooler library that created the file format-version : The version of the current format format-url : URL to page providing format details creation-date : Date the file was built metadata : custom user metadata about the experiment Indexes ~~~~~~~ Indexes are stored as 1D datasets in a separate group. The current indexes can be thought of as run-length encodings of the ``bins/chrom`` and ``pixels/bin1_id`` columns, respectively. - ``chrom_offset`` : indicates what row in the bin table each chromosome first appears. - ``bin1_offset`` : indicates what row in the pixel table each bin1 ID appears. This is often called *indptr* in CSR data structures.