Data model design overview
Updated: December 2005

The Workbench design manages an interconnected set of data describing chemical compounds, interactions, pathways, cell geometry, and visual attributes. The data model is a conceptual view of that data that describes the main data components and how they are related. This conceptual view is made concrete through several implementations:

  • Programming API - the classes and methods that manage the data
  • File format - the file syntax used to store that data
  • Database - the database schema used to store that data

The pages here discuss the conceptual view of the data. Please see the file formats and database overviews for a discussion of data storage, and the Workbench's API documentation for the Java implementation of this data model.

Principal data objects

There are four main data structures about which the rest of the data model revolves:

Compound
A description of a compound that may be used within a pathway. A compound names the chemical and includes core attributes such as its type (e.g., protein, small molecule, etc.) and ontological information (what type of protein?). Database IDs may be attached to look up detailed information about the compound and cache it into the Workbench's data structures.
Participant
A description of the use of a chemical compound within a pathway. A participant refers to a chemical compound description for general attributes of the compound, then adds usage-specific attributes such as stoichiometry and charge, the cell compartment involved, time course data, and commentary on the participant's role in the pathway.
Interaction
A description of a reaction involving two or more participants. Those participants are grouped into reactants, products, and catalyst participants. The interaction can be marked as reversible, its enzymes as activating or inhibiting the reaction, and assigned a name, commentary, database IDs, and other attributes.
Pathway
A description of a reaction network involving one or more interactions and their participants.

Secondary data objects

The design's primary data objects refer to secondary data objects for key attributes including:

Cell Compartment
The region of the cell in which a participant is involved in an interaction (e.g., nucleus, cell membrane, etc.). A compartment is minimally a named region. It may include a location and size for the region and compartments may be organized into a containment hierarchy of compartments within compartments.
Organism
The organism for whome the data is relevant (e.g., mouse, human, etc.). An organism minimally has a name, but may also include classification data to place it within a taxonomy of organisms.
Simulation Parameters
Data to specify kinetic rate laws, initial conditions, or experimental data on concentration changes, and so forth.
Presentation Parameters
Visual attributes, such as the location, size, and color of pathway components.

All secondary data is optional.

The Workbench includes general-purpose data structures for these or any use, including data structures for 3D locations, 3D bounded regions, colors, numeric values, value ranges, value lists, units of measure, text values, URLs, and so on.

Types and classifications

Ontological information may be attached to each of the principal data objects, and many of the other major data structures. This ontological information includes:

Simple type
A simple type that broadly classifies the item (e.g. protein vs. small molecule). The simple type is often all that is supported by existing file formats, and is sufficient to guide visualization and user interfaces to present the data object appropriately (e.g. draw proteins in blue and small molecules in green).
Detailed classifications
A list of one or more classification tuples that name a classification scheme and select the specific classification within that scheme. A data object may have multiple classifications, simulatenously including those particular to a database, file format, project, or lab.

Terms and vocabularies

The prinicipal data objects, types, classifications, and so forth are handled generically within the Workbench. They are all classed as terms - named objects that may be organized into vocabularies. Terms may be organized hierarchically within a vocabulary, such as for NCBI's taxonomy of organisms or for a containment hierararchy of cell compartments. Vocabularies are themselves terms, enabling vocabularies of vocabularies and hierarchically nested vocabularies.