Skip to content

Design Patterns

This page explains the key structural decisions behind CoreMeta4Cat โ€” how the four pillars are modelled, how they connect to a dataset, and what design patterns make the schema extensible and machine-actionable.


Reading guide

This page is written in three tiers. Most users only need the first two.

Tier Who it's for Sections
Overview Everyone โ€” data providers, repository managers The entry point, The four pillars
Pattern explanations Users who want to understand how to navigate or extend the schema Classification pattern, Activity pattern, Mixin pattern
Technical depth Schema developers and DCAT-AP-PLUS integrators Sections marked with ๐Ÿ”ฌ

The entry point: CatalysisDataset

Every CoreMeta4Cat record starts with a CatalysisDataset. This is a dcat:Dataset โ€” fully compatible with plain DCAT and DCAT-AP โ€” extended with four additional link slots that connect to the four CoreMeta4Cat pillars.

id: ex:dataset-001
type: CatalysisDataset                  # โ†’ dcat:Dataset

# Layer 1: global classification
rdf_type:
  id: voc4cat:0007001
  title: "heterogeneous catalysis"

# Layer 2: links to the four pillars
was_generated_by:
  - id: ex:synthesis-001
    type: Synthesis
  - id: ex:characterization-001
    type: Characterization

is_about_activity:
  - id: ex:reaction-001
    type: Reaction

is_about_entity:
  - id: ex:catalyst-001
    type: CatalystSample

The key points here are:

  • rdf_type carries a controlled vocabulary term from Voc4Cat to classify which field of catalysis the dataset belongs to.
  • was_generated_by links to activities that produced the data (Synthesis, Characterization, Simulation).
  • is_about_activity links to the Reaction being studied โ€” which is the catalytic process itself, not a data-generating step.
  • is_about_entity links to the catalyst sample or material the dataset concerns.

The four pillars

The four CoreMeta4Cat pillars โ€” Synthesis, Characterization, Reaction, and Simulation โ€” are the core of the metadata model. Each is a separate class, defined in its own subprofile module, and linked from the CatalysisDataset via the slots above.

catcore.yaml  (aggregator + CatalysisDataset)
  โ”œโ”€โ”€ catcore_common.yaml          (shared slots and enumerations)
  โ”œโ”€โ”€ catcore_synthesis_ap.yaml    (Synthesis + 12 preparation methods)
  โ”œโ”€โ”€ catcore_characterization_ap.yaml  (Characterization + 28 techniques)
  โ”œโ”€โ”€ catcore_reaction_ap.yaml     (Reaction + 8 reactor types)
  โ””โ”€โ”€ catcore_simulation_ap.yaml   (Simulation + 4 methods + 12 properties)

Synthesis

What it is: The process of preparing a catalyst. Synthesis is a DataGeneratingActivity โ€” it produces data about the preparation, and it produces a CatalystSample as its physical output.

Key links:

  • realized_plan โ†’ a PreparationMethod subclass (the protocol used)
  • had_input_entity โ†’ Precursor instances (the starting materials)
  • had_output_entity โ†’ CatalystSample (the resulting catalyst)

Twelve preparation methods are currently defined, each as a concrete PreparationMethod subclass:

  • Wet chemistry

    Impregnation, Co-Precipitation, Deposition-Precipitation, Sol-Gel, Molecular Synthesis

  • Thermal / gas-phase

    Solvothermal, Combustion Synthesis, Flame Spray Pyrolysis, Sublimation, Plasma-Assisted

  • Mechanical / energy-assisted

    Mechanochemical Synthesis, Microwave-Assisted, Sonochemical Synthesis

  • Surface / thin-film

    Atomic Layer Deposition, Exsolution Synthesis

Characterization

What it is: The measurement of a catalyst or catalytic system using an analytical technique. Characterization is also a DataGeneratingActivity โ€” it produces measurement data.

Key links:

  • evaluated_entity โ†’ the CatalystSample or material being measured
  • realized_plan โ†’ a CharacterizationTechnique subclass (the measurement protocol)
  • carried_out_by โ†’ the instrument (Device) performing the measurement

Twenty-eight techniques are currently defined, organised into groups:

Group Techniques
Diffraction Powder XRD, Single Crystal XRD
X-ray spectroscopy XAS/XANES/EXAFS, XPS, EDX
Vibrational spectroscopy FTIR, DRIFTS, Raman, NMR
Electron microscopy TEM, SEM
Thermal analysis TGA, TPR, TPO
Surface & pore analysis BET
Elemental analysis ICP-AES, Elemental Analysis (CHNS)
Optical & electronic UV-Vis, Photoluminescence, Photoluminescence Lifetime
Electrochemistry Cyclic Voltammetry, Conductivity Measurement
Particle sizing Dynamic Light Scattering
Mass spectrometry ESI-MS, GC-MS, HPLC-MS
Chromatography GC, HPLC

Reaction

What it is: The catalytic reaction being studied. Unlike Synthesis and Characterization, Reaction is not a DataGeneratingActivity. It is the process being observed, not the process generating the dataset.

Key links:

  • carried_out_by โ†’ a ReactorDesignType (the physical reactor)
  • had_input_entity โ†’ reactant feeds
  • product_identification_method โ†’ a CharacterizationTechnique used for product analysis

Eight reactor design types are defined:

FixedBedReactor ยท CSTR ยท PlugFlowReactor ยท Autoclave ยท SlurryReactor ยท Microreactor ยท ElectrochemicalReactor ยท FluidizedBedReactor

Operando experiments

For in-situ or operando experiments (e.g. XRD measured while a reaction runs), the dataset carries both links simultaneously:

was_generated_by:
  - type: Characterization     # PowderXRD โ€” the process that made the data
is_about_activity:
  - type: Reaction             # the catalytic process being monitored

Simulation

What it is: A computational study of a catalyst or catalytic mechanism. Simulation is a DataGeneratingActivity โ€” it generates data computationally.

Key links:

  • realized_plan โ†’ a SimulationMethod subclass (DFT, MD, Microkinetics, MonteCarlo)
  • carried_out_by โ†’ a Software agent (the simulation package)
  • evaluated_entity โ†’ the catalyst model or structure being simulated

Twelve calculated property classes capture the type of computed output: ElectronicStructure, BandGap, ThermodynamicStability, PhononDispersion, Surfaces, GrainBoundaries, ElasticConstants, DielectricTensors, EquationsOfState, AqueousStability, Piezoelectricity, Ferroelectrics.


Pattern 1: Classification via rdf_type

Pattern summary

Flexible, machine-actionable classification of datasets, activities, and entities using ontology terms โ€” without creating a separate class for every possible value.

Rather than defining a fixed class hierarchy for every type of catalysis or every synthesis method, CoreMeta4Cat uses a single rdf_type slot on each class to carry a controlled vocabulary term from Voc4Cat, CHMO, or another ontology. This keeps the schema compact while staying fully machine-actionable.

On CatalysisDataset โ€” classify the catalysis research field:

rdf_type:
  id: voc4cat:0007001
  title: "heterogeneous catalysis"

On Synthesis โ€” classify the preparation method type:

type: Synthesis
rdf_type:
  id: voc4cat:0007016
  title: "impregnation"
realized_plan:
  type: Impregnation    # the concrete method class with all parameter slots

On Characterization โ€” classify the measurement technique:

type: Characterization
rdf_type:
  id: CHMO:0000158
  title: "powder X-ray diffraction"
realized_plan:
  type: PowderXRD       # the concrete technique class with all measurement slots

The rdf_type slot gives the machine-readable ontology term; the concrete subclass (Impregnation, PowderXRD, โ€ฆ) provides the structured parameter slots. Both are used together.

The allowed values for rdf_type on CatalysisDataset are defined in CatalysisResearchFieldEnum:

Value Ontology term Description
heterogeneous_catalysis voc4cat:0007001 Catalyst and reactants in different phases
homogeneous_catalysis voc4cat:0000294 Catalyst and reactants in the same phase
electrocatalysis voc4cat:0000216 Catalysis of electrochemical reactions
biocatalysis voc4cat:0000204 Enzyme or whole-cell catalysis
hybrid_catalysis (pending) Combination of two or more approaches
other โ€” Fallback for unlisted fields

Pattern 2: Activities and Plans

Pattern summary

A two-part structure separates what was done (the Activity) from the protocol describing how to do it (the Plan). This mirrors the PROV-O model and keeps the schema clean.

Each pillar that generates data follows this two-part structure:

Activity (what was done)        Plan (the protocol)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€       โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Synthesis              โ”€โ”€โ†’      PreparationMethod
Characterization       โ”€โ”€โ†’      CharacterizationTechnique
Simulation             โ”€โ”€โ†’      SimulationMethod

The Activity carries the instance-level data (who did it, when, on what sample, with what output). The Plan carries the method-level data (parameter settings, instrument configuration, protocol steps).

# The activity โ€” what happened
id: ex:synthesis-001
type: Synthesis
nominal_composition: "5wt% Ni/Al2O3"
had_input_entity:
  - id: ex:precursor-001
    type: Precursor
    name: "Ni(NO3)2ยท6H2O"
    precursor_quantity: 1.24   # g
had_output_entity:
  - id: ex:catalyst-001
    type: CatalystSample

# The plan โ€” the protocol
realized_plan:
  id: ex:method-001
  type: Impregnation
  impregnation_type: incipient_wetness
  impregnation_duration: 12.0   # h
  drying_temperature: 120.0     # ยฐC
  drying_time: 12.0             # h
  calcination_final_temperature: 500.0   # ยฐC
  calcination_dwelling_time: 4.0         # h
  calcination_gaseous_environment: "air"

This separation means a single PreparationMethod record could in principle be shared across multiple Synthesis activities โ€” a direct gain for reproducibility.


Pattern 3: Mixin classes

Pattern summary

Slot groups that are shared across multiple methods are factored into reusable mixin classes, so each slot is defined exactly once and inherited wherever needed.

Many preparation methods share common process steps โ€” drying, calcination, precipitation. Rather than repeating the same slots in every method class, CoreMeta4Cat uses mixin classes that bundle related slots:

Mixin Slots it provides Used by
DryingMixin drying_device, drying_temperature, drying_time, drying_atmosphere Impregnation, CoPrecipitation, DepositionPrecipitation, SonochemicalSynthesis, MolecularSynthesis
CalcinationMixin calcination_initial_temperature, calcination_final_temperature, calcination_dwelling_time, calcination_heating_rate, calcination_gaseous_environment, calcination_gas_flow_rate, number_of_cycles Impregnation, CoPrecipitation, DepositionPrecipitation, SonochemicalSynthesis, ExsolutionSynthesis
PrecipitationMixin precipitating_agent, synthesis_ph, mixing_rate, mixing_time, mixing_temperature, order_of_addition, aging_temperature, aging_time CoPrecipitation, DepositionPrecipitation
ThermalSynthesisMixin synthesis_temperature, synthesis_duration, equipment, vessel_type, atmosphere Solvothermal, PlasmaAssisted, CombustionSynthesis, MicrowaveAssisted, MechanochemicalSynthesis, Sublimation

The same pattern is used in the Characterization subprofile for analytical techniques:

Mixin Used by
XRaySourceMixin PowderXRD, SingleCrystalXRD, XPS, EDX
ElectronMicroscopyMixin TEM, SEM
TemperatureProgramMixin TPR, TPO, Thermogravimetry
ChromatographyMixin GC, HPLC, GC-MS, HPLC-MS
MassRangeMixin GC-MS, HPLC-MS, ESI-MS
ElectrochemistryMixin CyclicVoltammetry, ConductivityMeasurement

In LinkML, a mixin class has no class_uri of its own and generates no independent node shape. It is a pure slot container, mixed in via the mixins: key on a concrete class.


Pattern 4: Shared slots in catcore_common

Pattern summary

Slots referenced by two or more subprofiles are declared once in catcore_common.yaml and imported by all subprofiles. This keeps the schema DRY (Don't Repeat Yourself).

Some slots appear in multiple pillars โ€” for example, temperature is relevant to Synthesis (calcination), Characterization (temperature-programmed experiments), and Reaction (reactor temperature). These shared slots live in catcore_common.yaml:

catcore_common.yaml โ€” shared slots include:
  atmosphere, temperature, flow_rate, heating_rate,
  equipment, sample_mass, stirring_speed, stirring_duration,
  drying_*, calcination_*, concentration, solvent,
  experiment_duration, step_size, resolution, ...

Slots that are exclusive to a single subprofile are declared in that subprofile file only.


๐Ÿ”ฌ Deep dive: The EvaluatedActivity distinction

Technical section

This section is intended for schema developers and DCAT-AP-PLUS integrators. It is not required reading for data providers.

One of the most important architectural decisions in CoreMeta4Cat is that Reaction is not a DataGeneratingActivity. Instead it is an EvaluatedActivity.

In DCAT-AP-PLUS, the distinction is:

Class Meaning Links to dataset via
DataGeneratingActivity A process that produces the data in the dataset prov:wasGeneratedBy
EvaluatedActivity A process that the dataset is about, but which did not produce it is_about_activity

For catalysis, a Reaction is the catalytic process being studied. The data is generated by measuring that reaction โ€” via a Characterization activity. The Reaction itself does not produce the data file.

This distinction enables operando experiments to be described correctly:

# Operando XRD during CO oxidation
was_generated_by:
  - type: Characterization      # PowderXRD run โ€” this produced the data
    rdf_type:
      id: CHMO:0000158
      title: "powder X-ray diffraction"

is_about_activity:
  - type: Reaction              # the CO oxidation reaction being monitored
    catalyst_quantity: 50.0
    reactant: ["1 vol% CO", "2 vol% O2"]
    carried_out_by:
      type: FixedBedReactor

If Reaction were modelled as a DataGeneratingActivity, this relationship would collapse: it would be impossible to distinguish the measurement from the catalytic process it monitors.

The same distinction appears in DCAT-AP-PLUS itself โ€” the NMR example in the base documentation uses was_generated_by: NMRSpectroscopy (the measurement) and evaluated_entity: MaterialSample (the thing measured). CoreMeta4Cat extends this by adding is_about_activity: Reaction for cases where a process โ€” not just a material โ€” is being monitored.


Import hierarchy

The full import chain, from the CoreMeta4Cat top level down to the DCAT-AP-PLUS base, is:

catcore.yaml
  โ””โ”€โ”€ catcore_common.yaml
        โ””โ”€โ”€ chem_dcat_ap
              โ””โ”€โ”€ chemical_reaction_ap
                    โ””โ”€โ”€ chemical_entities_ap
                          โ””โ”€โ”€ material_entities_ap
                                โ””โ”€โ”€ dcat_ap_plus   โ† DCAT-AP-PLUS base
  โ”œโ”€โ”€ catcore_synthesis_ap.yaml
  โ”œโ”€โ”€ catcore_characterization_ap.yaml
  โ”œโ”€โ”€ catcore_reaction_ap.yaml
  โ””โ”€โ”€ catcore_simulation_ap.yaml

Each layer adds domain-specific classes and slots on top of the layer below, without modifying it. This means CoreMeta4Cat datasets remain valid DCAT-AP-PLUS instances, which in turn remain valid DCAT-AP datasets.