CoreMeta4Cat — Comprehensive Metadata Guidelines for Catalysis Research Data

Synthesis

Twelve preparation methods with method-specific parameter sets, shared mixin classes for drying and calcination steps.
Characterization

Twenty-eight analytical techniques, from Powder XRD to Cyclic Voltammetry, each with dedicated measurement slots.
Reaction

Eight reactor design types, flattened operation parameter slots, and product identification links.
Simulation

Four computational methods (DFT, MD, Microkinetics, Monte Carlo) with 12 calculated property classes.

What is CoreMeta4Cat?

CoreMeta4Cat is a LinkML-based metadata reference model for catalysis research data, developed within the NFDI4Cat initiative. It defines the minimum information that should be reported alongside research data in the field of catalysis, following the FAIR principles (Findable, Accessible, Interoperable, Reusable).

CoreMeta4Cat is built as a domain-specific application profile on top of DCAT-AP-PLUS, a provenance-aware extension of the DCAT Application Profile 3.0. This means every CoreMeta4Cat dataset is a valid dcat:Dataset, every activity is a valid prov:Activity, and all schema artefacts — SHACL shapes, JSON Schema, Python/Pydantic classes, HTML reference documentation — are generated automatically from the single LinkML source.

Quick Start: What does CoreMeta4Cat add?

In plain DCAT-AP, a Dataset can describe what data exists but says little about how it was produced or what material it concerns. DCAT-AP-PLUS adds a structured provenance graph via prov:wasGeneratedBy. CoreMeta4Cat specialises that graph for catalysis:

# A dataset about the CO oxidation performance of a supported Pt catalyst
id: ex:dataset-001
title: "CO oxidation activity of 1wt% Pt/Al2O3 at 200–400°C"
rdf_type:
  id: voc4cat:0007001
  title: "heterogeneous catalysis"

was_generated_by:
  - id: ex:reaction-001
    type: Reaction
    catalyst_quantity: 100.0   # mg
    reactant:
      - "1 vol% CO in N2"
      - "2 vol% O2 in N2"
    reactor_temperature_range: "200–400 °C"
    experiment_pressure: 1.0   # bar
    carried_out_by:
      id: ex:reactor-001
      type: FixedBedReactor

is_about_entity:
  - id: ex:catalyst-001
    type: CatalystSample
    nominal_composition: "1wt% Pt/Al2O3"

This is valid CoreMeta4Cat instance data. Every class and property is mapped to a controlled ontology term (voc4cat, CHMO, OBI, …) and can be validated and converted to RDF using standard LinkML tooling.

Two-layer architecture

CoreMeta4Cat organises metadata in two layers.

Layer 1 — Global classification is data-class-independent. It applies to every CatalysisDataset and captures the two fields needed for the coarsest-possible filtering of a repository:

Field	Example values	Obligation
Catalysis research field (`rdf_type`)	heterogeneous catalysis, electrocatalysis, biocatalysis	Recommended
Reaction type (`rdf_type` on `Reaction`)	CO oxidation, ammonia synthesis, hydrogenation	Recommended

Layer 2 — Data-class-specific metadata is structured around the four pillars: Synthesis, Characterization, Reaction, and Simulation. Each pillar maps to a DCAT-AP-PLUS Activity subclass and carries its own set of Mandatory, Recommended, and Optional fields.

CatalysisDataset (dcat:Dataset)
 ├── rdf_type → CatalysisResearchFieldEnum   [Layer 1]
 ├── was_generated_by → Synthesis            [Layer 2]
 ├── was_generated_by → Characterization     [Layer 2]
 ├── was_generated_by → Simulation           [Layer 2]
 └── is_about_activity → Reaction            [Layer 2]

The four CoreMeta4Cat pillars

Synthesis

Reproducibility of catalyst synthesis is one of the most persistent challenges in catalysis research. The Synthesis pillar defines the minimum metadata for twelve preparation methods, from common routes such as Impregnation and Co-Precipitation to more specialised techniques like Atomic Layer Deposition, Flame Spray Pyrolysis, and Exsolution Synthesis.

Method-specific parameter sets are organised into concrete PreparationMethod subclasses. Cross-cutting slot groups (drying step, calcination step, precipitation step, thermal process) are factored out as mixin classes, so parameters shared by multiple methods are defined exactly once.

Class	Key mixins applied
`Impregnation`	`DryingMixin`, `CalcinationMixin`
`CoPrecipitation`	`PrecipitationMixin`, `DryingMixin`, `CalcinationMixin`
`DepositionPrecipitation`	`PrecipitationMixin`, `DryingMixin`, `CalcinationMixin`
`Solvothermal`, `PlasmaAssisted`, `CombustionSynthesis`, `MicrowaveAssisted`, `MechanochemicalSynthesis`, `Sublimation`	`ThermalSynthesisMixin`
`SonochemicalSynthesis`, `MolecularSynthesis`	`DryingMixin` / `CalcinationMixin`
`AtomicLayerDeposition`, `SolGel`, `FlameSprayPyrolysis`, `ExsolutionSynthesis`	method-specific slots only

Characterization

The Characterization pillar covers twenty-eight analytical techniques currently used in catalysis. Each technique is modelled as a concrete CharacterizationTechnique subclass (a DCAT-AP-PLUS Plan), with slots for instrument parameters, sample state, and measurement conditions. Cross-cutting parameter groups are again factored out as mixins:

XRaySourceMixin — shared by PowderXRD, SingleCrystalXRD, XPS, EDX
ElectronMicroscopyMixin — shared by TEM, SEM
TemperatureProgramMixin — shared by TPR, TPO, Thermogravimetry
ChromatographyMixin, MassRangeMixin — shared by GC, GC-MS, HPLC, HPLC-MS

Reaction

The Reaction pillar represents the catalytic process being studied. It is modelled as a DCAT-AP-PLUS EvaluatedActivity — the process the dataset is about, not the process that generates the data. This distinction matters: for operando experiments (e.g. in-situ XRD during a reaction), the dataset carries both was_generated_by: Characterization and is_about_activity: Reaction.

The reactor is linked via carried_out_by as one of eight ReactorDesignType subclasses:

ElectrochemicalReactor
CSTR
PlugFlowReactor
Autoclave
SlurryReactor
Microreactor
FixedBedReactor
FluidizedBedReactor

Simulation

The Simulation pillar covers four major computational method classes, each a SimulationMethod subclass (DCAT-AP-PLUS Plan): DFT, MolecularDynamics, Microkinetics, and MonteCarlo. The simulation software is linked via carried_out_by as a Software agent. Twelve CalculatedProperty classes (e.g. ElectronicStructure, BandGap, PhononDispersion, ThermodynamicStability) capture the computed output type.

Documentation

Page	What it covers
Design Patterns	How the four pillars map to DCAT-AP-PLUS, the mixin pattern, ontology alignment
How to Extend	Rules for adding new preparation methods, techniques, reactor types, and properties
Schema Reference	Auto-generated reference for all classes and slots
CoreMeta4Cat Users	Projects and repositories that adopt CoreMeta4Cat

Source code

The LinkML schema, build scripts, and documentation source are on GitHub: HendrikBorgelt/CoreMeta4Cat

The schema is built as a domain-specific application profile on top of DCAT-AP-PLUS. The base layer is maintained by NFDI4Cat.