CoreMeta4Cat — Comprehensive Metadata Guidelines for Catalysis Research Data
-
Synthesis
Twelve preparation methods with method-specific parameter sets, shared mixin classes for drying and calcination steps.
-
Characterization
Twenty-eight analytical techniques, from Powder XRD to Cyclic Voltammetry, each with dedicated measurement slots.
-
Reaction
Eight reactor design types, flattened operation parameter slots, and product identification links.
-
Simulation
Four computational methods (DFT, MD, Microkinetics, Monte Carlo) with 12 calculated property classes.
What is CoreMeta4Cat?
CoreMeta4Cat is a LinkML-based metadata reference model for catalysis research data, developed within the NFDI4Cat initiative. It defines the minimum information that should be reported alongside research data in the field of catalysis, following the FAIR principles (Findable, Accessible, Interoperable, Reusable).
CoreMeta4Cat is built as a domain-specific application profile on top of DCAT-AP-PLUS, a provenance-aware extension of the DCAT Application Profile 3.0. This means every CoreMeta4Cat dataset is a valid dcat:Dataset, every activity is a valid prov:Activity, and all schema artefacts — SHACL shapes, JSON Schema, Python/Pydantic classes, HTML reference documentation — are generated automatically from the single LinkML source.
Quick Start: What does CoreMeta4Cat add?
In plain DCAT-AP, a Dataset can describe what data exists but says little about how it was produced or what material it concerns. DCAT-AP-PLUS adds a structured provenance graph via prov:wasGeneratedBy. CoreMeta4Cat specialises that graph for catalysis:
# A dataset about the CO oxidation performance of a supported Pt catalyst
id: ex:dataset-001
title: "CO oxidation activity of 1wt% Pt/Al2O3 at 200–400°C"
rdf_type:
id: voc4cat:0007001
title: "heterogeneous catalysis"
was_generated_by:
- id: ex:reaction-001
type: Reaction
catalyst_quantity: 100.0 # mg
reactant:
- "1 vol% CO in N2"
- "2 vol% O2 in N2"
reactor_temperature_range: "200–400 °C"
experiment_pressure: 1.0 # bar
carried_out_by:
id: ex:reactor-001
type: FixedBedReactor
is_about_entity:
- id: ex:catalyst-001
type: CatalystSample
nominal_composition: "1wt% Pt/Al2O3"
This is valid CoreMeta4Cat instance data. Every class and property is mapped to a controlled ontology term (voc4cat, CHMO, OBI, …) and can be validated and converted to RDF using standard LinkML tooling.
Two-layer architecture
CoreMeta4Cat organises metadata in two layers.
Layer 1 — Global classification is data-class-independent. It applies to every CatalysisDataset and captures the two fields needed for the coarsest-possible filtering of a repository:
| Field | Example values | Obligation |
|---|---|---|
Catalysis research field (rdf_type) |
heterogeneous catalysis, electrocatalysis, biocatalysis | Recommended |
Reaction type (rdf_type on Reaction) |
CO oxidation, ammonia synthesis, hydrogenation | Recommended |
Layer 2 — Data-class-specific metadata is structured around the four pillars: Synthesis, Characterization, Reaction, and Simulation. Each pillar maps to a DCAT-AP-PLUS Activity subclass and carries its own set of Mandatory, Recommended, and Optional fields.
CatalysisDataset (dcat:Dataset)
├── rdf_type → CatalysisResearchFieldEnum [Layer 1]
├── was_generated_by → Synthesis [Layer 2]
├── was_generated_by → Characterization [Layer 2]
├── was_generated_by → Simulation [Layer 2]
└── is_about_activity → Reaction [Layer 2]
The four CoreMeta4Cat pillars
Synthesis
Reproducibility of catalyst synthesis is one of the most persistent challenges in catalysis research. The Synthesis pillar defines the minimum metadata for twelve preparation methods, from common routes such as Impregnation and Co-Precipitation to more specialised techniques like Atomic Layer Deposition, Flame Spray Pyrolysis, and Exsolution Synthesis.
Method-specific parameter sets are organised into concrete PreparationMethod subclasses. Cross-cutting slot groups (drying step, calcination step, precipitation step, thermal process) are factored out as mixin classes, so parameters shared by multiple methods are defined exactly once.
| Class | Key mixins applied |
|---|---|
Impregnation |
DryingMixin, CalcinationMixin |
CoPrecipitation |
PrecipitationMixin, DryingMixin, CalcinationMixin |
DepositionPrecipitation |
PrecipitationMixin, DryingMixin, CalcinationMixin |
Solvothermal, PlasmaAssisted, CombustionSynthesis, MicrowaveAssisted, MechanochemicalSynthesis, Sublimation |
ThermalSynthesisMixin |
SonochemicalSynthesis, MolecularSynthesis |
DryingMixin / CalcinationMixin |
AtomicLayerDeposition, SolGel, FlameSprayPyrolysis, ExsolutionSynthesis |
method-specific slots only |
Characterization
The Characterization pillar covers twenty-eight analytical techniques currently used in catalysis. Each technique is modelled as a concrete CharacterizationTechnique subclass (a DCAT-AP-PLUS Plan), with slots for instrument parameters, sample state, and measurement conditions. Cross-cutting parameter groups are again factored out as mixins:
XRaySourceMixin— shared by PowderXRD, SingleCrystalXRD, XPS, EDXElectronMicroscopyMixin— shared by TEM, SEMTemperatureProgramMixin— shared by TPR, TPO, ThermogravimetryChromatographyMixin,MassRangeMixin— shared by GC, GC-MS, HPLC, HPLC-MS
Reaction
The Reaction pillar represents the catalytic process being studied. It is modelled as a DCAT-AP-PLUS EvaluatedActivity — the process the dataset is about, not the process that generates the data. This distinction matters: for operando experiments (e.g. in-situ XRD during a reaction), the dataset carries both was_generated_by: Characterization and is_about_activity: Reaction.
The reactor is linked via carried_out_by as one of eight ReactorDesignType subclasses:
ElectrochemicalReactorCSTRPlugFlowReactorAutoclaveSlurryReactorMicroreactorFixedBedReactorFluidizedBedReactor
Simulation
The Simulation pillar covers four major computational method classes, each a SimulationMethod subclass (DCAT-AP-PLUS Plan): DFT, MolecularDynamics, Microkinetics, and MonteCarlo. The simulation software is linked via carried_out_by as a Software agent. Twelve CalculatedProperty classes (e.g. ElectronicStructure, BandGap, PhononDispersion, ThermodynamicStability) capture the computed output type.
Documentation
| Page | What it covers |
|---|---|
| Design Patterns | How the four pillars map to DCAT-AP-PLUS, the mixin pattern, ontology alignment |
| How to Extend | Rules for adding new preparation methods, techniques, reactor types, and properties |
| Schema Reference | Auto-generated reference for all classes and slots |
| CoreMeta4Cat Users | Projects and repositories that adopt CoreMeta4Cat |
Source code
The LinkML schema, build scripts, and documentation source are on GitHub: HendrikBorgelt/CoreMeta4Cat
The schema is built as a domain-specific application profile on top of DCAT-AP-PLUS. The base layer is maintained by NFDI4Cat.