Skip to content

Getting Started with CoreMeta4Cat

Adopting a new metadata standard does not have to mean changing how you work overnight. CoreMeta4Cat is designed to be useful at every stage of the journey — including if you stay with spreadsheets permanently. This page explains a pragmatic, low-barrier path towards semantically richer catalysis data, regardless of whether you use an Electronic Lab Notebook, a data management platform, or simply Excel.


Why a gradual adoption path?

Catalysis research is diverse. Some groups run highly standardised high-throughput experiments that map naturally onto structured data formats. Others carry out unique, exploratory investigations where no two experiments follow quite the same procedure — and where investing in dedicated data management infrastructure may not be feasible.

CoreMeta4Cat does not require you to change your data collection workflows. What it offers instead is a shared vocabulary: a set of agreed-upon terms and field names that can be applied to your existing data, wherever it lives, at whatever level of detail works for your group.

The path below goes from the simplest possible adoption — using the vocabulary reference to understand which terms exist — all the way to fully machine-readable, schema-validated records. You can stop at any point and already have something more interoperable than before.

Level 1 ──► Understand the vocabulary hierarchy
               ↓
Level 2 ──► Annotate your own spreadsheets with Voc4Cat terms
               ↓
Level 3 ──► Write a lightweight JSON converter for your sheet structure
               ↓
Level 4 ──► Validate records against the full CoreMeta4Cat schema

Level 1 — The vocabulary reference workbook

To help you understand which terms exist and how they relate to each other, CoreMeta4Cat provides a reference Excel workbook. This workbook is not a data entry form to fill in — it is a structured overview of the CoreMeta4Cat vocabulary hierarchy, organised by data class, that you can use as a lookup reference when designing or annotating your own data sheets.

The workbook has five sheets:

Sheet What it shows
CatCore The minimal global field set — catalyst type, support, metal, metal loading, additive, reaction type — with the Voc4Cat term for each
Catalysts synthesis precise The full hierarchy of synthesis fields, organised by synthesis step (solvation, mixing, milling, pH adjustment, filtration, crystallisation, washing, dilution, impregnation, drying, calcination, sieving, pelleting), with the corresponding Voc4Cat CURIE shown in each column header
Characterization FF Method parameter fields for 10+ characterisation techniques (PXRD, XAS, FTIR, Raman, GC-MS, …), showing which ontology terms apply to which measurement parameters
Characterization results The result fields that should be reported for each of the 25 techniques currently in CoreMeta4Cat, with the relevant ontology term for each result type
Cat test The full hierarchy of reaction / catalytic test fields — reactor type, operation mode, reactants, solvent, products, analysis — with Voc4Cat CURIEs

Use this workbook to find the right term for a concept you want to annotate, and to understand how fields nest within each other (e.g. which parameters belong under a calcination step versus a drying step).


Level 2 — Annotating your own spreadsheets

The most lightweight form of CoreMeta4Cat adoption is to take your existing experimental spreadsheets — the ones you already use to record synthesis batches, characterisation measurements, or reaction results — and annotate their column headers with the corresponding Voc4Cat terms.

How the annotation works

Each column header in your sheet gets a Voc4Cat CURIE added alongside it, whether in a second header row, as a cell comment, or simply appended to the header text. There is no prescribed format — what matters is that the machine-readable term is present and unambiguous.

The vocabulary reference workbook uses a two-line pattern in each column header as a model:

Human-readable label
voc4cat:XXXXXXX

Here are some representative examples drawn from the reference workbook:

Synthesis fields:

Label in reference workbook Voc4Cat term Meaning
institution voc4cat:0007842 Institution where the catalyst was prepared
catalyst voc4cat:0007014 Catalyst type
Preparation method voc4cat:0007016 Synthesis method
Support voc4cat:0007825 Support material
Precursor 1 voc4cat:0007794 First precursor compound
Precursor 1 amount voc4cat:0007038 Mass of precursor used
Metal loading, nominal (wt%) voc4cat:0007815 Nominal metal loading
calcination final temperature voc4cat:0000058 Final calcination temperature
calcination heating rate voc4cat:0000059 Heating rate during calcination
calcination dwelling time voc4cat:0000060 Hold time at calcination temperature

Reaction / catalytic test fields:

Label in reference workbook Voc4Cat term Meaning
Reaction type voc4cat:0007010 Type of catalytic reaction
Catalyst mass [g] voc4cat:0007792 Mass of catalyst loaded
Reactor voc4cat:0007017 Reactor type
operation mode voc4cat:0000108 Batch or flow operation
Reactor temperature voc4cat:0007032 Reactor temperature range
reactant voc4cat:0000101 Reactant compound(s)
Solvent voc4cat:0007246 Reaction solvent

Characterisation result fields:

Technique Key result fields and terms
BET analysis (AFP:0003761) Specific surface area (voc4cat), pore volume, average pore diameter
Powder XRD (CHMO:0000158) Phase identification, crystallite size (Scherrer), phase quantification
XPS (CHMO:0000404) Assigned species, binding energy, oxidation state, elemental concentration
ICP-OES Chemical element (SIO), concentration (afo)
GC / GC-MS (CHMO:0000497) Retention time (voc4cat), peak area
TEM (CHMO:0000080) Average particle size (voc4cat), particle shape
UV-Vis (CHMO:0000292) Wavelength (voc4cat), absorbance, attenuation coefficient
NMR (liquid) Chemical shift (NMRCV), signal intensity, multiplet feature

You do not need to annotate every column at once. Starting with the most essential fields — catalyst type, support, metal loading, and reaction type — already makes your spreadsheet far more comparable against data from other groups, because you are now using the same vocabulary terms.

What are CURIEs?

A CURIE (Compact URI) like voc4cat:0007014 is shorthand for a full web address: https://w3id.org/nfdi4cat/voc4cat_0007014. Every Voc4Cat term resolves to a human-readable definition at that address. This means a computer — or another researcher — can unambiguously identify what each field means, regardless of what language your column header is written in. That is the foundation of interoperability.

The CatCore minimal field set

The CatCore sheet in the reference workbook shows the smallest useful annotation set — fields that apply to every catalysis dataset regardless of data class. If you annotate nothing else, annotating these fields is already a meaningful step:

Field Voc4Cat term Notes
Catalyst type voc4cat:0007014 Choose from: heterogeneous (voc4cat:0007003), homogeneous (voc4cat:0007804), hybrid (voc4cat:0007805), biocatalyst
Support voc4cat:0007825 e.g. Al₂O₃, SiO₂, TiO₂, carbon
Metal e.g. Ni, Pd, Pt, Cu
Metal loading (wt%) voc4cat:0007815 Nominal loading
Additive voc4cat:0007793 Dopant (voc4cat:0007847), molecular modifier, or ligand (voc4cat:0007809)
Molar ratio metal : additive e.g. 1:0.1
Reaction type voc4cat:0007010 Use RXNO terms where available

Level 3 — Writing a JSON converter for your sheet

If you want your spreadsheet data to be fully machine-readable — for repository deposit, automated validation, or integration with other tools — the next step is to write a converter that reads your sheet and outputs a JSON record conforming to the CoreMeta4Cat schema.

One converter per sheet structure

Every research group's spreadsheets are different. Column order, naming conventions, the number of precursor columns, how reactions and syntheses are organised — these vary between groups, and often between projects within a group. Rather than attempting to handle this diversity with a single general-purpose tool, CoreMeta4Cat takes a different approach: each group writes a small, transparent converter script tailored to their own sheet layout.

This may sound like more work, but in practice a converter for a synthesis sheet is typically 30–80 lines of Python. It is a one-time investment per sheet structure, and it produces a permanent, auditable record of exactly how your data maps onto the CoreMeta4Cat schema. If your sheet layout changes, you update the converter accordingly.

The Voc4Cat annotations you added in Level 2 do most of the work: they are the mapping key that tells the converter which column corresponds to which CoreMeta4Cat field.

What a minimal converter looks like

import openpyxl, json

wb = openpyxl.load_workbook("my_synthesis_data.xlsx")
ws = wb["Synthesis"]

# Read column headers — these carry the Voc4Cat annotations
# so you know exactly which column maps to which CoreMeta4Cat field
headers = [cell.value for cell in ws[1]]

records = []
for row in ws.iter_rows(min_row=2, values_only=True):
    data = dict(zip(headers, row))
    if not any(data.values()):
        continue  # skip empty rows

    record = {
        "type": "CatalysisDataset",
        "was_generated_by": [{
            "type": "Synthesis",
            "nominal_composition": data.get("Name"),           # dct:title
            "realized_plan": {
                "type": "Impregnation",
                # voc4cat:0000058 — calcination final temperature
                "calcination_final_temperature":
                    data.get("calcination final temperature"),
                # voc4cat:0000060 — calcination dwelling time
                "calcination_dwelling_time":
                    data.get("calcination dwelling time"),
            },
            "had_input_entity": [{
                "type": "Precursor",
                "name": data.get("Precursor 1"),               # voc4cat:0007794
                "precursor_quantity": data.get("Precursor 1 amount")  # voc4cat:0007038
            }]
        }],
        "is_about_entity": [{
            "type": "CatalystSample",
            "support": data.get("Support"),                    # voc4cat:0007825
            "metal_loading_nominal":
                data.get("Metal loading, nominal (wt%)")       # voc4cat:0007815
        }]
    }
    records.append(record)

with open("synthesis_records.json", "w") as f:
    json.dump(records, f, indent=2, ensure_ascii=False)

The column names passed to data.get(...) are the same human-readable labels you already have in your sheet header. The Voc4Cat CURIEs in the comments document the semantic meaning of each mapping for anyone reading the script later.

The resulting JSON record

Running this converter produces a record like the following, which is a valid CoreMeta4Cat JSON instance:

{
  "type": "CatalysisDataset",
  "rdf_type": { "id": "voc4cat:0007001", "title": "heterogeneous catalysis" },
  "was_generated_by": [
    {
      "type": "Synthesis",
      "nominal_composition": "5wt% Ni/Al2O3",
      "realized_plan": {
        "type": "Impregnation",
        "calcination_final_temperature": 500.0,
        "calcination_dwelling_time": 4.0
      },
      "had_input_entity": [
        {
          "type": "Precursor",
          "name": "Ni(NO3)2·6H2O",
          "precursor_quantity": 1.24
        }
      ]
    }
  ],
  "is_about_entity": [
    {
      "type": "CatalystSample",
      "support": "Al2O3",
      "metal_loading_nominal": 5.0
    }
  ]
}

Level 4 — Validating against the full schema

Once you have JSON records, you can validate them against the CoreMeta4Cat schema using standard LinkML tooling:

pip install linkml

linkml-validate \
  --schema catcore.yaml \
  --target-class CatalysisDataset \
  synthesis_records.json

Validation reports which mandatory fields are missing, which values fall outside the expected type or controlled vocabulary, and which cross-record links are incomplete. You do not need to reach full validation compliance in one step — treat it as a diagnostic tool that tells you where the gaps are and helps you prioritise what to add next.


What each level gives you

Level What you need to do What you gain
1 — Vocabulary reference Browse the reference workbook Understanding of the field hierarchy and available Voc4Cat terms
2 — Annotate your sheets Add Voc4Cat CURIEs to column headers in your existing sheets Vocabulary-consistent, comparable data — no programming required
3 — Write a converter One small Python script per sheet structure Fully machine-readable JSON records, ready for repository deposit
4 — Validate Run linkml-validate Schema compliance, SHACL shapes, RDF export, semantic querying

Levels 1 and 2 require no programming knowledge and no changes to your existing data collection workflows. They represent a genuine step towards FAIR catalysis data on their own terms.


Next steps