Overview

EXACT-OM predicts correspondences between source and target ontology entities. It combines lexical label matching, ontology structure, auxiliary attributes, optional LLM arbitration, and a global candidate selector. The important runtime property is that each scored pair can carry an inspectable explanation: final scores are broken down into lexical, structural, and LLM contributions, with structural evidence split into hierarchy, similarity, difference, and attribute channels.

Global mode

Use this when you want EXACT-OM to generate candidates, score them, apply threshold and cardinality filters, and write a mapping TSV.

Local mode

Use this when you already have a candidate TSV. Pass -c and the system keeps the full ranked target list for each source.

Audit and review

Enable summary and JSON outputs to inspect scores, channel importances, selected triples, rationales, and run statistics.

Install

Install from the repository root. Use Python 3.10 and make sure Java is available on the system path before running ontology-backed workflows.

poetry install
poetry run exact --help
  • Required: Python 3.10, Poetry, Java/JDK or JRE.
  • Recommended: CUDA-capable GPU for large biomedical runs.
  • Visualizer frontend: Node/npm only when rebuilding explanations_visualizer.

The package defines these entry points: exact, bioml-eval, exact-llm-debug, exact-user-study, and exact-study-viz.

Inputs

Every alignment run needs a source ontology, a target ontology, an output directory, and a YAML config.

Input Required Format Used for
Source ontology Yes .owl Source entity labels, annotations, hierarchy, and graph evidence.
Target ontology Yes .owl Target entity labels, annotations, hierarchy, and graph evidence.
Training reference No TSV: SrcEntity, TgtEntity, Score Supervised calibration for the global candidate selector.
Full reference No TSV: SrcEntity, TgtEntity, Score Evaluation and reference-aware analysis.
Candidate file No TSV: SrcEntity, TgtEntity, TgtCandidates Local ranking mode. TgtCandidates stores the target candidate list for the source.

The helper data/get_data.py downloads Bio-ML data when the data directory is empty and can also build conference-dataset folders. The repository keeps large dataset folders ignored, so expect to provide benchmark data locally.

Run Alignment

Global alignment

Omit -c to let EXACT-OM build the candidate set and write a filtered global alignment.

poetry run exact \
  -s data/ncit-doid/ncit.owl \
  -t data/ncit-doid/doid.owl \
  -o exp/runs/ncit_doid/global \
  -y exact/default_config.yaml \
  -r data/ncit-doid/train.tsv \
  -f data/ncit-doid/test.tsv \
  -l -e -m 60G -d 0

Local candidate ranking

Pass -c when you want to score and rank an existing candidate set. All candidates remain in the ranking file.

poetry run exact \
  -s data/ncit-doid/ncit.owl \
  -t data/ncit-doid/doid.owl \
  -o exp/runs/ncit_doid/local \
  -y exact/default_config.yaml \
  -f data/ncit-doid/test.tsv \
  -c data/ncit-doid/test.cands.tsv \
  -l -e -m 60G -d 0

CLI options

Option Meaning
-s, --source_ontology_filePath to the source OWL ontology.
-t, --target_ontology_filePath to the target OWL ontology.
-o, --output_dirRun output directory. Created when missing.
-y, --config_fileYAML runtime configuration. Defaults to built-in settings when omitted.
-r, --training_reference_fileOptional training mappings for selector calibration.
-f, --full_reference_fileOptional reference mappings for evaluation and analysis.
-c, --candidates_fileCandidate restriction file. Enables local ranking mode.
-e, --run_evalRun evaluation after writing the alignment.
-l, --save_logsWrite exact.log in the run directory.
-m, --jvm_heap_sizeJVM heap size. A bare number is interpreted as GB.
-d, --deviceCUDA device id. Omit for CPU.

YAML runner

For repeatable jobs, use a run-config YAML with tools/run_exact_job.py.

poetry run python tools/run_exact_job.py \
  --run-config exp/runs/ncit_doid/run.yaml \
  --dry-run

The same helper can submit through Slurm with --sbatch-script deploy/sbatch/exact_single_run.sh.

Configuration

The default config is exact/default_config.yaml. Copy it into the run folder and edit only the blocks that matter for the run. Small overrides are merged with defaults, so you do not need to repeat every parameter.

Alignment decisions

alignment_params:
  threshold: 0.7
  cardinality: 1
  target_cardinality: 1
  save_json: true

threshold filters global alignments. In local mode it labels rationales as positive or negative while preserving the ranking.

Candidate generation

candidates_params:
  retrieval_strategy: hybrid
  top_k: 20
  lexical_encoder_name: sentence-transformers/all-MiniLM-L6-v2

Global mode uses these settings to build source-local target candidates before scoring.

Disable LLM use

model:
  params:
    use_llm: false
    generate_llm_rationales: false

This keeps lexical and structural scoring active while avoiding hosted or local LLM calls.

Use OpenRouter

export OPENROUTER_API_KEY=...

llm_routing:
  decision_profile: openrouter_gpt4o_mini
  rationale_profile: openrouter_gpt4o_mini

Hosted decision scoring is probe-gated. If chat logprobs are unavailable, the runtime falls back to the configured local decision profile.

The most common first-pass tuning knobs are candidates_params.top_k, alignment_params.threshold, dataset_params.n_hops, dataset_params.hierarchy_max_depth, and model.params.tau_LLM.

System Pipeline

EXACT-OM pipeline diagram showing candidate generation, pair-adaptive scoring, LLM arbitration, and output explanations.
Default pipeline: candidate retrieval, pair-adaptive evidence channels, optional LLM arbitration, selector, and auditable outputs.
  1. Exact lexical prefilter. Normalized labels and synonyms are matched first. Exact matches can be removed from downstream scoring and reinserted later.
  2. Candidate generation. In global mode, a hybrid retriever combines dense label embeddings with lexical token and character similarity.
  3. Pair-adaptive scoring. Each source-target pair receives lexical, hierarchy, similarity, difference, and attribute scores with quality estimates.
  4. Adaptive fusion. Strong and reliable channels receive more weight. Empty channels become neutral and do not dilute the result.
  5. LLM arbitration. Ambiguous or internally disagreeing pairs can receive a pair brief and a binary LLM decision probability.
  6. Global selection. For generated candidates, the optional CandidateSetSelector compares each source's candidate set jointly and can abstain with NO_MATCH.
  7. Audit export. Outputs include mapping files, flattened metrics, run stats, plots, and optional full explanation JSON.

Evidence channels

Channel Signal What to inspect
LexicalBest label or synonym similarity.s_label, I_label, selected labels.
HierarchyAligned parents or configured hierarchy families.s_hier, I_hier, selected hierarchy triples.
SimilaritySupported non-hierarchical object-property triples.s_sim, I_sim, selected similarity triples.
DifferenceInformative triples on one side without support on the other.s_diff, I_diff, unsupported evidence.
AttributeDefinitions, synonyms, xrefs, and projected literals that support the pair.s_attr, I_attr, selected attributes.
LLMDecision probability on ambiguous or disagreeing evidence.p_llm, I_llm, generated rationale.

Outputs

A successful run writes a structured output tree under the directory passed with -o.

run-dir/
  exact.log
  times.txt
  dataset/
    dataset.csv
    dataset.meta.json
    feature_metrics.csv
    plots/
  model/
    alignment/
      src2tgt.maps_global.tsv
      src2tgt.maps_local.tsv
      default/
        summary_metrics.csv
        run_stats.json
        run_stats.csv
        full_explanations.json
        llm_calibration.json
    checkpoints/
    cache/
    plots/
Artifact When present Purpose
src2tgt.maps_global.tsvGlobal modeSaved mappings with SrcEntity, TgtEntity, and Score.
src2tgt.maps_local.tsvLocal modePer-source ranked target candidates.
summary_metrics.csvalignment_params.save_csv: trueFlattened numeric scores, weights, importances, selector fields, and labels.
full_explanations.jsonalignment_params.save_json: trueFull candidate-level explanation records with channel evidence and rationales.
run_stats.jsonSummary CSV enabledRun-level aggregates, LLM usage, review-band fraction, and score distributions.
times.txtAlways after run stages completeStage-level runtime measurements.
Example explanation table with score, channel contribution, and evidence fields.
Explanation records are intended for reviewer-facing inspection, not only aggregate metrics.

Evaluation

Use -e during an alignment run, or evaluate an existing mapping file with bioml-eval.

poetry run bioml-eval \
  --alignment_file exp/runs/ncit_doid/global/model/alignment/src2tgt.maps_global.tsv \
  --output_dir exp/runs/ncit_doid/global \
  --full_reference_file data/ncit-doid/test.tsv \
  --source_ontology_file data/ncit-doid/ncit.owl \
  --target_ontology_file data/ncit-doid/doid.owl \
  --save_logs -m 32G

Global evaluation reports precision, recall, and F1 against the reference alignment. Local candidate evaluation uses the supplied candidate file and reports ranking metrics such as MRR and Hits@K.

Additional analysis helpers include tools/analyze_alignment_run.py, tools/aggregate_results.py, tools/run_candidate_recall_experiment.py, and tools/run_cardinality_threshold_tests.py.

User Study Analysis

The exact-user-study command builds reusable artifacts from an existing local ranking run. The run must contain model/alignment/src2tgt.maps_local.tsv and model/alignment/default/full_explanations.json.

poetry run exact-user-study \
  --run-dir exp/runs/omim_ordo/local \
  --top-k 5 \
  --per-rank 4 \
  --shortlist-per-rank 8 \
  --generate-rationales \
  --jvm-heap-size 32G

Artifacts are written to <run-dir>/analysis/user_study unless --output-dir is set.

  • pair_metrics.csv and source_panels.csv: candidate and source-panel metrics.
  • study_shortlist.csv and study_selection_review.csv: balanced selection workflow files.
  • study_selected_records_with_rationales.json: final selected cases for the visualizer.
  • study_mapping.json: compact payload served by the study visualizer.
  • failure_taxonomy.csv and user_study_analysis.ipynb: failure-analysis outputs.

Study Visualizer

The visualizer serves a fixed study run through FastAPI and a static React/Cytoscape frontend. It is designed for read-only inspection and LimeSurvey iframe embedding.

cd explanations_visualizer
npm install
npm run build

cd ..
poetry run python -m study_visualizer_runtime.cli \
  --run-dir exp/runs/omim_ordo/local \
  --analysis-dir exp/runs/omim_ordo/local/analysis/user_study \
  --port 8000

Open a specific source panel with:

http://localhost:8000/?source=<exact_source_iri>

Render bundle deployment

For a lightweight hosted visualizer, export a bundle and deploy the Render assets in deploy/render.

poetry run python tools/prepare_study_visualizer_bundle.py \
  --run-dir exp/runs/omim_ordo/local \
  --bundle-dir deploy/render/study_bundles/omim-ordo \
  --overwrite

The bundle contains the config, study mapping, selected records, ontology cache, and manifest needed by the runtime service.

Python API

Use the API wrappers when integrating EXACT-OM into a script or notebook.

from exact import AlignmentRunner

runner = AlignmentRunner(
    source_ontology_file="data/ncit-doid/ncit.owl",
    target_ontology_file="data/ncit-doid/doid.owl",
    output_dir="exp/runs/ncit_doid/api",
    training_reference_file="data/ncit-doid/train.tsv",
    full_reference_file="data/ncit-doid/test.tsv",
    config_file="exact/default_config.yaml",
    save_logs=True,
    jvm_heap_size="60G",
    run_eval=True,
    device=0,
)
runner.run()

Evaluation is available through exact.EvalutionRunner. The class name is intentionally spelled as it appears in the package API.

Operations

Caching

use_file_cache: true reuses dataset and model caches. Inference checkpoints are enabled by default and resume from compatible runs.

Memory

Use -m 60G or a larger heap for large OWL files. Java is needed for ontology loading, reasoning, and visualizer ontology expansion.

Devices

Pass -d 0 for GPU 0. If CUDA is unavailable, the alignment action logs a warning and uses CPU.

Slurm

Use the scripts in deploy/sbatch with the YAML runner for reproducible cluster jobs.

Troubleshooting

Symptom Likely cause Action
JVM fails to initialize Java is missing or JAVA_HOME does not point to a usable runtime. Install a JDK/JRE and rerun with an explicit heap size such as -m 32G.
Hosted LLM falls back locally Missing OpenRouter key or hosted decision profile lacks usable chat logprobs. Set OPENROUTER_API_KEY, choose a compatible profile, or disable LLM use.
No full_explanations.json JSON export is disabled. Set alignment_params.save_json: true before the run.
User-study analysis cannot start The input run is not a local ranking run or lacks full explanations. Run alignment with -c and enable save_json.
Very slow preprocessing Large ontology reasoning, high candidate top_k, or wide structural evidence pools. Increase heap, reuse cache, lower top_k, or reduce structural caps for exploratory runs.