Motivation
Ontology Alignment is an essential tool for interoperability and semantic data integration, but most works in the field are restricted to finding simple correspondences.
Existing lexical-based approaches are limited to finding complex mappings where there is a lexical similarity between the entities, which is often not the case in real-world ontologies.
Existing traditional Association Rule Mining based approaches have a catch-all philosophy, exhaustively searching for frequent patterns and using predefined complex alignment patterns to filter the results a posteriori.
We propose novel pattern mining based algorithms for targeted complex ontology alignment, where patterns are used a priori, to allow for a targeted Association Rule Mining process, which we have found to be:
System
External ontology loading system facilities (AMLC)
The loading step retrieves the set of shared individuals between the two ontologies and organises the ontology information (types, relations and property values of each individual, ranges and domains of the properties and hierarchical relations between classes) in hash-tables.Matching algorithms
There are individual matchers dedicated to each of the complex alignment patterns, which search the hash-table data structures containing the relevant data for the targeted alignment pattern.The support (or frequency) of the source and target entities that participate in the pattern are stored and a common Association Rule Mining matching algorithm is responsible for extracting association rules.
Refinement algorithms
These algorithms receive mappings generated by some of the pattern matching algorithms as input and refine those mappings, converting simple subsumption mappings into complex equivalence ones.Filtering algorithms
Different filters select which of the candidate mappings to include in the final alignment, excluding redundant mappings and conflicting mappings with lower confidence.An aggregator algorithm combines mappings for the same entity into a single mapping using logical operators, such as “AND” and “OR”.
Evaluation
Data
We chose the Populated Conference dataset for the evaluation of the proposed algorithms, which is available in the OAEI 2020 Complex track. The dataset comprises five ontologies, from which we chose cmt and conference to align, given its richness in terms of complex patterns.
Manual scale
We manually classified the resulting mappings according to a rating scale consisting of the following five categories with associated scores.
Results
- Our algorithms cover eight distinct complex patterns, from which seven were found in the cmt-conference dataset.
- They were unable to find mappings for some of the patterns present in the reference, however, they found several mappings for patterns not present in the reference with high weighted precision.
- These results show that the reference alignment is not exhaustive in all nontrivial correspondences that are valid between these two ontologies, suggesting that complex alignment references may be incomplete.
Conclusions
- This work represents a paradigm shift by making use of the alignment patterns to steer, rather than filter, the Association Rule Mining process.
- The manual evaluation revealed that the majority of mappings we found are correct or nearly correct, even if not present in the reference alignment.
- These results highlight the importance of establishing evaluation metrics that consider varying degrees of correctness while being fully automated.
Funding
This work was supported by FCT through the LASIGE Research Unit (UIDB/00408/2020 and UIDP/00408/2020). It was also partially supported by the KATY project which has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 101017453.