EMF
This module provides building blocks for AI solutions on top of EMF Ecore models.
Similarity
org.nasdanika.ai.emf.similarity package provides a number of interfaces and classes for computing similarity between EObjects as connections between EObjectNodes.
Similarities
Similarity can be of any type. The similarity package provides concrete classes for the below similarity types:
java.lang.Doublejava.lang.FloatEStructuralFeatureSimilarity- similarity which has an aggregated similarity value and similarity values for individual features.DoubleEStructuralFeatureSimilarity- binding toDouble.FloatEStructuralFeatureSimilarity- binding toFloat.
Connections
EStructuralFeatureConnection- a connection with a value and a structural feature (attribute or reference) indicating similarity of a feature of the target to the sourceDoubleEStructuralFeatureConnection- binding toDouble.FloatEStructuralFeatureConnection- binding toFloat.
SimilarityConnection- a connection which indicates how much its target is similar to the source. Similarity does not have to be symmetrical.DoubleSimilarityConnection- binding ofSimilarityConnectiontoDouble.EStructuralFeatureVectorSimilarityConnection- binding toEStructuralFeatureSimilarityDoubleEStructuralFeatureVectorSimilarityConnection- binding toDouble.FloatEStructuralFeatureVectorSimilarityConnection- binding toFloat.
FloatSimilarityConnection- binding ofSimilarityConnectiontoFloat.
Connection factories
SimilarityConnectionFactory- abstract base class for creating similarity connections.EStructuralFeatureVectorSimilarityConnectionFactory- abstract base class forEStructuralFeatureVectorSimilarityConnectionfactories withEStructuralFeatureSimilarityvalue. Computes value from the outgoingEStructuralFeatureConnections.DoubleEStructuralFeatureVectorSimilarityConnectionFactory- binding toDouble. Computes value as a weighted sum of feature values. The default feature weight is1.0, overridegetFeatureWeight()to customize.FloatEStructuralFeatureVectorSimilarityConnectionFactory- binding toFloat. Computes value as a weighted sum of feature values. The default feature weight is1.0, overridegetFeatureWeight()to customize.
DoubleEStructuralFeatureSimilarityConnectionFactory- createsDoubleSimilarityConnectionby computing its value from outgoingDoubleEStructuralFeatureConnections using a weighted sum. The default feature weight is1.0, overridegetFeatureWeight()to customize.FloatEStructuralFeatureSimilarityConnectionFactory- createsFloatSimilarityConnectionby computing its value from outgoingFloatEStructuralFeatureConnections using a weighted sum. The default feature weight is1.0, overridegetFeatureWeight()to customize.MessageCollectorSimilarityConnectionFactory- abstract base class for connections factories which collect messages to compute similarity (see graph similarity below)DoubleMessageCollectorSimilarityConnectionFactory- binding toDoubleSimilarityConnection.EStructuralFeatureVectorMessageCollectorSimilarityConnectionFactory- computesEStructuralFeatureSimilarityfrom messageEReferenceConnections in the message path.- DoubleEStructuralFeatureVectorMessageCollectorSimilarityConnectionFactory - binding to
Double
- DoubleEStructuralFeatureVectorMessageCollectorSimilarityConnectionFactory - binding to
Graph similarity
EObjectGraphMessageProcessor and its subclass DoubleEObjectGraphMessageProcessor can be used to compute similarity between model objects by constructing a graph on top of model objects and their relationships and then sending messages between graph nodes. Similarity may be computed from message values and message paths.
The below code snippet shows how to use the above classes:
DoubleEObjectGraphMessageProcessor<Void> messageProcessor = new DoubleEObjectGraphMessageProcessor<>(false, familyResource.getContents(), progressMonitor) {
/**
* Override this method to filter messages:
* - Drop long messages
* - Pass messages only through certain types of connections
* or from/to certain types of nodes
*/
@Override
protected boolean test(Message<Double> message, ProgressMonitor tpm) {
if (message.depth() > 20 || message.value() < 0.000001) {
return false;
}
Element recipient = message.recipient();
if (recipient instanceof EObjectNode) {
EObject eObject = ((EObjectNode) recipient).get();
return eObject instanceof Person || eObject instanceof EClass;
}
return true;
}
/*
* Customize connection weights and message values,
* return null for connections which shall not be traversed
*/
@Override
protected Double getOutgoingConnectionWeight(Connection connection) {
return connection instanceof EClassConnection ? 1.0 : null;
}
@Override
protected Double getIncomingConnectionWeight(Connection connection) {
return connection instanceof EClassConnection ? 1.0 : null;
}
@Override
protected Double getIncomingEReferenceWeight(EReference eReference) {
return eReference == EcorePackage.Literals.ECLASS__ESUPER_TYPES ? 1.0 : null;
}
@Override
protected Double getOutgoingEReferenceWeight(EReference eReference) {
return eReference == EcorePackage.Literals.ECLASS__ESUPER_TYPES ? 1.0 : null;
}
@Override
protected Double getConnectionMessageValue(
BiFunction<Connection, Boolean, Double> state,
Connection activator,
boolean incomingActivator,
Node sender,
Connection recipient,
boolean incomingRrecipient,
Message<Double> parent,
ProgressMonitor progressMonitor) {
Double connectionMessageValue = super.getConnectionMessageValue(
state,
activator,
incomingActivator,
sender,
recipient,
incomingRrecipient,
parent,
progressMonitor);
if (connectionMessageValue != null) {
return 0.8 * connectionMessageValue;
}
return connectionMessageValue;
}
};
DoubleEStructuralFeatureVectorMessageCollectorSimilarityConnectionFactory similarityConnectionFactory =
new DoubleEStructuralFeatureVectorMessageCollectorSimilarityConnectionFactory();
// Optional selector to send messages only from some graph nodes
Function<Map<Element, ProcessorInfo<BiFunction<Message<Double>, ProgressMonitor, Void>>>, Stream<BiFunction<Message<Double>, ProgressMonitor, Void>>> selector = processors -> {
...
};
// Optional message filter - can be used instead of overriding the test() method
// or in combination with the test() method.
BiFunction<Message<Double>, ProgressMonitor, Message<Double>> messageFilter = (m,p) -> {
...
};
messageProcessor.processes(
1.0,
selector,
messageTransformer,
similarityConnectionFactory,
progressMonitor);
Collection<DoubleEStructuralFeatureVectorSimilarityConnection> similarityConnections = similarityConnectionFactory.createSimilarityConnections();
TestFamilySimilarity provides several examples of computing similarity of family members:
- Relatives - messages are sent through the “parents” reference
- Gender - messages are sent through the EClassConnection and then through the supertypes reference
Performance
On a Windows 11 desktop with Intel i7-14700 2.1 GHz CPU:
- Single thread - 500K messages/second
- Cached thread pool (5-20) - 600k message/second
Applications
Graph RAG & fine tuning
- Find objects similar to a given (context) object
- Generate descriptions for the context objects and similar objects
- Use generated descriptions in a chat prompt or to fine-tune a model. In the latter case similar objects can be used to generate questions/answers.
In case of the family model it might be important that Paul is a Man and a father of Lea who is a Woman.
In the case of an architecture model, Internet Banking System for example, it might be important that API Application is a Container. It might be especially important for proprietary architecture models. Let’s say that API Application is a container image to be deployed to Kubernetes following organization’s guidelines.
It may also be important that the Internet Banking System belongs to a specific line of business or product portfolio.
Decision analysis
In Multi-criteria Decision Analysis message sending can be used to compute alternatives’ weights by sending a message from the goal. Messages would pass through the criteria (flat, hierarchy, network) and be collected by the alternative nodes. Message paths can be used to explain reasoning and to detect inconsistencies.
See also Task and design spaces with Visual Collaborative Multi-Criteria Decision Analysis
Product recommender
In banking and other businesses message passing can be used to compute similarity between customers and products.
Customer to customer similarity can be computed using customer location (country/state/city), income, net worth, demographics.
If products are organized into product groups/categories, product-to-products similarity can be computed by traversing the category hierarchy. A checking account product might be more similar to a saving account product (both deposits) than to a credit card. Or vice versa - both checking and credit card accounts are used for payments.
Then customer-product similarity can be computed by sending messages from a customer node to similar customers and then from those customer nodes to the products they use.
Nature model
This scenario is to demonstrate different types of similarity.
In the Nature model as shown here we may compute:
- Food similarity. For example, Fox it more similar to Hare than to Grass because Fox eats Hare, but doesn’t eat Grass. Computing this similarity would require the following traversal:
- Object -> class (type)
- Class -> operation (method)
- Operation -> parameters
- Parameter -> parameter type
- Parameter type -> object (instance)
- Reproduction - a female fox would have greater similarity with a male fox than to a female fox or other living beings. Computing this similarity would require traversal from an object to its class and then to other instances of the class.
Using the above similarities we can solve a “life optimization problem”: given a distribution of nutrients on 2D surface compute distribution of living beings - grass, hares and foxes. Living beings shall be close to their food so they don’t starve to death. They should be far enough from each other so they don’t exhaust their food supply. Living beings of the same species shall be close enough to beings of the opposite sex to reproduce.
From the above similarities distances/forces can be computed to use in a force layout graph to compute distributions.
Nasdanika