
| A minor flaw in current IIMB 2009 version |
|
|
|
| Written by Christian Meilicke |
| Friday, 11 December 2009 08:23 |
|
The current IIMB 2009 version has some really nice features. In particular, the stronger T-Box allows to check in how far systems can exploit this additional information. But when doing experiments with the dataset we noticed some minor characteristics that might distort the results related. Due to the synthetic generation of the dataset, the ids for the different versions are following this pattern: actor_453896_0 Many systems use not only datatype property values and labels as information source during the matching process, but also the ids of instances and classes. Especially for terminological matching (and many system started doing this) the id is often also a description, even though this is a bad modelling style. Each of these systems will have a great advantage over the other systems, which is not based on a better strategy but on this specifics of the dataset. Thus, it would be better to e.g. randomize the ids in some way. |
| Last Updated on Friday, 11 December 2009 11:59 |



Comments
RSS feed for comments to this post.