INSTANCEMATCHING.ORG
How to discover similar objects in heterogeneous data sources
AKT-Rexa-SWETO DBLP PDF Print E-mail
Written by Andriy Nikolov   
Tuesday, 15 December 2009 11:02

The benchmark was originally constructed in Knowledge Media Institute. It contains the following three datasets:

  1. AKT EPrints. This dataset contains information about papers produced within the AKT research project.
  2. Rexa. The dataset was extracted from the Rexa search server, which was constructed at the University of Massachusetts using automatic IE algorithms.
  3. SWETO DBLP. This is a publicly available dataset listing publications from the computer science domain.

The SWETO-DBLP dataset was originally represented in RDF. Two other datasets (AKT EPrints and Rexa) were extracted from the HTML sources using specially constructed wrappers and structured according to the SWETO-DBLP ontology. The AKT and Rexa datasets and the gold standard mappings can be downloaded from the author's website.  The SWETO-DBLP dataset can be downloaded from the SWETO project web-site.

 

Add comment


Security code
Refresh