Geospatial extensions of SPARQL like GeoSPARQL and stSPARQL have recently been defined and corresponding geospatial RDF stores have been implemented. However, there is no widely used benchmark for evaluating geospatial RDF stores which takes into account recent advances to the state of the art in this area. We have developed a benchmark, called Geographica, which uses both real-world and synthetic data to test the offered functionality and the performance of some prominent geospatial RDF stores. Our benchmark is composed by two workloads with their associated datasets and queries: a real-world workload based on publicly available linked data sets and a synthetic workload. We have perform experiments using Geographica for the following geospatial RDF stores, (i) Strabon, (ii) Parliament, (iii) uSeekM. The results of these experiments can be found in the first manuscript [1] of this benchmark in , which has been submitted to a conference.
The real-world workload uses publicly available linked geospatial data. This workload consists of a micro benchmark and a macro benchmark. The micro benchmark tests primitive spatial functions. We check the spatial component of a system with queries that use non-topological functions, spatial selections, spatial joins and spatial aggregate functions. In the macro benchmark we test the performance of the selected RDF stores in typical application scenarios like reverse geocoding, map search and browsing, and a real-world use case from the Earth Observation domain.
Query | Strabon | uSeekM | Parliament |
RG1 | 48.6 sec. | 0.15 sec. | 1.7 sec. |
RG2 | 16.2 sec. | 0.62 sec. | 0.9 sec. |
Query | Strabon | uSeekM | Parliament |
MSB1 | 0.06 sec. | 0.5 sec. | 0.2 sec. |
MSB2 | 0.69 sec. | 0.06 sec. | 19.2 sec. |
MSB3 | 0.07 sec. | 0.04 sec. | 2.8 sec. |
Query | Strabon | uSeekM | Parliament |
RM1 | 5.3 sec. | 196 sec. | 93 sec. |
RM2 | 0.8 sec. | 19 sec. | 3.4 sec. |
RM3 | 1.6 sec. | 2.5 sec. | 8.3 sec. |
RM4 | 61 sec. | 126 sec. | 509 sec. |
RM5 | 3.7 sec. | 93 sec. | 407 sec. |
RM6 | 135 sec. | > 1 hour | > 1 hour |
In the second workload of Geographica we use a generator that produces synthetic data of various sizes and generates queries of varying thematic and spatial selectivity. In this way, we can perform the evaluation of geospatial RDF stores in a controlled environment.
Geographica is an open source Java project that utilizes Apache Maven as a build automation tool. A user can get a clone of the Geographica Mercurial repository or just use Geographica from a Maven repository to test an RDF store. To get the source code of Geographica a user has to execute the following command.
> hg clone http://hg.strabon.di.uoa.gr/Geographica
In order to use Geographica in a Maven project a user should include the following dependency in the maven pom.xml
<dependency>
<groupId>gr.uoa.di.rdf</groupId>
<artifactId>runtime</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
Also, the following repository should be added in the <repositories> section of the pom.xml
<repository>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
<id>strabon.snapshot</id>
<name>Strabon - maven repository - snapshots</name>
<url>http://maven.strabon.di.uoa.gr/content/repositories/snapshots</url>
</repository>
For each workload Geographica defines one or more experiments and each experiment is implemented by a Java class (e.g., MicroNonTopologicalExperiment.java). Also, Geographica defines a Java interface (SystemUnderTest.java) which describes the functions which must be offered by a geospatial RDF store in order to be tested. So, a geospatial RDF store should implement this Java interface in order to be tested by Geographica. For example, the Maven project strabon-geographica implements the interface SystemUnderTest in order to use the geospatial RDF store Strabon and runs the experiment MicroNonTopologicalExperiment. The main functions which are defined by this interface are the following:
# Based on 24680540 KB RAM in the server wal_level = minimal # minimal, archive, or hot_standby # disable write ahead log when benchmarking # storing times default_statistics_target = 10000 maintenance_work_mem = 1GB checkpoint_completion_target = 0.9 effective_cache_size = 23GB # physical ram-memory for system work_mem = 15GB # memory for tmp,hashes etc per (sub)query # 23GB would be an extreme value for benchmarking # split memory between shared buffers # and work_mem appropriately wal_buffers = 32MB checkpoint_segments = 64 shared_buffers = 15GB max_connections = 48 # 3*cores for read-only scenarios. Otherwise use 2*cores geqo_threshold = 15 from_collapse_limit = 14 join_collapse_limit = 14
kernel.shmmax = 17179869184 kernel.shmmni = 4224 kernel.shmall = 17301504 fs.file-max = 1000000 fs.inotify.max_user_watches=150000
`