Geographica: A Benchmark for Geospatial RDF Stores

Authors

George Garbis (ggarbis [at] di [dot] uoa [dot] gr)
Kostis Kyzirakos (kkyzir [at] di [dot] uoa [dot] gr)
Manolis Koubarakis(koubarak [at] di [dot] uoa [dot] gr)

Introduction

Geospatial extensions of SPARQL like GeoSPARQL and stSPARQL have recently been defined and corresponding geospatial RDF stores have been implemented. However, there is no widely used benchmark for evaluating geospatial RDF stores which takes into account recent advances to the state of the art in this area. We have developed a benchmark, called Geographica, which uses both real-world and synthetic data to test the offered functionality and the performance of some prominent geospatial RDF stores. Our benchmark is composed by two workloads with their associated datasets and queries: a real-world workload based on publicly available linked data sets and a synthetic workload. We have perform experiments using Geographica for the following geospatial RDF stores, (i) Strabon, (ii) Parliament, (iii) uSeekM. The results of these experiments can be found in the first manuscript [1] of this benchmark in , which has been submitted to a conference.

The real-world workload

The real-world workload uses publicly available linked geospatial data. This workload consists of a micro benchmark and a macro benchmark. The micro benchmark tests primitive spatial functions. We check the spatial component of a system with queries that use non-topological functions, spatial selections, spatial joins and spatial aggregate functions. In the macro benchmark we test the performance of the selected RDF stores in typical application scenarios like reverse geocoding, map search and browsing, and a real-world use case from the Earth Observation domain.

Datasets

Greek Administrative Geography Dataset (download)
CORINE Land Use/Land Cover Dataset (download)
LinkedGeoData Dataset (download)
GeoNames Dataset (download)
DBPedia Dataset (download)
Hotspots Dataset (download)

Micro Benchmark Queries

Macro Benchmark Queries

Macro Benchmark Detailed Results

In the manuscript [1] we report in detail the results of the micro benchmark and the experiments regarding the synthetic workload. On contrary, we report only average times needed for a complete iteration of all the queries of each scenario. To provide complete information of the experiment results we report here the average time needed to answer each query separately.

Response Times per query for the reverse geocoding scenario

Query	Strabon	uSeekM	Parliament
RG1	48.6 sec.	0.15 sec.	1.7 sec.
RG2	16.2 sec.	0.62 sec.	0.9 sec.

Response Times per query for the Map Search and Browsing scenario

Query	Strabon	uSeekM	Parliament
MSB1	0.06 sec.	0.5 sec.	0.2 sec.
MSB2	0.69 sec.	0.06 sec.	19.2 sec.
MSB3	0.07 sec.	0.04 sec.	2.8 sec.

Response Times per query for the Rapid Mapping for Fire Monitoring scenario

Query	Strabon	uSeekM	Parliament
RM1	5.3 sec.	196 sec.	93 sec.
RM2	0.8 sec.	19 sec.	3.4 sec.
RM3	1.6 sec.	2.5 sec.	8.3 sec.
RM4	61 sec.	126 sec.	509 sec.
RM5	3.7 sec.	93 sec.	407 sec.
RM6	135 sec.	> 1 hour	> 1 hour

The synthetic workload

In the second workload of Geographica we use a generator that produces synthetic data of various sizes and generates queries of varying thematic and spatial selectivity. In this way, we can perform the evaluation of geospatial RDF stores in a controlled environment.

Datasets

Synthetic Dataset (download)

queries

Synthetic

Geographica source code

Geographica is an open source Java project that utilizes Apache Maven as a build automation tool. A user can get a clone of the Geographica Mercurial repository or just use Geographica from a Maven repository to test an RDF store. To get the source code of Geographica a user has to execute the following command.
> hg clone http://hg.strabon.di.uoa.gr/Geographica
In order to use Geographica in a Maven project a user should include the following dependency in the maven pom.xml
<dependency>
<groupId>gr.uoa.di.rdf</groupId>
<artifactId>runtime</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
Also, the following repository should be added in the <repositories> section of the pom.xml
<repository>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
<id>strabon.snapshot</id>
<name>Strabon - maven repository - snapshots</name>
<url>http://maven.strabon.di.uoa.gr/content/repositories/snapshots</url>
</repository>

For each workload Geographica defines one or more experiments and each experiment is implemented by a Java class (e.g., MicroNonTopologicalExperiment.java). Also, Geographica defines a Java interface (SystemUnderTest.java) which describes the functions which must be offered by a geospatial RDF store in order to be tested. So, a geospatial RDF store should implement this Java interface in order to be tested by Geographica. For example, the Maven project strabon-geographica implements the interface SystemUnderTest in order to use the geospatial RDF store Strabon and runs the experiment MicroNonTopologicalExperiment. The main functions which are defined by this interface are the following:

void initialize(): A function that initializes a geospatial RDF store.
void close(): A function that closes a geospatial RDF store.
void clearCaches(): A function that clears the memory cache in respect to a geospatial RDF store.
long[] runQueryWithTimeOut(String query, int timoutSets): A function that evaluates a query against a geospatial RDF store. The parameter query is the query which should be evaluated and the parameter timeoutSecs is a time limit for evaluating the query. If this time limit expires then the query evaluation should be aborted.

Geographica is still a work in progress. We estimate that the first release of Geographica will be available early in July.

Examples

strabon-geographica (download)

Technical Details

We have perform experiments using Geographica for the following geospatial RDF stores:

Strabon (Version 3.2.8)
Parliament (Version 2.7.4)
uSeekM (Version 1.2.1)

The results of these experiments can be found in the manuscript [1] of this benchmark. Regarding Strabon and uSeekM which utilize PostGIS for evaluating geospatial queries we have used PostgreSQL v9.1.9 with PostGIS v2.0.0. We have also tunned it to make better use of the system resources. Especially we edited the files postgresql.conf and sysctl.con as following.

postgresql.conf

# Based on 24680540 KB RAM in the server
wal_level = minimal                 # minimal, archive, or hot_standby
                                    # disable write ahead log when benchmarking
		                    # storing times

default_statistics_target = 10000
maintenance_work_mem = 1GB 
checkpoint_completion_target = 0.9 
effective_cache_size = 23GB          # physical ram-memory for system
work_mem = 15GB                      # memory for tmp,hashes etc per (sub)query
                                     # 23GB would be an extreme value for benchmarking
                                     # split memory between shared buffers 
			             # and work_mem appropriately

wal_buffers = 32MB
checkpoint_segments = 64
shared_buffers = 15GB
max_connections = 48                 # 3*cores for read-only scenarios. Otherwise use 2*cores

geqo_threshold = 15
from_collapse_limit = 14
join_collapse_limit = 14

/etc/sysctl.conf

kernel.shmmax = 17179869184
kernel.shmmni = 4224
kernel.shmall = 17301504

fs.file-max = 1000000
fs.inotify.max_user_watches=150000

1. G. Garbis, K. Kyzirakos, M. Koubarakis. Geographica: A Benchmark for Geospatial RDF Stores. In the 12th International Semantic Web Conference (ISWC 2013). Sydney, Australia, October 21-25, 2013 [pdf]