Department of Informatics – DDIS

Dynamic and Distributed Information Systems Group

Three DDIS-Papers got accepted for the SSWS Worskhop at the ISWC 2013 in Sydney

12. August 2013 | Patrick Minder | Keine Kommentare |

Two DDIS-Papers got recently accepted for 9th International Workshop on Scalable Semantic Web Knowledge Base Systems at the ISWC 2013 in Sydney. Find the abstracts of the two interesting papers by Minh Khoa Nguyen, Lorenz Fischer, Dr. Thomas Scharrenbach, Philip Stutz, Mihaela Verman, and Prof. Abraham Bernstein in this Blog-Post.

Abstract: Network-Aware Workload Scheduling: Scalable Linked Data Stream Processing

Lorenz Fischer, Thomas Scharrenbach, Abraham Bernstein

In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the

workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy|a uniform distribution of

computation load among available machines|typically used by stream processing systems, disregards network-load as one of the major bottlenecks for throughput resulting in an immense load in terms of inter-machine communication.

In this paper we propose a graph-partitioning based approach for workload scheduling within stream processing systems. We implemented a

distributed triple-stream processing engine on top of the Storm realtime computation framework and evaluate its communication behavior using two real-world datasets.We show that the application of graph partitioning algorithms can decrease inter-machine communication substantially (by 40% to 99%) whilst maintaining an even workload distribution, even using very limited data statistics. We also find that processing RDF data as single triples at a time rather than graph fragments (containing multiple triples), may decrease throughput indicating the usefulness of semantics.

 

Abstract: Eviction Strategies for Semantic Flow Processing

Minh Khoa Nguyen, Thomas Scharrenbach, and Abraham Bernstein

In order to cope with the ever-increasing data volume continuous processing of incoming data via Semantic Flow Processing systems have been proposed. These systems allow to answer queries on streams of RDF triples. To achieve this goal they match (triple) patterns against the incoming stream and generate/update variable bindings. Yet, given the continuous nature of the stream the number of bindings can explode and exceed memory; in particular when computing aggregates. To make the information processing practical Semantic Flow Processing systems, therefore, typically limit the considered data to a (moving) window. Whilst this technique is simple it may not be able to find patterns spread further than the window or may still cause memory overruns when data is highly bursty.

In this paper we propose to maintain bindings (and thus memory) not on recency (i.e., a window) but on the likelihood of contributing to a complete match. We propose to base the decision on the matching likelihood and not creation time (fifo) or at random. Furthermore we propose to drop variable bindings instead of data as do load shedding approaches. Specifically, we systematically investigate deterministic and the matching-likelihood based probabilistic eviction strategy for dropping variable bindings in terms of recall. We find that matching likelihood based eviction outperforms fifo and random eviction strategies on synthetic as well as real world data.

 

Abstract: TripleRush: A Fast and Scalable Triple Store 

Philip Stutz, Mihaela Verman, Lorenz Fischer, and Abraham Bernstein

TripleRush is a parallel in-memory triple store designed to address the need for efficient graph stores that quickly answer queries over large-scale graph data. To that end it leverages a novel, graph-based architecture.

Specifically, TripleRush is built on our parallel and distributed graph processing framework Signal/Collect. The index structure is represented as a graph where each index vertex corresponds to a triple pattern. Partially matched copies of a query are routed in parallel along different paths of this index structure.

We show experimentally that TripleRush takes less than a third of the time to answer queries compared to the fastest of three state-of-the-art triple stores, when measuring time as the geometric mean of all queries for two benchmarks.

On individual queries, TripleRush is up to three orders of magnitude faster than other triple stores.

 

Abgelegt unter: Complex-Event ProcessingSemantic Web