QueryVis - Workshop on Innovative Corpus Query and Visualization Tools

at Nodalida 2015, Vilnius (Lithuania), May, 11th, 2015

Recent years have seen an increased interest in and availability of many different kinds of corpora. These range from small, but carefully annotated treebanks to large parallel corpora and very large monolingual corpora for big data research. It remains a challenge to query the multilayer annotations of small corpora, to efficiently access large corpora as well as to visualize the query results.

When dealing with large corpora, query tools need to scale in terms of processing speed and reporting through statistical information and visualization options. This becomes evident, for example, when dealing with very large corpora (such as complete Wikipedia corpora) or multi-parallel corpora (such as Europarl or JRC Acquis). The goal of the workshop is to gather researchers who develop or evaluate new corpus query and visualization tools for linguistics, language technology or related disciplines.

QueryVis Workshop Program

The proceedings have been published online at Linköping University Electronic Press.

13:30h to 13:45h Martin Volk Introduction to the Workshop
  Session 1 (Chair: Andrius Utka)
13:45h to 14:15h Lucia Kocincová, Vít Baisa, Miloš Jakubíček and Vojtěch Kovář Interactive Visualizations of Corpus Data in Sketch Engine
14:15h to 14:45h Michał Kosek, Anders Nøklestad, Joel Priestley, Kristin Hagen and Janne Bondi Johannessen Visualisation in speech corpora: maps and waves in the Glossa system
30 min Coffee break
  Session 2 (Chair: Simon Clematide)
15:15h-16:00h Marc Kupietz (Institut für Deutsche Sprache, Mannheim) Invited Talk: Scaling out corpus technology: the open source query and analysis engine KorAP
16:00h-16:30h Joachim Bingel and Nils Diewald KoralQuery - A General Corpus Query Protocol
15 min Break
  Session 3 (Chair: Johannes Graën)
16:45-17:15h Ruprecht von Waldenfels ParaViz: A vizualization tool for crosslinguistic functional comparisons based on a parallel corpus
17:15h-17:45h Simon Clematide Reflections and Proposals for a Query and Reporting Language for Richly Annotated Multiparallel Corpora
17:45h-18:00h Gintare Grigonyte Closing Session

Invited Talk

Scaling out corpus technology: the open source query and analysis engine KorAP

Marc Kupietz, Institut für Deutsche Sprach, Mannheim

Abstract: With the growing importance of empiricism and a rapidly growing amount of research data, progress in linguistic research nowadays requires more and more sophisticated and methodologically sound technical infrastructure, far beyond of what typical university computing centres or typical research projects can deliver. Unfortunately however, the funding conditions in linguistics are still not as well adapted to this circumstance as in more established data-intensive research fields and even large scale e-infrastructure initiatives like CLARIN have provided a solid basis of standards and best practises, but nothing coming close to a sufficiently general tool for corpus based research. The talk will introduce KorAP, an open-source corpus analysis platform, mainly developed at the Institut für Deutsche Sprache. It will sketch KorAP's background, how it deals with current and upcoming scientific and technological challenges, how it tries to achieve long-term sustainability despite the aforementioned constraints and how it tries to contribute to progress in linguistic research.

Gintarė Grigonytė (Stockholm University)