WIKIPEDIA REVISION TOOLKIT: EFfiCIENTLY ACCESSING WIKIPEDIA’S EDIT HISTORY

Authors

  • Oliver Ferschke
  • Torsten Zesch
  • Iryna Gurevych

Abstract

We present an open-source toolkit whichallows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access theedit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that,the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. By using a dedicated storage format, our toolkit massively decreases the data volume to less than 2% of the original size, and at the same time provides an easy-to-use interface to access the revision data. The language-independent design allows to process any language represented in Wikipedia. We expect this work to consolidate NLP research using Wikipedia in general, and to foster research making use of the knowledge encoded in Wikipedia’s edit history.

Downloads

Download data is not yet available.

Author Biographies

  • Oliver Ferschke
    Head of HR Marketing and Employer Branding  Adjuct Professor for Marketing  Head of Dialogue Marketing BMW 
  • Torsten Zesch

    Аналізує Вікіпедію як лексичний семантичний ресурс і порівнює її зі звичайними ресурсами, такими як словники, тесаурі, семантичні словомережі і т.д. Різні частини Вікіпедії відображають різні аспекти цих ресурсів.

  • Iryna Gurevych

    German computer scientist. She is Professor at the Department of Computer Science of the Technical University of Darmstadt and Director of Ubiquitous Knowledge Processing Lab.

Published

2012-04-25

Issue

Section

SECTION 1

How to Cite

WIKIPEDIA REVISION TOOLKIT: EFfiCIENTLY ACCESSING WIKIPEDIA’S EDIT HISTORY . (2012). Modern Information Technologies and Innovation Methodologies of Education in Professional Training Methodology Theory Experience Problems, 30, 76-82. https://vspu.net/sit/index.php/sit/article/view/3783