In the "Text+Berg digital" project, two publication series of the Swiss Alpine Club (SAC), which have been published continuously since 1864, will be digitally recorded and corpus-linguistically processed in a first step. The two series, the "Yearbook of the S.A.C.". (1864-1923) and the "Alps" (1925-today), are a valuable collection of reports, essays and reflections on alpinism. Due to the temporal continuity in which the series have been published, they represent a unique textual basis for answering historical, cultural and linguistic questions.
The computational linguistic interest in the corpus lies on the one hand in the preparation of the corpus itself (automatic word type recognition, proper names/place name recognition etc.), but also in the analysis of the linguistic data, for example to refine language models. Since the publications contain not only German, but also French and Italian texts, it makes sense to create a "comparable corpus" for multilingual questions.