Guiding Theme B1: Structured summaries of complex contents

Guiding theme B1 addresses structured summaries, which provide an overview of and access to documents covering a specific facet (i.e., a certain topic, person, event, etc.). The reason for this is that summarizing multiple documents in a brief running text inevitably means to cut interesting details of individual documents. Users of an automatic system providing structured summaries should be able to explore “Big Data”, i.e. a document collection using concept maps or a hierarchically ordered “table of contents” across all documents. They should receive detailed information on the specifics of each document (e.g., as a list of keywords or short text snippets).

Example Ph.D. project

A Ph.D. project that primarily follows this guiding theme is expected to conduct innovative research on creating hierarchical multi-document summaries. This research will utilize state-of-the-art neural summarization techniques to identify important information, cluster it according to subtopics, argumentative strands, opinions, or time, and establish hierarchical relationships to allow users to drill down from a general overview to specific detailed information. A major challenge is the generation of hierarchically structured summaries for a heterogeneous document collection covering different domains, genres, or text types. The Ph.D. project can build upon previous work on hierarchical summarization datasets (Tauchmann et al., 2018), hierarchical summarization (Christensen et al., 2014), and automatically generating table-of-contents (Erbs et al., 2013).

The applicant will work closely together with researchers studying discourse graphs (area A: graph-based discourse processing) and research data (area D: Criteria and methods for quality assessment of heterogeneous sources and dossiers). She or he will also collaborate with the other two guiding themes of research area B (Natural Language Processing for multi-document summarization) in order to consolidate the summarization-focused research activities of AIPHES.

Research results of the first Ph.D. cohort

The research in guiding theme B1 aims at enhancing the state-of-the-art for generating navigation structures as a supporting tool for humans who explore large, heterogeneous document collections. Facing the constantly growing amount of electronically available documents, the need for supporting tools to handle them efficiently is high. As navigation structures, we use structured summaries in the form of concept maps. A concept map is a graph showing concepts as nodes and the relations among them as edges, with both kinds of labels extracted from text. Concept maps have been shown to be useful for many applications, including the summarization and structuring of documents.

In particular, we have studied concept-map-based semantic navigation structures (Nadolskyy, 2016), evaluated multiple semantic parsing approaches in order to obtain suitable predicate-argument analyses for concept maps (Falke and Gurevych, 2017b) and developed a German version of the information extraction system PropS (Falke et al., 2016) to enable research on concept-map-based summarization in both English and German. In close cooperation with research area D, we created a large benchmark dataset of concept maps for the educational domain (Falke and Gurevych, 2017a). This work has been distinguished with the “best resource/application paper award” at EMNLP 2017.

Based on this corpus, we evaluate methods for automatic concept map construction by incorporating concept co-reference resolution and predicate-argument-based event structures (Falke et al., 2017) as well as insights from the guiding themes A1 and A2. In our current work, we research end-to-end deep learning methods for automatic concept map construction (with C3) and improve concept maps based on user interaction and feedback (with D2).

People

PI: Prof. Dr. Iryna Gurevych
PhD student(s): Tobias Falke

References

Janara Christensen, Stephen Soderland, Gagan Bansal, Mausam. (2014). Hierarchical summarization: Scaling up multi-document summarization. In Proceedings of the 52nd Annual Meeting of the Association for Computlational Linguistics (ACL), pages 902–912, Baltimore, MD, USA.
Nicolai Erbs, Iryna Gurevych, and Torsten Zesch. (2013). Hierarchy Identification for Automatically Generating Table-of-Contents. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), pp. 252–260. Hissar, Bulgaria.

Publications

Zopf, Markus ; Botschen, Teresa ; Falke, Tobias ; Heinzerling, Benjamin ; Marasovic, Ana ; Mihaylov, Todor ; P. V. S., Avinesh ; Loza Mencía, Eneldo ; Fürnkranz, Johannes ; Frank, Anette (2018):
What's Important in a Text? An Extensive Evaluation of Linguistic Annotations for Summarization.
S. 272-277, [Konferenzveröffentlichung]

Falke, Tobias ; Meyer, Christian M. ; Gurevych, Iryna (2017):
Concept-Map-Based Multi-Document Summarization using Concept Coreference Resolution and Global Importance Optimization.
In: Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP), S. 801-811,
Taipei, Taiwan, [Konferenzveröffentlichung]

Falke, Tobias ; Gurevych, Iryna (2017):
Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps.
In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), S. 2951-2961,
Copenhagen, Denmark, [Konferenzveröffentlichung]

Falke, Tobias ; Gurevych, Iryna (2017):
GraphDocExplore: A Framework for the Experimental Comparison of Graph-based Document Exploration Techniques.
In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations, S. 19-24,
Copenhagen, Denmark, [Konferenzveröffentlichung]

Falke, Tobias ; Gurevych, Iryna (2017):
Utilizing Automatic Predicate-Argument Analysis for Concept Map Mining.
Volume 2: Short papers, In: Proceedings of the 12th International Conference on Computational Semantics (IWCS 2017),
Association for Computational Linguistics, Montpellier, France, [Konferenzveröffentlichung]

Falke, Tobias ; Stanovsky, Gabriel ; Gurevych, Iryna ; Dagan, Ido (2016):
Porting an Open Information Extraction System from English to German.
In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), S. 892-898,
Association for Computational Linguistics, Austin, TX, USA, [Konferenzveröffentlichung]

go to TU-biblio search on ULB website

i

Dieser Webauftritt wurde archiviert und wird nicht mehr aktualisiert.

Guiding Theme B1: Structured summaries of complex contents

Example Ph.D. project

Research results of the first Ph.D. cohort

People

References