A seminar by Prof. Antoine Doucet: Multilingual news surveillance in real-time.

Time: 14:00 on Feb 21, 2017

Place: room 816, Ta Quang Buu Library, HUST

Presenter:  Prof. Antoine Doucet, Full Professor at the L3i laboratory of the University of La Rochelle, and Director of the ICT department at the French-Vietnam University of Science and Technology of Hanoi (USTH).

Title: Multilingual news surveillance in real-time

Abstract:

In the age of open and big data, the task of automatically analysing numerous media in various format and multiple languages has become more crucial than ever. The ability to quickly and efficiently analyse the massive amount of documents on the Internet is crucial for domains such as business intelligence or disaster management. With hundreds of thousands of articles published every day, the online press represents a heterogeneous source of great importance.

This conference will present a new approach that is able to detect events from news in any language. By relying on the journalistic genre rather than on linguistic analysis, it is for instance able to find what epidemic diseases are active where, in any language. The DaNIEL system resulted, currently evaluated over 40 languages, and it was proven that on average it finds epidemic events faster than human experts. The paper describing the approach was recently awarded the title of “best paper in 2015” by the International Medical Informatics Association (out of 1,272 entries in PubMed and Web of Science) in the field of “Public Health and Epidemiology Informatics”, as selected in the yearbook of medical informatics.

We believe that the approach may be expanded to further domain, and current experiments aim to apply the same approach on social media and to the management of natural disasters.

More about speaker: Antoine Doucet is a tenured Full Professor at the L3i laboratory of the University of La Rochelle since 2014. Director of the ICT department at the French-Vietnam University of Science and Technology of Hanoi (USTH), he leads the digital document and contents research group of the L3i laboratory (about 40 people). His main research interests lie in the fields of information retrieval (structured and semi-structured) and natural language processing, in particular, the extraction and use of multi-word units. The central focus of his work is on the development of methods that scale to very large document collections and that do not require prior knowledge of the data (in particular, techniques that function for documents written in any language). Antoine Doucet obtained a PhD in computer science from the University in Helsinki (Finland) in 2005, and holds a French habilitation (HDR) since 2012.