CENTRE@CLEF 2018 - CLEF/NTCIR/TREC Reproducibility

The goal of CENTRE@CLEF 2018 is to run a joint CLEF/NTCIR/TREC task on challenging participants: 1) to reproduce best results of best/most interesting systems in previous editions of CLEF/NTCIR/TREC by using standard open source IR systems; 2) to contribute back to the community the additional components and resources developed to reproduce the results in order to improve existing open source systems.
  • Task 1 - Replicability : replicability of selected methods on the same experimental collections.
  • Task 2 - Reproducibility : reproducibility of selected methods on the different experimental collections.
  • Task 3 - Re-reproducibility : using the components developed in T1 and T2 and made available by the other participants to replicate/reproduce their results.
  • Lab Coordination : Nicola Ferro (University of Padua), Tetsuya Sakai (Waseda University), Ian Soboroff (NIST)
  • Lab website:

CheckThat! - Automatic Identification and Verification of Political Claims

CheckThat! aims to foster the development of technology capable of both spotting and verifying check-worthy claims in political debates in English and Arabic.
  • Task 1 -Check-Worthiness: Given a political debate, which is segmented into sentences with speakers annotated, identify which statements (claims) should be prioritized for fact-checking. This will be a ranking problem, and systems will be asked to produce a score, according to which the ranking will be performed.
  • Task 2 -Factuality: Given a list of already-extracted claims, classify them with factuality labels (e.g., true, half-true, false). This task will be run in an open mode. We will not provide any pre-selected set of documents to support the veracity labels. Participants will be free to use whatever resources they have and the Web in general, with the exception of the websites used by the organizers to collect the data.
  • Lab Coordination: Preslav Nakov, Lluís Màrquez, Alberto Barrón-Cedeño(Qatar Computing Research Institute), Wajdi Zaghouani(Carnegie Mellon University Qatar),Tamer Elsayed, Reem Suwaileh(Qatar University),Pepa Gencheva (Sofia University)
  • Lab website:

Dynamic Search for Complex Tasks

The primary aim of the CLEF Dynamic Search Lab is to develop algorithms which interact dynamically with user (or other algorithms) towards solving a task, and evaluation methodologies to quantify their effectiveness. The lab is organized along two tasks:
  • Task 1 -Query Suggestion:given a verbose topic description participants will generate and submit a sequence of queries and a ranking of the collection for each query. Queries will be evaluated over their effectiveness (query agent) and/or resemblance to user queries (user simulation). Query suggestion will be performed iteratively.
  • Task 2 -Result Composition:Given the obtained results from the aforementioned queries obtain a single ranked list by merging the individual rankings.
  • Lab Coordination: EvangelosKanoulas(University of Amsterdam), Leif Azzopardi(University of Strathclyde)
  • Lab website:

eRISK - Early risk prediction on the Internet

eRisk explores the evaluation methodology, effectiveness metrics and practical applications (particularly those related to health and safety) of early risk detection on the Internet.
  • Task 1 - Early Detection of Signs of Depression : the challenge consists of sequentially processing pieces of evidence (Social Media entries) and detect early traces of depression as soon as possible.
  • Task 2 - Early Detection of Signs of Anorexia: the challenge consists of sequentially processing pieces of evidence (Social Media entries) and detect early traces of anorexia as soon as possible. Both tasks are mainly concerned about evaluating Text Mining solutions and, thus, we concentrate on texts written in Social Media. Texts should be processed in the order they were posted. In this way, systems that effectively perform this task could be applied to sequentially monitor user interactions in blogs, social networks, or other types of online media.
  • Lab Coordination: David E. Losada(University of Santiago de Compostela), Fabio Crestani(Universityof Lugano), Javier Parapar(University of A Coruñ)
  • Lab website:

CLEF eHealth

Medical content is available electronically in a variety of forms ranging from patient records and medical dossiers, scientific publications and health-related websites to medical-related topics shared across social networks. This lab aims to support the development of techniques to aid laypeople, clinicians and policy-makers in easily retrieving and making sense of medical content to support their decision making.
  • Task 1 - Multilingual Information Extraction : Participants will be required to extract the causes of death from death certificates, authored by physicians in European languages. This can be seen as a named entity recognition, normalization, and/or text classification task.
  • Task 2 - Technologically Assisted Reviews in Empirical Medicine : Participants will be challenged to retrieve medical studies relevant to conducting a systematic review on a given topic. This can be seen as a total recall problem and is addressed by both query generation and document ranking.
  • Task 3 - Patient-centred Information Retrieval : Participants must retrieve web pages that fulfil a given patient’s personalised information need. This needs to fulfil the following criteria: information reliability, quality, and suitability. The task also has a multilingual querying track.
  • The tasks are open for everybody. We particularly welcome academic and industrial researchers, scientists, engineers and graduate students in natural language processing, machine learning and biomedical/health informatics to participate. We also encourage participation by multidisciplinary teams that combine technological skills with clinical expertise.
  • Lab website:
  • Lab coordination: Leif Azzopardi (Univ. of Strathclyde), Lorraine Goeuriot (Univ. J.Fourier), Evangelos Kanoulas (Univ. of Amsterdam), Liadh Kelly (Maynooth University), Aurélie Névéol (CNRS-LIMSI), Joao Palotti (Vienna Univ.), Aude Robert (INSERM/CepiDC), Rene Spijker (Cochrane), Hanna Suominen (Australian National Univ.), Guido Zuccon (Queensland Univ. of Technology

ImageCLEF-Multimedia Retrieval in CLEF

The lab provides an evaluation forum for the language independent annotation and retrieval of images, a domain for which tools are by far not as advanced as for text analysis and retrieval.
  • Task 1 - ImageCLEFlifelog : An increasingly wide range of personal devices, such as smartphones, video cameras as well as wearable devices that allow capturing pictures, videos, and audio clips in every moment of our life are becoming available. The task addresses the problems of lifelogging data understanding, summarization and retrieval.
  • Task 2 - ImageCLEFcaption : Interpreting and summarizing the insights gained from medical images such as radiology output is a time-consuming task that involves highly trained experts and often represents a bottleneck in clinical diagnosis pipelines. The task addresses the problem of bio-medical image concept detection and caption prediction from large amounts of training data.
  • Task 3 - ImageCLEFtuberculosis : The objective of this task is to determine tuberculosis subtypes and drug resistances, as far as possible automatically, from the volumetric image information in computed tomography (CT) volumes (mainly texture analysis) and based on clinical information (e.g., age, gender, etc).
  • Task 4 - VisualQuestionAnswering : With the ongoing drive for improved patient engagement and access to the electronic medical records via patient portals, patients can now review structured and unstructured data from labs and images to text reports associated with their healthcare utilization. Given a medical image accompanied with a set of clinically relevant questions, participating systems are tasked with answering the questions based on the visual image content.
  • Lab Coordination : Bogdan Ionescu (University Politehnica of Bucharest), Mauricio Villegas (SearchInk), Henning Müller (HES-SO)
  • Lab website :


LifeCLEF lab aims at boosting research on the identification of living organisms and on the production of biodiversity data. Through its biodiversity informatics related challenges, LifeCLEF is intended to push the boundaries of the state-of-the-art in several research directions at the frontier of multimedia information retrieval, machine learning and knowledge engineering. The lab is organized around three tasks:
  • Task 1 - GeoLifeCLEF : location-based species recommendation.
  • Task 2 - BirdCLEF : bird species identification from bird calls and songs.
  • Task 3 - ExpertLifeCLEF : experts vs. machines identification quality.
  • Lab Coordination : Alexis Joly (INRIA, LIRMM), Henning Müller (HES-SO), Pierre Bonnet (CIRAD, AMAP), Hervé Goëau (CIRAD, AMAP), Hervé Glotin (University of Toulon, LSIS CNRS), Simone Palazzo (University of Catania), Willem-Pier Vellinga (Xeno-Canto)
  • Lab website:

MC2-Multilingual Cultural Mining and Retrieval

Developing processing methods and resources to mine the social media sphere surrounding cultural events such as festivals. This requires to deal with almost all languages and dialects as well as informal expressions. There are three tasks:
  • Task 1 - Cross Language Cultural Retrieval over MicroBlogs : a) Small Microblogs Multilingual Information Retrieval in Arabic, English, French and Latin languages; b) Microblogs Bilingual Information Retrieval for tuning systems running on language pairs; c) Microblog Monolingual Information Retrieval based on 2017 language identification.
  • Task 2 - Mining Opinion Argumentation : a) Polarity detection in microblogs; b) Automatic identification of argumentation elements over Microblogs and WikiPedia; c) Classification and summarization of arguments in texts.
  • Task 3 - Dialectal Focus Retrieval : a) Arabic dialects in Blogs, MicroBlogsand Video News transcriptions; b) Spanish language variations in Blogs, MicroBlog and Journals.
  • Lab Coordination : ChirazLatiri (University Tunis ElManar), Eric SanJuan(LIA, Avignon University), Catherine Berrut(LIG, Grenoble Alpes University), Lorraine Goeuriot(LIG, Grenoble Alpes University), Julio Gonzalo (UNED)
  • Lab website:


PAN is a series of scientific events and shared tasks on digital text forensics.
  • Task 1 - Author Identification: cross-domain authorship attribution. More specifically, cases where the topic of texts varies significantly will be examined. In addition, we will continue the pilot task of style change detection, focusing on finding switches of authors within documents based on an intrinsic style analysis.
  • Task 2 - Author Obfuscation: while the goal of author identification and author profiling is to model author style so as to deanomyize authors, the goal of author obfuscation technology is to prevent that by disguising the authors. We will study author masking vs. authorship verification.
  • Task 3 - Author Profiling: the goal is to identify an author's traits based on their writing style. The focus will be on age and gender, whereas text and image will be used as information sources, offering tweets in English, Spanish and Arabic.
  • Lab Coordination: Martin Potthast(Leipzig University), Paolo Rosso (UniversitatPolitècnicade València), EfstathiosStamatatos(Univerisityof the Aegean), Benno Stein (Bauhaus-Universität Weimar)
  • Lab website:

PIR-CLEF - Evaluation of Personalised Information Retrieval

The primary aim of the PIR-CLEF 2018 laboratory is: 1) to facilitate comparative evaluation of PIR by offering participating research groups a mechanism for evaluation of their personalisation algorithms; 2) to give the participating groups the means to formally define and evaluate their own and novel user profiling approaches for PIR.
  • Task 1 - Personalized Search: we will provide a bag-of-words profile gathered during the query sessions performed by real searchers, the set of queries formulated by each user, together with the corresponding document relevance, and the the search logs of each user. Task participants will be expected to compute search results obtained by applying their personalization algorithms on these queries. The search will be carried out on the ClueWeb12 collection, by using the API provided by DCU.
  • Task 2 - User Profile Models: participants will be required to develop their own user profile models using the information gathered about the real user during her interactions with the system. The same information have been used for creating the baseline (keyword-based user profiles), which is provided in the benchmark.
  • Lab Coordination: Gabriella Pasi (University of Milano Bicocca), Gareth J. F. Jones (Dublin City University), Stefania Marrara(ConsorzioC2T), DebasisGanguly(IBM Research Dublin) , ProchetaSen (Dublin City University), Camilla Sanvitto(University of Milano Bicocca)
  • Lab website: