Home > Tasks 2018 > 1 - Cross Language cultural microblog search

1 - Cross Language cultural microblog search

Synopsis

Given a movie title and microcritics from the French VodKaster Network, the task is to find all relevant microblogs from the MC2 corpus in French, English, Spanish, Portuguese and Arabic. Runs will be evaluated based on the informativity of top ranked microblogs which combines graded relevance socores and diversity.

Task description

Use case

Browsing the VodKaster website allows French readers to get personal short comments called (microcritics) about movies. You can get similar or complementary opinions on twitter but they are less specific to movies and harder to find. The use case is to display to the reader a concise summary of microblogs related to the microcritics he/she is reading, considering bilingual and trilingual users that would read microblogs in other languages then French.

Topics

Topics represents a selection from VodKaster microcritics in French mentionning the term festival
Each topics contains:

A topic ID
A title made of the movie name
A narative showing a microcritic about the movie
A list nuggets (i.e terms and expressions ) manually setracted from microcritic

Microblog Corpus

The collection of microblogs is provided by the French national research project GAFES about Festivals. It has been collected based on the keyword festival from may 2015 to November 2016. It has been complemented with microblogs about cites like Cannes, Avignon, Lyon, Rennes and Edinburgh. It contains microblogs in all languages. Its usage is restricted to active participants only for research purpose.

A login is required to access the data, once registered to CLEF

The complete stream of 70 000 000 microblogs is available here for registered participants.
An indri Index with a web interface are available to query the whole set of microblogs

Evaluation

Runs will be primarily evaluated on the informativeness following INEX Tweets Contextualisation methodology [1] and using the FRESA [2] software extended to Arabic, French, Portuguese and Spanish. All Fresa metrics will be computed between runs top ranked microblog extracts and a textual reference to be provided by organizers. Following the evaluation process in [1], this reference will be based on both manual and pools runs from participant submissions.

[1] INEX Tweet Contextualization task : Evaluation, results and lesson learned
Patrice Bellot , Véronique Moriceau , Josiane Mothe, Eric SanJuan , Xavier Tanier :- Inf. Process. Manage. 52(5) : 801-819 (2016)
[2] http://fresa.talne.eu

Submissions

Submitted summeries should be in TREC format in a tabulated file with five fields:

a run ID : (Team name for exemple)
tweet id : a long integer representation of the unique identifier of this Tweet
an integer indicating its position in the summary
a float number as an estimation of its relevance
the main language of the microblog content (fr, en, es, pt or ar)
an extract of the microblog content with the author name if considered as relevant

Runs will be truncated at 50, 150 and 300 words, content will be concatenated and displayed to evaluators that will highlight relevant passages. Therefore, the concatenation of content in the last column should be readable by a human (i.e. this column needs to be readable on its own).

Schedule

Registration closes: 30 April 2018
End Evaluation Cycle: 04 June 2018
Submission of Participant Papers [CEUR-WS]: 08 June 2018
Notification of Acceptance Participant Papers: 15 June 2018
Camera Ready Copy of Participant Papers [CEUR-WS]: 29 June
2018
September 10-14 2018 CLEF 2018 Conference

Task organizers

Jean Valère Cossu (My Local Influence)
Olivier Hamon (Syllabs)
Eric SanJuan (LIA, Avignon, eric.sanjuan@univ-avignon.fr)

Task objectives and Evaluation process
21 February 2018, by Jean-valère, olivier, sanjuan

Objective
Vodkaster ( http://www.vodkaster.com/ ) is a French social network about movies where participants can share comments about movies under the form of microcritics not longer than a tweet. The main differences are the restricted cultural domain and the form. The objective of the task is for a given movie and microcitic and each language among French, English, Spanish, Portuguese and Arabic to provide a summary of the related microblogs. Microblogs included is a summary should (…)