Task objectives and Evaluation process

Wednesday 21 February 2018, by Jean-valère, olivier, sanjuan

Objective

Vodkaster ( http://www.vodkaster.com/ ) is a French social network about movies where participants can share comments about movies under the form of microcritics not longer than a tweet. The main differences are the restricted cultural domain and the form.
The objective of the task is for a given movie and microcitic and each language among French, English, Spanish, Portuguese and Arabic to provide a summary of the related microblogs.
Microblogs included is a summary should provide relevant information about at least one of the following aspects:

the film mentioned in the microcritic including subject, genre, presence in festivals,
reception, audience, critics and opinions as well as actors and producers careers.
events like festivals mentioned in the microcritic if any, including opinions and narratives.
comments and critics in twitter similar to those in the microcritic if any.
Extended summaries can include microblogs about closely related films and events.
Promotional, automatic tweets or retweets are not considered as relevant. However, retweets by movie aficionados or movie makers are relevant.

Task description

Browsing the VodKaster website allows french readers to get personal short comments (microcritics)
about movies. You can get similar and/or complementary opinions on twitter but they are less specific to movies and harder to find. The use case is to display to the reader a concise summary of microblogs related to the microcritics he/she is reading, considering bilingual and trilingual users that would read microblogs in other languages than French.
Summaries are exclusively made of extracts from microblog contents and can include author names if considered as informative. They should be readable and codes like external URLs and references to multimedia objects should be removed. Three different summary lengths in words are considered: 50, 150 and 250.
Summaries are intended to provide an idea of all relevant information included in the corpus.
Diversity among top ranked microblogs is important. If the summary does not provide any microblog directly related to the topic it suggests that there is none in the corpus.

Evaluation process

Runs will be primarily evaluated on informativeness following INEX Tweet
Contextualization methodology [1] and based on the FRESA 2 [2] software extended to Arabic, French, Portuguese and Spanish. All FRESA metrics will be computed between runs top ranked microblog extracts and a textual reference to be provided by organizers. Following [1], this reference will based on both manual runs and pools from participant submissions.
Graded standard q-rels for microblogs will be automatically generated based on FRESA [2] scores to be used with standard TREC eval tools. However, due to the impact of microblog high redundancy and reposts over q-rels exhaustivity [3], these measures won’t be considered as official.
Alternative Nugget-based Information Retrieval Evaluation references and scores [4] will be also tentatively provided and discussed at the Lab.
Readability of results provided by systems will be also manually checked, the user case requiring these results to be displayed to the user.

[1] INEX Tweet Contextualization task : Evaluation, results and lesson learned
Patrice Bellot , Véronique Moriceau , Josiane Mothe, Eric SanJuan , Xavier Tannier :- Inf. Process.
Manage. 52(5) : 801-819 (2016)
[2] http://fresa.talne.eu
[3] Philippe Mulhem, Lorraine Goeuriot, Nayanika Dogra, Nawal Ould Amer: TimeLine Illustration
Based on Microblogs: When Diversification Meets Metadata Re-ranking. CLEF 2017 Proceedings.
Lecture Notes in Computer Science 10456, Springer 2017 : 224-235
[4] http://www.ccs.neu.edu/home/jaa/IIS-1256172/

Submission

Submitted summaries should be in TREC like format, a tabulated file with five fields:

a run ID
an integer indicating its position in the summary
a float number as an estimation of its relevance
the main language of the microblog content (fr, en, es, pt or ar)
an extract of the microblog content with the author name if considered as relevant

Runs will be truncated at 50, 150 and 300 words, content will be concatenated and displayed to evaluators that will highlight relevant passages. Therefore, the concatenation of content in the last column should be readable by a human (i.e. this column needs to be readable on its own).

Each team can submit up to three runs in each language (Arabic, English, French, Portuguese and Spanish). Teams will be invited to share there queries in different languages. Organizers will facilitate running submitted sets of queries on the following baseline systems.

Baseline system

A baseline system powered by Indri is provided to participants to run complex focus nested queries.
In this index, microblogs have been merge into XML documents per autor to allow expansions. Indri XML index permits to retrieve XML elements based on nested content.

<!ELEMENT xml (f, m)+>
<!ELEMENT f (#user_id)>
<!ELEMENT m (i, u, l, c d, t)>
<!ELEMENT i (#microblog_id)>
<!ELEMENT u (#user)>
<!ELEMENT l (#ISO_language_code)>
<!ELEMENT c (#client>
<!ELEMENT d (#date)>
<!ELEMENT t (#PCDATA)>
Example:
<xml><f>20666489</f>
<m><i>727389569688178688</i>
<u>soulsurvivornl</u>
<l>en</l>
<c>Twitter for iPhone</c>
<d>2016-05-03</d>
<t>RT @ndnl: Dit weekend begon het Soul Surivor Festival.</t>
</m>
<m><i>727944506507669504</i>
<u>soulsurvivornl</u>
<l>en</l>
<c>Facebook</c>
<d>2016-05-04</d>
<t>Last van een festival-hangover?</t>
</m>
</xml>