This paper by Nayanika DOGRA, Philippe MULHEM, Nawal OULD AMER, and Lorraine GOEURIOT presents the approach used by the LIG-MRIM research group to the participation of the pilot task TimeLine illustration based on Microblogs for the 2016 CLEF Cultural Microblog Contextualization WorkShop that lead to the 2017 lab.
Home > Tasks 2017 > 3 - Time Line Illustration
3 - Time Line Illustration
1. Goal
The goal is to retrieve all relevant tweets dedicated to each event of a festival, according to the program provided. We are really looking here at a kind of "total recall" retrieval, based on initial shows names, artists names, the date and time of shows.
We focus in this task on 4 festivals. Two french Music festivals, one french theater festival and one great-britain theater festival:
- Vielles Charrues 2015 (program in pdf)
- Transmusicales 2015 (program online)
- Avignon 2016 (program in pdf)
- Edinburgh 2016 (program in pdf)
2. Topics
Topics are given in the file clef_mc2_task3_topics.xml
Each topic is related to one cultural event.
In our terminology, one event is one occurrence of a show (theater, music, ...).
Several occurrences of the same show correspond then to several events (e.g. plays can be presented several times during theater festivals).
More precisely, one topic is described by: one id, one festival name, one title, one artist (or band) name, one timeslot (date/time begin and end), and one location venue.
An excerpt from the topic list is:
<topics>
...
<topic>
<id>5</id>
<title></title>
<artist>Klangstof</artist>
<festival>transmusicales</festival>
<startdate>04/12/16-17:45</startdate>
<enddate>04/12/16-18:30</enddate>
<venue>UBU</venue>
</topic>
...
</topics>
The id is an integer ranging from 1 to 664.
We see from the excerpt above that, for a live music show without any specific title, the title field is empty.
The artist name is a single artist, a list of artist names,
an artistic company name or orchestra name, as they appear in the official programs of the festivals.
The festival labels are:
- charrues for Vielles Charrues 2015,
- transmusicales for Transmusicales 2015,
- avignon for Avignon 2016,
- edinburgh for Edinburgh 2016.
For the fields
If the start or end time is unknown, they’re replaced with : DD/MM/YY-xx:xx .
If the day is unknown, the date format is the following: -HH:MM (day is omitted).
The venue is a string corresponding to the name of the location, given by the official programs.
3. Dataset
A login is required to access the data, once registered on CLEF each registered team can obtain up to 4 extra individual logins by writing to admin@talne.eu.
Participants are required to use the full dataset to conduct their experiments:
- The complete stream of 70 000 000 microblogs is available here for registered participants.
- Anindri Index with a web interface is available to query the whole set of microblogs
4. Runs
The runs are expected to respect the classical trec top files format. Only the top 1000 results for each query run must be given. Each retrieved document is identified using its tweet id.
The evaluation will be achieved on a subset of the full set of topics, according to the richness of the results obtained.
The official evaluation measures planned are recall values at 5, 10, 25, 50 and 100 documents.
Each registered participant should submit no more than 6 runs. The protocol to submit the runs will be described later.
The evaluation protocol is likely to change depending on the submission received.
5. Evaluation
As much retweets will be excluded from the pools.
Tweet relevance will be based on a 3-level scale:
- Not relevant: the tweet is not related to the topic
- Partially relevant: the tweet is somehow related to the topic (e.g. the tweet is related to the artist, song, play but not to the event, or is related to a similar event with no possible way to check if they are the same)
- Relevant: the tweet is related to the event