<?xml 
version="1.0" encoding="utf-8"?><?xml-stylesheet title="XSL formatting" type="text/xsl" href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=backend.xslt" ?>
<rss version="2.0" 
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:atom="http://www.w3.org/2005/Atom"
>

<channel xml:lang="fr">
	<title>MC2 2018 Lab</title>
	<link>https://clef2018.clef-initiative.eu/mc2/</link>
	<description>MC2 CLEF Lab is centered on mining the social media sphere surrounding cultural events such as festivals and movies, It provides access for registered participants to the microbolg collection of the GAFES project funded by the French National Research Agency and lead by the University of Avignon.</description>
	<language>fr</language>
	<generator>SPIP - www.spip.net</generator>
	<atom:link href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=backend" rel="self" type="application/rss+xml" />




<item xml:lang="en">
		<title>Dialect detection in Informal Arabic Text</title>
		<link>https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=22</link>
		<guid isPermaLink="true">https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=22</guid>
		<dc:date>2018-04-24T15:24:21Z</dc:date>
		<dc:format>text/html</dc:format>
		<dc:language>en</dc:language>
		<dc:creator>Malek Hajjem</dc:creator>



		<description>

-
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=rubrique&amp;id_rubrique=14" rel="directory"&gt;3-Dialectal Focus Retrieval&lt;/a&gt;


		</description>


 <content:encoded>
		</content:encoded>


		

	</item>
<item xml:lang="en">
		<title>Content Analysis Results: Language identification 2017</title>
		<link>https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=21</link>
		<guid isPermaLink="true">https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=21</guid>
		<dc:date>2018-03-15T09:00:54Z</dc:date>
		<dc:format>text/html</dc:format>
		<dc:language>en</dc:language>
		<dc:creator>Malek Hajjem</dc:creator>



		<description>
&lt;p&gt;Results Topics are a random selection of original microblogs posted in June 2016 without external links and with more then 80 characters. Submissions and scores for the two best teams can be found here Syllabs and Lia. The task paper can be found here &lt;br class='autobr' /&gt; @inproceedingsDBLP:conf/clef/ErmakovaMS17, author = Liana Ermakova and Josiane Mothe and Eric SanJuan, title = CLEF 2017 Microblog Cultural Contextualization Content Analysis task (&#8230;)&lt;/p&gt;


-
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=rubrique&amp;id_rubrique=9" rel="directory"&gt;Data&lt;/a&gt;


		</description>


 <content:encoded>&lt;div class='rss_texte'&gt;&lt;p&gt;&lt;strong&gt;Results&lt;/strong&gt;&lt;/p&gt;
&lt;ul class=&#034;spip&#034; role=&#034;list&#034;&gt;&lt;li&gt;&lt;a href=&#034;https://mc2.talne.eu/data/clef2017/clef_mc2_task1_topics.txt&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;Topics&lt;/a&gt; are a random selection of original microblogs posted in &lt;a href=&#034;https://mc2.talne.eu/data/clef2017/clef_microblogs_festival2016-06.txt.gz&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;June 2016&lt;/a&gt; without external links and with more then 80 characters.&lt;/li&gt;&lt;li&gt; Submissions and scores for the two best teams can be found here &lt;a href=&#034;https://mc2.talne.eu/lab/IMG/xlsx/syllabs_task1b_language_diff.csv.xlsx&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;Syllabs&lt;/a&gt; and &lt;a href=&#034;https://mc2.talne.eu/lab/IMG/xlsx/lia_task1b_language_diff.csv.xlsx&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;Lia&lt;/a&gt;.&lt;/li&gt;&lt;li&gt; The task paper can be found &lt;a href=&#034;http://ceur-ws.org/Vol-1866/invited_paper_14.pdf&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;here&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class=&#034;precode&#034;&gt;&lt;pre class='spip_code spip_code_block' dir='ltr' style='text-align:left;'&gt;&lt;code&gt;@inproceedings{DBLP:conf/clef/ErmakovaMS17, author = {Liana Ermakova and Josiane Mothe and Eric SanJuan}, title = {{CLEF} 2017 Microblog Cultural Contextualization Content Analysis task Overview}, booktitle = {Working Notes of {CLEF} 2017 - Conference and Labs of the Evaluation Forum, Dublin, Ireland, September 11-14, 2017.}, year = {2017}, crossref = {DBLP:conf/clef/2017w}, url = {http://ceur-ws.org/Vol-1866/invited_paper_14.pdf}, timestamp = {Thu, 16 Nov 2017 14:36:59 +0100}, biburl = {https://dblp.org/rec/bib/conf/clef/ErmakovaMS17}, bibsource = {dblp computer science bibliography, https://dblp.org}
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Evaluation process&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The Evaluation process detects the reliability of the language on Twitter. &lt;br class='autobr' /&gt;
In fact, Tweet objects have a long list of &#8216;root-level' attributes, including fundamental attributes such as &#034;lang&#034;. When present, this &lt;a href=&#034;http://support.gnip.com/apis/powertrack2.0/rules.html#Operators&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;attribute&lt;/a&gt; indicates a &lt;a href=&#034;https://tools.ietf.org/html/bcp47&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;BCP 47&lt;/a&gt; language identifier corresponding to the &lt;strong&gt;machine-detected&lt;/strong&gt; language from where the microblog was edited. Obviously the machine-detected language may be different from the microblog langage. &lt;br class='autobr' /&gt;
Scores in this evaluation are assigned by a human expert. Only the tweets where the results of participants' language detector systems differ from tweet's &#034;lang&#034; attribute were examined. Tweets in several languages have a graduated score describing how much a language is present on it.&lt;/p&gt;&lt;/div&gt;
		
		</content:encoded>


		

	</item>
<item xml:lang="en">
		<title>Available ressources Clef 2018: detailed description</title>
		<link>https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=20</link>
		<guid isPermaLink="true">https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=20</guid>
		<dc:date>2018-03-14T16:09:54Z</dc:date>
		<dc:format>text/html</dc:format>
		<dc:language>en</dc:language>
		<dc:creator>Malek Hajjem</dc:creator>



		<description>
&lt;p&gt;The festival galleries dataset &lt;br class='autobr' /&gt;
A massive collection of microblogs and urls related to culture festivals are provided for registered participants here . In order to deal with such large dataset we propose different format : A CSV format : It is a tab-separated CSV file that could be useful in case of managing dataset via a Mysql database or python programming langague. An XML format for Indri: This format could be smoothly indexed with Indri in case of need. With tweet textual content (&#8230;)&lt;/p&gt;


-
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=rubrique&amp;id_rubrique=9" rel="directory"&gt;Data&lt;/a&gt;


		</description>


 <content:encoded>&lt;div class='rss_texte'&gt;&lt;p&gt;&lt;strong&gt;The festival galleries dataset&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A massive collection of microblogs and urls related to culture festivals are provided for registered participants &lt;a href=&#034;https://mc2.talne.eu/data/clef/&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;here&lt;/a&gt; .&lt;br class='autobr' /&gt;
In order to deal with such large dataset we propose different format :&lt;/p&gt;
&lt;ul class=&#034;spip&#034; role=&#034;list&#034;&gt;&lt;li&gt; A CSV format : It is a tab-separated CSV file that could be useful in case of managing dataset via a Mysql database or python programming langague.&lt;/li&gt;&lt;/ul&gt;&lt;ul class=&#034;spip&#034; role=&#034;list&#034;&gt;&lt;li&gt; An XML format for Indri: This format could be smoothly indexed with Indri in case of need. With tweet textual content some metadata ( see description above ) is also provided. We note that XML files are grouped by author.&lt;/li&gt;&lt;/ul&gt;&lt;div class=&#034;precode&#034;&gt;&lt;pre class='spip_code spip_code_block' dir='ltr' style='text-align:left;'&gt;&lt;code&gt;&lt;!ELEMENT xml (f, m)+&gt; &lt;!ELEMENT f (#user_id)&gt; &lt;!ELEMENT m (i, u, l, c d, t)&gt; &lt;!ELEMENT i (#microblog_id)&gt; &lt;!ELEMENT u (#user)&gt; &lt;!ELEMENT l (#ISO_language_code)&gt; &lt;!ELEMENT c (#client&gt; &lt;!ELEMENT d (#date)&gt; &lt;!ELEMENT t (#PCDATA)&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The festival galleries dataset is presented partially or totally. In case of a partial format, each csv file contains gathered tweets by month. Original tweets are separated from rediffused tweets to manage lighter files.&lt;/p&gt;
&lt;div class=&#034;precode&#034;&gt;&lt;pre class='spip_code spip_code_block' dir='ltr' style='text-align:left;'&gt;&lt;code&gt;festival
Originals: Re posts: 1- 2015-05(72M) 2015-05(54M)
2- 2015-06(235M) 2015-06(190M)
3- 2015-07(220M) 2015-07(162M)
... ...
... ...
... ...
... ...
18- 2016-10(102M) 2016-10(148M)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ul class=&#034;spip&#034; role=&#034;list&#034;&gt;&lt;li&gt; HTML form to test queries: this form make you able to test the Microblog search baseline system using an Indri query
&lt;div class=&#034;precode&#034;&gt;&lt;pre class='spip_code spip_code_block' dir='ltr' style='text-align:left;'&gt;&lt;code&gt;*Simple queries:
For a basic query, just type in the terms you wish to search on. Each term will be weighed equally and combined in an &#034;or&#034; fashion.
- hiphop jazz #combine(hiphop jazz ) *Phrase Matching: To search for a specific phrase (i.e. &#034;hiphop jazz&#034;), you can wrap your terms using the ordered window operator #n (where n is the window size of the number of terms). #1(hiphop jazz) Your search results would return only those documents where the terms &#034;hiphop&#034; and &#034;jazz&#034; appear in order. *Unordered Windows
The #uwN operator performs a search on terms that occur within a certain window size. For example, if we wanted to look for the terms &#034;hiphop&#034; and &#034;jazz&#034; that occured within 2 terms of each other, but we did not care if the term &#034;hiphop&#034; came before &#034;jazz&#034; or not, we would write this as: #uw2(hiphop jazz) *Boolean Searches By default, the Indri will return a document if any of the terms occur in the document; documents that contain more terms will generally be ranked above documents that contain fewer terms. If you wish to specify that all of your search terms must be included, you can use the &#034;boolean and&#034; operator (#band). For example, if you want to ensure that the terms &#034;hiphop&#034; and &#034;jazz&#034; both exist, use: #band(hiphop jazz)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul class=&#034;spip&#034; role=&#034;list&#034;&gt;&lt;li&gt; PERL API used to interroge the web service locally with suitable query in Indri language&lt;/li&gt;&lt;li&gt; Indri parameter files : A parameter file in XML format useful to reindex the collection with Indri&lt;/li&gt;&lt;li&gt; Compressed Indri Indexes per month&lt;/li&gt;&lt;li&gt; Programs to generate xml repositories from CSV ordered data&lt;/li&gt;&lt;li&gt; Root of Indri indexes and data&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Links&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;An uncompressed list of tweets url is available for participants in csv format. This metadata could be used to explore more the tweet content.&lt;/p&gt;&lt;/div&gt;
		
		</content:encoded>


		

	</item>
<item xml:lang="en">
		<title>Milestones and timetable 2018</title>
		<link>https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=5</link>
		<guid isPermaLink="true">https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=5</guid>
		<dc:date>2018-02-21T13:03:43Z</dc:date>
		<dc:format>text/html</dc:format>
		<dc:language>en</dc:language>
		<dc:creator>sanjuan</dc:creator>


		<dc:subject>Mile Stones</dc:subject>

		<description>
&lt;p&gt;Registration opens: 8 february 2018 (Task2) Registration closes: 30 April 2018 End Evaluation Cycle: 19 May 2018 Submission of Participant Papers [CEUR-WS]: 31 May 2018 Submission of Lab Overviews [LNCS]: 8 June 2018 Notification of Acceptance Participant Papers [CEUR-WS]: 15 June 2018 Notification of Acceptance Lab Overviews [LNCS]: 15 June 2018 Camera Ready Copy of Lab Overviews [LNCS]: 22 June 2018 Camera Ready Copy of Participant Papers and Extended Lab Overviews [CEUR-WS]: 29 June 2018 (&#8230;)&lt;/p&gt;


-
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=rubrique&amp;id_rubrique=4" rel="directory"&gt;Organization&lt;/a&gt;

/ 
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=mot&amp;id_mot=1" rel="tag"&gt;Mile Stones&lt;/a&gt;

		</description>


 <content:encoded>&lt;div class='rss_texte'&gt;&lt;ul class=&#034;spip&#034; role=&#034;list&#034;&gt;&lt;li&gt; Registration opens: 8 february 2018 (Task2)&lt;/li&gt;&lt;li&gt; Registration closes: 30 April 2018&lt;/li&gt;&lt;li&gt; End Evaluation Cycle: 19 May 2018&lt;/li&gt;&lt;li&gt; Submission of Participant Papers [CEUR-WS]: 31 May 2018&lt;/li&gt;&lt;li&gt; Submission of Lab Overviews [LNCS]: 8 June 2018&lt;/li&gt;&lt;li&gt; Notification of Acceptance Participant Papers [CEUR-WS]: 15 June 2018&lt;/li&gt;&lt;li&gt; Notification of Acceptance Lab Overviews [LNCS]: 15 June 2018&lt;/li&gt;&lt;li&gt; Camera Ready Copy of Lab Overviews [LNCS]: 22 June 2018&lt;/li&gt;&lt;li&gt; Camera Ready Copy of Participant Papers and Extended Lab Overviews [CEUR-WS]: 29 June 2018&lt;/li&gt;&lt;li&gt; CEUR-WS Working Notes Preview for Checking by Authors and Lab Organizers: 18-24 July 2018&lt;/li&gt;&lt;li&gt; September 10-14 2018 CLEF 2018 Conference&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;
		
		</content:encoded>


		

	</item>
<item xml:lang="en">
		<title>Task objectives and Evaluation process</title>
		<link>https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=17</link>
		<guid isPermaLink="true">https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=17</guid>
		<dc:date>2018-02-21T12:50:00Z</dc:date>
		<dc:format>text/html</dc:format>
		<dc:language>en</dc:language>
		<dc:creator>Jean-val&#232;re, olivier, sanjuan</dc:creator>



		<description>
&lt;p&gt;Objective &lt;br class='autobr' /&gt;
Vodkaster ( http://www.vodkaster.com/ ) is a French social network about movies where participants can share comments about movies under the form of microcritics not longer than a tweet. The main differences are the restricted cultural domain and the form. The objective of the task is for a given movie and microcitic and each language among French, English, Spanish, Portuguese and Arabic to provide a summary of the related microblogs. Microblogs included is a summary should (&#8230;)&lt;/p&gt;


-
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=rubrique&amp;id_rubrique=11" rel="directory"&gt;1 - Cross Language cultural microblog search&lt;/a&gt;


		</description>


 <content:encoded>&lt;div class='rss_texte'&gt;&lt;h2 class=&#034;spip&#034;&gt;Objective&lt;/h2&gt;
&lt;p&gt;Vodkaster ( &lt;a href=&#034;http://www.vodkaster.com/&#034; class=&#034;spip_url spip_out auto&#034; rel=&#034;nofollow external&#034;&gt;http://www.vodkaster.com/&lt;/a&gt; ) is a French social network about movies where participants can share comments about movies under the form of microcritics not longer than a tweet. The main differences are the restricted cultural domain and the form. &lt;br class='autobr' /&gt;
The objective of the task is for a given movie and microcitic and each language among French, English, Spanish, Portuguese and Arabic to provide a summary of the related microblogs.&lt;br class='autobr' /&gt;
Microblogs included is a summary should provide relevant information about at least one of the following aspects:&lt;/p&gt;
&lt;ul class=&#034;spip&#034; role=&#034;list&#034;&gt;&lt;li&gt; the film mentioned in the microcritic including subject, genre, presence in festivals,&lt;br class='autobr' /&gt;
reception, audience, critics and opinions as well as actors and producers careers.&lt;/li&gt;&lt;li&gt; events like festivals mentioned in the microcritic if any, including opinions and narratives.&lt;/li&gt;&lt;li&gt; comments and critics in twitter similar to those in the microcritic if any.&lt;br class='autobr' /&gt;
Extended summaries can include microblogs about closely related films and events.&lt;br class='autobr' /&gt;
Promotional, automatic tweets or retweets are not considered as relevant. However, retweets by movie aficionados or movie makers are relevant.&lt;/li&gt;&lt;/ul&gt;&lt;h2 class=&#034;spip&#034;&gt;Task description&lt;/h2&gt;
&lt;p&gt;Browsing the VodKaster website allows french readers to get personal short comments (microcritics)&lt;br class='autobr' /&gt;
about movies. You can get similar and/or complementary opinions on twitter but they are less specific to movies and harder to find. The use case is to display to the reader a concise summary of microblogs related to the microcritics he/she is reading, considering bilingual and trilingual users that would read microblogs in other languages than French.&lt;br class='autobr' /&gt;
Summaries are exclusively made of extracts from microblog contents and can include author names if considered as informative. They should be readable and codes like external URLs and references to multimedia objects should be removed. Three different summary lengths in words are considered: 50, 150 and 250.&lt;br class='autobr' /&gt;
Summaries are intended to provide an idea of all relevant information included in the corpus.&lt;br class='autobr' /&gt;
Diversity among top ranked microblogs is important. If the summary does not provide any microblog directly related to the topic it suggests that there is none in the corpus.&lt;/p&gt;
&lt;h2 class=&#034;spip&#034;&gt;Evaluation process&lt;/h2&gt;
&lt;p&gt;Runs will be primarily evaluated on informativeness following INEX Tweet&lt;br class='autobr' /&gt;
Contextualization methodology [1] and based on the FRESA 2 [2] software extended to Arabic, French, Portuguese and Spanish. All FRESA metrics will be computed between runs top ranked microblog extracts and a textual reference to be provided by organizers. Following [1], this reference will based on both manual runs and pools from participant submissions. &lt;br class='autobr' /&gt;
Graded standard q-rels for microblogs will be automatically generated based on FRESA [2] scores to be used with standard TREC eval tools. However, due to the impact of microblog high redundancy and reposts over q-rels exhaustivity [3], these measures won't be considered as official.&lt;br class='autobr' /&gt;
Alternative Nugget-based Information Retrieval Evaluation references and scores [4] will be also tentatively provided and discussed at the Lab.&lt;br class='autobr' /&gt;
Readability of results provided by systems will be also manually checked, the user case requiring these results to be displayed to the user.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;[1] INEX Tweet Contextualization task : Evaluation, results and lesson learned&lt;br class='autobr' /&gt;
Patrice Bellot , V&#233;ronique Moriceau , Josiane Mothe, Eric SanJuan , Xavier Tannier :- Inf. Process.&lt;br class='autobr' /&gt;
Manage. 52(5) : 801-819 (2016)&lt;br class='autobr' /&gt;
[2] &lt;a href=&#034;http://fresa.talne.eu&#034; class=&#034;spip_url spip_out auto&#034; rel=&#034;nofollow external&#034;&gt;http://fresa.talne.eu&lt;/a&gt;&lt;br class='autobr' /&gt;
[3] Philippe Mulhem, Lorraine Goeuriot, Nayanika Dogra, Nawal Ould Amer: TimeLine Illustration&lt;br class='autobr' /&gt;
Based on Microblogs: When Diversification Meets Metadata Re-ranking. CLEF 2017 Proceedings.&lt;br class='autobr' /&gt;
Lecture Notes in Computer Science 10456, Springer 2017 : 224-235&lt;br class='autobr' /&gt;
[4] &lt;a href=&#034;http://www.ccs.neu.edu/home/jaa/IIS-1256172/&#034; class=&#034;spip_url spip_out auto&#034; rel=&#034;nofollow external&#034;&gt;http://www.ccs.neu.edu/home/jaa/IIS-1256172/&lt;/a&gt;&lt;/i&gt;&lt;/p&gt;
&lt;h2 class=&#034;spip&#034;&gt;Submission&lt;/h2&gt;
&lt;p&gt;Submitted summaries should be in TREC like format, a tabulated file with five fields:&lt;/p&gt;
&lt;ol class=&#034;spip&#034; role=&#034;list&#034;&gt;&lt;li&gt; a run ID&lt;/li&gt;&lt;li&gt; an integer indicating its position in the summary&lt;/li&gt;&lt;li&gt; a float number as an estimation of its relevance&lt;/li&gt;&lt;li&gt; the main language of the microblog content (fr, en, es, pt or ar)&lt;/li&gt;&lt;li&gt; an extract of the microblog content with the author name if considered as relevant&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;Runs will be truncated at 50, 150 and 300 words, content will be concatenated and displayed to evaluators that will highlight relevant passages. Therefore, the concatenation of content in the last column should be readable by a human (i.e. this column needs to be readable on its own).&lt;/p&gt;
&lt;p&gt;Each team can submit up to three runs in each language (Arabic, English, French, Portuguese and Spanish). Teams will be invited to share there queries in different languages. Organizers will facilitate running submitted sets of queries on the following baseline systems.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Baseline system&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A baseline system powered by Indri is provided to participants to run complex focus nested queries.&lt;br class='autobr' /&gt;
In this index, microblogs have been merge into XML documents per autor to allow expansions. Indri XML index permits to retrieve XML elements based on nested content.&lt;/p&gt;
&lt;div class=&#034;precode&#034;&gt;&lt;pre class='spip_code spip_code_block' dir='ltr' style='text-align:left;'&gt;&lt;code&gt;&lt;!ELEMENT xml (f, m)+&gt;
&lt;!ELEMENT f (#user_id)&gt;
&lt;!ELEMENT m (i, u, l, c d, t)&gt;
&lt;!ELEMENT i (#microblog_id)&gt;
&lt;!ELEMENT u (#user)&gt;
&lt;!ELEMENT l (#ISO_language_code)&gt;
&lt;!ELEMENT c (#client&gt;
&lt;!ELEMENT d (#date)&gt;
&lt;!ELEMENT t (#PCDATA)&gt;
Example:
&lt;xml&gt;&lt;f&gt;20666489&lt;/f&gt;
&lt;m&gt;&lt;i&gt;727389569688178688&lt;/i&gt;
&lt;u&gt;soulsurvivornl&lt;/u&gt;
&lt;l&gt;en&lt;/l&gt;
&lt;c&gt;Twitter for iPhone&lt;/c&gt;
&lt;d&gt;2016-05-03&lt;/d&gt;
&lt;t&gt;RT @ndnl: Dit weekend begon het Soul Surivor Festival.&lt;/t&gt;
&lt;/m&gt;
&lt;m&gt;&lt;i&gt;727944506507669504&lt;/i&gt;
&lt;u&gt;soulsurvivornl&lt;/u&gt;
&lt;l&gt;en&lt;/l&gt;
&lt;c&gt;Facebook&lt;/c&gt;
&lt;d&gt;2016-05-04&lt;/d&gt;
&lt;t&gt;Last van een festival-hangover?&lt;/t&gt;
&lt;/m&gt;
&lt;/xml&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
		
		</content:encoded>


		

	</item>
<item xml:lang="en">
		<title>More about use case, data and evaluation process</title>
		<link>https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=19</link>
		<guid isPermaLink="true">https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=19</guid>
		<dc:date>2018-02-09T09:39:44Z</dc:date>
		<dc:format>text/html</dc:format>
		<dc:language>en</dc:language>
		<dc:creator>Chiraz Latiri, Julio Gonzalo, Malek Hajjem</dc:creator>



		<description>
&lt;p&gt;Detailed description &lt;br class='autobr' /&gt;
use case &lt;br class='autobr' /&gt;
Given, a selected of festivals name from popular festivals on FlickR English and French language, participants have to search for the most argumentative tweets in a collection covering 18 months of news about festivals in different languages. The identified tweets have to be a summary of ranked tweets according to their probability of being argumentative tweets. This use case was proposed to help festival organiser treating such set of tweets on priority. (&#8230;)&lt;/p&gt;


-
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=rubrique&amp;id_rubrique=12" rel="directory"&gt;2- Mining opinion argumentation&lt;/a&gt;


		</description>


 <content:encoded>&lt;div class='rss_texte'&gt;&lt;h2 class=&#034;spip&#034;&gt;Detailed description&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;use case&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Given, a selected of festivals name from popular festivals on FlickR &lt;a href=&#034;https://mc2.talne.eu/~t17malek/mc2_2018_t2/opinion/en/arg/data_t2_sample/English-topics.csv&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;English&lt;/a&gt; and &lt;a href=&#034;https://mc2.talne.eu/~t17malek/mc2_2018_t2/opinion/en/arg/data_t2_sample/French_topics.csv&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;French&lt;/a&gt; language, participants have to search for the most argumentative tweets in a collection covering 18 months of news about festivals in different languages. The identified tweets have to be a summary of ranked tweets according to their probability of being argumentative tweets. This use case was proposed to help festival organiser treating such set of tweets on priority. That is why the more the summary of ranked tweets is variant the in term of argumentation the more the run is useful.&lt;br class='autobr' /&gt;
For each language English and French (English and French), a monolingual scenario is expected : Given a festival name from topics file, participants have to search, from the microblog collection, the set of the most argumentative tweets in the same query language.&lt;br class='autobr' /&gt;
Samples of argumentative Tweets are provided here: &lt;a href=&#034;https://mc2.talne.eu/~t17malek/mc2_2018_t2/opinion/en/arg/data_t2_sample/English_sample_with_5_tweets.csv&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;English_Sample&lt;/a&gt;, &lt;a href=&#034;https://mc2.talne.eu/~t17malek/mc2_2018_t2/opinion/en/arg/data_t2_sample/French_sample_5_tweets.csv&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;French_Sample&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Topics&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#034;https://mc2.talne.eu/~t17malek/mc2_2018_t2/opinion/en/arg/data_t2_sample/English-topics.csv&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;English&lt;/a&gt; and &lt;a href=&#034;https://mc2.talne.eu/~t17malek/mc2_2018_t2/opinion/en/arg/data_t2_sample/French_topics.csv&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;French&lt;/a&gt; contain respectively 12 and 4 festival name. They represent a set of some popular festivals on FlickR for which we have pictures. Topics were carefully selected by organizer to ensure that selected topics have enough related argumentative tweets in our corpus. Such manual selection was conduct to to ensure a possible evaluation.&lt;/p&gt;
&lt;p&gt;The choice of FlickR as source of topic was motivated by the fact that such social media platform had a high quality amateur pictures. This personal involvement serves our goal as we are interested mainly to personal tweets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Microblog Corpus&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A login is required to access the data, once registered to &lt;a href=&#034;http://clef2018-labs-registration.dei.unipd.it/&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;CLEF&lt;/a&gt;&lt;/p&gt;
&lt;ul class=&#034;spip&#034; role=&#034;list&#034;&gt;&lt;li&gt; The complete stream of 70 000 000 microblogs is available &lt;a href=&#034;https://mc2.talne.eu/data/clef/&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;here&lt;/a&gt; for registered participants. This document collection is provided by GAFES. Microblogs are provided with their meta-information and expanded URLs on a MySQL server. &lt;br class='autobr' /&gt;
Due to legal terms the access to this database is restricted to registered participants under privacy agreement.&lt;/li&gt;&lt;li&gt; An [indri Index with a web interface-&lt;a href=&#034;https://mc2.talne.eu/data/clef/api&#034; class=&#034;spip_url spip_out auto&#034; rel=&#034;nofollow external&#034;&gt;https://mc2.talne.eu/data/clef/api&lt;/a&gt;] are available to query the whole set of microblogs&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Evaluation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The official evaluation measure is NDCG.&lt;/p&gt;
&lt;p&gt;This ranking measures will give a score for each retrieved tweet with a discount function over the rank. As we are mostly interested in top ranked arguments, this ranking measures meet our expectation.&lt;br class='autobr' /&gt;
This measure was also used in TREC Microblog Track [1]. A tweet is considered as highly relevant when it is a personal and contains an argument that directly refers to the festival (topic).&lt;/p&gt;
&lt;p&gt;&lt;i&gt;[1] Overview of the TREC-2015 Microblog Track&lt;br class='autobr' /&gt;
Jimmy Lin,Miles Efron, Yulu Wang, Garrick sherman, Ellen Voorhees&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Result Submission&lt;/strong&gt;&lt;br class='autobr' /&gt;
The runs must respect the classical trec top files format as describe above. Only the top 100 results for each query run must be given. Each run in each language, must contain 3 fields:
&lt;br /&gt;&lt;span class=&#034;spip-puce ltr&#034;&gt;&lt;b&gt;&#8211;&lt;/b&gt;&lt;/span&gt; Id : a long integer representation of the unique identifier of this Tweet
&lt;br /&gt;&lt;span class=&#034;spip-puce ltr&#034;&gt;&lt;b&gt;&#8211;&lt;/b&gt;&lt;/span&gt; Scores : The probability of being an argument tweet accorded by participant system &lt;br /&gt;&lt;span class=&#034;spip-puce ltr&#034;&gt;&lt;b&gt;&#8211;&lt;/b&gt;&lt;/span&gt; Rank : The accorded position of the tweet in the grading list of argument tweets
&lt;br /&gt;&lt;span class=&#034;spip-puce ltr&#034;&gt;&lt;b&gt;&#8211;&lt;/b&gt;&lt;/span&gt; Content: Microblog textual content&lt;/p&gt;
&lt;p&gt;Diversity criteria: &lt;br class='autobr' /&gt;
The more a run detects different arguments about a cultural event, the more it is interesting.&lt;/p&gt;
&lt;p&gt;Exemples about &#034;Cannes festival name:
&lt;br /&gt;&lt;span class=&#034;spip-puce ltr&#034;&gt;&lt;b&gt;&#8211;&lt;/b&gt;&lt;/span&gt; I ve seen some people saying they're boycotting Cannes &lt;i&gt;because of the high heels rule&lt;/i&gt;. I'm not sure they'll notice.
&lt;br /&gt;&lt;span class=&#034;spip-puce ltr&#034;&gt;&lt;b&gt;&#8211;&lt;/b&gt;&lt;/span&gt; Not going to lie, one of my favorite things about the Cannes festival &lt;i&gt;is all of these handsome men in tuxedos.&lt;/i&gt;
&lt;br /&gt;&lt;span class=&#034;spip-puce ltr&#034;&gt;&lt;b&gt;&#8211;&lt;/b&gt;&lt;/span&gt; Cannes is relevant because &lt;i&gt;movies get timed standing ovations&lt;/i&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How to get the data?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To get an access to the Microblog corpus, email malek.hajjem@univ-avignon.fr or registered to &lt;a href=&#034;http://clef2018-labs-registration.dei.unipd.it/&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;CLEF&lt;/a&gt;&lt;br class='autobr' /&gt;
The English topics can be downloaded &lt;a href=&#034;https://mc2.talne.eu/~t17malek/mc2_2018_t2/opinion/en/arg/data_t2_sample/English-topics.csv&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;here&lt;/a&gt; &lt;br class='autobr' /&gt;
The French topics can be downloaded &lt;a href=&#034;https://mc2.talne.eu/~t17malek/mc2_2018_t2/opinion/en/arg/data_t2_sample/French_topics.csv&#034; class=&#034;spip_out&#034; rel=&#034;external&#034;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Contact Information&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you have any question, email us through this address mail : malek.hajjem@univ-avignon.fr&lt;/p&gt;&lt;/div&gt;
		
		</content:encoded>


		

	</item>
<item xml:lang="en">
		<title>Towards Argumentative Ranking </title>
		<link>https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=18</link>
		<guid isPermaLink="true">https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=18</guid>
		<dc:date>2018-02-08T18:17:15Z</dc:date>
		<dc:format>text/html</dc:format>
		<dc:language>en</dc:language>
		<dc:creator>Chiraz Latiri, Julio Gonzalo, Malek Hajjem</dc:creator>



		<description>
&lt;p&gt;Organizers: &lt;br class='autobr' /&gt;
Chiraz Latiri, Julio Gonzalo, Malek Hajjem &lt;br class='autobr' /&gt;
Task 2 participation deadline April 30, 2018 &lt;br class='autobr' /&gt;
Argumentative Ranking of Microblogs &lt;br class='autobr' /&gt;
Argumentation mining is a new problem in corpus-based text analysis that addresses the challenging task of automatically identifying the justifications provided by opinion holders for their judgment. Several approaches of argumentation mining have been proposed so far in areas such as legal documents, on-line debates, product reviews, newspaper (&#8230;)&lt;/p&gt;


-
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=rubrique&amp;id_rubrique=12" rel="directory"&gt;2- Mining opinion argumentation&lt;/a&gt;


		</description>


 <content:encoded>&lt;div class='rss_texte'&gt;&lt;p&gt;&lt;strong&gt;Organizers: &lt;br class='autobr' /&gt;
&lt;/strong&gt;&lt;br class='autobr' /&gt;
&lt;i&gt;Chiraz Latiri, Julio Gonzalo, Malek Hajjem&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;Task 2 participation deadline &lt;strong&gt;April 30, 2018&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Argumentative Ranking of Microblogs&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Argumentation mining is a new problem in corpus-based text analysis that addresses the challenging task of automatically identifying the justifications provided by opinion holders for their judgment. Several approaches of argumentation mining have been proposed so far in areas such as legal documents, on-line debates, product reviews, newspaper articles and court cases, as well as in dialogical domains. &lt;br class='autobr' /&gt;
With the popularization of social networks, argumentation mining is considered as an extension of the opinion mining issue from social network content. The aim is to automatically identify reason-conclusion structures that can lead to model social web user's positions about a service or an event expressed through social networks platforms like Twitter. Indeed, when we need to form an opinion on a new topic or make a decision, arguments will be all what we are looking for.&lt;br class='autobr' /&gt; To make argumentation structures available, in case of Twitter, a robust automatic recognition of it is required, based on resources that should be created in a reproducible fashion to be reliable. However, the ambiguity of natural language text produced in social media, with different writing styles, implicit context and heterogeneous content make argumentation, on Twitter, very challenging.&lt;/p&gt;
&lt;p&gt;Another possible way to pick up the argumentation structures, from a generic tweet corpus, is to use approaches based on information extraction. The idea is to perform a search process that focus on claims about a given topic out in a massive collection. This approach relates to the field of focused retrieval, that aims to provide users with direct access to relevant information in retrieved documents. In this task, relevant information is expressed in the form of arguments. [1]&lt;/p&gt;
&lt;p&gt;Success of such argumentation ranking will require interdisciplinary approaches based on the combination of different research issues. In fact, to better understand a short text and be able to detect the argumentative structures within a microblog, we could restore a &#171; text contextualization &#187; as a way to provide more information on the corresponding text [2]. Providing such information in order to detect argumentative tweets, would highlight relevant ones, in other words, tweets expressed in the form of arguments. Thus, argumentation mining in this situation will tend to act in the same way of an Information Retrieval (IR) system where potential argumentative tweets had to come first. Similar approach that addresses such purpose is presented in [3], where the output of the priority task will be a ranking of tweets according to their probability of being a potential threat to the reputation of some entity.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;[1] Argumentative Ranking&lt;br class='autobr' /&gt;
Marco Lippi and Paolo Sarti and Paolo Torroni DISI - Universita degli Studi di Bologna Proceedings of Natural Language Processing meets Journalism - IJCAI-16 Workshop (NLPMJ 2016), New York, (July 2016)&lt;br class='autobr' /&gt;
[2] INEX Tweet Contextualization task : Evaluation, results and lesson learned&lt;br class='autobr' /&gt;
Patrice Bellot, V&#233;ronique Moriceau, Josiane Mothe, Eric SanJuan, Xavier Tannier:- Inf. Process. Manage. 52(5): 801-819 (2016)&lt;br class='autobr' /&gt;
[3] Overview of RepLab 2013: Evaluating Online Reputation Monitoring Systems&lt;br class='autobr' /&gt;
Enrique Amigo, Jorge Carrillo de Albornoz, Irina Chugur, Adolfo Corujo Julio Gonzalo, Tamara Martin, Edgar Meij Maarten de Rijke and Damiano Spina&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;
		
		</content:encoded>


		

	</item>
<item xml:lang="en">
		<title>TimeLine Illustration based on Microblogs</title>
		<link>https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=14</link>
		<guid isPermaLink="true">https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=14</guid>
		<dc:date>2016-10-19T19:42:26Z</dc:date>
		<dc:format>text/html</dc:format>
		<dc:language>en</dc:language>
		<dc:creator>Lorraine, Philippe</dc:creator>


		<dc:subject>CLEF 2016</dc:subject>

		<description>
&lt;p&gt;This paper by Nayanika DOGRA, Philippe MULHEM, Nawal OULD AMER, and Lorraine GOEURIOT presents the approach used by the LIG-MRIM research group to the participation of the pilot task TimeLine illustration based on Microblogs for the 2016 CLEF Cultural Microblog Contextualization WorkShop that lead to the 2017 lab.&lt;/p&gt;


-
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=rubrique&amp;id_rubrique=7" rel="directory"&gt;3 - Time Line Illustration&lt;/a&gt;

/ 
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=mot&amp;id_mot=3" rel="tag"&gt;CLEF 2016&lt;/a&gt;

		</description>


 <content:encoded>&lt;div class='rss_texte'&gt;&lt;p&gt;This paper by Nayanika DOGRA, Philippe MULHEM, Nawal OULD AMER, and Lorraine GOEURIOT presents the approach used by the LIG-MRIM research group to the participation of the pilot task TimeLine illustration based on Microblogs for the 2016 CLEF Cultural Microblog Contextualization WorkShop that lead to the 2017 lab.&lt;/p&gt;&lt;/div&gt;
		&lt;div class="hyperlien"&gt;View online : &lt;a href="http://ceur-ws.org/Vol-1609/16091201.pdf" class="spip_out"&gt;LIG at CLEF 2016 Cultural Microblog Contextualization: TimeLine Illustration based on Microblogs&lt;/a&gt;&lt;/div&gt;
		
		</content:encoded>


		

	</item>
<item xml:lang="en">
		<title>Wikipedia XML corpus for summary generation</title>
		<link>https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=13</link>
		<guid isPermaLink="true">https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=13</guid>
		<dc:date>2016-10-18T16:44:45Z</dc:date>
		<dc:format>text/html</dc:format>
		<dc:language>en</dc:language>
		<dc:creator>sanjuan</dc:creator>


		<dc:subject>data</dc:subject>
		<dc:subject>CLEF 2016</dc:subject>

		<description>
&lt;p&gt;Wikipedia is under Creative Commons license, and its contents can be used to contextualize tweets or to build complex queries referring to Wikipedia entities. &lt;br class='autobr' /&gt;
We have extracted an average of 10 million XML documents from Wikipedia per year since 2012 in the four main twitter languages:- en, es, fr and pt. &lt;br class='autobr' /&gt;
These documents reproduce in an easy-to-use XML structure the contents of the main Wikipedia pages: title, abstract, section and subsections as well as Wikipedia internal links. Other (&#8230;)&lt;/p&gt;


-
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=rubrique&amp;id_rubrique=5" rel="directory"&gt;1 - Content Analysis&lt;/a&gt;

/ 
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=mot&amp;id_mot=2" rel="tag"&gt;data&lt;/a&gt;, 
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=mot&amp;id_mot=3" rel="tag"&gt;CLEF 2016&lt;/a&gt;

		</description>


 <content:encoded>&lt;div class='rss_texte'&gt;&lt;p&gt;Wikipedia is under Creative Commons license, and its contents can be used to contextualize tweets or to build complex queries referring to Wikipedia entities.&lt;/p&gt;
&lt;p&gt;We have extracted an average of 10 million XML documents from Wikipedia per year since 2012 in the four main twitter languages:- en, es, fr and pt.&lt;/p&gt;
&lt;p&gt;These documents reproduce in an easy-to-use XML structure the contents of the main Wikipedia pages: title, abstract, section and subsections as well as Wikipedia internal links. Other contents such as images, footnotes and external links are stripped out in order to obtain a corpus easier to process using standard NLP tools.&lt;/p&gt;
&lt;p&gt;By comparing contents over the years, it is possible to detect long term trends&lt;/p&gt;&lt;/div&gt;
		&lt;div class="hyperlien"&gt;View online : &lt;a href="http://tc.talne.eu/" class="spip_out"&gt;Micro Blog Contextualization CLEF &amp; Inex tracks data and tools&lt;/a&gt;&lt;/div&gt;
		
		</content:encoded>


		

	</item>
<item xml:lang="en">
		<title>The festival galleries dataset</title>
		<link>https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=12</link>
		<guid isPermaLink="true">https://clef2018.clef-initiative.eu/mc2/spip.php?page=article&amp;id_article=12</guid>
		<dc:date>2016-10-18T16:31:57Z</dc:date>
		<dc:format>text/html</dc:format>
		<dc:language>en</dc:language>
		<dc:creator>sanjuan</dc:creator>


		<dc:subject>data</dc:subject>

		<description>
&lt;p&gt;This data set allows to experiment microblog search and stream summarization. &lt;br class='autobr' /&gt;
Microblog collection &lt;br class='autobr' /&gt;
The document collection is provided to registered participants by ANR GAFES project. It consists in a pool of more than 50M unique micro-blogs from different sources with their meta-information as well as ground truth for the evaluation. &lt;br class='autobr' /&gt;
The microblog collection contains a very large pool of public posts on Twitter using the keyword festival since June 2015. These micro-blogs are (&#8230;)&lt;/p&gt;


-
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=rubrique&amp;id_rubrique=9" rel="directory"&gt;Data&lt;/a&gt;

/ 
&lt;a href="https://clef2018.clef-initiative.eu/mc2/spip.php?page=mot&amp;id_mot=2" rel="tag"&gt;data&lt;/a&gt;

		</description>


 <content:encoded>&lt;div class='rss_texte'&gt;&lt;p&gt;This data set allows to experiment microblog search and stream summarization.&lt;/p&gt;
&lt;h2 class=&#034;spip&#034;&gt;Microblog collection&lt;/h2&gt;
&lt;p&gt;The document collection is provided to registered participants by ANR GAFES project. It consists in a pool of more than 50M unique micro-blogs from different sources with their meta-information as well as ground truth for the evaluation.&lt;/p&gt;
&lt;p&gt;The microblog collection contains a very large pool of public posts on Twitter using the keyword festival since June 2015. These micro-blogs are collected using private archive services based on streaming API. The average of unique microblog posts (i.e. without re-twitts) between June and September is 2, 616, 008 per month. The total number of collected micro-blog posts after one year (from May 2015 to May 2016) is 50, 490, 815 (24, 684, 975 without re-posts). These micro-blog posts are available online on a relational database with associated fields.&lt;/p&gt;
&lt;p&gt;Because of privacy issues, they cannot be publicly released but can be analyzed inside the organization that purchased these archives and among collaborators under privacy agreement. The CM2 lab provides this opportunity to share this data among academic participants. These archives can be indexed, analyzed and general results acquired from them can be published without restriction.&lt;/p&gt;
&lt;h2 class=&#034;spip&#034;&gt;Linked web pages &lt;/h2&gt;
&lt;p&gt;66% of the collected micro-blog posts contain Twittert.co compressed URLs. Sometimes these URLs refer to other online services like adf.ly, cur.lv, dlvr.it, ow.ly that hide the real URL. We used the spider mode of the GNU wget tool to get the real URL, this process required multiple DNS requests.&lt;/p&gt;
&lt;p&gt;The number of unique uncompressed urls collected in one year is 11,580,788 from 641,042 distinct domains.&lt;/p&gt;
&lt;h2 class=&#034;spip&#034;&gt;Getting access to the data set for scholars&lt;/h2&gt;&lt;ol class=&#034;spip&#034; role=&#034;list&#034;&gt;&lt;li&gt; register your institution to CLEF&lt;/li&gt;&lt;li&gt; send a request by email to admin@talne.eu from the same domain as your institution with full contact information.&lt;/li&gt;&lt;li&gt; if accepted, you will receive a confidential agreement to be approved by your institution.&lt;/li&gt;&lt;li&gt; once we get back the agreement you will receive personal information to access lab data servers.&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;
		&lt;div class="hyperlien"&gt;View online : &lt;a href="http://ceur-ws.org/Vol-1609/16091197.pdf" class="spip_out"&gt;Cultural micro-blog Contextualization 2016 Workshop Overview: data and pilot tasks &lt;/a&gt;&lt;/div&gt;
		
		</content:encoded>


		

	</item>



</channel>

</rss>
