Methods, Tools, Resources

2019

Ferraresi; A. and Bernardini, S. 2019. Building EPTIC: A many-sided, multi-purpose corpus of EU parliament proceedings. In Doval I. & Sánchez Nieto M. T. (eds.) Parallel Corpora for Contrastive and Translation Studies. New resources and applications. Amsterdam: Philadelphia, John Benjamins,pp. 123 - 139.

Scansani, R. Bentivogli, L. Bernardini, S. Ferraresi, A. 2019. MAGMATic: A Multi-domain Academic Gold Standard with Manual Annotation of Terminology for Machine Translation Evaluation. In Proceedings of the Machine Translation Summit XVII. Volume 1: Research track, pp. 78 - 86.

2019

Bernardini, S., Ferraresi, A., Russo, M., Collard, C. and, Defrancq, B. 2018. Building Interpreting and Intermodal Corpora: A How to for a Formidable Task. In (eds.) Making Way in Corpus-based Interpreting Studies. Singapore: Springer, pp. 21 - 42.

2018

Marchi, A. 2018. Dividing up the data: Epistemological, methodological and practical impact of diachronic segmentation. In C. Taylor & A. Marchi (eds.) Corpus Approaches to Discourse: A Critical Review. London & New York: Routledge / Taylor and Francis, pp. 174 - 196.

Taylor, C. and Marchi, A. (eds.) Corpus Approaches to Discourse: A Critical Review. London & New York: Routledge / Taylor and Francis.

2017

Miličević, M., Ljubešić, N. and Fišer, D. 2017. Birds of a feather don’t quite tweet together: An analysis of spelling variation in Slovene, Croatian and Serbian twitterese. In D. Fišer and M. Beißwenger (eds.) Investigating Computer-Mediated Communication: Corpus-Based Approaches to Language in the Digital World. Ljubljana: Scientific Publishing House of the Faculty of Arts, University of Ljubljana, pp. 14-43.

2016

Bernardini, S., Ferraresi, A. and Miličević, M. From EPIC to EPTIC — Exploring simplification in interpreting and translation from an intermodal perspective. Target 28, pp. 61 - 86.

Miličević, M. and Ljubešić, N. 2016. Tviterasi, tviteraši or twitteraši? Producing and analysing a normalised dataset of Croatian and Serbian tweets. Slovenščina 2.0 4(2): 156-188.

Samardžić, T. and Miličević, M. 2016. A framework for automatic acquisition of Croatian and Serbian verb aspect from corpora. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 4596-4601.

2015

Fišer, D., Erjavec, T., Ljubešić N. and Miličević, M. 2015. Comparing the nonstandard language of Slovene, Croatian and Serbian tweets. In Smolej, M. (ed.) Simpozij Obdobja 34. Slovnica in slovar - aktualni jezikovni opis (1. del). Ljubljana: Filozofska fakulteta, pp. 225-231.

Miličević, M. 2015. Semi-automatic construction of comparable genre-oriented corpora of Serbian in Cyrillic and Latin scripts. Anali Filološkog fakulteta 27/2: 285-300.

2014

Miličević, M., Bernardini, S. Ferraresi, A. 2014. Costruzione semi-automatica di corpora orientati al genere in lingue morfologicamente ricche: un paragone fra l'italiano e il serbo. ITALICA BELGRADENSIA 1, pp. 99 - 114.

2013

Baroni, M. and Bernardini, S. 2013. Corpus Query Tools for lexicography. In Dictionaries. An international encyclopedia of lexicography. Supplementary volume: recent developments with focus on electronic and computational lexicography. Berlin New York: Mouton De Gruyter, pp. 1395 - 1405.

Bernardini, S. and Ferraresi A. 2013. Old needs, new solutions: Comparable corpora for language professionals. In Sharoff, S., Rapp, R., Zweigenbaum, P. and Fung, P.(eds.) Building and using comparable corpora. Berlin Heidelberg: Springer, pp. 303 - 319.

Ferraresi, A. and Bernardini, S. 2013. The academic Web-as-Corpus. In Evert, S., Stemle, E. and Rayson, P. (eds.) Proceedings of the 8th Web as Corpus workshop (WAC-8). Stroudsburg, PA, Association for Computational Linguistics, pp. 53 - 62.

2012

Gabrielatos C. and Marchi A. 2012. Keyness: Appropriate metrics and practical issues. Corpus-Assisted Discourse Studies More than the sum of Discourse Analysis and comuputing?(Proceedings of Corpora and Discourse International Conference, Bologna, 12-14 settembre 2012).

2009

Baroni, M., Bernardini, S. Ferraresi, A. and Zanchetta, E. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation 43(3), pp. 209 - 226.

Cirillo L., Marchi A. and Venuti M. 2009. The making of the CorDis corpus: Compilation and markup. In J. Morley & P. Payley (eds.) Corpus-Assisted Discourse Studies on the Iraq Conflict: Wording the War. London & New York: Routledge / Taylor and Francis, pp. 13 - 33.

Marchi, A. and Venuti, M. 2009. Mark up and the narrative structure of TV news. In L. Haarman & L. Lombardo (eds.) Evaluation and Stance in War News. London: Continuum, pp. 27 - 47.

2007

Marchi, A., Cirillo, L. and Venuti, M. 2007. The CorDis Corpus: Mark-up and related issues. In Proceedings from the Corpus Linguistics Conference Series 2, pp. 1 - 10.

2006

Baroni, M., Bernardini S. (eds.) 2006. Wacky! Working papers on the Web as Corpus. Bologna: Gedit.

Baroni, M. and Bernardini, S. 2006. A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary & Linguistic Computing 21(3), pp. 259 - 274.

Bernardini, S, Baroni, M. and Evert, S. 2006. A WaCky introduction. In M. Baroni & S. Bernardini (eds.) WaCky! Working Papers on the Web as Corpus. Bologna: GEDIT, pp. 9 - 40.

2004

Baroni, M. and Bernardini, S. 2004. BootCaT: Bootstrapping corpora and terms from the web. In M.T. Lino et al. (eds.) Proceedings of Fourth International Conference on Language Resources and Evaluation, LREC 2004. Lisobon: ELDA, pp. 1313 - 1316.

Baroni, M., Bernardini, S., Comastri, F., Piccioni, L., Volpi, A., Aston, G. and Mazzoleni, M. 2004. Introducing the La Repubblica corpus: A large, annotated, TEI(XML)-compliant corpus of newspaper Italian. In Proceedings of Fourth International Conference on Language Resources and Evaluation, LREC 2004. Paris: ELRA - European Language Resources Association, pp. 1771 - 1774.

Bendazzoli, C., Monti, C., Sandrelli, A., Russo, M., Baroni, M., Bernardini S., Mack, G., Ballardini, E. and Mead, P. 2004. Towards the creation of an electronic corpus to study directionality in simultaneous interpreting. in Compiling and processing spoken language corpora: Proceedings of the LREC 2004 Satellite Workshop. Lisbon: ELDA, 2004, pp. 33 - 39.