Methods, Tools, Resources




Ferraresi; A. and Bernardini, S. 2019. Building EPTIC: A many-sided, multi-purpose corpus of EU parliament proceedings. In Doval I. & Sánchez Nieto M. T. (eds.) Parallel Corpora for Contrastive and Translation Studies. New resources and applications. Amsterdam: Philadelphia, John Benjamins,pp. 123 - 139.

Scansani, R. Bentivogli, L. Bernardini, S. Ferraresi, A. 2019. MAGMATic: A Multi-domain Academic Gold Standard with Manual Annotation of Terminology for Machine Translation Evaluation. In Proceedings of the Machine Translation Summit XVII. Volume 1: Research track, pp. 78 - 86.


Bernardini, S., Ferraresi, A., Russo, M., Collard, C. and, Defrancq, B. 2018. Building Interpreting and Intermodal Corpora: A How to for a Formidable Task. In (eds.) Making Way in Corpus-based Interpreting Studies. Singapore: Springer, pp. 21 - 42.


Marchi, A. 2018. Dividing up the data: Epistemological, methodological and practical impact of diachronic segmentation. In C. Taylor & A. Marchi (eds.) Corpus Approaches to Discourse: A Critical Review. London & New York: Routledge / Taylor and Francis, pp. 174 - 196.

Taylor, C.  and Marchi, A. (eds.) Corpus Approaches to Discourse: A Critical Review. London & New York: Routledge / Taylor and Francis.


Miličević, M., Ljubešić, N. and Fišer, D. 2017. Birds of a feather don’t quite tweet together: An analysis of spelling variation in Slovene, Croatian and Serbian twitterese. In D. Fišer and M. Beißwenger (eds.) Investigating Computer-Mediated Communication: Corpus-Based Approaches to Language in the Digital World. Ljubljana: Scientific Publishing House of the Faculty of Arts, University of Ljubljana, pp. 14-43.


Bernardini, S., Ferraresi, A. and Miličević, M. From EPIC to EPTIC — Exploring simplification in interpreting and translation from an intermodal perspective. Target 28, pp. 61 - 86.

Miličević, M. and Ljubešić, N. 2016. Tviterasi, tviteraši or twitteraši? Producing and analysing a normalised dataset of Croatian and Serbian tweets. Slovenščina 2.0 4(2): 156-188.

Samardžić, T. and Miličević, M. 2016. A framework for automatic acquisition of Croatian and Serbian verb aspect from corpora. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 4596-4601.


Fišer, D., Erjavec, T., Ljubešić N. and Miličević, M. 2015. Comparing the nonstandard language of Slovene, Croatian and Serbian tweets. In Smolej, M. (ed.) Simpozij Obdobja 34. Slovnica in slovar - aktualni jezikovni opis (1. del). Ljubljana: Filozofska fakulteta, pp. 225-231.

Miličević, M. 2015. Semi-automatic construction of comparable genre-oriented corpora of Serbian in Cyrillic and Latin scripts. Anali Filološkog fakulteta 27/2: 285-300. 


Miličević, M., Bernardini, S. Ferraresi, A. 2014. Costruzione semi-automatica di corpora orientati al genere in lingue morfologicamente ricche: un paragone fra l'italiano e il serbo. ITALICA BELGRADENSIA 1, pp. 99 - 114.


Baroni, M. and Bernardini, S. 2013. Corpus Query Tools for lexicography. In Dictionaries. An international encyclopedia of lexicography. Supplementary volume: recent developments with focus on electronic and computational lexicography. Berlin New York: Mouton De Gruyter, pp. 1395 - 1405.

Bernardini, S. and  Ferraresi A. 2013. Old needs, new solutions: Comparable corpora for language professionals. In Sharoff, S., Rapp, R., Zweigenbaum, P. and Fung, P.(eds.) Building and using comparable corpora. Berlin Heidelberg: Springer, pp. 303 - 319.

Ferraresi, A. and Bernardini, S. 2013. The academic Web-as-Corpus. In Evert, S., Stemle, E. and Rayson, P. (eds.) Proceedings of the 8th Web as Corpus workshop (WAC-8). Stroudsburg, PA, Association for Computational Linguistics, pp. 53 - 62. 


Gabrielatos C. and Marchi A. 2012. Keyness: Appropriate metrics and practical issues. Corpus-Assisted Discourse Studies More than the sum of Discourse Analysis and comuputing?(Proceedings of Corpora and Discourse International Conference, Bologna, 12-14 settembre 2012).


Baroni, M.,  Bernardini, S. Ferraresi, A. and Zanchetta, E. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled CorporaLanguage Resources and Evaluation 43(3), pp. 209 - 226.

Cirillo L., Marchi A. and Venuti M. 2009. The making of the CorDis corpus: Compilation and markup. In J. Morley & P. Payley (eds.) Corpus-Assisted Discourse Studies on the Iraq Conflict: Wording the War. London & New York: Routledge / Taylor and Francis, pp. 13 - 33.

 Marchi, A. and Venuti, M. 2009. Mark up and the narrative structure of TV news. In L. Haarman & L. Lombardo (eds.) Evaluation and Stance in War News. London: Continuum, pp. 27 - 47.


Marchi, A., Cirillo, L.  and Venuti, M. 2007. The CorDis Corpus: Mark-up and related issues. In Proceedings from the Corpus Linguistics Conference Series 2, pp. 1 - 10.


Baroni, M., Bernardini S. (eds.) 2006. Wacky! Working papers on the Web as Corpus. Bologna: Gedit.

Baroni, M. and Bernardini, S. 2006. A new approach to the study of translationese: Machine-learning the difference between original and translated text.  Literary & Linguistic Computing 21(3), pp. 259 - 274.

Bernardini, S, Baroni, M. and Evert, S. 2006. A WaCky introduction. In M. Baroni & S. Bernardini (eds.) WaCky! Working Papers on the Web as Corpus. Bologna: GEDIT, pp. 9 - 40.


Baroni, M. and Bernardini, S. 2004. BootCaT: Bootstrapping corpora and terms from the web.  In M.T. Lino et al. (eds.) Proceedings of Fourth International Conference on Language Resources and Evaluation, LREC 2004. Lisobon: ELDA, pp. 1313 - 1316. 

Baroni, M., Bernardini, S., Comastri, F., Piccioni, L., Volpi, A., Aston, G. and Mazzoleni, M. 2004. Introducing the La Repubblica corpus: A large, annotated, TEI(XML)-compliant corpus of newspaper Italian. In Proceedings of Fourth International Conference on Language Resources and Evaluation, LREC 2004. Paris: ELRA - European Language Resources Association, pp. 1771 - 1774. 

Bendazzoli, C., Monti, C., Sandrelli, A., Russo, M., Baroni, M., Bernardini S., Mack, G., Ballardini, E. and Mead, P. 2004. Towards the creation of an electronic corpus to study directionality in simultaneous interpreting. in Compiling and processing spoken language corpora: Proceedings of the LREC 2004 Satellite Workshop. Lisbon: ELDA, 2004, pp. 33 - 39.