
.png)
Jun Pan
Hong Kong Baptist University
Jun Pan works as Professor of the Department of Translation, Interpreting and Intercultural Studies and Director of the Academy of Language and Culture at Hong Kong Baptist University. Working as an interpreter for many years, Prof. Pan has dedicated herself to the teaching and research of interpreting and translation, covering a wide array of subjects including corpus-based translation/interpreting studies, political discourse and translation/interpreting, digital humanities, learner factors & situated learning in interpreter training, bibliometric research in translation/interpreting studies, professionalism in translation/interpreting, etc. She is President of the Hong Kong Translation Society, Chair of International Relations of the Hong Kong Association of University Women, and (founding) Executive Committee Member of the University Women Asia (under Graduate Women International). She is also a poet.
Advancing Corpus-Based Interpreting Studies:
Re-Engineering Existing Corpora for Innovation and AI Integration
Abstract
This presentation examines the advancement of corpus-based interpreting studies (CIS) through the integration of existing corpora, including the CEPIC (Chinese-English Political Interpreting Corpus), EPIC (European Parliament Interpreting Corpus), and EPICG (European Parliament Interpreting Corpus Ghent). The discussion is framed within the context of a General Research Fund project of Hong Kong SAR’s Research Grants Council, Re-testing the Universals: "Mining" Interpreting Data through Re-Engineered Mega-Size Corpora (RGC Ref No. 12623122, 2022–present).
The presentation focuses on the first phase of the project: the integration of existing and comparable interpreting corpora through a process termed “re-engineering”. This process involves unifying the CEPIC, EPIC, and EPICG. Together, these corpora encompass a diverse range of European languages (Italian, Spanish, French, and Dutch) alongside Chinese and English. The investigators, who have led the development of these corpora, aim to establish a shared framework ensuring cross-corpus and cross-language comparability.
The talk delves into key innovations in corpus integration and analysis, with examples highlighting the role of artificial intelligence (AI) and natural language processing (NLP) tools. The presenters will first introduce the process of expanding the existing corpora, focusing on the use of the latest NLP and AI tools in corpus preparation, including data cleaning, alignment, and enhancement. This will be followed by a comparison of the three expanded corpora, specifically in terms of metadata structures, transcription protocols, and annotation practices such as part-of-speech (POS) tagging. Concrete examples will be provided to demonstrate how integration is applied and its impact on achieving cross-corpus comparability.
The talk will explore emerging issues in CIS, including the integration of multimodality (e.g., incorporating audio-visual data for analysis), the development and standardisation of data- sharing protocols, and the ethical considerations surrounding open access to interpreting corpora, particularly in relation to privacy, copyright, and researcher accountability. The presentation will also discuss the potential applications of interpreting corpora in machine learning, such as training AI models for automatic speech recognition (ASR), machine interpreting, and other language technologies.
This presentation highlights the transformative potential of these shared and re-engineered corpora, not only in advancing the theoretical understanding of interpreting studies but also in addressing practical needs in professional interpreting and technology development. By emphasising innovation, interdisciplinarity, and collaboration, it underscores the broader implications of corpus-based research for the future of interpreting studies in an increasingly AI-driven world.


_JPG.jpg)
_JPG.jpg)
_JPG.jpg)

