top of page
ACIS 2025_网页横幅.png
image (2).png

Jun Pan

Hong Kong Baptist University

Jun Pan works as Professor of the Department of Translation, Interpreting and Intercultural Studies and Director of the Academy of Language and Culture at Hong Kong Baptist University. Working as an interpreter for many years, Prof. Pan has dedicated herself to the teaching and research of interpreting and translation, covering a wide array of subjects including corpus-based translation/interpreting studies, political discourse and translation/interpreting, digital humanities, learner factors & situated learning in interpreter training, bibliometric research in translation/interpreting studies, professionalism in translation/interpreting, etc. She is President of the Hong Kong Translation Society, Chair of International Relations of the Hong Kong Association of University Women, and (founding) Executive Committee Member of the University Women Asia (under Graduate Women International). She is also a poet.

 

Advancing Corpus-Based Interpreting Studies:

Re-Engineering Existing Corpora for Innovation and AI Integration

Abstract

This presentation examines the advancement of corpus-based interpreting studies (CIS) through the integration of existing corpora, including the CEPIC (Chinese-English Political Interpreting Corpus), EPIC (European Parliament Interpreting Corpus), and EPICG (European Parliament Interpreting Corpus Ghent). The discussion is framed within the context of a General Research Fund project of Hong Kong SAR’s Research Grants Council, Re-testing the Universals: "Mining" Interpreting Data through Re-Engineered Mega-Size Corpora (RGC Ref No. 12623122, 2022–present).

The presentation focuses on the first phase of the project: the integration of existing and comparable interpreting corpora through a process termed “re-engineering”. This process involves unifying the CEPIC, EPIC, and EPICG. Together, these corpora encompass a diverse range of European languages (Italian, Spanish, French, and Dutch) alongside Chinese and English. The investigators, who have led the development of these corpora, aim to establish a shared framework ensuring cross-corpus and cross-language comparability.

The talk delves into key innovations in corpus integration and analysis, with examples highlighting the role of artificial intelligence (AI) and natural language processing (NLP) tools. The presenters will first introduce the process of expanding the existing corpora, focusing on the use of the latest NLP and AI tools in corpus preparation, including data cleaning, alignment, and enhancement. This will be followed by a comparison of the three expanded corpora, specifically in terms of metadata structures, transcription protocols, and annotation practices such as part-of-speech (POS) tagging. Concrete examples will be provided to demonstrate how integration is applied and its impact on achieving cross-corpus comparability.

The talk will explore emerging issues in CIS, including the integration of multimodality (e.g., incorporating audio-visual data for analysis), the development and standardisation of data- sharing protocols, and the ethical considerations surrounding open access to interpreting corpora, particularly in relation to privacy, copyright, and researcher accountability. The presentation will also discuss the potential applications of interpreting corpora in machine learning, such as training AI models for automatic speech recognition (ASR), machine interpreting, and other language technologies.

This presentation highlights the transformative potential of these shared and re-engineered corpora, not only in advancing the theoretical understanding of interpreting studies but also in addressing practical needs in professional interpreting and technology development. By emphasising innovation, interdisciplinarity, and collaboration, it underscores the broader implications of corpus-based research for the future of interpreting studies in an increasingly AI-driven world.

DSC08751.JPG
DSC08775.JPG
DSC08782(2).JPG
DSC08789(1).JPG
DSC08842(1).JPG
DSC08848.JPG
DSC08819.JPG

This symposium is supported by the General Research Fund (GRF) of

Hong Kong’s Research Grants Council (Project No. 12623122).

bottom of page