Menu Back to Poster-Presentations-Details

PP02-24: Electronic Trial Master File (eTMF) Automation using Intelligent OCR and RPA

Poster Presenter

      Rika Hayashi

      • Otsuka Pharmaceutical Co., Ltd.


We developed prototype programs leveraging intelligent Optical Character Recognition (OCR) and Robotic Process Automation (RPA) to evaluate the potential eTMF automation since documentation process in Japan is paper-based and resource-intensive.


We examined a combination of OCR and RPA to extract meta-data from paper documents. We also evaluated whether adding thesaurus dictionaries into OCR data-capturing process improved read accuracy.


Through a series of interviews of staff involved in the clinical operation and document control, the detail process of eTMF registration was identified. Approximately one million documents are filed annually in the eTMF system. Based on the interviews, meta-data collection from paper documents was a critical task in the process. This step required human workload and skills because of a variety of meta-data fields, such as documentation data, protocol number, IRB information, and so on. Also, the document format was different between sites and CROs and unstructured. The error rate at this step was the highest in the entire eTMF process. We considered that automation of meta-data collection from source documents would improve productivity in terms of human resource, cost, and also quality. Further, Japanese character, Kanji, has a wide variation as well as handwriting is sometimes hard to recognize by a conventional OCR. Therefore, we incorporated a thesaurus dictionary into OCR data-capturing system to improve read accuracy and adopted intelligent OCR technology to recognize unstructured layout of each document. The prototype system with the integration of a thesaurus dictionary with intelligent OCR showed more than 90% accuracy and capabilities sufficient for practical use. We also incorporated RPA into the prototype system to integrate meta-data obtained by the OCR system into a single excel file report. The staff responsible for eTMF registration receive the report. With the combination of RPA and OCR, a prototype system for meta-data collection was completed and was considered efficient for practical use.


The project explored the integration of OCR and RPA tools to innovate the eTMF registration process. We did not incorporate the experiment in the direct operations in the eTMF system as a validate format. We tested the practical use and cost-effectiveness of OCR and RPA implementation. We can achieve more than half reduction of resources in the eTMF process if all eTMF registration process is automated. Further system development will be considered dependent on business needs.