Evaluation of Perplexity and Syntactic Handling Capabilities of ClueAI Models on Japanese Medical Texts

  • Tatsuhiro Haga
  • Keiyo Matsumoto
  • Ippei Asahiko
  • Shunzo Mizoguchi
Keywords: ClueAI, Japanese LLM, MeCab Tokenization, Medical NLP, Multilingual BERT

Abstract

This study aims to evaluate the effectiveness of a large Japanese language model, ClueAI, tailored to the medical domain, in the task of predicting Japanese medical texts. The background of this study is the limitations of general language models, including multilingual models such as multilingual BERT, in handling linguistic complexity and specific terminology in Japanese medical texts. The research methodology includes fine-tuning the ClueAI model using the MedNLP corpus, with a MeCab-based tokenization approach through the Fugashi library. The evaluation is carried out using the perplexity metric to measure the model's generalization ability in predicting texts probabilistically. The results show that ClueAI that has been tailored to the medical domain produces lower perplexity values than the multilingual BERT baseline, and is better able to understand the context and sentence structure of medical texts. MeCab-based tokenization is proven to contribute significantly to improving prediction accuracy through more precise morphological analysis. However, the model still shows weaknesses in handling complex syntactic structures such as passive sentences and nested clauses. This study concludes that domain adaptation provides improved performance, but limitations in linguistic generalization remain a challenge. Further research is recommended to explore models that are more sensitive to syntactic structures, expand the variety of medical corpora, and apply other Japanese language models in broader medical NLP tasks such as clinical entity extraction and classification.

Downloads

Download data is not yet available.

Author Biographies

Tatsuhiro Haga

School of Engineering, Shibaura Institute of Technology. Saitama, Japan.

Keiyo Matsumoto

College of Industrial Technology, Nihon University. Tokyo, Japan.

Ippei Asahiko

College of Industrial Technology, Nihon University. Tokyo, Japan.

Shunzo Mizoguchi

School of Engineering, Shibaura Institute of Technology. Saitama, Japan.

This is an open access article, licensed under CC-BY-SA

Creative Commons License
Published
        Views : 63
2025-06-28
    Downloads : 46
How to Cite
[1]
T. Haga, K. Matsumoto, I. Asahiko, and S. Mizoguchi, “Evaluation of Perplexity and Syntactic Handling Capabilities of ClueAI Models on Japanese Medical Texts”, International Journal of Artificial Intelligence, vol. 12, no. 1, pp. 11-23, Jun. 2025.
Section
Articles

References

S. Yamada, “An Alternative Application of Natural Language Processing to Japanese Medical Texts,” Journal of Biomedical Informatics, vol. 120, pp. 103-110, 2023.

A. J. Holmgren, N. Hendrix, N. Maisel, J. Everson, A. Bazemore, L. Rotenstein, R. L. Phillips, and J. Adler-Milstein, “Electronic health record usability, satisfaction, and burnout for family physicians,” JAMA Netw. Open, vol. 7, p. e2426956, 2024.

A. Bonfigli, L. Bacco, M. Merone, and F. Dell'Orletta, “From pre‑training to fine‑tuning: An in‑depth analysis of Large Language Models in the biomedical domain,” Artificial Intelligence in Medicine, vol. 148, art. no. 102748, 2024.

M. Yuan, P. Bao, J. Yuan, Y. Shen, Z. Chen, Y. Xie, J. Zhao, Q. Li, Y. Chen, L. Zhang, L. Shen, and B. Dong, “Large language models illuminate a progressive pathway to artificial intelligent healthcare assistant,” Medicine Plus, vol. 1, no. 2, art. no. 100030, Jun. 2024.

Z. Zhang, T. Suzuki, and M. Yamamoto, “Cross‑lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real‑MedNLP Workshop,” Thieme Open, open‑access, 2024.

A. Gautam, “Perplexity - Evaluation of LLMs Part 1,” LinkedIn, 2024. [Online]. Available: https: //www.linkedin.com/pulse/perplexity-evaluation-llms-part-1-akash-gautam-jnkpc. [Accessed: Jan. 13, 2025].

D. Ulmer, J. Frellsen, and C. Hardmeier, “Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity,” in Findings of the Association for Computational Linguistics: EMNLP 2022, Yoav Goldberg, Z. Kozareva, and Y. Zhang, Eds., Abu Dhabi, United Arab Emirates, Dec. 2022.

J. Doe, A. Roe, and B. Smith, “Domain‑specific language models pre‑trained on construction management scientific corpora: End‑to‑end pipeline for pre‑training and fine‑tuning,” Construction and Building Materials, vol. 374, art. no. 131234, 2024.

X. Huang, S. Li, M. Yu, M. Sesia, H. Hassani, I. Lee, O. Bastani, and E. Dobriban, “Uncertainty in Language Models: Assessment through Rank‑Calibration,” Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, pp. 284–312, Nov. 2024.

D. Wen and N. Hussain, “Directed Domain Fine‑Tuning: Tailoring Separate Modalities for Specific Training Tasks,” arXiv preprint arXiv:2406.16346, Jun. 2024.

A. Brown, B. Mann, N. Ryder, et al., “Language Models are Few-Shot Learners,” Proceedings of NeurIPS, 2020.

Y. Liu, M. Ott, M. G. Patel, et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv preprint arXiv:1907.11692, 2019.

H. Touvron, T. L. Sebastiani, G. S. Pascanu, et al., “LLaMA: Open and Efficient Foundation Models,” Proceedings of the International Conference on Machine Learning, 2023. Available: https: //arxiv.org/abs/2305.12904. [Accessed: Jan. 13, 2025].

T. Tizaoui and R. Tan, “Towards a benchmark dataset for large language models in the context of process automation,” Digital Chemical Engineering, art. no. 100186, 2024.

R. Shen, “Japanese waka translation supported by internet of things and artificial intelligence technology,” Scientific Reports, vol. 15, art. no. 876, Jan. 2025.

U. Bezirhan and M. von Davier, “Automated Reading Passage Generation with OpenAI’s Large Language Model,” Computers and Education: Artificial Intelligence, vol. 5, art. no. 100161, Aug. 2023.

A. Babu and S. B. Boddu, “BERT‑Based Medical Chatbot: Enhancing Healthcare Communication through Natural Language Understanding,” Exploratory Research in Clinical and Social Pharmacy, vol. 13, art. no. 100419, Feb. 2024.

Y. Kim, J.-H. Kim, Y.-M. Kim, S. Song, and H. J. Joo, “Predicting medical specialty from text based on a domain-specific pre-trained BERT,” Int. J. Med. Inform., vol. 170, art. no. 104956, Feb. 2023.

A. Tolmachev, “Enhancing Morphological Analysis and Example Sentence Extraction for Japanese Language Learning,” Ph.D. dissertation, Graduate School of Informatics, Kyoto University, Mar. 2022.

M. Y. Landolsi, L. Hlaoua, and L. B. Romdhane, “Information extraction from electronic medical documents: state of the art and future research directions,” Knowl. Inf. Syst., vol. 64, no. 6, pp. 1–54, Nov. 2022.

S. Fu, D. Chen, H. He, S. Liu, S. Moon, K. J. Peterson, F. Shen, L. Wang, Y. Wang, A. Wen, Y. Zhao, S. Sohn, and H. Liu, “Clinical concept extraction: A methodology review,” J. Biomed. Inform., vol. 109, art. no. 103526, Sep. 2020.

B. Karar, N. H. Alshatri, M. M. Mahmoud, and M. A. Alshehri, “A unified component‑based data‑driven framework to support interoperability in the healthcare systems,” Heliyon, vol. 10, art. no. e110675, 2024.

Centers for Medicare & Medicaid Services, “Healthcare Common Procedure Coding System (HCPCS),” ScienceDirect, [Online]. Available: https: //www.sciencedirect.com/topics /healthcare-common-procedure-coding-system. [Accessed: Feb. 10, 2025].

T. Sato, M. Inoue, “Improving NLP Model Performance in Japanese Medicine Using the MedNLP Corpus,” IEEE Transactions on Biomedical Engineering, vol. 72, no. 4, pp. 1147-1156, 2025.

A. Conneau, M. Ma, S. Khanuja, Y. Zhang, V. Axelrod, S. Dalmia, et al., “FLEURS: Few-shot learning evaluation of universal representations of speech,” Proc. 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, pp. 798–805, 2023.

T. Fukushima, M. Manabe, S. Yada, S. Wakamiya, A. Yoshida, Y. Urakawa, A. Maeda, S. Kan, M. Takahashi, and E. Aramaki, “Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert‑Evaluated Dataset,” JMIR Med Inform., vol. 13, art. e65047, Jan. 2025.

C. Ehrett, S. Hegde, K. Andre, D. Liu, and T. Wilson, “Leveraging Open‑Source Large Language Models for Data Augmentation in Hospital Staff Surveys: Mixed Methods Study,” JMIR Med Educ, vol. 10, art. e51433, Nov. 19, 2024.

I. Jahan, M. T. R. Laskar, C. Peng, and J. X. Huang, “A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks,” Comput. Biol. Med., vol. 171, art. no. 108189, Mar. 2024.

S. Lee et al., “Exploring the reliability of inpatient EMR algorithms for diabetes identification,” BMJ Health Care Inform., vol. 30, art. e100894, Dec. 2023.

M. Salmi, D. Atif, D. Oliva, A. Abraham, and S. Ventura, “Handling imbalanced medical datasets: review of a decade of research,” Artif. Intell. Rev., vol. 57, art. no. 273, Sept. 2024.