HUSCAP logo Hokkaido Univ. logo

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Health Sciences / Faculty of Health Sciences >
Peer-reviewed Journal Articles, etc >

Extraction and Quantification of Words Representing Degrees of Diseases : Combining the Fuzzy C-Means Method and Gaussian Membership

Files in This Item:

The file(s) associated with this item can be obtained from the following URL:

Title: Extraction and Quantification of Words Representing Degrees of Diseases : Combining the Fuzzy C-Means Method and Gaussian Membership
Authors: Han, Feng Browse this author
Zhang, ZiHeng Browse this author
Zhang, Hongjian Browse this author
Nakaya, Jun Browse this author →KAKEN DB
Kudo, Kohsuke Browse this author →KAKEN DB
Ogasawara, Katsuhiko Browse this author →KAKEN DB
Keywords: medical text
fuzzy c-means
machine learning
word quantification
medical report
text mining
data mining
free text
support system
Issue Date: 18-Nov-2022
Publisher: Journal of Medical Internet Research(JMIR)
Journal Title: JMIR Formative Research
Volume: 6
Issue: 11
Start Page: e38677
Publisher DOI: 10.2196/38677
PMID: 36399376
Abstract: Background: Due to the development of medical data, a large amount of clinical data has been generated. These unstructured data contain substantial information. Extracting useful knowledge from this data and making scientific decisions for diagnosing and treating diseases have become increasingly necessary. Unstructured data, such as in the Marketplace for Medical Information in Intensive Care III (MIMIC-III) data set, contain several ambiguous words that demonstrate the subjectivity of doctors, such as descriptions of patient symptoms. These data could be used to further improve the accuracy of medical diagnostic system assessments. To the best of our knowledge, there is currently no method for extracting subjective words that express the extent of these symptoms (hereinafter, degree words). Objective: Therefore, we propose using the fuzzy c-means (FCM) method and Gaussian membership to quantify the degree words in the clinical medical data set MIMIC-III. Methods: First, we preprocessed the 381,091 radiology reports collected in MIMIC-III, and then we used the FCM method to extract degree words from unstructured text. Thereafter, we used the Gaussian membership method to quantify the extracted degree words, which transform the fuzzy words extracted from the medical text into computer-recognizable numbers. Results: The results showed that the digitization of ambiguous words in medical texts is feasible. The words representing each degree of each disease had a range of corresponding values. Examples of membership medians were 2.971 (atelectasis), 3.121 (pneumonia), 2.899 (pneumothorax), 3.051 (pulmonary edema), and 2.435 (pulmonary embolus). Additionally, all extracted words contained the same subjective words (low, high, etc), which allows for an objective evaluation method. Furthermore, we will verify the specific impact of the quantification results of ambiguous words such as symptom words and degree words on the use of medical texts in subsequent studies. These same ambiguous words may be used as a new set of feature values to represent the disorders. Conclusions: This study proposes an innovative method for handling subjective words. We used the FCM method to extract the subjective degree words in the English-interpreted report of the MIMIC-III and then used the Gaussian functions to quantify the subjective degree words. In this method, words containing subjectivity in unstructured texts can be automatically processed and transformed into numerical ranges by digital processing. It was concluded that the digitization of ambiguous words in medical texts is feasible.
Type: article
Appears in Collections:保健科学院・保健科学研究院 (Graduate School of Health Sciences / Faculty of Health Sciences) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Export metadata:

OAI-PMH ( junii2 , jpcoar_1.0 )

MathJax is now OFF:


 - Hokkaido University