Extraction and Quantification of Words Representing Degrees of Diseases : Combining the Fuzzy C-Means Method and Gaussian Membership

Han, Feng; Zhang, ZiHeng; Zhang, Hongjian; Nakaya, Jun; Kudo, Kohsuke; Ogasawara, Katsuhiko

doi:10.2196/38677


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Health Sciences / Faculty of Health Sciences >
Peer-reviewed Journal Articles, etc >

Extraction and Quantification of Words Representing Degrees of Diseases : Combining the Fuzzy C-Means Method and Gaussian Membership

Files in This Item:

The file(s) associated with this item can be obtained from the following URL: https://doi.org/10.2196/38677

Title:	Extraction and Quantification of Words Representing Degrees of Diseases : Combining the Fuzzy C-Means Method and Gaussian Membership
Authors:	Han, Feng Browse this author
	Zhang, ZiHeng Browse this author
	Zhang, Hongjian Browse this author
	Nakaya, Jun Browse this author →KAKEN DB
	Kudo, Kohsuke Browse this author →KAKEN DB
	Ogasawara, Katsuhiko Browse this author →KAKEN DB
Keywords:	medical text
	fuzzy c-means
	cluster
	algorithm
	machine learning
	word quantification
	fuzzification
	Gauss
	radiology
	medical report
	documentation
	text mining
	data mining
	extraction
	unstructured
	free text
	quantification
	fuzzy
	diagnosis
	diagnostic
	EHR
	support system
Issue Date:	18-Nov-2022
Publisher:	Journal of Medical Internet Research（JMIR）
Journal Title:	JMIR Formative Research
Volume:	6
Issue:	11
Start Page:	e38677
Publisher DOI:	10.2196/38677
PMID:	36399376
Abstract:	Background: Due to the development of medical data, a large amount of clinical data has been generated. These unstructured data contain substantial information. Extracting useful knowledge from this data and making scientific decisions for diagnosing and treating diseases have become increasingly necessary. Unstructured data, such as in the Marketplace for Medical Information in Intensive Care III (MIMIC-III) data set, contain several ambiguous words that demonstrate the subjectivity of doctors, such as descriptions of patient symptoms. These data could be used to further improve the accuracy of medical diagnostic system assessments. To the best of our knowledge, there is currently no method for extracting subjective words that express the extent of these symptoms (hereinafter, degree words). Objective: Therefore, we propose using the fuzzy c-means (FCM) method and Gaussian membership to quantify the degree words in the clinical medical data set MIMIC-III. Methods: First, we preprocessed the 381,091 radiology reports collected in MIMIC-III, and then we used the FCM method to extract degree words from unstructured text. Thereafter, we used the Gaussian membership method to quantify the extracted degree words, which transform the fuzzy words extracted from the medical text into computer-recognizable numbers. Results: The results showed that the digitization of ambiguous words in medical texts is feasible. The words representing each degree of each disease had a range of corresponding values. Examples of membership medians were 2.971 (atelectasis), 3.121 (pneumonia), 2.899 (pneumothorax), 3.051 (pulmonary edema), and 2.435 (pulmonary embolus). Additionally, all extracted words contained the same subjective words (low, high, etc), which allows for an objective evaluation method. Furthermore, we will verify the specific impact of the quantification results of ambiguous words such as symptom words and degree words on the use of medical texts in subsequent studies. These same ambiguous words may be used as a new set of feature values to represent the disorders. Conclusions: This study proposes an innovative method for handling subjective words. We used the FCM method to extract the subjective degree words in the English-interpreted report of the MIMIC-III and then used the Gaussian functions to quantify the subjective degree words. In this method, words containing subjectivity in unstructured texts can be automatically processed and transformed into numerical ranges by digital processing. It was concluded that the digitization of ambiguous words in medical texts is feasible.
Type:	article
URI:	http://hdl.handle.net/2115/88111
Appears in Collections:	保健科学院・保健科学研究院 (Graduate School of Health Sciences / Faculty of Health Sciences) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University