### How to Handle “Missing Values” in Linguistic Typology : A Pitfall in the Statistical Modelling Approach

*Ono, Yohei*

Permalink : http://hdl.handle.net/2115/77605

KEYWORDS : Computational Linguistics;Descriptive Linguistics;Probability Axioms;Statistical Imputation;The World Atlas of Language Structure

#### Abstract

There are two mainstreams in statistical typology: one that learns WALS data with probability distribution, the other that elucidates WALS data without probability distribution. These two streams differ in the following three points: (1) the selection of WALS data in the analysis; (2) the purpose of applying statistical methods to WALS data; (3) the selection of statistical methods based on the previous two points. This paper focuses on the first stream, called the“statistical modelling approach”in this paper, and discusses whether probability distribution can apply to “missing values”in WALS from the viewpoint of linguistic materials, taking Ainu, Chukchi, Khalkha, and Navajo as examples. The results demonstrate that“missing values”are not dealt with in the context of linguistic materials but conform to statistical notions, which enables information scientists/statisticians to apply probability function and probabilistic modelling. Thus, the statistical modelling approach does not learn what WALS data represent in terms of substantive linguistics knowledge and distorts WALS data in the statistical context. This raises a question regarding the fundamentals of statistical typology with the statistical modelling approach. Statistical typology should primarily address how the missing values in WALS are dealt with using the probability function.The findings indicate that interdisciplinary research among the humanities and information science/statistics necessitates that information scientists/statisticians explain their research using linguistics concepts and that linguists explain their research using concepts from information science/statistics. This will enable mutual responses from both fields, with appropriate feedback from substantive knowledge, as well as constructive complementary studies.

