HUSCAP logo Hokkaido Univ. logo

Hokkaido University Collection of Scholarly and Academic Papers >
Graduate School of Science / Faculty of Science >
Peer-reviewed Journal Articles, etc >

Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks

Files in This Item:

The file(s) associated with this item can be obtained from the following URL:

Title: Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks
Authors: Nakamura, Tomohiro Browse this author
Sakaue, Shinsaku Browse this author
Fujii, Kaito Browse this author
Harabuchi, Yu Browse this author
Maeda, Satoshi Browse this author
Iwata, Satoru Browse this author
Issue Date: 2022
Publisher: Nature Portfolio
Journal Title: Scientific reports
Volume: 12
Issue: 1
Start Page: 1124
Publisher DOI: 10.1038/s41598-022-04967-9
Abstract: Selecting diverse molecules from unexplored areas of chemical space is one of the most important tasks for discovering novel molecules and reactions. This paper proposes a new approach for selecting a subset of diverse molecules from a given molecular list by using two existing techniques studied in machine learning and mathematical optimization: graph neural networks (GNNs) for learning vector representation of molecules and a diverse-selection framework called submodular function maximization. Our method, called SubMo-GNN, first trains a GNN with property prediction tasks, and then the trained GNN transforms molecular graphs into molecular vectors, which capture both properties and structures of molecules. Finally, to obtain a subset of diverse molecules, we define a submodular function, which quantifies the diversity of molecular vectors, and find a subset of molecular vectors with a large submodular function value. This can be done efficiently by using the greedy algorithm, and the diversity of selected molecules measured by the submodular function value is mathematically guaranteed to be at least 63% of that of an optimal selection. We also introduce a new evaluation criterion to measure the diversity of selected molecules based on molecular properties. Computational experiments confirm that our SubMo-GNN successfully selects diverse molecules from the QM9 dataset regarding the property-based criterion, while performing comparably to existing methods regarding standard structure-based criteria. We also demonstrate that SubMo-GNN with a GNN trained on the QM9 dataset can select diverse molecules even from other MoleculeNet datasets whose domains are different from the QM9 dataset. The proposed method enables researchers to obtain diverse sets of molecules for discovering new molecules and novel chemical reactions, and the proposed diversity criterion is useful for discussing the diversity of molecular libraries from a new property-based perspective.
Type: article
Appears in Collections:理学院・理学研究院 (Graduate School of Science / Faculty of Science) > 雑誌発表論文等 (Peer-reviewed Journal Articles, etc)

Export metadata:

OAI-PMH ( junii2 , jpcoar_1.0 )

MathJax is now OFF:


 - Hokkaido University