Studies on the Information Extraction of Inorganic Material Synthesis Procedure from Literature

Liu Shanshan


We already see the huge benefits of data-driven methods in many fields, however, researchers in the field of inorganic chemistry still can not take advantage of it and spend a lot of time reading journals and literature and manually organizing the information they need. This study refers to use Natural Language Processing (NLP) technology to automatically extract material synthesis procedures from the literature to meet the data needs in this field. First, we define the synthesis procedure so that the entire task is divided into two classic natural language processing tasks, named entity recognition and relationship extraction. We tried to apply neural network models that excel in other fields and found the shortcomings and development directions of these methods in the field. In addition, we also tried the rule-based method. Experiments show that the rule-based methods perform well in our task. The neural network method combined with the rules also performs better than the neural network method without artificial features. Our research also reveals the complexity and difficulty of such extraction problems and provides an improvement direction for further research.