Materials informatics is the approach to develop materials using combination of materials science and informatics techniques. It should be noted that high throughput experiment is also the key to materials informatics. Nowadays, in both academic and industrial fields, there are many reports which use Materials informatics in real problem to understand mechanism, predict properties or design molecules. Therefore, the main topic in chemical industry is now changing from ‘whether informatics approach become useful’ to ‘how should informatics approach be applied to real dataset’. The reason why is the lack of enough complete dataset in real difficult problems you have to solve in industry.
In this study, I will propose how to apply data-science techniques to real incomplete dataset and obtain helpful knowledge. In this thesis, I discuss how I should apply data-science approach to small and incomplete dataset describing polymer property data. Considering the dataset of polymer property includes missing value, how it should be considered is discussed. It will be also shown that unsupervised manner is useful to understand the relationships of properties.
Using the incomplete dataset, I will propose the way to predict polymer properties by data-science approach. It is simple way based on monomer unit structure information and be able to deal with various properties. I performed to evaluate the reliability prediction models and also propose judging generalization performance with data already obtained. On the basis of above consideration, I will discuss the situation of materials informatics in industrial use in the future.