Some Results on Finding Meaningful Association Rules in Data Mining

松山 哲平 (9951106)


One piece of useful information obtained by data mining from a database is a large itemset. Large itemset is a set of items that frequently occur together in a database. Another useful information obtained by data mining is a meaningful association rule. An association rule is a formula of the form X → Y, where X and Y are disjoint itemsets. An intuitive meaning of this formula in a sales database is that when all the items in X are purchased by a customer, all the items in Y are likely to be purchased together.

One of the well-studied problems in data mining is the search for meaningful association rules in a database which contains massive amount of transactions. One way to find a meaningful association rule is to compute a large itemset Z first, and then to divide Z into left-hand side X and right-hand side Y such that X → Y is a meaningful association rule. A number of algorithms for computing all the large itemsets have been proposed, however, the computational complexity of their algorithms has scarcely been discussed. Most of the performances of them are estimated only by empirical evaluation through test data. On the other hand, the problem to decide whether there exits a large itemset of size at least given integer in a database has shown to be NP-complete. In respect of the division of a large itemset, it has been believed that it is rather straightforward to divide Z into X and Y such that X → Y is a meaningful association rule, if Z is given.

In this paper, we propose rare itemset as a useful information. Rare itemset is a set of items that rarely occur together in a database. We show that the problem to decide whether there exists a rare itemset of size at most given integer in a database is NP-complete. Moreover, we show that the problem to decide whether there exists a meaningful association rule is NP-complete, even if a large itemset is given. Also, we propose class of databases in which all the meaningful association rules can be computed efficiently.