## Some Results on Finding Meaningful Association Rules in Data Mining

## 松山 哲平 (9951106)

One piece of useful information obtained by data mining from a database is a large itemset.
Large itemset is a set of items that frequently occur together in a database.
Another useful information obtained by data mining is a meaningful association rule.
An association rule is a formula of the form X → Y,
where X and Y are disjoint itemsets.
An intuitive meaning of this formula in a sales database is that when all the items in X are purchased by a customer,
all the items in Y are likely to be purchased together.
One of the well-studied problems in data mining is the search for meaningful
association rules in a database which contains massive amount of
transactions.
One way to find a meaningful
association rule is to compute a large itemset Z first, and then to divide
Z into left-hand side X and right-hand side Y such that
X → Y is a meaningful association rule.
A number of algorithms for computing all the large itemsets
have been proposed, however,
the computational complexity of their algorithms
has scarcely been discussed.
Most of the performances of them are estimated only by empirical evaluation through test data.
On the other hand, the problem to decide whether there
exits a large itemset of size at least given integer in a database
has shown to be NP-complete.
In respect of the division of a large itemset,
it has been believed that it is rather straightforward to divide Z into X and Y such that
X → Y is a meaningful association rule, if Z is given.

In this paper, we propose rare itemset as a useful information.
Rare itemset is a set of items that rarely occur together in a database.
We show that the problem to decide
whether there exists a rare itemset of size at most given integer
in a database is NP-complete.
Moreover, we show that the problem to decide whether there exists a meaningful
association rule is NP-complete, even if a large itemset is given.
Also, we propose class of databases in which all the meaningful association rules
can be computed efficiently.