Capturing Truthiness: Mining Truth Tables in Binary Datasets
2007) Capturing Truthiness: Mining Truth Tables in Binary Datasets. Technical Report TR-07-10, Computer Science, Virginia Tech. (
We introduce a new data mining problem: mining truth tables in binary datasets. Given a matrix of objects and the properties they satisfy, a truth table identifies a subset of properties that exhibit maximal variability (and hence, complete independence) in occurrence patterns over the underlying objects. This problem is relevant in many domains, e.g., bioinformatics where we seek to identify and model independent components of combinatorial regulatory pathways, and in social/economic demographics where we desire to determine independent behavioral attributes of populations. Besides intrinsic interest in such patterns, we show how the problem of mining truth tables is dual to the problem of mining redescriptions, in that a set of properties involved in a truth table cannot participate in any possible redescription. This allows us to adapt our algorithm to the problem of mining redescriptions as well, by first identifying regions where redescriptions cannot happen, and then pursuing a divide and conquer strategy around these regions. Furthermore, our work suggests dual mining strategies where both classes of algorithms can be brought to bear upon either data mining task. We outline a family of levelwise approaches adapted to mining truth tables, algorithmic optimizations, and applications to bioinformatics and political datasets.
|Item Type:||Departmental Technical Report|
|Keywords:||Data mining; truth tables; levelwise algorithms.|
|Subjects:||Computer Science > Algorithms and Data Structure|
|Deposited By:||Ramakrishnan, Naren|
|Deposited On:||13 March 2007|