Apriori Class Reference

This class implements the APRIORI algirithm. More...

#include <Apriori.hpp>

Collaboration diagram for Apriori:


Public Member Functions
	Apriori (ifstream &basket_file, const char *output_file_name)
void	APRIORI_alg (const unsigned long min_supp)
	This procedure implements the APRIORI algorithm.
	~Apriori ()
Private Member Functions
void	support (const itemtype &candidate_size)
	Determines the support of the candidates of the given size.
Private Attributes
Apriori_Trie *	apriori_trie
	A trie that stores the frequent itemset and candidates.
Input_Output_Manager	input_output_manager
	The input_output_manager that is responsibel for the input, output and recoding operations.
map< vector< itemtype >, unsigned long >	reduced_baskets
	This will store the reduced baskets, if store_input=true;.
bool	store_input
	If store_input = true, then the reduced baskets will be stored in memory.

Detailed Description

This class implements the APRIORI algirithm.

APRIORI is a levelwise algorithm. It scans the transaction database several times. After the first scan the frequent 1-itemsets are found, and in general after the k^th scan the frequent k-itemsets are extracted. The method does not determine the support of every possible itemset. In an attempt to narrow the domain to be searched, before every pass it generates candidate itemsets. An itemset becomes a candidate if every subset of it is frequent. Obviously every frequent itemset needs to be candidate too, hence only the support of candidates is calculated. Frequent k-itemsets generate the candidate k+1-itemsets after the $k^{th}$ scan.

After all the candidate k+1-itemsets have been generated, a new scan of the transactions is effected and the precise support of the candidates is determined. The candidates with low support are thrown away. The algorithm ends when no candidates can be generated.

The intuition behind candidate generation is based on the following simple fact:

Every subset of a frequent itemset is frequent.

This is immediate, because if a transaction t supports an itemset X, then t supports every subset $Y\subseteq X$ .

Using the fact indirectly, we infer, that if an itemset has a subset that is infrequent, then it cannot be frequent. So in the algorithm APRIORI only those itemsets will be candidates whose every subset is frequent. The frequent k-itemsets are available when we attempt to generate candidate k+1-itemsets. The algorithm seeks candidate k+1-itemsets among the sets which are unions of two frequent k-itemsets. After forming the union we need to verify that all of its subsets are frequent, otherwise it should not be a candidate. To this end, it is clearly enough to check if all the k-subsets of X are frequent.

Next the supports of the candidates are calculated. This is done by reading transactions one by one. For each transaction t the algorithm decides which candidates are supported by t. To solve this task efficiently APRIORI uses a hash-tree. However in this implementation a trie (prefix-tree) is applied. Tries have many advantages over hash-trees.

It is faster
It needs no parameters (main drawback of a hash-tree is that its performance is very sensitive to the parameteres)
The candidate generation is very simple.

Definition at line 78 of file Apriori.hpp.

Constructor & Destructor Documentation

Apriori::Apriori ( ifstream & basket_file,

const char * output_file_name

)

Parameters:

basket_file The file that contain the transactions.

output_file_name The name of file where the results have to be written to.

Definition at line 42 of file Apriori.cpp.

Apriori::~Apriori ( )

Definition at line 83 of file Apriori.cpp.
References apriori_trie.

Member Function Documentation

void Apriori::APRIORI_alg ( const unsigned long min_supp )

This procedure implements the APRIORI algorithm.

Parameters:

min_supp The relative support threshold

Definition at line 50 of file Apriori.cpp.
References apriori_trie, Apriori_Trie::candidate_generation(), Apriori_Trie::delete_infrequent(), Input_Output_Manager::find_frequent_items(), input_output_manager, Apriori_Trie::insert_frequent_items(), Apriori_Trie::is_there_any_candidate(), itemtype, Input_Output_Manager::rewind(), Apriori_Trie::statistics(), support(), and Input_Output_Manager::write_out_basket_and_counter().
Referenced by main().

void Apriori::support ( const itemtype & candidate_size ) [private]

Determines the support of the candidates of the given size.

Parameters:

candidate_size The size of the candidate whose support has top be determined.

Definition at line 21 of file Apriori.cpp.
References apriori_trie, Input_Output_Manager::basket_recode(), Apriori_Trie::find_candidate(), input_output_manager, itemtype, Input_Output_Manager::read_in_a_line(), and reduced_baskets.
Referenced by APRIORI_alg().

Member Data Documentation

Apriori_Trie* Apriori::apriori_trie [private]

A trie that stores the frequent itemset and candidates.

Definition at line 95 of file Apriori.hpp.
Referenced by APRIORI_alg(), support(), and ~Apriori().

Input_Output_Manager Apriori::input_output_manager [private]

The input_output_manager that is responsibel for the input, output and recoding operations.

Definition at line 98 of file Apriori.hpp.
Referenced by APRIORI_alg(), and support().

map<vector<itemtype>, unsigned long> Apriori::reduced_baskets [private]

This will store the reduced baskets, if store_input=true;.

Definition at line 100 of file Apriori.hpp.
Referenced by support().

bool Apriori::store_input [private]

If store_input = true, then the reduced baskets will be stored in memory.

Definition at line 103 of file Apriori.hpp.

The documentation for this class was generated from the following files:

Generated on Fri Sep 3 17:23:52 2004 for APRIORI algorithm by

1.3.5

Apriori Class Reference

Public Member Functions

Private Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation