Package 'arc' reference manual

Title:	Association Rule Classification
Description:	Implements the Classification-based on Association Rules (CBA) algorithm for association rule classification. The package, also described in Hahsler et al. (2019) <doi:10.32614/RJ-2019-048>, contains several convenience methods that allow to automatically set CBA parameters (minimum confidence, minimum support) and it also natively handles numeric attributes by integrating a pre-discretization step. The rule generation phase is handled by the 'arules' package. To further decrease the size of the CBA models produced by the 'arc' package, postprocessing by the 'qCBA' package is suggested.
Authors:	Tomas Kliegr [aut, cre]
Maintainer:	Tomas Kliegr <[email protected]>
License:	GPL-3
Version:	1.4.1
Built:	2025-03-07 04:49:19 UTC
Source:	https://github.com/kliegr/arc

Apply Cut Points to Vector

Description

Applies cut points to vector.

Usage

applyCut(col, cuts, infinite_bounds, labels)
applyCut(col, cuts, infinite_bounds, labels)

Arguments

`col`	input vector with data.
`cuts`	vector with cutpoints. There are several special values defined: `NULL` indicates that no discretization will be performed, but the value will be converted to factor `"All"` indicates all values will be merged into one.
`infinite_bounds`	a logical indicating how the bounds on the extremes should look like. If set to `FALSE`, the leftmost/rightmost intervals will be bounded by the minimum and maximum in the respective column. If set to `TRUE`, the leftmost/rightmost intervals will be bounded by negative and positive infinity.
`labels`	a logical indicating whether the bins of the discretized data should be represented by integer codes or as interval notation using (a;b] when set to TRUE.

Value

Vector with discretized data.

Examples

  applyCut(datasets::iris[[1]], c(3,6), TRUE, TRUE)
applyCut(datasets::iris[[1]], c(3,6), TRUE, TRUE)

Apply Cut Points to Data Frame

Description

Applies cut points to input data frame.

Usage

applyCuts(df, cutp, infinite_bounds, labels)
applyCuts(df, cutp, infinite_bounds, labels)

Arguments

`df`	input data frame.
`cutp`	a list of vectors with cutpoints (for more information see `applyCut`).
`infinite_bounds`	a logical indicating how the bounds on the extremes should look like (for more information see `applyCut`)
`labels`	a logical indicating whether the bins of the discretized data should be represented by integer codes or as interval notation using (a;b] when set to TRUE.

Value

discretized data. If there was no discretization specified for some columns, these are returned as is.

Examples

  applyCuts(datasets::iris, list(c(5,6), c(2,3), "All", NULL, NULL), TRUE, TRUE)

applyCuts(datasets::iris, list(c(5,6), c(2,3), "All", NULL, NULL), TRUE, TRUE)

CBA Classifier

Description

Learns a CBA rule set from supplied dataframe.

Usage

cba(train, classAtt, rulelearning_options = NULL, pruning_options = NULL)
cba(train, classAtt, rulelearning_options = NULL, pruning_options = NULL)

Arguments

`train`	a data frame with data.
`classAtt`	the name of the class attribute.
`rulelearning_options`	custom options for the rule learning algorithm overriding the default values. If not specified, the the topRules function is called and defaults specified there are used `target_rule_count` (int) mining stops when the resulting rule set contains this number of rules; `trim` (boolean) if set to TRUE and more than `target_rule_count` is discovered, only first `target_rule_count` rules will be returned. `minsupp` (float) minimum support threshold `minconf` (float) minimum confidence threshold `minlen` (int) minimum length of rules, minlen=1 corresponds to rule with empty antecedent and one item in consequent. In general, rules with empty antecedent are not desirable for the subsequent pruning algorithm, therefore the value of this parameter should be set at least to value 2. `maxlen` (int) maximum length of rules, should be equal or higher than minlen. A higher value may decrease the number of iterations to obtain target_rule_count rules, but it also increases the risk of initial combinatorial explosion and subsequent memory crash of the apriori rule learner. `maxtime` (int) maximum number of seconds it should take 'apriori' to obtain rules. `find_conf_supp_thresholds` (boolean) whether to use automatic threshold detection or not.
`pruning_options`	custom options for the pruning algorithm overriding the default values.

Value

Object of class CBARuleModel.

Examples

 # Example using automatic threshold detection
 cba(datasets::iris, "Species", rulelearning_options = list(target_rule_count = 50000))
 # Example using manually set confidence and support thresholds
 rm <- cba(datasets::iris, "Species", rulelearning_options = list(minsupp=0.01,
   minconf=0.5, minlen=1, maxlen=5, maxtime=1000, target_rule_count=50000, trim=TRUE,
   find_conf_supp_thresholds=FALSE))
 inspect(rm@rules)
# Example using automatic threshold detection
 cba(datasets::iris, "Species", rulelearning_options = list(target_rule_count = 50000))
 # Example using manually set confidence and support thresholds
 rm <- cba(datasets::iris, "Species", rulelearning_options = list(minsupp=0.01,
   minconf=0.5, minlen=1, maxlen=5, maxtime=1000, target_rule_count=50000, trim=TRUE,
   find_conf_supp_thresholds=FALSE))
 inspect(rm@rules)

CBA Classifier from provided rules

Description

Learns a CBA rule set from supplied rules

Usage

cba_manual(
  train_raw,
  rules,
  txns,
  rhs,
  classAtt,
  cutp,
  pruning_options = list(input_list_sorted_by_length = FALSE)
)
cba_manual(
  train_raw,
  rules,
  txns,
  rhs,
  classAtt,
  cutp,
  pruning_options = list(input_list_sorted_by_length = FALSE)
)

Arguments

`train_raw`	a data frame with raw data (numeric attributes are not discretized).
`rules`	Rules class instance output by the apriori package
`txns`	Transactions class instance passed to the arules method invocation. Transactions are created over discretized data frame - numeric values are replaced with intervals such as "(13;45]".
`rhs`	character vectors giving the labels of the items which can appear in the RHS ($rhs element of the APappearance class instance passed to the arules call)
`classAtt`	the name of the class attribute.
`cutp`	list of cutpoints used to discretize data (required for application of the model on continuous data)
`pruning_options`	custom options for the pruning algorithm overriding the default values.

Value

Object of class CBARuleModel.

Examples

  data(humtemp)
  data_raw<-humtemp
  data_discr <- humtemp

  #custom discretization
  data_discr[,1]<-cut(humtemp[,1],breaks=seq(from=15,to=45,by=5))
  data_discr[,2]<-cut(humtemp[,2],breaks=c(0,40,60,80,100))

  #change interval syntax from (15,20] to (15;20], which is required by MARC
  data_discr[,1]<-as.factor(unlist(lapply(data_discr[,1], function(x) {gsub(",", ";", x)})))
  data_discr[,2]<-as.factor(unlist(lapply(data_discr[,2], function(x) {gsub(",", ";", x)})))
  data_discr[,3] <- as.factor(humtemp[,3])

  #mine rules
  classAtt="Class"
  appearance <- getAppearance(data_discr, classAtt)
  txns_discr <- as(data_discr, "transactions")
  rules <- apriori(txns_discr, parameter =
   list(confidence = 0.5, support= 3/nrow(data_discr), minlen=1, maxlen=5), appearance=appearance)
  inspect(rules)


  rmCBA <- cba_manual(data_raw,  rules, txns_discr, appearance$rhs,
  classAtt, cutp= list(), pruning_options=NULL)
  inspect (rmCBA@rules)
  prediction <- predict(rmCBA,data_discr,discretize=FALSE)
  acc <- CBARuleModelAccuracy(prediction, data_discr[[classAtt]])
  print(paste("Accuracy:",acc))
data(humtemp)
  data_raw<-humtemp
  data_discr <- humtemp

  #custom discretization
  data_discr[,1]<-cut(humtemp[,1],breaks=seq(from=15,to=45,by=5))
  data_discr[,2]<-cut(humtemp[,2],breaks=c(0,40,60,80,100))

  #change interval syntax from (15,20] to (15;20], which is required by MARC
  data_discr[,1]<-as.factor(unlist(lapply(data_discr[,1], function(x) {gsub(",", ";", x)})))
  data_discr[,2]<-as.factor(unlist(lapply(data_discr[,2], function(x) {gsub(",", ";", x)})))
  data_discr[,3] <- as.factor(humtemp[,3])

  #mine rules
  classAtt="Class"
  appearance <- getAppearance(data_discr, classAtt)
  txns_discr <- as(data_discr, "transactions")
  rules <- apriori(txns_discr, parameter =
   list(confidence = 0.5, support= 3/nrow(data_discr), minlen=1, maxlen=5), appearance=appearance)
  inspect(rules)


  rmCBA <- cba_manual(data_raw,  rules, txns_discr, appearance$rhs,
  classAtt, cutp= list(), pruning_options=NULL)
  inspect (rmCBA@rules)
  prediction <- predict(rmCBA,data_discr,discretize=FALSE)
  acc <- CBARuleModelAccuracy(prediction, data_discr[[classAtt]])
  print(paste("Accuracy:",acc))

Example CBA Workflow with CSV Input

Description

Learns a CBA rule set and saves the resulting rule set back to csv.

Usage

cbaCSV(
  path,
  outpath = NULL,
  classAtt = NULL,
  idcolumn = NULL,
  rulelearning_options = NULL,
  pruning_options = NULL
)
cbaCSV(
  path,
  outpath = NULL,
  classAtt = NULL,
  idcolumn = NULL,
  rulelearning_options = NULL,
  pruning_options = NULL
)

Arguments

`path`	path to csv file with data.
`outpath`	path to write the rule set to.
`classAtt`	the name of the class attribute.
`idcolumn`	the name of the id column in the dataf ile.
`rulelearning_options`	custom options for the rule learning algorithm overriding the default values.
`pruning_options`	custom options for the pruning algorithm overriding the default values.

Value

Object of class CBARuleModel

Examples

 # cbaCSV("path-to-.csv")


# cbaCSV("path-to-.csv")

Test CBA Workflow on Iris Dataset

Description

Test workflow on iris dataset: learns a cba classifier on one "train set" fold , and applies it to the second "test set" fold.

Usage

cbaIris()
cbaIris()

Value

Accuracy.

Test CBA Workflow on Iris Dataset with numeric target

Description

Test workflow on iris dataset: learns a cba classifier on one "train set" fold, and applies it to the second "test set" fold.

Usage

cbaIrisNumeric()
cbaIrisNumeric()

Value

Accuracy.

CBARuleModel

Description

This class represents a rule-based classifier.

Slots

rules: an object of class rules from arules package
cutp: list of cutpoints
classAtt: name of the target class attribute
attTypes: attribute types

Unsupervised Discretization

Description

Discretizes provided numeric vector.

Usage

discretizeUnsupervised(
  data,
  labels = FALSE,
  infinite_bounds = FALSE,
  categories = 3,
  method = "cluster"
)
discretizeUnsupervised(
  data,
  labels = FALSE,
  infinite_bounds = FALSE,
  categories = 3,
  method = "cluster"
)

Arguments

`data`	input numeric vector.
`labels`	a logical indicating whether the bins of the discretized data should be represented by integer codes or as interval notation using (a;b] when set to TRUE.
`infinite_bounds`	a logical indicating how the bounds on the extremes should look like.
`categories`	number of categories (bins) to produce.
`method`	clustering method, one of "interval" (equal interval width), "frequency" (equal frequency), "cluster" (k-means clustering). See also documentation of the `discretize` function from the arules package.

Value

Discretized data. If there was no discretization specified for some columns, these are returned as is.

Examples

  discretizeUnsupervised(datasets::iris[[1]])

discretizeUnsupervised(datasets::iris[[1]])

Discretize Numeric Columns In Data frame

Description

Can discretize both predictor columns in data frame – using supervised algorithm MDLP (Fayyad & Irani, 1993) – and the target class – using unsupervised algorithm (k-Means). This R file contains fragments of code from the GPL-licensed R discretization package by HyunJi Kim.

Usage

discrNumeric(
  df,
  classatt,
  min_distinct_values = 3,
  unsupervised_bins = 3,
  discretize_class = FALSE
)
discrNumeric(
  df,
  classatt,
  min_distinct_values = 3,
  unsupervised_bins = 3,
  discretize_class = FALSE
)

Arguments

`df`	a data frame with data.
`classatt`	name the class attribute in df
`min_distinct_values`	the minimum number of unique values a column needs to have to be subject to supervised discretization.
`unsupervised_bins`	number of target bins for discretizing the class attribute. Ignored when the class attribute is not numeric or when `discretize_class` is set to FALSE.
`discretize_class`	logical value indicating whether the class attribute should be discretized. Ignored when the class attribute is not numeric.

Value

list with two slots: $cutp with cutpoints and $Disc.data with discretization results

References

Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning, Artificial intelligence 13, 1022–1027

Examples

  discrNumeric(datasets::iris, "Species")

discrNumeric(datasets::iris, "Species")

Method that generates items for values in given data frame column.

Description

Method that generates items for values in given data frame column.

Usage

getAppearance(df, classAtt)
getAppearance(df, classAtt)

Arguments

`df`	a data frame contain column `classAtt`.
`classAtt`	name of the column in `df` to generate items for.

Value

appearance object for mining classification rules

Examples

getAppearance(datasets::iris,"Species")

getAppearance(datasets::iris,"Species")

Returns vector with confidences for the positive class (useful for ROC or AUC computation)

Description

Methods for computing ROC curves require a vector of confidences of the positive class, while in CBA, the confidence returned by predict with outputProbabilies = TRUE returns confidence for the predicted class. This method converts the values to confidences for the positive class

Usage

getConfVectorForROC(confidences, predictedClass, positiveClass)
getConfVectorForROC(confidences, predictedClass, positiveClass)

Arguments

`confidences`	Vector of confidences
`predictedClass`	Vector with predicted classes
`positiveClass`	Positive class (String)

Value

Vector of confidence values

Examples

predictedClass = c("setosa","virginica")
confidences = c(0.9,0.6)
baseClass="setosa"
getConfVectorForROC(confidences,predictedClass,baseClass)

# Further examples showing how ROC curve and AUC values can be computed
# using this function are available at project's GitHub homepage.
predictedClass = c("setosa","virginica")
confidences = c(0.9,0.6)
baseClass="setosa"
getConfVectorForROC(confidences,predictedClass,baseClass)

# Further examples showing how ROC curve and AUC values can be computed
# using this function are available at project's GitHub homepage.

Comfort level based on temperature and humidity of the environment

Description

A syntetic toy dataset. The variables are as follows:

Usage

data(humtemp)
data(humtemp)

Format

A data frame with 34 rows and 3 variables

Details

Temperature.
Humidity.
Class. Comfort level

Supervised Discretization

Description

Performs supervised discretization of numeric columns, except class, on the provided data frame. Uses the Minimum Description Length Principle algorithm (Fayyed and Irani, 1993) as implemented in the discretization package.

Usage

mdlp2(
  df,
  cl_index = NULL,
  handle_missing = FALSE,
  labels = FALSE,
  skip_nonnumeric = FALSE,
  infinite_bounds = FALSE,
  min_distinct_values = 3
)
mdlp2(
  df,
  cl_index = NULL,
  handle_missing = FALSE,
  labels = FALSE,
  skip_nonnumeric = FALSE,
  infinite_bounds = FALSE,
  min_distinct_values = 3
)

Arguments

`df`	input data frame.
`cl_index`	index of the class variable. If not specified, the last column is used as the class variable.
`handle_missing`	Setting to TRUE activates the following behaviour: if there are any missing observations in the column processed, the input for discretization is a subset of data containing this column and target with rows containing missing values excuded.
`labels`	A logical indicating whether the bins of the discretized data should be represented by integer codes or as interval notation using (a;b] when set to TRUE.
`skip_nonnumeric`	If set to TRUE, any non-numeric columns will be skipped.
`infinite_bounds`	A logical indicating how the bounds on the extremes should look like.
`min_distinct_values`	If a column contains less than specified number of distinct values, it is not discretized.

Value

Discretized data. If there were any non-numeric input columns they are returned as is. All returned columns except class are factors.

References

Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning, Artificial intelligence 13, 1022–1027

Examples

  mdlp2(datasets::iris) #gives the same result as mdlp(datasets::iris) from discretize package
  #uses Sepal.Length as target variable
  mdlp2(df=datasets::iris, cl_index = 1,handle_missing = TRUE, labels = TRUE,
  skip_nonnumeric = TRUE, infinite_bounds = TRUE, min_distinct_values = 30)

mdlp2(datasets::iris) #gives the same result as mdlp(datasets::iris) from discretize package
  #uses Sepal.Length as target variable
  mdlp2(df=datasets::iris, cl_index = 1,handle_missing = TRUE, labels = TRUE,
  skip_nonnumeric = TRUE, infinite_bounds = TRUE, min_distinct_values = 30)

Apply Rule Model

Description

Method that matches rule model against test data.

Usage

## S3 method for class 'CBARuleModel'
predict(
  object,
  data,
  discretize = TRUE,
  outputFiringRuleIDs = FALSE,
  outputConfidenceScores = FALSE,
  confScoreType = "ordered",
  positiveClass = NULL,
  ...
)
## S3 method for class 'CBARuleModel'
predict(
  object,
  data,
  discretize = TRUE,
  outputFiringRuleIDs = FALSE,
  outputConfidenceScores = FALSE,
  confScoreType = "ordered",
  positiveClass = NULL,
  ...
)

Arguments

`object`	a CBARuleModel class instance
`data`	a data frame with data
`discretize`	boolean indicating whether the passed data should be discretized using information in the passed @cutp slot of the ruleModel argument.
`outputFiringRuleIDs`	if set to TRUE, instead of predictions, the function will return one-based IDs of rules used to classify each instance (one rule per instance).
`outputConfidenceScores`	if set to TRUE, instead of predictions, the function will return confidences of the firing rule
`confScoreType`	applicable only if 'outputConfidenceScores=TRUE', possible values 'ordered' for confidence computed only for training instances reaching this rule, or 'global' for standard rule confidence computed from the complete training data
`positiveClass`	This setting is only used if 'outputConfidenceScores=TRUE'. It should be used only for binary problems. In this case, the confidence values are recalculated so that these are not confidence values of the predicted class (default behaviour of 'outputConfidenceScores=TRUE') but rather confidence values associated with the class designated as positive
`...`	other arguments (currently not used)

Value

A vector with predictions.

Examples

  set.seed(101)
  allData <- datasets::iris[sample(nrow(datasets::iris)),]
  trainFold <- allData[1:100,]
  testFold <- allData[101:nrow(allData),]
  #increase for more accurate results in longer time
  target_rule_count <- 1000
  classAtt <- "Species"
  rm <- cba(trainFold, classAtt, list(target_rule_count = target_rule_count))
  prediction <- predict(rm, testFold)
  acc <- CBARuleModelAccuracy(prediction, testFold[[classAtt]])
  message(acc)
  # get rules responsible for each prediction
  firingRuleIDs <- predict(rm, testFold, outputFiringRuleIDs=TRUE)
  # show rule responsible for prediction of test instance no. 28
  inspect(rm@rules[firingRuleIDs[28]])
  # get prediction confidence (three different versions)
  rm@rules[firingRuleIDs[28]]@quality$confidence
  rm@rules[firingRuleIDs[28]]@quality$orderedConf
  rm@rules[firingRuleIDs[28]]@quality$cumulativeConf
set.seed(101)
  allData <- datasets::iris[sample(nrow(datasets::iris)),]
  trainFold <- allData[1:100,]
  testFold <- allData[101:nrow(allData),]
  #increase for more accurate results in longer time
  target_rule_count <- 1000
  classAtt <- "Species"
  rm <- cba(trainFold, classAtt, list(target_rule_count = target_rule_count))
  prediction <- predict(rm, testFold)
  acc <- CBARuleModelAccuracy(prediction, testFold[[classAtt]])
  message(acc)
  # get rules responsible for each prediction
  firingRuleIDs <- predict(rm, testFold, outputFiringRuleIDs=TRUE)
  # show rule responsible for prediction of test instance no. 28
  inspect(rm@rules[firingRuleIDs[28]])
  # get prediction confidence (three different versions)
  rm@rules[firingRuleIDs[28]]@quality$confidence
  rm@rules[firingRuleIDs[28]]@quality$orderedConf
  rm@rules[firingRuleIDs[28]]@quality$cumulativeConf

Classifier Builder

Description

An implementation of the CBA-CB M1 algorithm (Liu et al, 1998) adapted for R and arules package apriori implementation in place of CBA-RG.

Usage

prune(
  rules,
  txns,
  classitems,
  default_rule_pruning = TRUE,
  rule_window = 50000,
  greedy_pruning = FALSE,
  input_list_sorted_by_length = TRUE,
  debug = FALSE
)
prune(
  rules,
  txns,
  classitems,
  default_rule_pruning = TRUE,
  rule_window = 50000,
  greedy_pruning = FALSE,
  input_list_sorted_by_length = TRUE,
  debug = FALSE
)

Arguments

`rules`	object of class rules from arules package
`txns`	input object with transactions.
`classitems`	a list of items to appear in the consequent (rhs) of the rules.
`default_rule_pruning`	boolean indicating whether default pruning should be performed. If set to TRUE, default pruning is performed as in the CBA algorithm. If set to FALSE, default pruning is not performed i.e. all rules surviving data coverage pruning are kept. In either case, a default rule is added to the end of the classifier.
`rule_window`	the number of rules to precompute for CBA data coverage pruning. The default value can be adjusted to decrease runtime.
`greedy_pruning`	setting to TRUE activates early stopping condition: pruning will be stopped on first rule on which total error increases.
`input_list_sorted_by_length`	indicates by default that the input rule list is sorted by antecedent length (as output by arules), if this param is set to false, the list will be resorted
`debug`	output debug messages.

Value

Returns an object of class rules. Note that 'rules@quality' slot has been extended with additional measures, specifically 'orderedConf', 'orderedSupp', and 'cumulativeConf'. The rules are output in the order in which they are assumed to be applied in classification. Only the first applicable rule is used to classify the instance. As a result, in addition to rule confidence – which is computed over the whole training dataset – it makes sense to define order-sensitive confidence, which is computed only from instances reaching the given rule as $a/(a+b)$ , where $a$ is the number of instances matching both the antecedent and consequent (available in slot 'orderedSupp') and $b$ is the number of instances matching the antecedent, but not matching the consequent of the given rule. The cumulative confidence is an experimental measure, which is computed as the accuracy of the rule list comprising the given rule and all higher priority rules (rules with lower index) with uncovered instances excluded from the computation.

References

Ma, Bing Liu Wynne Hsu Yiming. Integrating classification and association rule mining. Proceedings of the fourth international conference on knowledge discovery and data mining. 1998.

Examples

 #Example 1
  txns <- as(discrNumeric(datasets::iris, "Species")$Disc.data,"transactions")
  appearance <- getAppearance(datasets::iris,"Species")
  rules <- apriori(txns, parameter = list(confidence = 0.5,
  support= 0.01, minlen= 2, maxlen= 4),appearance = appearance)
  prune(rules,txns, appearance$rhs)
  inspect(rules)

#Example 2
 utils::data(Adult) # this dataset comes with the arules package
 classitems <- c("income=small","income=large")
 rules <- apriori(Adult, parameter = list(supp = 0.3, conf = 0.5,
 target = "rules"), appearance=list(rhs=classitems, default="lhs"))
 # produces 25 rules
 rulesP <- prune(rules,Adult,classitems)
 rulesP@quality # inspect rule quality measured including the new additions
 # Rules after data coverage pruning: 8
 # Performing default rule pruning.
 # Final rule list size:  6
#Example 1
  txns <- as(discrNumeric(datasets::iris, "Species")$Disc.data,"transactions")
  appearance <- getAppearance(datasets::iris,"Species")
  rules <- apriori(txns, parameter = list(confidence = 0.5,
  support= 0.01, minlen= 2, maxlen= 4),appearance = appearance)
  prune(rules,txns, appearance$rhs)
  inspect(rules)

#Example 2
 utils::data(Adult) # this dataset comes with the arules package
 classitems <- c("income=small","income=large")
 rules <- apriori(Adult, parameter = list(supp = 0.3, conf = 0.5,
 target = "rules"), appearance=list(rhs=classitems, default="lhs"))
 # produces 25 rules
 rulesP <- prune(rules,Adult,classitems)
 rulesP@quality # inspect rule quality measured including the new additions
 # Rules after data coverage pruning: 8
 # Performing default rule pruning.
 # Final rule list size:  6

Rule Generation

Description

A wrapper for the apriori method from the arules package that iteratively changes mining parameters until a desired number of rules is obtained, all options are exhausted or a preset time limit is reached. Within the arc package, this function serves as a replacement for the CBA Rule Generation algorithm (Liu et al, 1998) – without pessimistic pruning – with general apriori implementation provided by existing fast R package arules.

Usage

topRules(
  txns,
  appearance = list(),
  target_rule_count = 1000,
  init_support = 0,
  init_conf = 0.5,
  conf_step = 0.05,
  supp_step = 0.05,
  minlen = 2,
  init_maxlen = 3,
  iteration_timeout = 2,
  total_timeout = 100,
  max_iterations = 30,
  trim = TRUE,
  debug = FALSE
)
topRules(
  txns,
  appearance = list(),
  target_rule_count = 1000,
  init_support = 0,
  init_conf = 0.5,
  conf_step = 0.05,
  supp_step = 0.05,
  minlen = 2,
  init_maxlen = 3,
  iteration_timeout = 2,
  total_timeout = 100,
  max_iterations = 30,
  trim = TRUE,
  debug = FALSE
)

Arguments

`txns`	input transactions.
`appearance`	object named list or APappearance object (refer to arules package)
`target_rule_count`	the main stopping criterion, mining stops when the resulting rule set contains this number of rules.
`init_support`	initial support.
`init_conf`	initial confidence.
`conf_step`	confidence will be changed by steps defined by this parameter.
`supp_step`	support will be changed by steps defined by this parameter.
`minlen`	minimum length of rules, minlen=1 corresponds to rule with empty antecedent and one item in consequent. In general, rules with empty antecedent are not desirable for the subsequent pruning algorithm, therefore the value of this parameter should be set at least to value 2.
`init_maxlen`	maximum length of rules, should be equal or higher than minlen. A higher value may decrease the number of iterations to obtain target_rule_count rules, but it also increases the risk of initial combinatorial explosion and subsequent memory crash of the apriori rule learner.
`iteration_timeout`	maximum number of seconds it should take apriori to obtain rules with current configuration/
`total_timeout`	maximum number of seconds the mining should take.
`max_iterations`	maximum number of iterations.
`trim`	if set to TRUE and more than `target_rule_count` is discovered, only first `target_rule_count` rules will be returned.
`debug`	boolean indicating whether to output debug messages.

Value

Returns an object of class rules.

References

Ma, Bing Liu Wynne Hsu Yiming. Integrating classification and association rule mining. Proceedings of the fourth international conference on knowledge discovery and data mining. 1998.

Examples

# Example 1
  utils::data(Adult)
  rules <- topRules(Adult, appearance = list(), target_rule_count = 100,
  init_support = 0.5,init_conf = 0.9, minlen = 1, init_maxlen = 10)

# Example 2
  rules <- topRules(as(discrNumeric(datasets::iris, "Species")$Disc.data,"transactions"),
  getAppearance(datasets::iris,"Species"))

# Example 3
  utils::data(datasets::iris)
  appearance <- list(rhs =  c("Species=setosa", "Species=versicolor",
   "Species=virginica"), default="lhs")
  data <- sapply(datasets::iris,as.factor)
  data <- data.frame(data, check.names=FALSE)
  txns <- as(data,"transactions")
  rules <- topRules(txns, appearance)

# Example 1
  utils::data(Adult)
  rules <- topRules(Adult, appearance = list(), target_rule_count = 100,
  init_support = 0.5,init_conf = 0.9, minlen = 1, init_maxlen = 10)

# Example 2
  rules <- topRules(as(discrNumeric(datasets::iris, "Species")$Disc.data,"transactions"),
  getAppearance(datasets::iris,"Species"))

# Example 3
  utils::data(datasets::iris)
  appearance <- list(rhs =  c("Species=setosa", "Species=versicolor",
   "Species=virginica"), default="lhs")
  data <- sapply(datasets::iris,as.factor)
  data <- data.frame(data, check.names=FALSE)
  txns <- as(data,"transactions")
  rules <- topRules(txns, appearance)

`prediction`	vector with predictions
`groundtruth`	vector with true labels

Package 'arc'

Help Index

Apply Cut Points to Vector

Description

Usage

Arguments

Value

See Also

Examples

Apply Cut Points to Data Frame

Description

Usage

Arguments

Value

See Also

Examples

CBA Classifier

Description

Usage

Arguments

Value

Examples

CBA Classifier from provided rules

Description

Usage

Arguments

Value

Examples

Example CBA Workflow with CSV Input

Description

Usage

Arguments

Value

Examples

Test CBA Workflow on Iris Dataset

Description

Usage

Value

Test CBA Workflow on Iris Dataset with numeric target

Description

Usage

Value

CBARuleModel

Description

Slots

Prediction Accuracy

Description

Usage

Arguments

Value

Unsupervised Discretization

Description

Usage

Arguments

Value

Examples

Discretize Numeric Columns In Data frame

Description

Usage

Arguments

Value

References

Examples

Method that generates items for values in given data frame column.

Description

Usage

Arguments

Value

Examples

Returns vector with confidences for the positive class (useful for ROC or AUC computation)

Description

Usage

Arguments

Value

Examples

Comfort level based on temperature and humidity of the environment

Description

Usage

Format

Details