Title: | Fits a Model that Partitions the Covariate Space into Blocks in a Data- Adaptive Way |
---|---|
Description: | Implements convex regression with interpretable sharp partitions (CRISP), which considers the problem of predicting an outcome variable on the basis of two covariates, using an interpretable yet non-additive model. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 <http://jmlr.org/papers/volume17/15-344/15-344.pdf>. |
Authors: | Ashley Petersen |
Maintainer: | Ashley Petersen <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2025-02-15 04:09:12 UTC |
Source: | https://github.com/cran/crisp |
This package is called crisp for "Convex Regression with Interpretable Sharp Partitions", which considers the problem of predicting an outcome variable on the basis of two covariates, using an interpretable yet non-additive model. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 <http://jmlr.org/papers/volume17/15-344/15-344.pdf>.
The main functions are: (1)crisp
and (2)crispCV
. The first function
crisp
fits CRISP for a sequence of tuning parameters and provides the fits
for this entire sequence of tuning parameters. The second function crispCV
considers
a sequence of tuning parameters and provides the fits, but also returns the optimal tuning parameter,
as chosen using K-fold cross-validation.
## Not run: #general example illustrating all functions #see specific function help pages for details of using each function #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #plot the mean model for the scenario from which we generated data plot(data) #fit model for a range of tuning parameters, i.e., lambda values #lambda sequence is chosen automatically if not specified crisp.out <- crisp(X = data$X, y = data$y) #or fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) #summarize all of the fits summary(crisp.out) #or just summarize a single fit #we examine the fit with an index of 25. that is, lambda of crisp.out$lambda.seq[25] summary(crisp.out, lambda.index = 25) #lastly, we can summarize the fit chosen using cross-validation summary(crispCV.out) #and also plot the cross-validation error plot(summary(crispCV.out)) #the lambda chosen by cross-validation is also available using crispCV.out$lambda.cv #plot the estimated relationships between two predictors and outcome #do this for a specific fit plot(crisp.out, lambda.index = 25) #or for the fit chosen using cross-validation plot(crispCV.out) #we can make predictions for a covariate matrix with new observations #new.X with 20 observations new.data <- sim.data(n = 20, scenario = 2) new.X <- new.data$X #these will give the same predictions: yhat1 <- predict(crisp.out, new.X = new.X, lambda.index = crispCV.out$index.cv) yhat2 <- predict(crispCV.out, new.X = new.X) ## End(Not run)
## Not run: #general example illustrating all functions #see specific function help pages for details of using each function #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #plot the mean model for the scenario from which we generated data plot(data) #fit model for a range of tuning parameters, i.e., lambda values #lambda sequence is chosen automatically if not specified crisp.out <- crisp(X = data$X, y = data$y) #or fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) #summarize all of the fits summary(crisp.out) #or just summarize a single fit #we examine the fit with an index of 25. that is, lambda of crisp.out$lambda.seq[25] summary(crisp.out, lambda.index = 25) #lastly, we can summarize the fit chosen using cross-validation summary(crispCV.out) #and also plot the cross-validation error plot(summary(crispCV.out)) #the lambda chosen by cross-validation is also available using crispCV.out$lambda.cv #plot the estimated relationships between two predictors and outcome #do this for a specific fit plot(crisp.out, lambda.index = 25) #or for the fit chosen using cross-validation plot(crispCV.out) #we can make predictions for a covariate matrix with new observations #new.X with 20 observations new.data <- sim.data(n = 20, scenario = 2) new.X <- new.data$X #these will give the same predictions: yhat1 <- predict(crisp.out, new.X = new.X, lambda.index = crispCV.out$index.cv) yhat2 <- predict(crispCV.out, new.X = new.X) ## End(Not run)
This function implements CRISP, which considers the problem of predicting an outcome variable on the basis of two covariates, using an interpretable yet non-additive model. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 <http://jmlr.org/papers/volume17/15-344/15-344.pdf>.
crisp(y, X, q = NULL, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL, rho = 0.1, e_abs = 10^-4, e_rel = 10^-3, varyrho = TRUE, double.run = FALSE)
crisp(y, X, q = NULL, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL, rho = 0.1, e_abs = 10^-4, e_rel = 10^-3, varyrho = TRUE, double.run = FALSE)
y |
An n-vector containing the response. |
X |
An n x 2 matrix with each column containing a covariate. |
q |
The desired granularity of the CRISP fit, |
lambda.min.ratio |
The smallest value for |
n.lambda |
The number of lambda values to consider - the default is 50. |
lambda.seq |
A user-supplied sequence of positive lambda values to consider. The typical usage is to calculate
|
rho |
The penalty parameter for our ADMM algorithm. The default is 0.1. |
e_abs , e_rel
|
Values used in the stopping criterion for our ADMM algorithm, and discussed in Appendix C.2 of the CRISP paper. |
varyrho |
Should |
double.run |
The initial complete run of our ADMM algorithm will yield sparsity in z_1i and z_2i, but not
necessarily exact equality of the rows and columns of |
An object of class crisp
, which can be summarized using summary
, plotted using plot
, and used to predict outcome values for new covariates using predict
.
M.hat.list
: A list of length n.lambda
giving M.hat
for each value of lambda.seq
.
num.blocks
: A vector of length n.lambda
giving the number of blocks in M.hat
for each value of lambda.seq
.
obj.vec
: A vector of length n.lambda
giving the value of the objective of Eqn (4) in the CRISP paper for each value of lambda.seq
.
Other elements: As specified by the user.
crispCV
, plot
, summary
, predict
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model for a range of tuning parameters, i.e., lambda values #lambda sequence is chosen automatically if not specified crisp.out <- crisp(X = data$X, y = data$y) ## End(Not run)
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model for a range of tuning parameters, i.e., lambda values #lambda sequence is chosen automatically if not specified crisp.out <- crisp(X = data$X, y = data$y) ## End(Not run)
This function implements CRISP, which considers the problem of predicting an outcome variable on the basis of two covariates, using an interpretable yet non-additive model.
CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods,
CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. This function differs
from the crisp
function in that the tuning parameter, lambda, is automatically selected using K-fold cross-validation.
More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 <http://jmlr.org/papers/volume17/15-344/15-344.pdf>.
crispCV(y, X, q = NULL, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL, fold = NULL, n.fold = NULL, seed = NULL, within1SE = FALSE, rho = 0.1, e_abs = 10^-4, e_rel = 10^-3, varyrho = TRUE, double.run = FALSE)
crispCV(y, X, q = NULL, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL, fold = NULL, n.fold = NULL, seed = NULL, within1SE = FALSE, rho = 0.1, e_abs = 10^-4, e_rel = 10^-3, varyrho = TRUE, double.run = FALSE)
y |
An n-vector containing the response. |
X |
An n x 2 matrix with each column containing a covariate. |
q |
The desired granularity of the CRISP fit, |
lambda.min.ratio |
The smallest value for |
n.lambda |
The number of lambda values to consider - the default is 50. |
lambda.seq |
A user-supplied sequence of positive lambda values to consider. The typical usage is to calculate
|
fold |
User-supplied fold numbers for cross-validation. If supplied, |
n.fold |
The number of folds, K, to use for the K-fold cross-validation selection of the tuning parameter, lambda. The default is 10 - specification of |
seed |
An optional number used with |
within1SE |
Logical value indicating how cross-validated tuning parameters should be chosen. If |
rho |
The penalty parameter for our ADMM algorithm. The default is 0.1. |
e_abs , e_rel
|
Values used in the stopping criterion for our ADMM algorithm, and discussed in Appendix C.2 of the CRISP paper. |
varyrho |
Should |
double.run |
The initial complete run of our ADMM algorithm will yield sparsity in z_1i and z_2i, but not
necessarily exact equality of the rows and columns of |
An object of class crispCV
, which can be summarized using summary
, plotted using plot
, and used to predict outcome values for new covariates using predict
.
lambda.cv
: Optimal lambda value chosen by K-fold cross-validation.
index.cv
: The index of the model corresponding to the chosen tuning parameter, lambda.cv
. That is, lambda.cv=crisp.out$lambda.seq[index.cv]
.
crisp.out
: An object of class crisp
returned by crisp
.
mean.cv.error
: An m-vector containing cross-validation error where m is the length of lambda.seq
. Note that mean.cv.error[i]
contains the cross-validation error for the tuning parameter crisp.out$lambda.seq[i]
.
se.cv.error
: An m-vector containing cross-validation standard error where m is the length of lambda.seq
. Note that se.cv.error[i]
contains the standard error of the cross-validation error for the tuning parameter crisp.out$lambda.seq[i]
.
Other elements: As specified by the user.
crisp
, plot
, summary
, predict
, plot.cvError
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) ## End(Not run)
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) ## End(Not run)
crisp
or crispCV
.This function plots fit of the class crispCV
, or class crisp
with a user-specified tuning parameter.
## S3 method for class 'crisp' plot(x, lambda.index, title = NULL, x1lab = NULL, x2lab = NULL, min = NULL, max = NULL, cex.axis = 1, cex.lab = 1, color1 = "seagreen1", color2 = "steelblue1", color3 = "darkorchid4", ...) ## S3 method for class 'crispCV' plot(x, title = NULL, x1lab = NULL, x2lab = NULL, min = NULL, max = NULL, cex.axis = 1, cex.lab = 1, color1 = "seagreen1", color2 = "steelblue1", color3 = "darkorchid4", ...)
## S3 method for class 'crisp' plot(x, lambda.index, title = NULL, x1lab = NULL, x2lab = NULL, min = NULL, max = NULL, cex.axis = 1, cex.lab = 1, color1 = "seagreen1", color2 = "steelblue1", color3 = "darkorchid4", ...) ## S3 method for class 'crispCV' plot(x, title = NULL, x1lab = NULL, x2lab = NULL, min = NULL, max = NULL, cex.axis = 1, cex.lab = 1, color1 = "seagreen1", color2 = "steelblue1", color3 = "darkorchid4", ...)
x |
An object of class |
lambda.index |
The index for the desired value of lambda, i.e., |
title |
The title of the plot. By default, the value of lambda is noted. |
x1lab |
The axis label for the first covariate. By default, it is "X1". |
x2lab |
The axis label for the second covariate. By default, it is "X2". |
min , max
|
The minimum and maximum y-values, respectively, to use when plotting the fit. By default, they are chosen to be the minimum and maximum of all of the fits, i.e., the minimum and maximum of |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of |
cex.lab |
The magnification to be used for x and y labels relative to the current setting of |
color1 , color2 , color3
|
The colors to use to create the color gradient for plotting the response values. At least the first two must be specified, or the defaults of |
... |
Additional arguments to be passed, which are ignored in this function. |
None.
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model for a range of tuning parameters, i.e., lambda values #lambda sequence is chosen automatically if not specified crisp.out <- crisp(X = data$X, y = data$y) #or fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) #plot the estimated relationships between two predictors and outcome #do this for a specific fit plot(crisp.out, lambda.index = 25) #or for the fit chosen using cross-validation plot(crispCV.out) ## End(Not run)
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model for a range of tuning parameters, i.e., lambda values #lambda sequence is chosen automatically if not specified crisp.out <- crisp(X = data$X, y = data$y) #or fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) #plot the estimated relationships between two predictors and outcome #do this for a specific fit plot(crisp.out, lambda.index = 25) #or for the fit chosen using cross-validation plot(crispCV.out) ## End(Not run)
crispCV
.This function plots the cross-validation curve for a series of models fit using crispCV
. The cross-validation error with +/-1 standard error is plotted for each value of lambda considered in the call to crispCV
with a dotted vertical line indicating the chosen lambda.
## S3 method for class 'cvError' plot(x, showSE = T, ...)
## S3 method for class 'cvError' plot(x, showSE = T, ...)
x |
An object of class |
showSE |
A logical value indicating whether the standard error of the curve should be plotted. |
... |
Additional arguments to be passed, which are ignored in this function. |
None.
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) #plot the cross-validation error plot(summary(crispCV.out)) ## End(Not run)
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) #plot the cross-validation error plot(summary(crispCV.out)) ## End(Not run)
This function plots the mean model for the scenario from which data was generated using sim.data
.
## S3 method for class 'sim.data' plot(x, ...)
## S3 method for class 'sim.data' plot(x, ...)
x |
An object of class |
... |
Additional arguments to be passed, which are ignored in this function. |
None.
#See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #plot the mean model for the scenario from which we generated data plot(data)
#See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #plot the mean model for the scenario from which we generated data plot(data)
crisp
or crispCV
.This function makes predictions for a specified covariate matrix for a fit of the class crispCV
, or class crisp
with a user-specified tuning parameter.
## S3 method for class 'crisp' predict(object, new.X, lambda.index, ...) ## S3 method for class 'crispCV' predict(object, new.X, ...)
## S3 method for class 'crisp' predict(object, new.X, lambda.index, ...) ## S3 method for class 'crispCV' predict(object, new.X, ...)
object |
An object of class |
new.X |
The covariate matrix for which to make predictions. |
lambda.index |
The index for the desired value of lambda, i.e., |
... |
Additional arguments to be passed, which are ignored in this function. |
The ith prediction is made to be the value of object$M.hat.list[[lambda.index]]
corresponding to the pair of covariates closest (in Euclidean distance) to new.X[i,]
.
A vector containing the fitted y values for new.X
.
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model for a range of tuning parameters, i.e., lambda values #lambda sequence is chosen automatically if not specified crisp.out <- crisp(X = data$X, y = data$y) #or fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) #we can make predictions for a covariate matrix with new observations #new.X with 20 observations new.data <- sim.data(n = 20, scenario = 2) new.X <- new.data$X #these will give the same predictions: yhat1 <- predict(crisp.out, new.X = new.X, lambda.index = crispCV.out$index.cv) yhat2 <- predict(crispCV.out, new.X = new.X) ## End(Not run)
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model for a range of tuning parameters, i.e., lambda values #lambda sequence is chosen automatically if not specified crisp.out <- crisp(X = data$X, y = data$y) #or fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) #we can make predictions for a covariate matrix with new observations #new.X with 20 observations new.data <- sim.data(n = 20, scenario = 2) new.X <- new.data$X #these will give the same predictions: yhat1 <- predict(crisp.out, new.X = new.X, lambda.index = crispCV.out$index.cv) yhat2 <- predict(crispCV.out, new.X = new.X) ## End(Not run)
crisp
.This function generates data according to the simulation scenarios considered in Section 3 of the CRISP paper (and plotted in Figure 2 of the paper).
sim.data(n, scenario, noise = 1, X = NULL)
sim.data(n, scenario, noise = 1, X = NULL)
n |
The number of observations. |
scenario |
The simulation scenario to use. Options are 1 (additive model), 2 (interaction model), 3 ('tetris' model), or 4 (smooth model), which correspond to the simulation scenarios of Section 3 of the CRISP paper. Each scenario has two covariates. |
noise |
The standard deviation of the normally-distributed noise that is added to the signal. |
X |
The |
A list containing:
X
: An n
x 2 covariate matrix.
y
: An n
-vector containing the response values.
Other elements: As specified by the user.
#See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #plot the mean model for the scenario from which we generated data plot(data)
#See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #plot the mean model for the scenario from which we generated data plot(data)
crisp
or crispCV
.This function summarizes fit of the class crispCV
or crisp
.
## S3 method for class 'crisp' summary(object, lambda.index = NULL, ...) ## S3 method for class 'crispCV' summary(object, ...)
## S3 method for class 'crisp' summary(object, lambda.index = NULL, ...) ## S3 method for class 'crispCV' summary(object, ...)
object |
An object of class |
lambda.index |
The index for the desired value of lambda, i.e., |
... |
Additional arguments to be passed, which are ignored in this function. |
None.
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model for a range of tuning parameters, i.e., lambda values #lambda sequence is chosen automatically if not specified crisp.out <- crisp(X = data$X, y = data$y) #or fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) #summarize all of the fits summary(crisp.out) #or just summarize a single fit #we examine the fit with an index of 25. that is, lambda of crisp.out$lambda.seq[25] summary(crisp.out, lambda.index = 25) #lastly, we can summarize the fit chosen using cross-validation summary(crispCV.out) ## End(Not run)
## Not run: #See ?'crisp-package' for a full example of how to use this package #generate data (using a very small 'n' for illustration purposes) set.seed(1) data <- sim.data(n = 15, scenario = 2) #fit model for a range of tuning parameters, i.e., lambda values #lambda sequence is chosen automatically if not specified crisp.out <- crisp(X = data$X, y = data$y) #or fit model and select lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2) #summarize all of the fits summary(crisp.out) #or just summarize a single fit #we examine the fit with an index of 25. that is, lambda of crisp.out$lambda.seq[25] summary(crisp.out, lambda.index = 25) #lastly, we can summarize the fit chosen using cross-validation summary(crispCV.out) ## End(Not run)