Package 'superpc' reference manual

Title:	Supervised Principal Components
Description:	Does prediction in the case of a censored survival outcome, or a regression outcome, using the "supervised principal component" approach. 'Superpc' is especially useful for high-dimensional data when the number of features p dominates the number of samples n (p >> n paradigm), as generated, for instance, by high-throughput technologies.
Authors:	Eric Bair [aut], Jean-Eudes Dazard [cre, ctb], Rob Tibshirani [ctb]
Maintainer:	Jean-Eudes Dazard <[email protected]>
License:	GPL (>= 3) \| file LICENSE
Version:	1.12
Built:	2025-03-01 02:48:50 UTC
Source:	https://github.com/jedazard/superpc

Cross-validation for supervised principal components

Description

This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components

Usage

    superpc.cv(fit,
               data, 
               n.threshold=20,
               n.fold=NULL,
               folds=NULL,
               n.components=3, 
               min.features=5, 
               max.features=nrow(data$x),
               compute.fullcv= TRUE,
               compute.preval=TRUE, 
               xl.mode=c("regular","firsttime","onetime","lasttime"), 
               xl.time=NULL,
               xl.prevfit=NULL)
superpc.cv(fit,
               data, 
               n.threshold=20,
               n.fold=NULL,
               folds=NULL,
               n.components=3, 
               min.features=5, 
               max.features=nrow(data$x),
               compute.fullcv= TRUE,
               compute.preval=TRUE, 
               xl.mode=c("regular","firsttime","onetime","lasttime"), 
               xl.time=NULL,
               xl.prevfit=NULL)

Arguments

`fit`	Object returned by superpc.train
`data`	Data object of form described in superpc.train documentation
`n.threshold`	Number of thresholds to consider. Default 20.
`n.fold`	Number of cross-validation folds. default is around 10 (program pick a convenient value based on the sample size
`folds`	List of indices of cross-validation folds (optional)
`n.components`	Number of cross-validation components to use: 1,2 or 3.
`min.features`	Minimum number of features to include in determining range for threshold. Default 5.
`max.features`	Maximum number of features to include in determining range for threshold. Default is total number of features in the dataset
`compute.fullcv`	Should full cross-validation be done?
`compute.preval`	Should full pre-validation be done?
`xl.mode`	Used by Excel interface only
`xl.time`	Used by Excel interface only
`xl.prevfit`	Used by Excel interface only

Details

This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components. To avoid prolems with fitting Cox models to samll validation datastes, it uses the "pre-validation" approach of Tibshirani and Efron (2002)

Value

`threshold`	Vector of thresholds considered
`nonzero`	Number of features exceeding each value of the threshold
`scor.preval`	Likelihood ratio scores from pre-validation
`scor`	Full CV scores
`folds`	Indices of CV folds used
`featurescores.folds`	Feature scores for each fold
`v.preval`	The pre-validated predictors
`type`	problem type
`call`	calling sequence

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

## Not run: 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.cv(a, data)

## End(Not run)
## Not run: 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.cv(a, data)

## End(Not run)

Decorrelate features with respect to competing predictors

Description

Fits a linear model to the features as a function of some competing predictors. Replaces the features by the residual from this fit. These "decorrelated" features are then used in the superpc model building process, to explicitly look for predictors that are independent of the competing predictors. Useful for example, when the competing predictors are clinical predictors like stage, grade etc.

Usage

    superpc.decorrelate(x, 
                        competing.predictors) 
superpc.decorrelate(x, 
                        competing.predictors)

Arguments

`x`	matrix of features. Different features in different rows, one observation per column
`competing.predictors`	List of one or more competing predictors. Discrete predictors should be factors

Value

Returns lm (linear model) fit of rows of x on compeiting predictors.

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
competing.predictors <- list(pred1=rnorm(30), 
                             pred2=as.factor(sample(c(1,2), 
                                             replace=TRUE, 
                                             size=30)))

#decorrelate x. Remember to decorrelate test data in the same way, before making predictions.
foo <- superpc.decorrelate(x, competing.predictors)
xnew <- t(foo$res)

#now use xnew in superpc
data <- list(x=xnew, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
a <- superpc.train(data, type="survival")

#etc. 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
competing.predictors <- list(pred1=rnorm(30), 
                             pred2=as.factor(sample(c(1,2), 
                                             replace=TRUE, 
                                             size=30)))

#decorrelate x. Remember to decorrelate test data in the same way, before making predictions.
foo <- superpc.decorrelate(x, competing.predictors)
xnew <- t(foo$res)

#now use xnew in superpc
data <- list(x=xnew, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
a <- superpc.train(data, type="survival")

#etc.

Fit predictive model using outcome of supervised principal components

Description

Fit predictive model using outcome of supervised principal components, via either coxph (for surival data) or lm (for regression data)

Usage

    superpc.fit.to.outcome(fit, 
                           data.test, 
                           score, 
                           competing.predictors=NULL, 
                           print=TRUE, 
                           iter.max=5)
superpc.fit.to.outcome(fit, 
                           data.test, 
                           score, 
                           competing.predictors=NULL, 
                           print=TRUE, 
                           iter.max=5)

Arguments

`fit`	Object returned by superpc.train.
`data.test`	Data object for prediction. Same form as data object documented in superpc.train.
`score`	Supervised principal component score, from superpc.predict.
`competing.predictors`	Optional - a list of competing predictors to be included in the model.
`print`	Should a summary of the fit be printed? Default TRUE.
`iter.max`	Max number of iterations used in predictive model fit. Default 5. Currently only relevant for Cox PH model.

Value

Returns summary of coxph or lm fit.

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
fit <- superpc.predict(a, 
                       data, 
                       data.test, 
                       threshold=1.0, 
                       n.components=1, 
                       prediction.type="continuous")
superpc.fit.to.outcome(a, 
                       data, 
                       fit$v.pred)
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
fit <- superpc.predict(a, 
                       data, 
                       data.test, 
                       threshold=1.0, 
                       n.components=1, 
                       prediction.type="continuous")
superpc.fit.to.outcome(a, 
                       data, 
                       fit$v.pred)

Return a list of the important predictors

Description

Return a list of the important predictor

Usage

    superpc.listfeatures(data, 
                         train.obj, 
                         fit.red, 
                         fitred.cv=NULL,
                         num.features=NULL, 
                         component.number=1)
superpc.listfeatures(data, 
                         train.obj, 
                         fit.red, 
                         fitred.cv=NULL,
                         num.features=NULL, 
                         component.number=1)

Arguments

`data`	Data object
`train.obj`	Object returned by superpc.train
`fit.red`	Object returned by superpc.predict.red, applied to training set
`fitred.cv`	(Optional) object returned by superpc.predict.red.cv
`num.features`	Number of features to list. Default is all features.
`component.number`	Number of principal component (1,2, or 3) used to determine feature importance scores

Value

Returns matrix of features and their importance scores, in order of decreasing absolute value of importance score. The importance score is the correlation of the reduced predictor and the full supervised PC predictor. It also lists the raw score- for survival data, this is the Cox score for that feature; for regression, it is the standardized regression coefficient. If fitred.cv is supplied, the function also reports the average rank of the gene in the cross-validation folds, and the proportion of times that the gene is chosen (at the given threshold) in the cross-validation folds.

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
fit.red <- superpc.predict.red(a, 
                               data, 
                               data.test, 
                               .6)
superpc.listfeatures(data, 
                     a,  
                     fit.red, 
                     num.features=10)
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
fit.red <- superpc.predict.red(a, 
                               data, 
                               data.test, 
                               .6)
superpc.listfeatures(data, 
                     a,  
                     fit.red, 
                     num.features=10)

Compute values of likelihood ratio test from supervised principal components fit

Description

Compute values of likelihood ratio test from supervised principal components fit

Usage

    superpc.lrtest.curv(object, 
                        data, 
                        newdata, 
                        n.components=1, 
                        threshold=NULL, 
                        n.threshold=20)
superpc.lrtest.curv(object, 
                        data, 
                        newdata, 
                        n.components=1, 
                        threshold=NULL, 
                        n.threshold=20)

Arguments

`object`	Object returned by superpc.train.
`data`	List of training data, of form described in superpc.train documentation.
`newdata`	List of test data; same form as training data.
`n.components`	Number of principal components to compute. Should be 1,2 or 3.
`threshold`	Set of thresholds for scores; default is n.threshold values equally spaced over the range of the feature scores.
`n.threshold`	Number of thresholds to use; default 20. Should be 1,2 or 3.

Value

`lrtest`	Values of likelihood ratio test statistic
`comp2`	Description of 'comp2'
`threshold`	Thresholds used
`num.features`	Number of features exceeding threshold
`type`	Type of outcome variable
`call`	calling sequence

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.lrtest.curv(a, data, data.test)
#superpc.plot.lrtest(aa)
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.lrtest.curv(a, data, data.test)
#superpc.plot.lrtest(aa)

Display the superpc Package News

Description

Function to display the log file NEWS of updates of the superpc package.

Usage

    superpc.news(...)
superpc.news(...)

Arguments

...

Further arguments passed to or from other methods.

Value

None.

Note

End-user function.

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Plot likelhiood ratio test statistics

Description

Plot likelhiood ratio test statistics from output of superpc.predict

Usage

    superpc.plot.lrtest(object.lrtestcurv, 
                        call.win.metafile=FALSE)
superpc.plot.lrtest(object.lrtestcurv, 
                        call.win.metafile=FALSE)

Arguments

`object.lrtestcurv`	Output from superpc.lrtest.curv
`call.win.metafile`	For use by PAM Excel interface

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
bb <- superpc.lrtest.curv(a, 
                          data, 
                          data.test)
superpc.plot.lrtest(bb)
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
bb <- superpc.lrtest.curv(a, 
                          data, 
                          data.test)
superpc.plot.lrtest(bb)

Plot output from superpc.cv

Description

Plots pre-validation results from plotcv, to aid in choosing best threshold

Usage

    superpc.plotcv(object, 
                   cv.type=c("full","preval"),
                   smooth=TRUE, 
                   smooth.df=10, 
                   call.win.metafile=FALSE, ...)
superpc.plotcv(object, 
                   cv.type=c("full","preval"),
                   smooth=TRUE, 
                   smooth.df=10, 
                   call.win.metafile=FALSE, ...)

Arguments

`object`	Object returned by superpc.cv.
`cv.type`	Type of cross-validation used - "full" (Default; this is "standard" cross-validation; recommended) and "preval"- pre-validation.
`smooth`	Should plot be smoothed? Only relevant to "preval". Default FALSE.
`smooth.df`	Degrees of freedom for smooth.spline, default 10. If NULL, then degrees of freedom is estimated by cross-validation.
`call.win.metafile`	Ignore: for use by PAM Excel program.
`...`	Additional plotting args to be passed to matplot.

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

## Not run: 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.cv(a,data)

superpc.plotcv(aa)

## End(Not run)
## Not run: 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.cv(a,data)

superpc.plotcv(aa)

## End(Not run)

Plot likelihood ratio test statistics from supervised principal components predictor

Description

Plot likelihood ratio test statistics from supervised principal components predictor

Usage

    superpc.plotred.lrtest(object.lrtestred, 
                           call.win.metafile=FALSE)
superpc.plotred.lrtest(object.lrtestred, 
                           call.win.metafile=FALSE)

Arguments

`object.lrtestred`	Output from either superpc.predict.red or superpc.predict.redcv
`call.win.metafile`	Used only by PAM Excel interface call to function

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

## Not run: 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.cv(a, data)
fit.red <- superpc.predict.red(a, 
                               data, 
                               data.test, 
                               .6)
fit.redcv <- superpc.predict.red.cv(fit.red, 
                                    aa, 
                                    data, 
                                    .6)
superpc.plotred.lrtest(fit.redcv)

## End(Not run)
## Not run: 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.cv(a, data)
fit.red <- superpc.predict.red(a, 
                               data, 
                               data.test, 
                               .6)
fit.redcv <- superpc.predict.red.cv(fit.red, 
                                    aa, 
                                    data, 
                                    .6)
superpc.plotred.lrtest(fit.redcv)

## End(Not run)

Form principal components predictor from a trained superpc object

Description

Computes supervised principal components, using scores from "object"

Usage

    superpc.predict(object, 
                    data, 
                    newdata, 
                    threshold, 
                    n.components=3, 
                    prediction.type=c("continuous","discrete","nonzero"), 
                    n.class=2)
superpc.predict(object, 
                    data, 
                    newdata, 
                    threshold, 
                    n.components=3, 
                    prediction.type=c("continuous","discrete","nonzero"), 
                    n.class=2)

Arguments

`object`	Obect returned by superpc.train
`data`	List of training data, of form described in superpc.train documentation,
`newdata`	List of test data; same form as training data
`threshold`	Threshold for scores: features with abs(score) > threshold are retained.
`n.components`	Number of principal components to compute. Should be 1,2 or 3.
`prediction.type`	"continuous" for raw principal component(s); "discrete" for principal component categorized in equal bins; "nonzero" for indices of features that pass the threshold
`n.class`	Number of classes into which predictor is binned (for prediction.type="discrete"

Value

`v.pred`	Supervised principal componients predictor
`u`	U matrix from svd of feature matrix x
`d`	singual values from svd of feature matrix x
`which.features`	Indices of features exceeding threshold
`n.components`	Number of supervised principal components requested
`call`	calling sequence

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
fit <- superpc.predict(a, 
                       data, 
                       data.test, 
                       threshold=1.0, 
                       n.components=1)
plot(fit$v.pred, ytest)
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
fit <- superpc.predict(a, 
                       data, 
                       data.test, 
                       threshold=1.0, 
                       n.components=1)
plot(fit$v.pred, ytest)

Feature selection for supervised principal components

Description

Forms reduced models to approximate the supervised principal component predictor.

Usage

    superpc.predict.red(fit, 
                        data, 
                        data.test, 
                        threshold, 
                        n.components=3, 
                        n.shrinkage=20, 
                        shrinkages=NULL,
                        compute.lrtest=TRUE,
                        sign.wt="both",
                        prediction.type=c("continuous", "discrete"), 
                        n.class=2)
                        
superpc.predict.red(fit, 
                        data, 
                        data.test, 
                        threshold, 
                        n.components=3, 
                        n.shrinkage=20, 
                        shrinkages=NULL,
                        compute.lrtest=TRUE,
                        sign.wt="both",
                        prediction.type=c("continuous", "discrete"), 
                        n.class=2)

Arguments

`fit`	Object returned by superpc.train
`data`	Training data object, of form described in superpc.train dcoumentation
`data.test`	Test data object; same form as train
`threshold`	Feature score threshold; usually estimated from superpc.cv
`n.components`	Number of principal components to examine; should equal 1,2, etc up to the number of components used in training
`n.shrinkage`	Number of shrinkage values to consider. Default 20.
`shrinkages`	Shrinkage values to consider. Default NULL.
`compute.lrtest`	Should the likelihood ratio test be computed? Default TRUE
`sign.wt`	Signs of feature weights allowed: "both", "pos", or "neg"
`prediction.type`	Type of prediction: "continuous" (Default) or "discrete". In the latter, superprc score is divided into n.class groups
`n.class`	Number of groups for discrete predictor. Default 2.

Details

Soft-thresholding by each of the "shrinkages" values is applied to the PC loadings. This reduce the number of features used in the model. The reduced predictor is then used in place of the supervised PC predictor.

Value

`shrinkages`	Shrinkage values used
`lrtest.reduced`	Likelihood ratio tests for reduced models
`num.features`	Number of features used in each reduced model
`feature.list`	List of features used in each reduced model
`coef`	Least squares coefficients for each reduced model
`import`	Importance scores for features
`wt`	Weight for each feature, in constructing the reduced predictor
`v.test`	Outcome predictor from reduced models. Array of n.shrinkage by (number of test observations)
`v.test.1df`	Outcome combined predictor from reduced models. Array of n.shrinkage by (number of test observations)
`n.components`	Number of principal components used
`type`	Type of outcome
`call`	calling sequence

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x,
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
fit.red <- superpc.predict.red(a,
                               data, 
                               data.test, 
                               threshold=.6)
superpc.plotred.lrtest(fit.red)
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x,
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
fit.red <- superpc.predict.red(a,
                               data, 
                               data.test, 
                               threshold=.6)
superpc.plotred.lrtest(fit.red)

Cross-validation of feature selection for supervised principal components

Description

Applies superpc.predict.red to cross-validation folds generates in superpc.cv. Uses the output to evaluate reduced models, and compare them to the full supervised principal components predictor.

Usage

    superpc.predict.red.cv(fitred, 
                           fitcv, 
                           data, 
                           threshold, 
                           sign.wt="both")
superpc.predict.red.cv(fitred, 
                           fitcv, 
                           data, 
                           threshold, 
                           sign.wt="both")

Arguments

`fitred`	Output of superpc.predict.red
`fitcv`	Output of superpc.cv
`data`	Training data object
`threshold`	Feature score threshold; usually estimated from superpc.cv
`sign.wt`	Signs of feature weights allowed: "both", "pos", or "neg"

Value

`lrtest.reduced`	Likelihood ratio tests for reduced models
`components`	Number of supervised principal components used
`v.preval.red`	Outcome predictor from reduced models. Array of num.reduced.models by (number of test observations)
`type`	Type of outcome
`call`	calling sequence

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

## Not run: 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*20), ncol=20)
y <- 10 + svd(x[1:10,])$v[,1] + .1*rnorm(20)
ytest <- 10 + svd(x[1:10,])$v[,1] + .1*rnorm(20)
censoring.status <- sample(c(rep(1,15), rep(0,5)))
censoring.status.test <- sample(c(rep(1,15), rep(0,5)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x,
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.cv(a, data)
fit.red <- superpc.predict.red(a,
                               data, 
                               data.test, 
                               threshold=.6)
fit.redcv <- superpc.predict.red.cv(fit.red, 
                                    aa, 
                                    data, 
                                    threshold=.6)

## End(Not run)
## Not run: 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*20), ncol=20)
y <- 10 + svd(x[1:10,])$v[,1] + .1*rnorm(20)
ytest <- 10 + svd(x[1:10,])$v[,1] + .1*rnorm(20)
censoring.status <- sample(c(rep(1,15), rep(0,5)))
censoring.status.test <- sample(c(rep(1,15), rep(0,5)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x,
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.cv(a, data)
fit.red <- superpc.predict.red(a,
                               data, 
                               data.test, 
                               threshold=.6)
fit.redcv <- superpc.predict.red.cv(fit.red, 
                                    aa, 
                                    data, 
                                    threshold=.6)

## End(Not run)

Plot outcome predictions from superpc

Description

Plots outcome predictions from superpc

Usage

    superpc.predictionplot(train.obj, 
                           data, 
                           data.test,
                           threshold, 
                           n.components=3,
                           n.class=2, 
                           shrinkage=NULL, 
                           call.win.metafile=FALSE)
superpc.predictionplot(train.obj, 
                           data, 
                           data.test,
                           threshold, 
                           n.components=3,
                           n.class=2, 
                           shrinkage=NULL, 
                           call.win.metafile=FALSE)

Arguments

`train.obj`	Object returned by superpc.train
`data`	List of training data, of form described in superpc.train documentation
`data.test`	List of test data; same form as training data
`threshold`	Threshold for scores: features with abs(score) > threshold are retained.
`n.components`	Number of principal components to compute. Should be 1,2 or 3.
`n.class`	Number of classes for survival stratification. Only applicable for survival data. Default 2.
`shrinkage`	Shrinkage to be applied to feature loadings. Default is NULL, meaning no shrinkage
`call.win.metafile`	Used only by Excel interface call to function

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
superpc.predictionplot(a, 
                       data, 
                       data.test, 
                       threshold=1)
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
superpc.predictionplot(a, 
                       data, 
                       data.test, 
                       threshold=1)

Make rainbow plot of superpc and compeiting predictors

Description

Makes a heatmap display of outcome predictions from superpc, along with expected survival time, and values of competing predictors.

Usage

    superpc.rainbowplot(data, 
                        pred, 
                        sample.labels,
                        competing.predictors,
                        call.win.metafile=FALSE)
superpc.rainbowplot(data, 
                        pred, 
                        sample.labels,
                        competing.predictors,
                        call.win.metafile=FALSE)

Arguments

`data`	List of (test) data, of form described in superpc.train documentation
`pred`	Superpc score from superpc.predict or superpc.predict.red
`sample.labels`	Vector of sample labels of test data
`competing.predictors`	List of competing predictors to be plotted
`call.win.metafile`	Used only by Excel interface call to function

Details

Any censored survival times are estimated by E(T|T > C), where $C$ is the observed censoring time and the Kaplan-Meier estimate from the training set is used to estimate the expectation.

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
competing.predictors.test <- list(pred1=rnorm(30), 
                                  pred2=as.factor(sample(c(1,2),
                                                  replace=TRUE,
                                                  size=30)))
                                                  
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)
sample.labels <- paste("te", as.character(1:20), sep="")

a <- superpc.train(data, type="survival")
pred <- superpc.predict(a, 
                        data, 
                        data.test, 
                        threshold=.25, 
                        n.components=1)$v.pred
superpc.rainbowplot(data, 
                    pred, 
                    sample.labels, 
                    competing.predictors=competing.predictors.test)
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
competing.predictors.test <- list(pred1=rnorm(30), 
                                  pred2=as.factor(sample(c(1,2),
                                                  replace=TRUE,
                                                  size=30)))
                                                  
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)
sample.labels <- paste("te", as.character(1:20), sep="")

a <- superpc.train(data, type="survival")
pred <- superpc.predict(a, 
                        data, 
                        data.test, 
                        threshold=.25, 
                        n.components=1)$v.pred
superpc.rainbowplot(data, 
                    pred, 
                    sample.labels, 
                    competing.predictors=competing.predictors.test)

Prediction by supervised principal components

Description

Does prediction of a quantitative regression or survival outcome, by the supervised principal components method.

Usage

    superpc.train(data, 
                  type=c("survival", "regression"), 
                  s0.perc=NULL)
superpc.train(data, 
                  type=c("survival", "regression"), 
                  s0.perc=NULL)

Arguments

`data`	Data object with components x- p by n matrix of features, one observation per column; y- n-vector of outcome measurements; censoring.status- n-vector of censoring censoring.status (1= died or event occurred, 0=survived, or event was censored), needed for a censored survival outcome
`type`	Problem type: "survival" for censored survival outcome, or "regression" for simple quantitative outcome
`s0.perc`	Factor for denominator of score statistic, between 0 and 1: the percentile of standard deviation values added to the denominator. Default is 0.5 (the median)

Details

Compute wald scores for each feature (gene), for later use in superpc.predict and superpc.cv

Value

`feature.scores`	Score for each feature (gene)
`type`	problem type
`s0.perc`	Factor for denominator of score statistic
`call`	calling sequence

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)

a <- superpc.train(data, type="survival")
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)

a <- superpc.train(data, type="survival")

Package 'superpc'

Help Index

Cross-validation for supervised principal components

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Decorrelate features with respect to competing predictors

Description

Usage

Arguments

Value

Author(s)

References

Examples

Fit predictive model using outcome of supervised principal components

Description

Usage

Arguments

Value

Author(s)

References

Examples

Return a list of the important predictors

Description

Usage

Arguments

Value

Author(s)

References

Examples

Compute values of likelihood ratio test from supervised principal components fit

Description

Usage

Arguments

Value

Author(s)

References

Examples

Display the superpc Package News

Description

Usage

Arguments

Value

Note

Author(s)

References

Plot likelhiood ratio test statistics

Description

Usage

Arguments

Author(s)

References

Examples

Plot output from superpc.cv

Description

Usage

Arguments

Author(s)

References

Examples

Plot likelihood ratio test statistics from supervised principal components predictor

Description

Usage

Arguments

Author(s)

References

Examples

Form principal components predictor from a trained superpc object

Description

Usage

Arguments

Value

Author(s)

References

Examples