Title: | Adjusted Prediction Model Performance Estimation |
---|---|
Description: | Calculating predictive model performance measures adjusted for predictor distributions using density ratio method (Sugiyama et al., (2012, ISBN:9781139035613)). L1 and L2 error for continuous outcome and C-statistics for binomial outcome are computed. |
Authors: | Eisuke Inoue, Hajime Uno |
Maintainer: | Eisuke Inoue <[email protected]> |
License: | GPL-2 |
Version: | 0.1.1 |
Built: | 2025-01-22 04:41:14 UTC |
Source: | https://github.com/cran/APPEstimation |
This package provides the function to estimate model performance
measures (,
,
-statistics). The difference in
the distribution of predictors between two datasets (training and
validation) is adjusted by a density ratio estimate.
Package: | APPEstimation |
Type: | Package |
Title: | Adjusted Prediction Model Performance Estimation |
Version: | 0.1.1 |
Depends: | densratio |
Date: | 2018-1-4 |
Author: | Eisuke Inoue, Hajime Uno |
Maintainer: | Eisuke Inoue <[email protected]> |
Description: | Calculating predictive model performance measures adjusted for predictor distributions using density ratio method (Sugiyama et al., (2012, ISBN:9781139035613)). L1 and L2 error for continuous outcome and C-statistics for binomial outcome are computed. |
License: | GPL-2 |
NeedsCompilation: | no |
Packaged: | 2018-01-05 02:55:47 UTC; inoue |
Date/Publication: | 2018-01-05 12:30:40 UTC |
Repository: | https://eisuke-inoue.r-universe.dev |
RemoteUrl: | https://github.com/cran/APPEstimation |
RemoteRef: | HEAD |
RemoteSha: | 9fedeba12a5148d33ee04dc93f7b1f0da98aad46 |
Index of help topics:
APPEstimation-package R function to calculate model performance measure adjusted for predictor distributions. appe.glm C-statistics adjusted for predictor distributions appe.lm L_1 and L_2 errors adjusted for predictor distributions cvalest.bin Estimation of C-statistics densratio.appe A wrapper function
Eisuke Inoue, Hajime Uno
Maintainer: Eisuke Inoue <[email protected]>
Sugiyama, M., Suzuki, T. & Kanamori, T. Density Ratio Estimation in Machine Learning. Cambridge University Press 2012. ISBN:9781139035613.
set.seed(100) # generating learning data n0 = 100 Z = cbind(rbeta(n0, 5, 5), rbeta(n0, 5, 5)) Y = apply(Z, 1, function (xx) { rbinom(1, 1, (1/(1+exp(-(sum(c(-2,2,2) * c(1,xx)))))))}) dat = data.frame(Y=Y, Za=Z[,1], Zb=Z[,2]) # the model to be evaluated mdl = glm(Y~., binomial, data=dat) # validation dataset, with different centers on predictors n1 = 100 Z1 = cbind(rbeta(n1, 6, 4), rbeta(n1, 6, 4)) Y1 = apply(Z1, 1, function (xx) { rbinom(1, 1, (1/(1+exp(-(sum(c(-2,2,2) * c(1,xx)))))))}) dat1 = data.frame(Y=Y1, Za=Z1[,1], Zb=Z1[,2]) # calculation of L1 and L2 for this model appe.glm(mdl, dat, dat1, reps=0)
set.seed(100) # generating learning data n0 = 100 Z = cbind(rbeta(n0, 5, 5), rbeta(n0, 5, 5)) Y = apply(Z, 1, function (xx) { rbinom(1, 1, (1/(1+exp(-(sum(c(-2,2,2) * c(1,xx)))))))}) dat = data.frame(Y=Y, Za=Z[,1], Zb=Z[,2]) # the model to be evaluated mdl = glm(Y~., binomial, data=dat) # validation dataset, with different centers on predictors n1 = 100 Z1 = cbind(rbeta(n1, 6, 4), rbeta(n1, 6, 4)) Y1 = apply(Z1, 1, function (xx) { rbinom(1, 1, (1/(1+exp(-(sum(c(-2,2,2) * c(1,xx)))))))}) dat1 = data.frame(Y=Y1, Za=Z1[,1], Zb=Z1[,2]) # calculation of L1 and L2 for this model appe.glm(mdl, dat, dat1, reps=0)
-statistics adjusted for predictor distributions
Calculates adjusted statistics by predictor distributions for
a generalized linear model with binary outcome.
appe.glm(mdl, dat.train, dat.test, method = "uLSIF", sigma = NULL, lambda = NULL, kernel_num = NULL, fold = 5, stabilize = TRUE, qstb = 0.025, reps = 2000, conf.level = 0.95)
appe.glm(mdl, dat.train, dat.test, method = "uLSIF", sigma = NULL, lambda = NULL, kernel_num = NULL, fold = 5, stabilize = TRUE, qstb = 0.025, reps = 2000, conf.level = 0.95)
mdl |
a |
dat.train |
a dataframe used to construct a prediction model (specified in
|
dat.test |
a dataframe corresponding to a validation (testing) data. Need to include outcome and all predictors. |
method |
uLSIF or KLIEP.
Same as the argument in |
sigma |
a positive numeric vector corresponding to candidate values of a
bandwidth for Gaussian kernel.
Same as the argument in |
lambda |
a positive numeric vector corresponding to candidate values of a
regularization parameter.
Same as the argument in |
kernel_num |
a positive integer corresponding to number of kernels.
Same as the argument in |
fold |
a positive integer corresponding to a number of the folds of
cross-validation in the KLIEP method.
Same as the argument in |
stabilize |
a logical value as to whether tail weight stabilization is performed
or not.
If TRUE, both tails of the estimated density ratio distribution are
replaced by the constant value which is specified at |
qstb |
a positive numerical value less than 1 to control the degree of weight stabilization. Default value is 0.025, indicating estimated density ratio values less than the 2.5 percentile and more than the 97.5 percentile are set to 2.5 percentile and 97.5 percentile, respectively. |
reps |
a positive integer to specify bootstrap repetitions. If 0, bootstrap calculations are not performed. |
conf.level |
a numerical value indicating a confidence level of interval. |
Adjusted and non-adjusted estimates of -statistics are provided
as matrix form.
"Cstat" indicates non-adjusted version, "C adjusted by score"
indicates adjusted version by linear predictors distribution, and
"C adjusted by predictors" indicates adjusted version by
predictor distributions (multi-dimensionally).
For confidence intervals, "Percentile" indicates a confidence interval
by percentile method and "Approx" indicates approximated versions
by Normal distribution.
set.seed(100) # generating learning data n0 = 100 Z = cbind(rbeta(n0, 5, 5), rbeta(n0, 5, 5)) Y = apply(Z, 1, function (xx) { rbinom(1, 1, (1/(1+exp(-(sum(c(-2,2,2) * c(1,xx)))))))}) dat = data.frame(Y=Y, Za=Z[,1], Zb=Z[,2]) # the model to be evaluated mdl = glm(Y~., binomial, data=dat) # validation dataset, with different centers on predictors n1 = 100 Z1 = cbind(rbeta(n1, 6, 4), rbeta(n1, 6, 4)) Y1 = apply(Z1, 1, function (xx) { rbinom(1, 1, (1/(1+exp(-(sum(c(-2,2,2) * c(1,xx)))))))}) dat1 = data.frame(Y=Y1, Za=Z1[,1], Zb=Z1[,2]) # calculation of L1 and L2 for this model appe.glm(mdl, dat, dat1, reps=0)
set.seed(100) # generating learning data n0 = 100 Z = cbind(rbeta(n0, 5, 5), rbeta(n0, 5, 5)) Y = apply(Z, 1, function (xx) { rbinom(1, 1, (1/(1+exp(-(sum(c(-2,2,2) * c(1,xx)))))))}) dat = data.frame(Y=Y, Za=Z[,1], Zb=Z[,2]) # the model to be evaluated mdl = glm(Y~., binomial, data=dat) # validation dataset, with different centers on predictors n1 = 100 Z1 = cbind(rbeta(n1, 6, 4), rbeta(n1, 6, 4)) Y1 = apply(Z1, 1, function (xx) { rbinom(1, 1, (1/(1+exp(-(sum(c(-2,2,2) * c(1,xx)))))))}) dat1 = data.frame(Y=Y1, Za=Z1[,1], Zb=Z1[,2]) # calculation of L1 and L2 for this model appe.glm(mdl, dat, dat1, reps=0)
and
errors adjusted for predictor distributions
Calculates adjusted and
errors by predictor
distributions for a linear model.
appe.lm(mdl, dat.train, dat.test, method = "uLSIF", sigma = NULL, lambda = NULL, kernel_num = NULL, fold = 5, stabilize = TRUE, qstb = 0.025, reps = 2000, conf.level = 0.95)
appe.lm(mdl, dat.train, dat.test, method = "uLSIF", sigma = NULL, lambda = NULL, kernel_num = NULL, fold = 5, stabilize = TRUE, qstb = 0.025, reps = 2000, conf.level = 0.95)
mdl |
a |
dat.train |
same as in |
dat.test |
same as in |
method |
same as in |
sigma |
same as in |
lambda |
same as in |
kernel_num |
same as in |
fold |
same as in |
stabilize |
same as in |
qstb |
same as in |
reps |
same as in |
conf.level |
same as in |
Adjusted and non-adjusted estimates of and
errors
are provided as matrix form.
"L1" and "L2" indicate non-adjusted versions, "L1 adjusted by score"
and "L2 adjusted by score" indicate adjusted versions by linear
predictors distribution, "L1 adjusted by predictors" and
"L2 adjusted by predictors" indicate adjusted versions by
predictor distributions (multi-dimensionally).
For confidence intervals, "Percentile" indicates a confidence interval
by percentile method and "Approx" indicates approximated versions
by Normal distribution.
set.seed(100) # generating development data n0 = 100 Z = cbind(rbeta(n0, 3, 3), rbeta(n0, 3, 3)) Y = apply(Z, 1, function(xx) { rlnorm(1, sum(c(1, 1) * xx), 0.3) }) dat = data.frame(Za=Z[,1], Zb=Z[,2], Y=Y) # the model to be evaluated mdl = lm(Y~ Za + Zb, data=dat) # generating validation dataset n1 = 100 Z1 = cbind(rbeta(n0, 3.5, 2.5), rbeta(n0, 3.5, 2.5)) Y1 = apply(Z1, 1, function(xx) { rlnorm(1, sum(c(1, 1) * xx), 0.3) }) dat1 = data.frame(Za=Z1[,1], Zb=Z1[,2], Y=Y1) # calculation of L1 and L2 for this model appe.lm(mdl, dat, dat1, reps=0)
set.seed(100) # generating development data n0 = 100 Z = cbind(rbeta(n0, 3, 3), rbeta(n0, 3, 3)) Y = apply(Z, 1, function(xx) { rlnorm(1, sum(c(1, 1) * xx), 0.3) }) dat = data.frame(Za=Z[,1], Zb=Z[,2], Y=Y) # the model to be evaluated mdl = lm(Y~ Za + Zb, data=dat) # generating validation dataset n1 = 100 Z1 = cbind(rbeta(n0, 3.5, 2.5), rbeta(n0, 3.5, 2.5)) Y1 = apply(Z1, 1, function(xx) { rlnorm(1, sum(c(1, 1) * xx), 0.3) }) dat1 = data.frame(Za=Z1[,1], Zb=Z1[,2], Y=Y1) # calculation of L1 and L2 for this model appe.lm(mdl, dat, dat1, reps=0)
-statistics
Calculates -statistics. Individual case weight can be
incorporated.
cvalest.bin(Y, scr, wgt = NULL)
cvalest.bin(Y, scr, wgt = NULL)
Y |
a numerical vector of inary outcome, either 0 or 1. |
scr |
a numerical vector of continuous variable. |
wgt |
a numerical vector corresponding to individuatl weight. |
-statistics is provided.
A wrapper function to use "densratio" function from the densratio package.
densratio.appe(xtrain, xtest, method = "uLSIF", sigma = NULL, lambda = NULL, kernel_num = NULL, fold = 5, stabilize = TRUE, qstb = 0.025)
densratio.appe(xtrain, xtest, method = "uLSIF", sigma = NULL, lambda = NULL, kernel_num = NULL, fold = 5, stabilize = TRUE, qstb = 0.025)
xtrain |
a dataframe used to construct a prediction model. |
xtest |
a dataframe corresponding to a validation (testing) data. |
method |
same as in |
sigma |
same as in |
lambda |
same as in |
kernel_num |
same as in |
fold |
same as in |
stabilize |
same as in |
qstb |
same as in |