imputeTestbench - R Package

R package imputeTestbench, which provides a testbench to do comparison of methods used for missing data imputation. This package validates and compares a proposed imputation method with other default methods like historic mean and interpolation. The testbench is not limited to these methods. User can add or remove multiple numbers of methods in the existing methods in testbench. By default, testbench compares different imputation methods considering different error metrics RMSE, MAE or MAPE. Along with this, it allows user to add new error metrics as per their requirements. The simplicity of the package usage and significant reduction in efforts and time consumption in state of art procedure, adds valuable advantage to it. This paper explains the use of all functions in imputeTestbench package with the demonstration of examples.

This package is available at CRAN reposiratory (
https://cran.r-project.org/package=imputeTestbench). 

Detailed discussion about R package imputeTestbench is discussed in following publication:

[Manuscript.pdf]    [OA]    [Package]

The following Web Application gives the Interactive User Interface to R package imputeTestbench. (Click on the Image)



Following example describes the working of this package:

Consider a sample data datax as follows:

datax <- c(1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5,1:5)

 Import library for Package imputeTestbench as follows:

library(imputeTestbench)

 The function impute_errors() is used to compare imputing methods with reference to RMSE, MAE or MAPE parameters. Syntax of `impute_errors()’ as shown below:


impute_errors(dataIn, missPercentFrom, missPercentTo, interval, repetition, errorParameter, MethodPath, MethodName)

 where,

  • dataIn is input data for testing
  • missPercentFrom is variable from which percent of missing values to be considered
  • missPercentTo is variable to state up to what percent missing values are to be considered
  • interval is interval between consecutive missPercent values
  • repetition is an integer to decide the numbers of repetition to be done for each missPercent value
  • errorParameter is type of error calculation (RMSE, MAE or MAPE)
  • MethodPath is location of function for the proposed imputation method
  • MethodName is name for function for the proposed imputation method

  At simplest form, function impute_errors() can we used as:

q <- impute_errors(datax)
q
## $Parameter
## [1] "RMSE Plot"
## 
## $Missing_Percent
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
## 
## $Historic_Mean
## [1] 0.4789879 0.6250889 0.8108440 0.9018024 0.9856108 1.1087825 1.1952286
## [8] 1.2724180
## 
## $Interpolation
## [1] 0.6220167 0.7748639 0.8716673 1.3633658 1.2714936 1.3627703 1.2976507
## [8] 1.8725297
# By default, the bar plot is used to show the comparison
plot_errors(q)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]  1.5  4.5  7.5 10.5 13.5 16.5 19.5 22.5
## [2,]  2.5  5.5  8.5 11.5 14.5 17.5 20.5 23.5
# Also, User can plot the comparison with line plot as:
plot_errors(dataIn = q, plotType = 2)

By default, this function compares two basic imputation methods, i.e. Historical means and Interpolation methods. The plot_errors() function is used to plot the comparison plots between different methods. This test bench allows to add one more imputing method to compare with already existing methods. The only care is to be takes as, the new imputing method is to be designed in function format such that it could return imputed data as output. Suppose, following function is the desired method to add in test bench.

===============================

inter <- function(outs)

{

library(imputeTS)

outs <- ts(outs)

d <- na.random(outs)

return(d)

}

===============================

Save this function in new R script file and save it and note its Source location similar to "source('~/imputeTestbench/R/inter.R')" and use impute_error() function as:

aa <- append_method(existing_method = q,dataIn= datax,missPercentFrom = 10, missPercentTo = 80, interval = 10, MethodPath = "source('~/imputeTestbench/R/inter.R')", MethodName = "Random")

aa
plot_errors(aa)

This above code is written in commented format, since this function is dependent on other function and its location, which is not included in this package.

If user wishes to add more than one imputation methods to test bench, the function append_method() is used as:

bb <- append_method(existing_method = aa, dataIn= datax,missPercentFrom = 10, missPercentTo = 80, interval = 10, MethodPath = "source('~/imputeTestbench/R/PSFimpute.R')", MethodName = "PSFimpute")

bb
plot_errors(bb)

where

  • existing_method is output obtained from impute_error() function
  • dataIn is input data for testing
  • missPercentFrom is variable from which percent of missing values to be considered
  • missPercentTo is variable to state up to what percent missing values are to be considered
  • interval is interval between consecutive missPercent values
  • repetition is an integer to decide the numbers of repetition to be done for each missPercent value
  • errorParameter is type of error calculation (RMSE, MAE or MAPE)
  • MethodPath is location of function for the proposed imputation method
  • MethodName is name for function for the proposed imputation method

Similarly, user can remove an imputation method from test bench with following function

cc <- remove_method(existing_method = bb, method_number = 1)
cc
plot_errors(cc)

To introduce missing patches as desired locations, random parameter is used. When random = 1, package itself introduce missing values at completely random places, whereas when random = 0, it allows user to introduce missing patches as desired locations as shown in following code.

dd <- impute_errors(random = 0, startPoint = c(10, 20, 30), patchLength = c(3, 4, 5))
dd
## $Parameter
## [1] "RMSE Plot"
## 
## $Missing_Percent
## [1] 0.12
## 
## $Historic_Mean
## [1] 0.5746791
## 
## $Interpolation
## [1] 0.7843964


Ċ
Neeraj Dhanraj,
Jun 11, 2016, 12:13 PM
Comments