sfa: Stochastic Frontier Analysis

Description

Maximum Likelihood Estimation of Stochastic Frontier Production and Cost Functions. Two specifications are available: the error components specification with time-varying efficiencies (Battese and Coelli 1992) and a model specification in which the firm effects are directly influenced by a number of variables (Battese and Coelli 1995). This R package uses the Fortran source code of Frontier 4.1 (Coelli 1996).

Usage

sfa( formula, data = sys.frame( sys.parent() ),
   ineffDecrease = TRUE, truncNorm = FALSE,
   timeEffect = FALSE, startVal = NULL,
   tol = 0.00001, maxit = 1000, muBound = 2, bignum = 1.0E+16,
   searchStep = 0.00001, searchTol = 0.001, searchScale = NA,
   gridSize = 0.1, gridDouble = TRUE,
   restartMax = 10, restartFactor = 0.999, printIter = 0 )
frontier( yName, xNames = NULL, zNames = NULL, data,
   zIntercept = FALSE, … )
# S3 method for frontier
print( x, digits = NULL, … )

Arguments

formula

a symbolic description of the model to be estimated; it can be either a (usual) one-part or a two-part formula (see section ‘Details’).

data

a (panel) data frame that contains the data; if data is a usual data.frame, it is assumed that these are cross-section data; if data is a panel data frame (created with pdata.frame), it is assumed that these are panel data.

ineffDecrease

logical. If TRUE, inefficiency decreases the endogenous variable (e.g. for estimating a production function); if FALSE, inefficiency increases the endogenous variable (e.g. for estimating a cost function).

truncNorm

logical. If TRUE, the inefficiencies are assumed to have a truncated normal distribution (i.e. parameter $μ$ is added); if FALSE, they are assumed to have a half-normal distribution (only relevant for the ‘Error Components Frontier’).

timeEffect

logical. If FALSE (default), the efficiency estimates of an ‘Error Components Frontier’ are time invariant; if TRUE, time is allowed to have an effect on efficiency (this argument is ignored in case of an ‘Efficiency Effects Frontier’).

startVal

numeric vector. Optional starting values for the ML estimation.

tol

numeric. Convergence tolerance (proportional).

maxit

numeric. Maximum number of iterations permitted.

muBound

numeric. Bounds on the parameter $μ$ (see ‘details’ section).

bignum

numeric. Used to set bounds on densities and distributions.

searchStep

numeric. Size of the first step in the Coggin uni-dimensional search procedure done each iteration to determine the optimal step length for the next iteration (see Himmelblau 1972).

searchTol

numeric. Tolerance used in the Coggin uni-dimensional search procedure done each iteration to determine the optimal step length for the next iteration (see Himmelblau 1972).

searchScale

logical or NA. Scaling in the Coggin uni-dimensional search procedure done each iteration to determine the optimal step length for the next iteration (see Himmelblau 1972): if TRUE, the step length is scaled to the length of the last step; if FALSE, the step length is not scaled; if NA, the step length is scaled (to the length of last step) only if the last step was smaller.

gridSize

numeric. The size of the increment in the first phase grid search on $γ$ .

gridDouble

logical. If TRUE, a second phase grid search on $γ$ is conducted around the “best” value obtained in the first phase with an increment of gridSize/10.

restartMax

integer: maximum number of restarts of the search procedure when it cannot find a parameter vector that results in a log-likelihood value larger than the log-likelihood value of the initial parameters.

restartFactor

numeric scalar: if the search procedure cannot find a parameter vector that results in a log-likelihood value larger than the log-likelihood value of the initial parameters, the initial values (provided by argument startVal or obtained by the grid serach) are multiplied by this number before the search procedure is restarted.

printIter

numeric. Print info every printIter iterations; if this argument is 0, do not print.

yName

string: name of the endogenous variable.

xNames

a vector of strings containing the names of the X variables (exogenous variables of the production or cost function).

zNames

a vector of strings containing the names of the Z variables (variables explaining the efficiency level).

zIntercept

logical. If TRUE, an intercept (with parameter $δ_{0}$ ) is added to the Z variables (only relevant for the ‘Efficiency Effects Frontier’).

an object of class frontier (returned by the function frontier).

digits

a non-null value for ‘digits’ specifies the minimum number of significant digits to be printed in values. The default, NULL, uses max(3,getOption("digits")-3). Non-integer values will be rounded down, and only values greater than or equal to 1 and no greater than 22 are accepted.

…

additional arguments of frontier are passed to sfa; additional arguments of the print method are currently ignored.

Value

sfa and frontier return a list of class frontier containing following elements:

modelType

integer. A ‘1’ denotes an ‘Error Components Frontier’ (ECF); a ‘2’ denotes an ‘Efficiency Effects Frontier’ (EFF).

ineffDecrease

logical. Argument ineffDecrease (see above).

number of cross-sections.

number of time periods.

nob

number of observations in total.

number of regressor variables (Xs).

truncNorm

logical. Argument truncNorm.

zIntercept

logical. Argument zIntercept.

timeEffect

logical. Argument timeEffect.

printIter

numeric. Argument printIter (see above).

searchScale

numeric. Argument searchScale (see above).

tol

numeric. Argument tol (see above).

searchTol

numeric. Argument searchTol (see above).

bignum

numeric. Argument bignum (see above).

searchStep

numeric. Argument searchStep (see above).

gridDouble

logical. Argument gridDouble (see above).

gridSize

numeric. Argument gridSize (see above).

maxit

numeric. Argument maxit (see above).

muBound

numeric. Argument muBound (see above).

restartMax

numeric. Argument restartMax (see above).

restartFactor

numeric. Argument restartFactor (see above).

nRestart

numeric. Number of restarts of the search procedure when it cannot find a parameter vector that results in a log-likelihood value larger than the log-likelihood value of the initial parameters.

startVal

numeric vector. Argument startVal (only if specified by user).

call

the matched call.

dataTable

matrix. Data matrix sent to Frontier 4.1.

olsParam

numeric vector. OLS estimates.

olsStdEr

numeric vector. Standard errors of OLS estimates.

olsLogl

numeric. Log likelihood value of OLS estimation.

olsResid

numeric vector. Residuals of the OLS estimation.

olsSkewness

numeric. Skewness of the residuals of the OLS estimation.

olsSkewnessOkay

logical. Indicating if the residuals of the OLS estimation have the expected skewness.

gridParam

numeric vector. Parameters obtained from the grid search (if no starting values were specified).

gridLogl

numeric. Log likelihood value of the parameters obtained from the grid search (only if no starting values were specified).

startLogl

numeric. Log likelihood value of the starting values for the parameters (only if starting values were specified).

mleParam

numeric vector. Parameters obtained from ML estimation.

mleCov

matrix. Covariance matrix of the parameters obtained from the OLS estimation.

mleLogl

numeric. Log likelihood value of the ML estimation.

nIter

numeric. Number of iterations of the ML estimation.

code

integer indication the reason for determination: 1 = log likelihood values and parameters of two successive iterations are within the tolerance limits; 5 = cannot find a parameter vector that results in a log-likelihood value larger than the log-likelihood value obtained in the previous step; 6 = search failed on gradient step; 10 = maximum number of iterations reached.

nFuncEval

Number of evaluations of the log likelihood function during the grid search and the iterative ML estimation.

fitted

matrix. Fitted “frontier” values of the dependent variable: each row corresponds to a cross-section; each column corresponds to a time period.

resid

matrix. Residuals: each row corresponds to a cross-section; each column corresponds to a time period.

validObs

vector of logical values indicating which observations of the provided data were used for the estimation, i.e. do not have values that are not available (NA, NaN) or infinite (Inf).

Details

Function frontier is a wrapper function that calls sfa for the estimation. The two functions differ only in the user interface; function frontier has the “old” user interface and is kept to maintain compatibility with older versions of the frontier package.

One can use functions sfa and frontier to calculate the log likelihood value for a given model, a given data set, and given parameters by using the argument startVal to specify the parameters and using the other arguments to specify the model and the data. The log likelihood value can then be retrieved by the logLik method with argument which set to "start". Setting argument maxit to 0 avoids the (eventually time-consuming) ML estimation and allows to retrieve the log likelihood value with the logLik method without further arguments.

The frontier function uses the Fortran source code of Tim Coelli's software FRONTIER 4.1 (http://www.uq.edu.au/economics/cepa/frontier.htm) and hence, provides the same features as FRONTIER 4.1. A comprehensive documentation of FRONTIER 4.1 is available in the file Front41.pdf that is included in the archive FRONT41-xp1.zip, which is available at http://www.uq.edu.au/economics/cepa/frontier.htm. It is recommended to read this documentation, because the frontier function is based on the FRONTIER 4.1 software.

If argument formula of sfa is a (usual) one-part formula (or argument zNames of frontier is NULL), an ‘Error Components Frontier’ (ECF, see Battese and Coelli 1992) is estimated. If argument formula is a two-part formula (or zNames is not NULL), an ‘Efficiency Effects Frontier’ (EEF, see Battese and Coelli 1995) is estimated. In this case, the first part of the formula (i.e. the part before the “|” symbol) is used to explain the endogenous variable directly (X variables), while the second part of the formula (i.e. the part after the “|” symbol) is used to explain the efficiency levels (Z variables). Generally, there should be no reason for estimating an EEF without Z variables, but this can done by setting the second part of argument formula to 1 (with Z intercept) or - 1 (without Z intercept) (or by setting argument zNames) to NA).

In case of an Error Components Frontier (ECF) with the inefficiency terms $u$ following a truncated normal distribution with mean $μ$ , argument muBound can be used to restrict $μ$ to be in the interval $\pm$ muBound * $σ_{u}$ , where $σ_{u}$ is the standard deviation of $u$ . If muBound is infinity, zero, or negative, no bounds on $μ$ are imposed.

References

Battese, G.E. and T. Coelli (1992), Frontier production functions, technical efficiency and panel data: with application to paddy farmers in India. Journal of Productivity Analysis, 3, 153-169.

Battese, G.E. and T. Coelli (1995), A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empirical Economics, 20, 325-332.

Coelli, T. (1996) A Guide to FRONTIER Version 4.1: A Computer Program for Stochastic Frontier Production and Cost Function Estimation, CEPA Working Paper 96/08, http://www.uq.edu.au/economics/cepa/frontier.php, University of New England.

Himmelblau, D.M. (1972), Applied Non-Linear Programming, McGraw-Hill, New York.

Examples

Run this code

# NOT RUN {
   # example included in FRONTIER 4.1 (cross-section data)
   data( front41Data )

   # Cobb-Douglas production frontier
   cobbDouglas <- sfa( log( output ) ~ log( capital ) + log( labour ),
      data = front41Data )
   summary( cobbDouglas )

   # load data about rice producers in the Philippines (panel data)
   data( riceProdPhil )

   # Error Components Frontier (Battese & Coelli 1992)
   # with observation-specific efficiencies (ignoring the panel structure)
   rice <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ),
      data = riceProdPhil )
   summary( rice )

   # Error Components Frontier (Battese & Coelli 1992)
   # with "true" fixed individual effects and observation-specific efficiencies
   riceTrue <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ) + 
      factor( FMERCODE ),  data = riceProdPhil )
   summary( riceTrue )

   # add data set with information about its panel structure
   library( "plm" )
   ricePanel <- pdata.frame( riceProdPhil, c( "FMERCODE", "YEARDUM" ) )

   # Error Components Frontier (Battese & Coelli 1992)
   # with time-invariant efficiencies
   riceTimeInv <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ),
      data = ricePanel )
   summary( riceTimeInv )

   # Error Components Frontier (Battese & Coelli 1992)
   # with time-variant efficiencies
   riceTimeVar <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ),
      data = ricePanel, timeEffect = TRUE )
   summary( riceTimeVar )

   # Technical Efficiency Effects Frontier (Battese & Coelli 1995)
   # (efficiency effects model with intercept)
   riceZ <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ) |
      EDYRS + BANRAT, data = riceProdPhil )
   summary( riceZ )

   # Technical Efficiency Effects Frontier (Battese & Coelli 1995)
   # (efficiency effects model without intercept)
   riceZ2 <- sfa( log( PROD ) ~ log( AREA ) + log( LABOR ) + log( NPK ) |
      EDYRS + BANRAT - 1, data = riceProdPhil )
   summary( riceZ2 )

   # Cost Frontier (with land as quasi-fixed input)
   riceProdPhil$cost <- riceProdPhil$LABOR * riceProdPhil$LABORP +
      riceProdPhil$NPK * riceProdPhil$NPKP
   riceCost <- sfa( log( cost ) ~ log( PROD ) + log( AREA ) + log( LABORP )
      + log( NPKP ), data = riceProdPhil, ineffDecrease = FALSE )
   summary( riceCost )
# }

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning