← Part 1 of lasso

The Lasso in High Dimensions, Part 1: Setup and Motivation

statisticsmachine-learninglasso
Setup and MotivationIn many modern statistical applications, we encounter the high-dimensional setting where thenumber of parameters 𝑝 far exceeds the number of observations 𝑛. Classical methods likeordinary least squares break down entirely in this regime.The Linear ModelConsider the standard linear model:𝑦=𝑋𝛽+𝜀where 𝑦𝑛 is the response vector, 𝑋𝑛×𝑝 is the design matrix, 𝛽𝑝 is the unknownparameter vector, and 𝜀𝑛 is a noise vector with 𝜀𝑖𝑁(0,𝜎2).When 𝑝𝑛, the system is underdetermined — there are infinitely many 𝛽 that perfectlyinterpolate the data. We need additional structure to make the problem well-posed.Sparsity AssumptionThe key insight is to assume that 𝛽 is sparse: most of its entries are zero. Formally, we assume𝛽0=𝑠 where 𝑠𝑛𝑝.The Lasso EstimatorThe Lasso [1] (Least Absolute Shrinkage and Selection Operator) is defined as:̂𝛽lasso=argmin𝛽𝑝{12𝑛𝑦𝑋𝛽22+𝜆𝛽1}The 1 penalty 𝜆𝛽1 simultaneously performs estimation and variable selection by shrinkingsmall coefficients exactly to zero.In the next post, we will establish oracle inequalities that quantify how well ̂𝛽lasso approximates𝛽.Bibliography[1]R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the RoyalStatistical Society: Series B, vol. 58, no. 1, pp. 267–288, 1996.