← Part 1 of lasso The Lasso in High Dimensions, Part 1: Setup and Motivation 2026-05-10 statisticsmachine-learninglasso Setup and MotivationIn many modern statistical applications, we encounter the high-dimensional setting where thenumber of parameters 𝑝 far exceeds the number of observations 𝑛. Classical methods likeordinary least squares break down entirely in this regime.The Linear ModelConsider the standard linear model:𝑦=𝑋𝛽∗+𝜀where 𝑦∈ℝ𝑛 is the response vector, 𝑋∈ℝ𝑛×𝑝 is the design matrix, 𝛽∗∈ℝ𝑝 is the unknownparameter vector, and 𝜀∈ℝ𝑛 is a noise vector with 𝜀𝑖∼𝑁(0,𝜎2).When 𝑝≫𝑛, the system is underdetermined — there are infinitely many 𝛽 that perfectlyinterpolate the data. We need additional structure to make the problem well-posed.Sparsity AssumptionThe key insight is to assume that 𝛽∗ is sparse: most of its entries are zero. Formally, we assume‖𝛽∗‖0=𝑠 where 𝑠≪𝑛≪𝑝.The Lasso EstimatorThe Lasso [1] (Least Absolute Shrinkage and Selection Operator) is defined as:̂𝛽lasso=argmin𝛽∈ℝ𝑝{12𝑛‖𝑦−𝑋𝛽‖22+𝜆‖𝛽‖1}The ℓ1 penalty 𝜆‖𝛽‖1 simultaneously performs estimation and variable selection by shrinkingsmall coefficients exactly to zero.In the next post, we will establish oracle inequalities that quantify how well ̂𝛽lasso approximates𝛽∗.Bibliography[1]R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the RoyalStatistical Society: Series B, vol. 58, no. 1, pp. 267–288, 1996. The Lasso in High Dimensions, Part 2: Oracle Inequalities →