Up Next

Market-Based Solutions to Vital Economic Issues


Future Business Leaders Committed to Changing the World for the Better
Jun 29, 2022

Shapley Values, Random Forests, and Lasso Regression for Explaining Factor Important in U.S. Cross-Sectional Returns

An asset’s expected return is based on a set of risk factors and the asset’s exposure to each factor. A core pillar of factor investing research has been the identification of new factors that can best explain cross-sectional returns. Researchers have identified hundreds of factors, but many of these factors are redundant, containing similar information about risk. A key challenge is using this vast collection of discovered factors to determine which factors are actually the most important. I create LASSO and random forest machine learning models for this purpose due to these models’ abilities to handle high-dimensional data. I use Shapley values, feature permutation, and mean decrease in impurity to evaluate feature importance for the random forest model, and I compare those results to the feature importance obtained through LASSO and OLS regression. From a set of 150 factors, I find that the momentum (UMD), earnings announcement return (ear), high-minus-low (HML), sales-to-cash ratio (salecash), and small-minus-big (SMB) factors are the most important overall. The different models produce moderate differences in factor importance rankings, and the different feature importance metrics for the random forest produce slight variations in feature importance rankings.

Independent Study under the direction of Professor Gill Segal

You may also be interested in: