为什么将机器学习应用于资产定价？_qq_903012463的博客

相关文章推荐

打盹的针织衫 · 95家跨国企业在中国布局氢能抢滩中国！ ...· 9 月前 ·

善良的洋葱 · 618买高端手机，一加12和荣耀magic ...· 10 月前 ·

小胡子的煎饼果子 · 什么是深度学习？（从函数逼近论的角度来理解）· 11 月前 ·

失落的饭卡 · 校草必须要爱我哪个,是哪个作者 - 快看漫画· 1 年前 ·

微笑的饼干 · 从“草根导演”陈翔的“飞跃” ...· 2 年前 ·

以下分享Shihao Gu等的文献的一个章节，因为读到这里很感动，所以特地记下来。
A number of aspects of empirical asset pricing make it a particularly attractive field for analysis with
machine learning methods.

Two main research agendas have monopolized modern empirical asset pricing research. The
first seeks to describe and understand differences in expected returns across assets. The second
focuses on dynamics of the aggregate market equity risk premium. Measurement of an asset’s risk
premium is fundamentally a problem of prediction—the risk premium is the conditional expectation
of a future realized excess return. Machine learning, whose methods are largely specialized for
prediction tasks, is thus ideally suited to the problem of risk premium measurement.
The collection of candidate conditioning variables for the risk premium is large. The profession
has accumulated a staggering list of predictors that various researchers have argued possess forecast-
ing power for returns. The number of stock-level predictive characteristics reported in the literature
numbers in the hundreds and macroeconomic predictors of the aggregate market number in the
dozens.2Additionally, predictors are often close cousins and highly correlated. Traditional predic-
tion methods break down when the predictor count approaches the observation count or predictors
are highly correlated. With an emphasis on variable selection and dimension reduction techniques,
machine learning is well suited for such challenging prediction problems by reducing degrees of free-
dom and condensing redundant variation among predictors.
Further complicating the problem is ambiguity regarding functional forms through which the
high-dimensional predictor set enter into risk premia. Should they enter linearly? If nonlinearities
are needed, which form should they take? Must we consider interactions among predictors? Such
questions rapidly proliferate the set of potential model specifications. The theoretical literature offers
little guidance for winnowing the list of conditioning variables and functional forms. Three aspects
of machine learning make it well suited for problems of ambiguous functional form. The first is its
diversity. As a suite of dissimilar methods it casts a wide net in its specification search. Second, with
methods ranging from generalized linear models to regression trees and neural networks, machine
learning is explicitly designed to approximate complex nonlinear associations. Third, parameter
penalization and conservative model selection criteria complement the breadth of functional forms
spanned by these methods in order to avoid overfit biases and false discovery.

在线翻译 ：
经验资产定价的许多方面使其成为使用机器学习方法进行分析的特别有吸引力的领域。
1）两个主要的研究议程已经垄断了现代经验资产定价研究。第一种方法旨在描述和理解资产预期收益的差异。第二个重点是总市场股票风险溢价的动态。从根本上衡量资产的风险溢价是一个预测问题，即风险溢价是对未来实现的超额收益的有条件预期。因此，机器学习的方法主要用于预测任务，因此非常适合于风险溢价测量的问题。
2）风险溢价的候选条件变量的集合很大。该行业积累了惊人的预测指标列表，各种研究人员认为这些指标具有回报的预测能力。数百篇文献中报道的股票水平预测特征的数量以及数十种总体市场数量的宏观经济预测因子。2此外，预测因子通常是近亲，并且具有高度相关性。当预测变量数接近观察计数或预测变量高度相关时，传统的预测方法会崩溃。通过强调变量选择和降维技术，机器学习非常适合此类挑战性的预测问题，它可以减少自由度并压缩预测变量之间的冗余变化。
3）使功能问题进一步复杂化的是关于功能形式的歧义，高维预测变量通过这些功能形式进入风险溢价。他们应该线性输入吗？如果是非线性是被需要的，他们应该采取哪种形式？我们必须考虑预测变量之间的相互作用吗？这些问题迅速扩大了潜在的模型规格集。理论文献几乎没有指导您了解条件变量和功能形式的列表。机器学习的三个方面使其非常适合解决模棱两可的功能形式的问题。首先是它的多样性。作为一组不同的方法，它在规范搜索中投放了广泛的网络。其次，使用从广义线性模型到回归树和神经网络的方法，显式设计了机器学习以近似复杂的非线性关联。第三，参数惩罚和保守模型选择标准补充了这些方法所涵盖的功能形式的广度，以避免过度拟合偏差和错误发现。

以下分享Shihao Gu等的文献的一个章节，因为读到这里很感动，所以特地记下来。A number of aspects of empirical asset pricing make it a particularly attractive field for analysis withmachine learning methods.Two main research agendas have monopolized modern empirical asset pricing research

2020年的Review of Financial Studies刊出了一篇名为“Empirical Asset Pricing via Machine Learning”的文章，作者中有两位是在Booth的华人顾诗颢、修大成，另一位则是在耶鲁和 AQR任职的 Bryan Kelly。该文对使用 机器学习 做实证资产定价的经典问题（即测度资产的风险溢价）进行了可比较的分析，表明使用 机器学习 的投资者可获得巨大的经济收益，甚至可比现有文献中基于回归的策略表现高出一倍。该文确定出最佳的模型（树和神经网络），并追踪到

新的研究表明，汤普森抽样可以很自然地与经典的线性规划公式相结合，其中就包括库存受限。 1933年， William R. Thompson发表了一篇关于贝叶斯模型(Bayesian model)算法的文章，该算法最终将被称为汤普森抽样。但这一算法在当时却成为激烈的研究主题，部分原因是互联网公司成功地将其用于在线广告展示。汤普森抽样选择了多臂强盗问题（有...