(Adapted and summarized from “Probabilistic Machine Learning for Finance and Investing” by Deepak Kanungo)
In investing, we are not dealing with simple games of chance, such as casino games, where the players, rules, and probability distributions are fixed and known. Product markets are quite different from such simple games of chance, where event risks can be estimated accurately. Unknown market participants may use different probability distributions in their models based on their own strategies and assumptions. Even for popular, consensus statistical distributions, there is no agreement about parameter estimates. Furthermore, because markets are not stationary ergodic, these probability distributions and their parameters are continually changing, sometimes abruptly, making a mockery of everyone’s estimates and predictions. So, based on the conventional definition of risk and uncertainty, almost all investing is uncertain. As practitioners, we develop our own subjective models based on our experience, expertise, knowledge, and judgment. As a matter of fact, we protect our proprietary, subjective models assiduously, since sharing them with the public would undermine our competitive advantages.
In epistemic statistics, probabilities are an extension of logic and can be assigned to any uncertain event—known, unknown, and unknowable. We do this by rejecting point estimates and setting the bar extremely high for assuming any event to be a certainty (probability = 1) or an impossibility (probability = 0). That’s why in epistemic statistics we deal only with probability distributions. Unknowable events are acknowledged by using fat-tailed probability distributions like the Cauchy distribution, which has no defined mean or variance, reflecting the fact that almost anything is possible during our investments. Probability estimates are based on our prior knowledge, observed data, and expertise in making such estimates. But most importantly, they depend on human judgment, common sense, and an understanding of causation, which AI systems are incapable of processing. The degree of confidence we have in our estimates and forecasts will vary depending on many factors, including the nature of the event, the sources of uncertainty, our resources, and our abilities to perform such tasks. In finance and investing, we don’t have the luxury of not undertaking such imperfect, messy statistical endeavors. We do it knowing full well that these difficult exercises are rife with approximations, riddled with potential errors, and susceptible to the ravages and ridicule of markets. The worst course of action is to be lulled into a false sense of security by some economic ideology of objective statistical models or normative theory of human behavior and rationality that have no basis in data and the experienced realities of the world.
All uncertain events are logically and realistically plausible based on an appropriate probability distribution and boundary conditions. We know that all models are wrong, including the useful ones, and do not pledge any fealty to these shadows of reality.
Epistemic Uncertainty
Episteme means “knowledge” in Greek. The epistemic uncertainty of any scenario depends on the state of knowledge or ignorance of the person confronting it. Unlike aleatory uncertainty, you can reduce epistemic uncertainty by acquiring more knowledge and understanding.
Designing Machine Learning models involves making choices about the objective function, data sample, model, algorithm, and computational resources, among many others. Our goal is to train our ML system so that it minimizes out-of-sample generalization errors that are reducible. If we have prior knowledge about the problem domain, we might develop a simple system with few parameters because of such knowledge and assumptions. This is referred to as bias in ML. The risk is that our prior assumptions of the model may be erroneous, leading it to underfit the training data systematically and learn no new patterns or signals from it. Consequently, the model is exposed to bias errors and performs poorly on unseen test data. On the other hand, if we don’t have prior knowledge about the problem domain, we might build a complex model with many parameters to adapt and learn as much as possible from the training data. The risk there is that the model overfits the training data and learns the spurious correlations (noise) as well. This result is that the model introduces errors in its predictions and inferences due to minor variations in the data. These errors are referred to as variance errors and the model performs poorly on unseen test data. A trade-off needs to be made in developing models that minimize reducible generalization errors. This trade-off is made more difficult and dynamic when the underlying data distributions are not stationary ergodic, as they are in investing problems.
Ontological Uncertainty
Ontological uncertainty generally arises from the future of human affairs being essentially unknowable.
Unexpected changes in business and financial markets are the rule, not the exception. Markets don’t send out a memo to participants when undergoing structural changes. Companies, deals, and investment strategies fail regularly and spectacularly because of these types of changes.
In investing, the source of ontological uncertainty is the complexity of human activities, such as political elections, monetary and fiscal policy changes, company bankruptcies, and technological breakthroughs, to name just a few. Only humans can understand the causality underlying these changes and use common sense to redesign the ML models from scratch to adapt to a new regime.
As you can see, designing models involves understanding different types of uncertainties, with each entailing a decision among various design options. Answers to these questions require prior knowledge of the problem domain and experience experimenting with many different models and algorithms. They cannot be derived from first principles of deductive logic or learned only from sample data that are not stationary ergodic. What might surprise most people is that there is a set of mathematical theorems called the no-free lunch (NFL) theorems that prove the validity of our various approaches.
The No Free Lunch Theorems
In 1996, David Wolpert shocked the economic and investing communities by publishing a paper that proved mathematically the impossibility of the existence of a superior Machine Learning algorithm that can solve all problems optimally. Prior knowledge of the problem domain is required to select the appropriate learning algorithm and improve its performance1.
Wolpert subsequently published another paper with William Macready in 1997 that provided a similar proof for search and optimization algorithms. These theorems are collectively known as the no-free lunch (NFL) theorems. Note that prior knowledge and assumptions about the problem domain that is used in the selection and design of the learning algorithm are also referred to as bias. Furthermore, a problem is defined by a data-generating target distribution that the algorithm is trying to learn from training data. A cost function is used to measure the performance of the learning algorithm on out-of-sample test data.
Succinctly, all this means that, if we don’t make the payment of prior knowledge to align our learning algorithm with the underlying target function of the problem domain, like the freeloading frequentists claim we must do to remain unbiased, the learning algorithm’s predictions based on unseen data will be no better than random guessing when averaged over all possible target distributions. In fact, the risk is that it might be worse than random guessing. So we can’t have our lunch and not pay for it in AI and Machine Learning. If we bolt for the exit without paying for our lunch, we’ll realize later that what we wolfed down was junk food and not a real meal.
Conclusions
In short, a model’s got to know its limitations. This is worth emphasizing because of the importance of this characteristic for models in investing. The corollary is that an AI’s got to know its limitations. The most serious limitation of all AI systems is that they lack common sense. This stems from their inability to understand causal relationships. AI systems only learn statistical relationships during training that are hard to generalize to new situations without comprehending causality.
Humans are endowed with a very important quality that no AI has been able to learn so far: a commonsensical ability to generalize our learnings reasonably well to unseen, out-of-sample related classes or ranges, even if we have not been specifically trained on them. Unlike AI systems, almost all humans can easily deduce, infer, and adjust their knowledge to new circumstances based on common sense.
The primary reason for such common failures is that AI models only compute correlations and don’t have the tools to comprehend causation. Furthermore, humans can abstract concepts from specific examples and think in terms of generalization of objects and causal relationships among them, while AI systems are just unable to do that. This is a major problem when dealing with noisy, big datasets as they present abundant opportunities for correlating variables that have no plausible physical or causal relationship. With large datasets, spurious correlations among variables are the rule, not the exception. Much valuable information and assessment of uncertainty are lost when a statistical distribution is summarized by a point estimate, even if it is an optimal estimate. By definition and design, a point estimate cannot capture the epistemic uncertainty of model parameters because they are not probability distributions. This has serious consequences in finance and investing, where we are dealing with complex, dynamic social systems that are steeped in all three dimensions of uncertainty: aleatory, epistemic, and ontological.
David Wolpert, “The Lack of A Priori Distinctions between Learning Algorithms,” Neural Computation 8, no. 7 (1996): 1341–90.