“If you torture the data enough, it will confess to anything.” – Ronald H. Coase
New entrants into the field of systematic investing must look for ways to differentiate themselves. Impressive backtested performance is one way to stand out. But are these backtests robust? Are they likely to withstand the challenges of real-world implementation? Historical returns can be noisy, and even small changes to how an experiment is run can produce very different outcomes. While it is easy to find exciting results in a backtest, we believe the right thing to do is to evaluate these findings using a comprehensive research framework and determine whether the discoveries are truly new and can benefit investors going forward.
Setting the Stakes
Let’s say we have two simulated strategies, A and B, that focus on US small cap value stocks and are rebalanced annually. Strategy A returned 15.6% per year from January 1973 to June 2019. A slight methodology tweak to strategy B, however, boosted its performance to 16.7%. The easy thing to do with this observation is to conclude that B is superior. After all, a pickup of 110 basis points per year (1.1%) is no small feat.
So, what was the special enhancement for B relative to A? Simply changing the rebalancing month from May to January. As we see from Exhibit 1, there was substantial variation in the returns based on the choice of rebalancing month for otherwise identical small cap value strategies. However, one should not interpret this 110-basis point spread as an expected value-add. In fact, this is a cautionary tale about the potential for noise in empirical research.
What’s the right thing to do? Be wary of the potential for such noise and seek to mitigate its influence on the inferences drawn from the empirical findings. In this case, the right approach could be to use annual staggered rebalancing that takes an average of all the bars in Exhibit 1, thus reducing the potential to game the backtest by picking the rebalancing month.
Exhibit 1: Flavor of the Month
Annualized Compound Returns (%) for Simulated US Small Cap Value Strategies Rebalanced in Different Months, January 1973–June 2019
In Search of Anomalies
When it comes to financial research, few things are more exciting than finding a new anomaly, or a variable that appears to drive differences in average returns. Let’s take stock volatility as an example. As shown in Exhibit 2, using Fama/French data on quintiles formed by sorting US stocks on past volatility (standard deviation), average returns for the low volatility quintile have been similar to those of the Fama/French Total US Market Research Index since the 1960s, despite the former having lower-than-market volatility. The easy thing to do? Declare that we have found a new anomaly.
Exhibit 2: Lowdown on Low Vol
Monthly Performance of Low Volatility Stocks Compared to the US Market, July 1963–December 2019
Not so fast. We believe the right way to analyze a new pattern in the historical return data is in the context of known drivers of expected returns. Specifically, is this new observation additive to our understanding of asset pricing? When evaluated against the Fama/French Five-Factor Model in Exhibit 3, the answer appears to be no. The small intercept and t-stat suggest the returns of low volatility stocks are well explained by their exposure to known return drivers. No anomaly here, just a reminder that empirical analysis should not be conducted in a vacuum.
Exhibit 3: Peeling Back The Curtain
Fama/French Five-Factor Model Regression of Low Volatility Stocks Returns, July 1963–December 2019
Tweaking the Mousetrap
Exceptionally negative recent value premiums have prompted many investors to scramble for ways to improve their process of pursuing the premium. That has led some down the road of tinkering with the metric used to define value versus growth. One example is adjusting book values for intangible assets, such as patents, copyrights, brands, and reputation.
This adjustment is alluring on the surface. Over the 10-year period ending December 2018, using a price-to-book ratio adjusted for internally developed intangibles would have substantially narrowed the annualized return difference between value and growth but not entirely eradicated value’s underperformance.1 The easy thing to do? Adopt the fix immediately. The right thing to do? Take a step back and ask: “Are internal intangibles a new phenomenon? And, if not new, have they grown in importance over time?” The answer to the first question is no. History buffs will know that the US started issuing patents back in 1790 and registering trademarks in 1870. So intangibles have been part of the economic landscape and capital markets for a long time.
To answer the second question, you have to estimate internally developed intangibles because they are generally expensed on the income statement under the US accounting principles, rather than capitalized on the balance sheet. Our recent paper on intangibles did that, finding that internally developed intangibles have been a steady fraction of company assets for a long time. Exhibit 4 shows, for the US Market, that they represented about 30% of company assets back in the 1980s and they represent about 30% of company assets today.
Exhibit 4: Internal Affair
Weighted Average Internally Developed Intangibles as a Percentage of Assets, US Market, 1963–2018
Our deep dive into intangibles also highlighted the noise involved in the estimation of internally developed intangibles. Procedures for doing so involve accumulating prior expenditures on research and design (R&D) and selling, general, and administration (SGA), relying on assumptions for how much of each component to include and over what time horizon to amortize. Couple these dependencies with data limitations, particularly for R&D, and there is ample susceptibility to noise with this adjustment.
Perhaps due to this noise and the higher uncertainty around the valuation of internally developed intangibles (which, unlike externally acquired ones, do not go through a market assessment), we find that adjusting for internally developed intangibles does not yield consistently higher value and profitability premiums.
As mentioned earlier, the US value premium over the past decade benefited from the adjustment. But further inspection of this result shows it is mainly driven by different sector exposures—turns out, the past decade was a good one for technology stocks, and technology had a higher weight in a value strategy formed on adjusted price-to-book. And, despite an improvement with the adjustment, the value premium was still negative.
Looking over the longer term and assessing value and profitability premiums together, the impact from intangibles adjustments has been minimal. Specifically, we see that the value premium gets a little larger while the profitability premium gets a little smaller. The net effect is near zero. There is no compelling evidence an investor can more effectively pursue higher expected returns by adjusting the value and profitability metrics for internally developed intangibles.
In summary, the easy thing to do in response to the recent dismal performance of the value premium is to immediately adopt an intangibles adjustment. But we believe the right thing to do is to evaluate thoughtfully and fully the impact of such an adjustment. We need to consider the high level of uncertainty around the value of internally developed intangibles (for example, only 8% of drugs that start the research phase end up in the marketplace2) and the noise that this adjustment will inject into the implementation process. We also need to consider that the empirical results from this adjustment are not compelling. Moreover, mixing and matching accounting variables to find the optimum backtest risks falling into the trap of overfitting the data3 —in other words, finding adjustments that make the past returns look great but might have no impact on future returns. The right thing to do is to look beyond an alpha or a Sharpe ratio and assess the rationale behind using each individual variable, how alternative variable specifications may differ, and what consequences this might have for implementation.
Timing Isn’t Everything
It is probably not news to value investors that premiums can be negative even for long periods of time. There is obviously a major incentive to find ways to predict and avoid disappointing performance. Accordingly, substantial research has been conducted on timing markets and premiums.
In great news for investors who own a time machine, Dimensional’s research on the subject has even identified a successful timing strategy for the value premium. A mean reversion approach in Italian stocks that switches in and out of value based on the trailing average of the value premium has outperformed a buy-and-hold value approach by 7.5 percentage points per year. For those of us living in the present, enthusiasm over this result continuing in the future is tempered by the fact that this specific timing strategy is the best one out of 680 strategies tested4 and actually underperformed buy-and-hold strategies outside of Italian stocks by more than two percentage points per year. As illustrated by the excess returns for all 680 strategies in Exhibit 5, underperformance was by far the most likely outcome for these approaches to timing premiums.
Exhibit 5: Bad Timing
Excess Returns for 680 Simulated Strategies that Time Premiums
The odds are high that—if you are willing to pore through the data long enough—you will eventually find a timing strategy that worked in the past. There are many inputs to a timing strategy: the premium being pursued, the indicator used for timing (e.g., valuation ratios or past performance), and the threshold for switching (in both directions), to name a few. The vast number of potential combinations of these inputs is ripe for data mining. Of course, the successful outcomes are the ones that will garner attention, but investors should acknowledge these as pyrite5: attractive at first glance but not robust to further testing.
Realized premiums can be negative even for long stretches of time. The easy thing to do is to vary exposure to premiums in accordance with the output of a model that appears successful in a backtest. But rigorous research casts doubt on this approach effectively reducing the downside risks of the equity, size, value, or profitability premiums. Moreover, this approach can often yield meaningful trading and tax costs. Therefore, we believe the right thing to do is to deliver what you say you will deliver through the continuous pursuit of the reliable equity premiums. In addition to reducing investor surprises, this helps ensure one captures the premiums when they appear.
In the interest of writing a column rather than a textbook, we have touched on only a fraction of the empirical research on expected returns. If all these decades of poking and prodding the data have proved anything, it’s that you can’t prove something with data alone. The endless combinations available to researchers and the noise inherent in stock return data imply historical return comparisons should be evaluated with a heaping spoon of salt.
This is not to dismiss the importance of empirical research. Performance simulations can help us understand the behavior of returns and inform our expectations for future returns. However, investors need to avoid the risk of extrapolating historical patterns that may have occurred by chance. How can investors wade through the noise? Comprehensively evaluating the research may be a tall order, just as it’s probably not realistic to expect a patient to perform his or her own surgery. But a good place to start is asking questions about empirical techniques. The more moving parts in the backtest, the more susceptible the inferences are to noise. An investor’s assessment of the likelihood that simulated investment performance translates into real-world portfolios may depend upon whether a manager can demonstrate an approach to research that mitigates these sensitivities.
From a manager standpoint, it’s important to have a framework that guides research and hypotheses, even before looking at the data. Research must be conducted in a way that yields robust inferences. This includes running robust experiments that attempt to alleviate random sources of noise and outcomes excessively dependent upon a particular set of circumstances. It also means aligning the research with how its insights will be deployed within the investment process so that inferences from the research are more accurate. Deeply understanding empirical financial research requires extensive expertise developed through time. But this is a key part of applying judgment in the investment process: it is easy to find something that looks great in hindsight, but the right thing to do is to find ways to add value going forward.
1Rizova, Savina and Namiko Saito (2020), “Intangibles and Expected Stock Returns”, available upon request.
2Ernst R. Berndt, Adrian H.B. Gottschalk, and Matthew W. Strobeck, “Opportunities for Improving the Drug Development Process: Results From a Survey of Industry and the FDA,” Innovation Policy and the Economy 6 (2006): 91–121.
3Robert Novy-Marx, “Backtesting Strategies Based on Multiple Signals,” NBER Working Paper No. w21329 (2015), available at: papers.ssrn.com/sol3/papers.cfm?abstract_id=2629935.
4Wei Dei, “Premium Timing with Valuation Ratios” (white paper, Dimensional Fund Advisors, 2016), available here.
5Pyrite, or iron sulfide, is more commonly known as “fool’s gold.”
6Ryan H. Peters and Lucian A. Taylor, “Intangible Capital and the Investment-q Relation,” Journal of Financial Economics 123, no. 2 (2017): 251–272.
Regression analysis: A statistical technique for estimating the strength of the relation between one variable and one or more other variables.
Price-to-book ratio: The ratio of a firm’s market value to its book value, where market value is computed as price multiplied by shares outstanding and book value is the value of stockholder equity as reported on a company’s balance sheet.
Alpha: The rate of return on an investment in excess of a benchmark or return predicted by a financial model. A higher alpha value implies greater outperformance.
Premium: A return difference between two assets or portfolios. Size premium: the return difference between small market capitalization stocks and large market capitalization stocks. Value premium: the return difference between stocks with low relative prices (value) and stocks with high relative prices (growth). Profitability premium: the return difference between stocks of companies with high profitability over those with low profitability.
Profitability: A company’s operating income before depreciation and amortization minus interest expense scaled by book equity.
Sharpe ratio: A ratio of return per unit of volatility.
Five-Factor Model: An empirical asset pricing model developed by Fama and French (2015) to explain variation in stock returns.
Premium timing simulations
Filters were applied to data retroactively and with the benefit of hindsight. Returns are not representative of indices or actual strategies and do not reflect costs and fees associated with an actual investment. Actual returns may be lower. The excess return of a trading rule is equal to the difference in average return between the rule and the long side of the premium that the rule is applied to. Data sources: CRSP and Compustat data for US firms listed on the NYSE, AMEX, or NASDAQ, and the historical book equity data in Kenneth French’s data library (http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html). The trading rules include nonparametric trading rules comparing valuation spreads to their historical distribution, linear trading rules based on the linear models predicting premiums based on valuation spreads, and logit trading rules based on the probability of a future premium being positive. Results will vary with each use and over time. For more information, please refer to Dimensional’s white paper “Premium Timing with Valuation Ratios,” available upon request.
Fama/French Total US Market Research Index: July 1926–present: Fama/French Total US Market Research Factor + One-Month US Treasury Bills. Source: Kenneth French website.
Results shown during periods prior to each index’s index inception date do not represent actual returns of the respective index. Other periods selected may have different results, including losses. Backtested index performance is hypothetical and is provided for informational purposes only to indicate historical performance had the index been calculated over the relevant time periods. Backtested performance results assume the reinvestment of dividends and capital gains.
Source: Dimensional, using CRSP and Compustat data. The eligible universe includes all firms in the US excluding REITs, tracking stocks, and investment companies.
Following Peters and Taylor (2017),6 internally developed intangible capital for each firm at a point in time is estimated by accumulating the historical spending on research and development (R&D) and a fraction of selling, general, and administrative (SG&A) expenses while amortizing it at constant rates.
Simulated strategy returns based model/back-tested performance. These are not live strategies managed by Dimensional Fund Advisors LP or any of its affiliates. The performance was achieved with the retroactive application of a model designed with the benefit of hindsight; it does not represent actual investment performance. Backtested model performance is hypothetical (it does not reflect trading in actual accounts) and is provided for informational purposes only. The securities held in the model may differ significantly from those held in client accounts. Model performance may not reflect the impact that economic and market factors might have had on the advisor’s decision making if the advisor were actually managing client money. These strategies were not available for investment in the time periods depicted. Actual management of these types of simulated strategies may result in lower returns than the backtested results achieved with the benefit of hindsight. Past performance (including hypothetical past performance) does not guarantee future or actual results. The simulated performance shown is “gross performance,” which includes the reinvestment of dividends but does not reflect the deduction of investment advisory fees and other expenses. A client’s investment returns will be reduced by the advisory fees or other expenses it may incur. For example, if a 1% annual advisory fee were deducted quarterly and a client’s annual return were 10% (based on quarterly returns of approximately 2.41% each) before the deduction of advisory fees, the deduction of advisory fees would result in an annual return of approximately 8.91% due, in part, to the compound effect of such fees.
Equius Partners is a Registered Investment Advisor. Please consider the investment objectives, risks, and charges and expenses of any mutual fund and read the prospectus carefully before investing. Indexes are not available for direct investment; therefore, their performance does not reflect the expenses associated with the management of an actual portfolio.
Past performance is not a guarantee of future results. This information is provided for educational purposes only and should not be considered investment advice or a solicitation to buy or sell securities. There is no guarantee an investing strategy will be successful. Investing involves risks, including possible loss of principal. Diversification does not eliminate the risk of market loss.
© 2021 Equius Partners, Inc.