Many investors, especially professional money managers and analysts, want to make comparisons. Are the stocks and funds in their portfolios keeping up with or lagging behind the stock market or a market index? And is there any connection between their investments and some benchmark like the Standard & Poor’s 500 Index? Do they move in tandem or in different directions? Is one driven by the other?
Statistics are used to help investors make these comparisons and see the tie between their investments and benchmarks. One of the most-cited statistics is known as r-squared, or the coefficient of determination.
What is r-squared?
In statistics, the term r-squared is a measure of the relationship between two things, called variables. R-squared is used to assess how much a change in one variable (call it Y, the investment) is determined by the change in the other variable (call it X, the benchmark or index). A statistical model is created to test the degree of relationship between the variables by comparing the actual values of Y (the investment’s returns) on a chart against the predicted returns represented by a line on the chart.
R-squared is derived from r, the symbol used to denote correlation, so it’s simply correlation squared. In finance and investing, correlation measures how often returns for two investments are moving in the same direction, or the opposite direction, with statistical values known as coefficients of between -1 and 1. Two assets in perfect positive correlation (both always rising) have a correlation value or coefficient of 1; two in perfect negative or inverse correlation (one always falling when the other is rising or vice versa) have a coefficient of -1. A coefficient of zero means the two investments have no correlation. Almost all correlations fall somewhere between these perfect extremes.
How is r-squared used in investing?
R-squared is often used to assess the degree to which an investment, typically a fund or portfolio, generates returns in line with the benchmark. Said another way, the r-squared statistic sizes up how much the investment’s returns are determined by the benchmark’s returns.
So an investor with a portfolio of stocks or stock funds might ask, “How much do my returns depend on the broad market’s returns?” A common example of this r-squared evaluation is a fund or stock portfolio in relation to the S&P 500, the most widely used proxy for the U.S. stock market.
For example, if the XYZ Large Cap Fund has a correlation coefficient of 0.70 with the S&P 500, that means the fund returns and index returns are rising together 70% of the time.
R-squared, or correlation squared, for the XYZ Large Cap Fund then is:
0.7 X 0.7 = 0.49
R-squared is always smaller than r because it’s the product of two decimals. For investors it’s expressed more intuitively as a percentage, so 0.49 means 49% of XYZ’s returns are determined by the returns of its benchmark, the S&P 500.
How to use r-squared
The main value of r-squared in statistics is in quickly assessing whether the statistical model is a good fit for the data set—does the data support the hypothesized relationship between X and Y? In other words, how well did the model predict the investment’s results?
For investors, r-squared explains how much the performance of an investment is explained by the performance of a benchmark such as an index. A higher value of r-squared, closer to 1.0 or 100%, suggests it has greater power as a forecasting tool for the performance of a fund or portfolio. A low r-squared, all other things equal, usually indicates the model is not good for forecasting.
Beyond this simple explanation of the relationship between correlation and determination, the value of r-squared can also be found through statistical analysis of variables on a chart, called regression analysis. A regression model is meant to help forecast returns on an investment by using a data sample, such as the daily price changes for the investment and the benchmark for a certain period (three months, six months, one year, etc.). Each of the daily changes would be a data point on the regression chart.
Regression analysis involves creating a model hypothesis, or equation, of the relationship between the variables:
- Dependent variable, Y, the stock, fund, or portfolio
- Independent variable, X, the benchmark (S&P 500)
The regression is depicted on the chart with a straight line and a number of dots on or around the line. Here is an example:
The line, typically upward sloping, represents the equation meant to quantify the relationship between the variables. A basic model equation might look like this:
Y = bX + a
In the equation, a and b are constants—their value doesn’t change. For equations plotted on a chart, the constant a represents the intercept—the value where the sloping regression line intersects the Y axis. And b represents the slope, or beta, of the line, whether it’s steep or flat.
Let’s assume the constant a has a value of 1, and the constant b has a value of 2. The equation then is:
Y = 2X + 1
In plain English, the model’s equation is hypothesizing that the rate of return on Y (the investment) will be two times the rate of X (the benchmark/index), with a minimum rate of 1%.
Here’s another example of a regression chart, with the straight line of the model’s equation and the individual observations of the fund/portfolio returns as dots scattered around the line. Because the model’s simple equation produces a straight line, this is called linear regression.
One purpose of regression analysis is to place the line through the scattered dots in a way that minimizes the average distance of the dots away from the line—that is, to discern a linear pattern through the scatter. This is called finding the best fit for the model to the data. It’s achieved through a series of calculations of the spread between the dots and the line, called least-squares regression.
The graph below shows an r-squared of 15% for a mutual fund, which means that only 15% of its returns are attributable to the returns of the index. The graph shows how widely the data points—the returns of the fund—are scattered away from the regression line. So an investor can intuitively see the weak relationship between the fund’s returns and the benchmark’s.
By contrast, the next graph below shows a much stronger relationship between the two variables—the plotted observations of the fund returns are clustered close to the regression line. The r-squared is 85%, meaning 85% of the fund’s returns are attributable to the index’s performance, and they show a better fit for the model’s proposed relationship between the two variables.
R-squared and investing style
Investors can look at r-squared values in relation to their investing style:
- Passive investing: This usually involves index funds or exchange-traded funds (ETFs) that seek to match a broad market benchmark. Investors want high r-squared. For example, the Vanguard 500 Index Admiral Fund and the Fidelity 500 Index Fund have r-squared values at or close to 100%, or 1. Passive investments tend to cost less for investors because they only need to mimic the benchmark, and less effort is needed to construct and maintain the portfolio.
- Active investing: The goal is to find investments that will beat the market. Investors expect lower r-squared because active portfolio managers seek stocks that don’t just match the index. A hedge fund would presumably have a lower r-square. Otherwise, an actively managed fund with higher costs but an r-squared of 97%, for example, might make investors question why they’re paying higher fees for a fund whose returns are mostly the result of changes in an index, when a lower-cost index fund produces about the same returns.
R-squared and other statistics
Some other variations of r-squared include:
This is used for linear regressions with more than one independent variable—for example, the benchmark return and the price of gold—that try to explain the dependent variable’s return. In statistics, adding another independent variable to a regression model will increase the r-squared reading.
R-squared only works as intended in a simple linear regression model with one explanatory variable. With a multiple regression made up of several independent variables, the r-squared must be adjusted lower to compensate for the possibility that the extra variables add no explanatory power to the model.
The slope of the regression line measures the magnitude of volatility in a portfolio’s returns relative to the benchmark. A beta of 1, for example, means that if the benchmark rises or falls 1% the portfolio rises or falls 1%. A beta of less than 1, for example 0.8, means that the portfolio is less volatile than the index— it rises or falls 0.8% when the benchmark rises or falls 1%. A beta of 1.2 means the portfolio is more volatile—it changes 1.2% when the benchmark changes 1%.
Investors can look at r-squared together with beta for a fuller understanding of the performance of their funds or portfolios. A fund with a high r-squared closely tracks the benchmark’s return. If it also has a high beta, above 1, that could mean outperforming the benchmark in a rising stock market—or doing worse than the benchmark when markets are declining.
Assessing goodness of fit in a regression model
Goodness of fit refers to how closely the scattered dots on the regression graph crowd around the regression line.
These differences should be free of what is called systematic bias—the data points should be randomly scattered around the regression line. Bias is indicated by another pattern in the scatterplot of data points, meaning another independent variable besides the benchmark –possibly a different benchmark—may explain the investment’s returns. Detecting bias requires a visual inspection of the scatterplot in the regression chart.
Are low r-squared values inherently bad or good?
A low r-squared value can still provide some information about the general direction of investment returns, even with the wider dispersion of returns from the benchmark. But it can be a problem if the investor wants a forecast to be more precise, with a smaller range around the forecast. Higher r-squared values generally provide more precise forecasts.
What does an r-squared of 0.9 mean?
A value of 0.9 would mean that 90% of the return for a fund or portfolio is attributable to the return on a benchmark. In the stock market, that would mean 90% of an equity fund’s performance stems from the performance of an index such as the S&P 500.
The bottom line
R-squared can be a good yardstick for investors to decide if they want investments that closely track an index, such as index funds, or investments that correlate less with an index, such as actively managed funds and hedge funds. Although some familiarity with the concept of r-squared can be useful for the average investor, it primarily is a tool used by professionals in managing and constructing investment portfolios.