The Factor Model Equation · It's Just Beta

2.1 The fundamental equation

For a universe of $N$ stocks and $K$ factors, over one period (a day, a week, a month), the factor model states that each stock’s return decomposes as:

$r_i \;=\; \sum_{k=1}^{K} X_{ik}\, f_k \;+\; \epsilon_i, \qquad i = 1, \dots, N$

or, stacking all stocks into vectors, in matrix form:

$\boxed{\; r = X f + \epsilon \;}$

Dimensions and units cause more real-world bugs here than any conceptual difficulty.

Symbol	Dimensions	What it is	Units / typical values
$r$	$N \times 1$	Asset returns over the period	Return (e.g. 0.042 = 4.2%)
$X$	$N \times K$	Exposure matrix: row $i$ = stock $i$ ‘s exposures to all $K$ factors	Dimensionless. Dummies $\in \{0,1\}$ or style z-scores roughly $\in [-3, 3]$
$f$	$K \times 1$	Factor returns over the period	Return. Same units as $r$
$\epsilon$	$N \times 1$	Specific (idiosyncratic) returns	Return. Same units as $r$

A few things to read off this equation:

$Xf$ is a weighted sum for each stock: Stock $i$ ‘s systematic return is row $i$ of $X$ (its exposures) dotted with $f$ (the factor returns): multiply each exposure by its factor’s return, then add the products.

Take stock AXIOM from the Mini Example with market exposure 1.0, tech-industry exposure 1.0, 0.0 exposure to the other industries, momentum exposure 1.198, value exposure −1.228, and size exposure 0.710. (Those exposures are built in Chapter 3 and the factor returns estimated in Chapter 6; here, take them as given.) Multiplying each exposure by its factor return for this period and adding everything up gives a total return of 4.30%:

Factor	Exposure	Factor return	Exposure × return
MKT	1.000	1.821%	1.82%
TECH	1.000	0.768%	0.77%
CONS	0.000	0.306%	0.00%
FIN	0.000	−1.282%	0.00%
MOM	1.198	1.962%	2.35%
VALUE	−1.228	0.548%	−0.67%
SIZE	0.710	0.046%	0.03%
Total			4.30%

Exposures are known at the start of the period, returns happen over the period: Strictly speaking, the equation is $r_t = X_{t-1} f_t + \epsilon_t$ . The time subscripts are dropped when working within a single period, but the lag is important in practice (see look-ahead bias, Chapter 16).
Exposures are dimensionless: A momentum factor return of 1.96% means: a stock with momentum exposure 1.0 earned 1.96% more this period than an otherwise identical stock with momentum exposure 0.0.

The equation as written is pure accounting: any $r$ can be split into $Xf + \epsilon$ once you decide how to determine $f$ (Chapter 6). What turns the accounting identity into a model is the set of statistical assumptions placed on $f$ and $\epsilon$ .

2.2 The assumptions, and what each one buys

A1. Specific returns have zero mean: $\mathbb{E}[\epsilon_i] = 0$ .

Everything systematic, anything with a nonzero expected payoff that is shared across stocks, belongs to the factor part. Specific return is a surprise.

A2. Specific returns are uncorrelated with factor returns: $\mathrm{Cov}(f_k, \epsilon_i) = 0$ for all $k, i$ .

The decomposition is clean: there is no leakage between the systematic and idiosyncratic parts. When $f$ is estimated by regression (Chapter 6), this holds in-sample by construction. It is the defining property of least squares residuals.

A3. Specific returns are uncorrelated across stocks: $\mathrm{Cov}(\epsilon_i, \epsilon_j) = 0$ for $i \neq j$ .

This is the key assumption, and the only genuinely substantive one: it says the factors capture all common movement, leaving residuals that are purely stock-specific. Whether it holds depends on whether the factor set is rich enough.

Assumption A3 can fail for different reasons. For example, you can have linked securities: two share classes of the same company, an ADR and its underlying, or a holding company and its main subsidiary. The residuals of linked securities are clearly correlated. In practice, this is handled with explicit linked-issuer treatment: off-diagonal specific covariance blocks for linked securities, or forcing linked lines to share factor exposures (Chapter 5).

Missing factors are another reason why A3 can fail. If the model has no “crowding” factor and crowded stocks de-leverage together, that common move lands in the residuals and shows up as residual correlation among crowded names. Detecting such clusters is exactly how candidates for new factors are found (Chapter 15).

What these assumptions buy: Take the covariance of both sides of $r = Xf + \epsilon$ to calculate the asset covariance $\Sigma$ . Since $X$ is fixed (the exposures are known at the start of the period), it passes outside the covariance, and expanding the right-hand side gives four terms:

$\Sigma \;=\; \mathrm{Cov}(r) \;=\; \mathrm{Cov}(Xf + \epsilon) \;=\; \underbrace{X\,\mathrm{Cov}(f)\,X^\top}_{\text{systematic}} \;+\; \underbrace{X\,\mathrm{Cov}(f, \epsilon) + \mathrm{Cov}(\epsilon, f)\,X^\top}_{\text{cross terms}} \;+\; \underbrace{\mathrm{Cov}(\epsilon)}_{\text{specific}}$

Now the assumptions do the work. By A2, factors and specific returns are uncorrelated, so both cross terms are zero. Write $F = \mathrm{Cov}(f)$ for the $K \times K$ factor covariance matrix and $\Delta = \mathrm{Cov}(\epsilon)$ for the specific covariance. By A3, $\Delta$ is diagonal, with each stock’s specific variance on the diagonal: $\Delta_{ii} = \sigma^2_{\epsilon_i}$ . What survives is:

$\boxed{\; \Sigma \;=\; X F X^\top + \Delta \;}$

This is the covariance decomposition and shows how the dimensionality problem is solved. The full $N \times N$ asset covariance matrix $\Sigma$ comes from three small pieces: the $K \times K$ factor covariance $F$ , the $N \times K$ exposures $X$ , and the $N$ specific variances on the diagonal of $\Delta$ .

2.3 Parameter counting: the payoff, in numbers

Count the parameters on each side for an institutional-scale model, $N = 3{,}000$ stocks and $K = 70$ factors:

	Quantity	Count
Unstructured	distinct entries of $\Sigma$ : $N(N+1)/2$	4,501,500
Factor model	factor covariances: $K(K+1)/2$	2,485
	exposures: $N K$	210,000
	specific variances: $N$	3,000
	total	215,485

That is a 95% reduction in raw count, and the count still understates the gain. In the fundamental-model architecture the 210,000 exposures are not statistically estimated at all. They are measured from observable characteristics (industry membership, balance-sheet ratios, Chapter 3), so the genuinely statistical burden is only the 2,485 factor covariances and 3,000 specific variances, each estimable from a long, stable time series (Chapter 8). Conditioning helps too: the factor-model $\Sigma$ is automatically positive semi-definite and well-conditioned, where the sample covariance matrix of 3,000 stocks on a few years of data is singular. Positive semi-definite means every portfolio variance $w^\top \Sigma w$ comes out $\geq 0$ , never a negative number for a quantity that is a variance. Well-conditioned means the eigenvalues span a moderate range rather than a huge one, so the matrix is far from singular. Singular means at least one eigenvalue is exactly zero: with fewer monthly observations than stocks the sample matrix is rank-deficient by construction and cannot be inverted.

2.4 From asset risk to portfolio risk

A portfolio is a vector of weights $w$ ( $N \times 1$ , summing to 1 for a fully invested long-only fund). Its return is $r_p = w^\top r$ , and substituting the model:

$r_p = w^\top X f + w^\top \epsilon = x_p^\top f + w^\top \epsilon, \qquad \text{where } \boxed{\,x_p = X^\top w\,}$

The $K \times 1$ vector $x_p$ holds the portfolio’s factor exposures: the weighted average of its holdings’ exposures. This is the central object of practical risk management: a 3,000-line portfolio compresses to ~70 meaningful numbers.

Portfolio variance follows from the covariance decomposition:

$\sigma_p^2 \;=\; w^\top \Sigma\, w \;=\; \underbrace{x_p^\top F\, x_p}_{\text{factor (systematic) variance}} \;+\; \underbrace{w^\top \Delta\, w}_{\text{specific variance}}$

Because $\Delta$ is diagonal, the specific term is just a weighted sum of the individual specific variances: $w^\top \Delta\, w = \sum_i w_i^2 \sigma^2_{\epsilon_i}$ .

The split has immediate consequences:

Specific risk diversifies. Factor risk does not. The specific term is a sum of squared weights. For an equal-weighted portfolio of $N$ stocks each with specific volatility $\sigma_\epsilon$ , specific variance is $\sigma_\epsilon^2 / N \to 0$ . The factor term has no such decay: holding more stocks does nothing to reduce market exposure. Diversification eliminates idiosyncratic risk and concentrates what remains into factor risk.
Risk analysis splits into two independent ledgers: a $K$ -dimensional factor ledger ( $x_p$ against $F$ ) and a per-position specific ledger ( $w_i^2 \sigma_{\epsilon_i}^2$ ). Chapter 9 builds the full attribution machinery on this split.

The same algebra applies to active management. With benchmark weights $w_b$ , the active portfolio $w_a = w_p - w_b$ has active exposures $x_a = X^\top w_a$ and tracking error $\sigma_a = \sqrt{x_a^\top F x_a + w_a^\top \Delta w_a}$ . Note that $w_a$ sums to zero, not 1: it is a set of over- and underweights, not a standalone portfolio, so the long-only normalization above does not apply to it. Everything in this series that works for total risk works for active risk by substituting $w$ with $w_a$ .

2.5 Returns conventions

Real implementations have to pin down a handful of returns conventions. Mixing those up can lead to errors that are difficult to track down.

Arithmetic vs. log returns: Factor models are linear in arithmetic (simple) returns: portfolio returns aggregate across assets linearly ( $r_p = w^\top r$ ), which is what the model needs. Log returns aggregate across time but not across assets. Standard practice: arithmetic returns within a period, compounded carefully across periods (see Chapter 10 for multi-period linking).
Total vs. excess returns: Risk models are typically fitted on returns in excess of the risk-free rate, though over daily or monthly horizons the distinction barely moves risk numbers.
Local vs. base currency: A USD-based investor holding a Japanese stock earns the local return plus the JPY/USD currency return: $(1+r_{\text{local}})(1+r_{\text{fx}}) - 1$ . Multi-country models separate these with explicit currency factors so that hedged and unhedged perspectives are both available (Chapter 15).
Frequency and horizon: A model estimated on daily returns and one estimated on monthly returns are different models with different uses (Chapter 8). Annualization conventions: multiply a variance by the periods per year, 252 trading days or 12 months, and volatility by the square root of the same.

2.6 Worked example: five stocks, two factors, by hand

Small enough to do on a calculator. Watch the structure, not the arithmetic. Universe of 5 stocks, $K = 2$ factors: market (everything has exposure 1) and value (a z-score, Chapter 3 explains where it comes from).

$X = \begin{pmatrix} 1 & 1.2 \\ 1 & 0.5 \\ 1 & -0.3 \\ 1 & -1.0 \\ 1 & -0.4 \end{pmatrix}, \qquad w = \begin{pmatrix} 0.30 \\ 0.25 \\ 0.20 \\ 0.15 \\ 0.10 \end{pmatrix}$

Annualized factor model: market volatility 16%, value factor volatility 4%, correlation $-0.20$ :

$F = \begin{pmatrix} 0.16^2 & -0.20 \times 0.16 \times 0.04 \\ -0.20 \times 0.16 \times 0.04 & 0.04^2 \end{pmatrix} = \begin{pmatrix} 0.0256 & -0.00128 \\ -0.00128 & 0.0016 \end{pmatrix}$

Specific volatilities: 20%, 25%, 18%, 30%, 22%, so $\Delta = \mathrm{diag}(0.04, 0.0625, 0.0324, 0.09, 0.0484)$ .

Step 1: portfolio exposures $x_p = X^\top w$ :

$x_{p,\text{mkt}} = \sum_i w_i \cdot 1 = 1.0, \qquad x_{p,\text{val}} = 0.30(1.2) + 0.25(0.5) + 0.20(-0.3) + 0.15(-1.0) + 0.10(-0.4) = 0.235$

The portfolio behaves like one asset with market beta 1.0 and a mild value tilt of +0.235.

Step 2: factor variance $x_p^\top F x_p$ :

$\begin{aligned} x_p^\top F x_p &= (1.0)^2(0.0256) + 2(1.0)(0.235)(-0.00128) + (0.235)^2(0.0016) \\ &= 0.025600 - 0.000602 + 0.000088 = 0.025087 \end{aligned}$

Step 3: specific variance $\sum_i w_i^2 \sigma_{\epsilon_i}^2$ :

$0.09(0.04) + 0.0625(0.0625) + 0.04(0.0324) + 0.0225(0.09) + 0.01(0.0484) = 0.011311$

Step 4: total risk:

$\sigma_p = \sqrt{0.025087 + 0.011311} = \sqrt{0.036398} \approx 19.08\% \text{ per year}$

Decomposition: factor volatility $\sqrt{0.025087} = 15.84\%$ , specific volatility $\sqrt{0.011311} = 10.64\%$ . The factor share of variance is $0.025087 / 0.036398 = 68.9\%$ . (Note that volatilities do not add, $15.84 + 10.64 \neq 19.08$ , but variances do. Risk decomposition always happens in variance units. Chapter 9 returns to this.)

Even this 5-stock toy shows the shape of a real portfolio: most of the risk is factor risk, dominated by the market term. The value tilt adds almost nothing to total risk here (its variance term is 0.000088), but it would dominate the active risk against a market-like benchmark, exactly the situation engineered in the mini example.

2.7 Summary

The model: $r = Xf + \epsilon$ , with exposures known at the start of the period.
The assumptions: specific returns have zero mean, are uncorrelated with factors, and are uncorrelated with each other. The third is the important one. Its failures (linked securities, missing factors) are diagnosable and fixable.
The payoff: $\Sigma = XFX^\top + \Delta$ , millions of covariances generated by only thousands of meaningful parameters, always positive semi-definite, always invertible.
Portfolio risk: $\sigma_p^2 = x_p^\top F x_p + w^\top \Delta w$ with $x_p = X^\top w$ . The idiosyncratic term goes to zero as the portfolio size increases, the factor term stays. Active risk is the same algebra applied to $w_a = w_p - w_b$ .

Left for later chapters: where $X$ comes from (Chapter 3), where $f$ comes from (Chapter 6), and where $F$ and $\Delta$ come from (Chapter 8).