Work in progress. This primer is still being written.
β ITSJUSTBETA.COM

Part 02 / 16

The Factor Model Equation

2.1 The fundamental equation

For a universe of NN stocks and KK factors, over one period (a day, a week, a month), the factor model states that each stock’s return decomposes as:

ri  =  k=1KXikfk  +  ϵi,i=1,,Nr_i \;=\; \sum_{k=1}^{K} X_{ik}\, f_k \;+\; \epsilon_i, \qquad i = 1, \dots, N

or, stacking all stocks into vectors, in matrix form:

  r=Xf+ϵ  \boxed{\; r = X f + \epsilon \;}

Dimensions and units cause more real-world bugs here than any conceptual difficulty.

SymbolDimensionsWhat it isUnits / typical values
rrN×1N \times 1Asset returns over the periodReturn (e.g. 0.042 = 4.2%)
XXN×KN \times KExposure matrix: row ii = stock ii‘s exposures to all KK factorsDimensionless. Dummies {0,1}\in \{0,1\} or style z-scores roughly [3,3]\in [-3, 3]
ffK×1K \times 1Factor returns over the periodReturn. Same units as rr
ϵ\epsilonN×1N \times 1Specific (idiosyncratic) returnsReturn. Same units as rr

A few things to read off this equation:

  1. XfXf is a weighted sum for each stock: Stock ii‘s systematic return is row ii of XX (its exposures) dotted with ff (the factor returns): multiply each exposure by its factor’s return, then add the products.

    Take stock AXIOM from the Mini Example with market exposure 1.0, tech-industry exposure 1.0, 0.0 exposure to the other industries, momentum exposure 1.198, value exposure −1.228, and size exposure 0.710. (Those exposures are built in Chapter 3 and the factor returns estimated in Chapter 6; here, take them as given.) Multiplying each exposure by its factor return for this period and adding everything up gives a total return of 4.30%:

    FactorExposureFactor returnExposure × return
    MKT1.0001.821%1.82%
    TECH1.0000.768%0.77%
    CONS0.0000.306%0.00%
    FIN0.000−1.282%0.00%
    MOM1.1981.962%2.35%
    VALUE−1.2280.548%−0.67%
    SIZE0.7100.046%0.03%
    Total4.30%
  2. Exposures are known at the start of the period, returns happen over the period: Strictly speaking, the equation is rt=Xt1ft+ϵtr_t = X_{t-1} f_t + \epsilon_t. The time subscripts are dropped when working within a single period, but the lag is important in practice (see look-ahead bias, Chapter 16).

  3. Exposures are dimensionless: A momentum factor return of 1.96% means: a stock with momentum exposure 1.0 earned 1.96% more this period than an otherwise identical stock with momentum exposure 0.0.

The equation as written is pure accounting: any rr can be split into Xf+ϵXf + \epsilon once you decide how to determine ff (Chapter 6). What turns the accounting identity into a model is the set of statistical assumptions placed on ff and ϵ\epsilon.

2.2 The assumptions, and what each one buys

A1. Specific returns have zero mean: E[ϵi]=0\mathbb{E}[\epsilon_i] = 0.

Everything systematic, anything with a nonzero expected payoff that is shared across stocks, belongs to the factor part. Specific return is a surprise.

A2. Specific returns are uncorrelated with factor returns: Cov(fk,ϵi)=0\mathrm{Cov}(f_k, \epsilon_i) = 0 for all k,ik, i.

The decomposition is clean: there is no leakage between the systematic and idiosyncratic parts. When ff is estimated by regression (Chapter 6), this holds in-sample by construction. It is the defining property of least squares residuals.

A3. Specific returns are uncorrelated across stocks: Cov(ϵi,ϵj)=0\mathrm{Cov}(\epsilon_i, \epsilon_j) = 0 for iji \neq j.

This is the key assumption, and the only genuinely substantive one: it says the factors capture all common movement, leaving residuals that are purely stock-specific. Whether it holds depends on whether the factor set is rich enough.

Assumption A3 can fail for different reasons. For example, you can have linked securities: two share classes of the same company, an ADR and its underlying, or a holding company and its main subsidiary. The residuals of linked securities are clearly correlated. In practice, this is handled with explicit linked-issuer treatment: off-diagonal specific covariance blocks for linked securities, or forcing linked lines to share factor exposures (Chapter 5).

Missing factors are another reason why A3 can fail. If the model has no “crowding” factor and crowded stocks de-leverage together, that common move lands in the residuals and shows up as residual correlation among crowded names. Detecting such clusters is exactly how candidates for new factors are found (Chapter 15).

What these assumptions buy: Take the covariance of both sides of r=Xf+ϵr = Xf + \epsilon to calculate the asset covariance Σ\Sigma. Since XX is fixed (the exposures are known at the start of the period), it passes outside the covariance, and expanding the right-hand side gives four terms:

Σ  =  Cov(r)  =  Cov(Xf+ϵ)  =  XCov(f)Xsystematic  +  XCov(f,ϵ)+Cov(ϵ,f)Xcross terms  +  Cov(ϵ)specific\Sigma \;=\; \mathrm{Cov}(r) \;=\; \mathrm{Cov}(Xf + \epsilon) \;=\; \underbrace{X\,\mathrm{Cov}(f)\,X^\top}_{\text{systematic}} \;+\; \underbrace{X\,\mathrm{Cov}(f, \epsilon) + \mathrm{Cov}(\epsilon, f)\,X^\top}_{\text{cross terms}} \;+\; \underbrace{\mathrm{Cov}(\epsilon)}_{\text{specific}}

Now the assumptions do the work. By A2, factors and specific returns are uncorrelated, so both cross terms are zero. Write F=Cov(f)F = \mathrm{Cov}(f) for the K×KK \times K factor covariance matrix and Δ=Cov(ϵ)\Delta = \mathrm{Cov}(\epsilon) for the specific covariance. By A3, Δ\Delta is diagonal, with each stock’s specific variance on the diagonal: Δii=σϵi2\Delta_{ii} = \sigma^2_{\epsilon_i}. What survives is:

  Σ  =  XFX+Δ  \boxed{\; \Sigma \;=\; X F X^\top + \Delta \;}

This is the covariance decomposition and shows how the dimensionality problem is solved. The full N×NN \times N asset covariance matrix Σ\Sigma comes from three small pieces: the K×KK \times K factor covariance FF, the N×KN \times K exposures XX, and the NN specific variances on the diagonal of Δ\Delta.

2.3 Parameter counting: the payoff, in numbers

Count the parameters on each side for an institutional-scale model, N=3,000N = 3{,}000 stocks and K=70K = 70 factors:

QuantityCount
Unstructureddistinct entries of Σ\Sigma: N(N+1)/2N(N+1)/24,501,500
Factor modelfactor covariances: K(K+1)/2K(K+1)/22,485
exposures: NKN K210,000
specific variances: NN3,000
total215,485

That is a 95% reduction in raw count, and the count still understates the gain. In the fundamental-model architecture the 210,000 exposures are not statistically estimated at all. They are measured from observable characteristics (industry membership, balance-sheet ratios, Chapter 3), so the genuinely statistical burden is only the 2,485 factor covariances and 3,000 specific variances, each estimable from a long, stable time series (Chapter 8). Conditioning helps too: the factor-model Σ\Sigma is automatically positive semi-definite and well-conditioned, where the sample covariance matrix of 3,000 stocks on a few years of data is singular. Positive semi-definite means every portfolio variance wΣww^\top \Sigma w comes out 0\geq 0, never a negative number for a quantity that is a variance. Well-conditioned means the eigenvalues span a moderate range rather than a huge one, so the matrix is far from singular. Singular means at least one eigenvalue is exactly zero: with fewer monthly observations than stocks the sample matrix is rank-deficient by construction and cannot be inverted.

2.4 From asset risk to portfolio risk

A portfolio is a vector of weights ww (N×1N \times 1, summing to 1 for a fully invested long-only fund). Its return is rp=wrr_p = w^\top r, and substituting the model:

rp=wXf+wϵ=xpf+wϵ,where xp=Xwr_p = w^\top X f + w^\top \epsilon = x_p^\top f + w^\top \epsilon, \qquad \text{where } \boxed{\,x_p = X^\top w\,}

The K×1K \times 1 vector xpx_p holds the portfolio’s factor exposures: the weighted average of its holdings’ exposures. This is the central object of practical risk management: a 3,000-line portfolio compresses to ~70 meaningful numbers.

Portfolio variance follows from the covariance decomposition:

σp2  =  wΣw  =  xpFxpfactor (systematic) variance  +  wΔwspecific variance\sigma_p^2 \;=\; w^\top \Sigma\, w \;=\; \underbrace{x_p^\top F\, x_p}_{\text{factor (systematic) variance}} \;+\; \underbrace{w^\top \Delta\, w}_{\text{specific variance}}

Because Δ\Delta is diagonal, the specific term is just a weighted sum of the individual specific variances: wΔw=iwi2σϵi2w^\top \Delta\, w = \sum_i w_i^2 \sigma^2_{\epsilon_i}.

The split has immediate consequences:

  • Specific risk diversifies. Factor risk does not. The specific term is a sum of squared weights. For an equal-weighted portfolio of NN stocks each with specific volatility σϵ\sigma_\epsilon, specific variance is σϵ2/N0\sigma_\epsilon^2 / N \to 0. The factor term has no such decay: holding more stocks does nothing to reduce market exposure. Diversification eliminates idiosyncratic risk and concentrates what remains into factor risk.
  • Risk analysis splits into two independent ledgers: a KK-dimensional factor ledger (xpx_p against FF) and a per-position specific ledger (wi2σϵi2w_i^2 \sigma_{\epsilon_i}^2). Chapter 9 builds the full attribution machinery on this split.

The same algebra applies to active management. With benchmark weights wbw_b, the active portfolio wa=wpwbw_a = w_p - w_b has active exposures xa=Xwax_a = X^\top w_a and tracking error σa=xaFxa+waΔwa\sigma_a = \sqrt{x_a^\top F x_a + w_a^\top \Delta w_a}. Note that waw_a sums to zero, not 1: it is a set of over- and underweights, not a standalone portfolio, so the long-only normalization above does not apply to it. Everything in this series that works for total risk works for active risk by substituting ww with waw_a.

2.5 Returns conventions

Real implementations have to pin down a handful of returns conventions. Mixing those up can lead to errors that are difficult to track down.

  • Arithmetic vs. log returns: Factor models are linear in arithmetic (simple) returns: portfolio returns aggregate across assets linearly (rp=wrr_p = w^\top r), which is what the model needs. Log returns aggregate across time but not across assets. Standard practice: arithmetic returns within a period, compounded carefully across periods (see Chapter 10 for multi-period linking).
  • Total vs. excess returns: Risk models are typically fitted on returns in excess of the risk-free rate, though over daily or monthly horizons the distinction barely moves risk numbers.
  • Local vs. base currency: A USD-based investor holding a Japanese stock earns the local return plus the JPY/USD currency return: (1+rlocal)(1+rfx)1(1+r_{\text{local}})(1+r_{\text{fx}}) - 1. Multi-country models separate these with explicit currency factors so that hedged and unhedged perspectives are both available (Chapter 15).
  • Frequency and horizon: A model estimated on daily returns and one estimated on monthly returns are different models with different uses (Chapter 8). Annualization conventions: multiply a variance by the periods per year, 252 trading days or 12 months, and volatility by the square root of the same.

2.6 Worked example: five stocks, two factors, by hand

Small enough to do on a calculator. Watch the structure, not the arithmetic. Universe of 5 stocks, K=2K = 2 factors: market (everything has exposure 1) and value (a z-score, Chapter 3 explains where it comes from).

X=(11.210.510.311.010.4),w=(0.300.250.200.150.10)X = \begin{pmatrix} 1 & 1.2 \\ 1 & 0.5 \\ 1 & -0.3 \\ 1 & -1.0 \\ 1 & -0.4 \end{pmatrix}, \qquad w = \begin{pmatrix} 0.30 \\ 0.25 \\ 0.20 \\ 0.15 \\ 0.10 \end{pmatrix}

Annualized factor model: market volatility 16%, value factor volatility 4%, correlation 0.20-0.20:

F=(0.1620.20×0.16×0.040.20×0.16×0.040.042)=(0.02560.001280.001280.0016)F = \begin{pmatrix} 0.16^2 & -0.20 \times 0.16 \times 0.04 \\ -0.20 \times 0.16 \times 0.04 & 0.04^2 \end{pmatrix} = \begin{pmatrix} 0.0256 & -0.00128 \\ -0.00128 & 0.0016 \end{pmatrix}

Specific volatilities: 20%, 25%, 18%, 30%, 22%, so Δ=diag(0.04,0.0625,0.0324,0.09,0.0484)\Delta = \mathrm{diag}(0.04, 0.0625, 0.0324, 0.09, 0.0484).

Step 1: portfolio exposures xp=Xwx_p = X^\top w:

xp,mkt=iwi1=1.0,xp,val=0.30(1.2)+0.25(0.5)+0.20(0.3)+0.15(1.0)+0.10(0.4)=0.235x_{p,\text{mkt}} = \sum_i w_i \cdot 1 = 1.0, \qquad x_{p,\text{val}} = 0.30(1.2) + 0.25(0.5) + 0.20(-0.3) + 0.15(-1.0) + 0.10(-0.4) = 0.235

The portfolio behaves like one asset with market beta 1.0 and a mild value tilt of +0.235.

Step 2: factor variance xpFxpx_p^\top F x_p:

xpFxp=(1.0)2(0.0256)+2(1.0)(0.235)(0.00128)+(0.235)2(0.0016)=0.0256000.000602+0.000088=0.025087\begin{aligned} x_p^\top F x_p &= (1.0)^2(0.0256) + 2(1.0)(0.235)(-0.00128) + (0.235)^2(0.0016) \\ &= 0.025600 - 0.000602 + 0.000088 = 0.025087 \end{aligned}

Step 3: specific variance iwi2σϵi2\sum_i w_i^2 \sigma_{\epsilon_i}^2:

0.09(0.04)+0.0625(0.0625)+0.04(0.0324)+0.0225(0.09)+0.01(0.0484)=0.0113110.09(0.04) + 0.0625(0.0625) + 0.04(0.0324) + 0.0225(0.09) + 0.01(0.0484) = 0.011311

Step 4: total risk:

σp=0.025087+0.011311=0.03639819.08% per year\sigma_p = \sqrt{0.025087 + 0.011311} = \sqrt{0.036398} \approx 19.08\% \text{ per year}

Decomposition: factor volatility 0.025087=15.84%\sqrt{0.025087} = 15.84\%, specific volatility 0.011311=10.64%\sqrt{0.011311} = 10.64\%. The factor share of variance is 0.025087/0.036398=68.9%0.025087 / 0.036398 = 68.9\%. (Note that volatilities do not add, 15.84+10.6419.0815.84 + 10.64 \neq 19.08, but variances do. Risk decomposition always happens in variance units. Chapter 9 returns to this.)

Even this 5-stock toy shows the shape of a real portfolio: most of the risk is factor risk, dominated by the market term. The value tilt adds almost nothing to total risk here (its variance term is 0.000088), but it would dominate the active risk against a market-like benchmark, exactly the situation engineered in the mini example.

2.7 Summary

  • The model: r=Xf+ϵr = Xf + \epsilon, with exposures known at the start of the period.
  • The assumptions: specific returns have zero mean, are uncorrelated with factors, and are uncorrelated with each other. The third is the important one. Its failures (linked securities, missing factors) are diagnosable and fixable.
  • The payoff: Σ=XFX+Δ\Sigma = XFX^\top + \Delta, millions of covariances generated by only thousands of meaningful parameters, always positive semi-definite, always invertible.
  • Portfolio risk: σp2=xpFxp+wΔw\sigma_p^2 = x_p^\top F x_p + w^\top \Delta w with xp=Xwx_p = X^\top w. The idiosyncratic term goes to zero as the portfolio size increases, the factor term stays. Active risk is the same algebra applied to wa=wpwbw_a = w_p - w_b.

Left for later chapters: where XX comes from (Chapter 3), where ff comes from (Chapter 6), and where FF and Δ\Delta come from (Chapter 8).