Work in progress. This primer is still being written.
β ITSJUSTBETA.COM

Part 17 / 16

Appendix: Reference Material

17.1 Notation summary

SymbolDimensionsMeaningFirst used
NNscalarnumber of assetsCh. 1
KKscalarnumber of factorsCh. 1
TTscalarnumber of time periodsCh. 1
rrN×1N \times 1asset returns over one periodCh. 2
XXN×KN \times Kfactor exposure (loading) matrixCh. 2
ffK×1K \times 1factor returns over one periodCh. 2
ϵ\epsilonN×1N \times 1specific (idiosyncratic) returnsCh. 2
FFK×KK \times Kfactor covariance matrixCh. 2
Δ\DeltaN×NN \times N diag.specific variance matrix, Δii=σϵi2\Delta_{ii} = \sigma^2_{\epsilon_i}Ch. 2
Σ\SigmaN×NN \times Nasset covariance, Σ=XFX+Δ\Sigma = XFX^\top + \DeltaCh. 2
w,wp,wb,waw, w_p, w_b, w_aN×1N \times 1weights: generic, portfolio, benchmark, active (wpwbw_p - w_b)Ch. 2, Ch. 9
x,xp,xb,xax, x_p, x_b, x_aK×1K \times 1factor exposures of a portfolio, x=Xwx = X^\top wCh. 2, Ch. 9
σp,σa\sigma_p, \sigma_ascalarportfolio volatility, tracking errorCh. 2
dikd_{ik}scalarraw descriptor value, stock ii, factor kkCh. 3
μk,σk\mu_k, \sigma_kscalarstandardization location (cap-weighted) and scale (equal-weighted)Ch. 3
S(f)S(f)scalarleast-squares objective, (weighted) sum of squared residuals minimized to get f^\hat fCh. 6
WWN×NN \times N diag.regression weights (convention: cap\propto \sqrt{\text{cap}})Ch. 6
CC1×K1 \times K (or more rows)constraint matrix, Cf=0Cf = 0Ch. 6
RRK×(Kc)K \times (K - c)restriction matrix, feasible f=Rgf = RgCh. 6
PPK×NK \times Npure factor portfolio matrix, f^=Pr\hat f = PrCh. 6, Ch. 7
HHN×NN \times NOLS projection (“hat”) matrix, H=X(XX)1XH = X(X^\top X)^{-1}X^\top§17.2
Ω\OmegaN×NN \times Ngeneral specific-return covariance in GLS; reduces to Δ\Delta when diagonal§17.2
β,βp,h\beta, \beta_{p,h}scalartime-series beta to a factor; portfolio beta to a hedge instrumentCh. 1, Ch. 12
γ\gammascalar / K×1K \times 1Fama–MacBeth risk premium; Lagrange multiplier in the characteristic-portfolio derivationCh. 7
hhvarieshedge notionals, characteristic portfolio weightsCh. 7, Ch. 12
λ\lambdascalarrisk aversion (Ch. 11), EWMA decay (Ch. 8), context disambiguatesCh. 8, Ch. 11
bbscalarbias statistic, std(rt/σ^t1)\mathrm{std}(r_t/\hat\sigma_{t-1})Ch. 8, Ch. 14
MCRi,CTRi\mathrm{MCR}_i, \mathrm{CTR}_iscalarmarginal contribution (Σw)i/σ(\Sigma w)_i/\sigma, contribution wiMCRiw_i \cdot \mathrm{MCR}_iCh. 9

Conventions: returns are arithmetic and in decimal unless a table is marked %. Risk numbers are annualized unless noted. “Exposure” always means a column-standardized or dummy loading per Chapter 3.

17.2 Refresher: the least-squares family in one place

OLS: minf(rXf)(rXf)f^=(XX)1Xr\min_f (r - Xf)^\top(r - Xf) \Rightarrow \hat f = (X^\top X)^{-1} X^\top r. Fitted values Xf^=HrX\hat f = Hr with the projection (“hat”) matrix H=X(XX)1XH = X(X^\top X)^{-1}X^\top: symmetric, idempotent (H2=HH^2 = H), projecting onto the column space of XX. Residuals (IH)r(I - H)r are orthogonal to that space: Xϵ^=0X^\top \hat\epsilon = 0.

WLS: With positive diagonal WW: minf(rXf)W(rXf)f^=(XWX)1XWr\min_f (r - Xf)^\top W (r - Xf) \Rightarrow \hat f = (X^\top W X)^{-1} X^\top W r. Equivalent to OLS on r~=W1/2r\tilde r = W^{1/2} r, X~=W1/2X\tilde X = W^{1/2} X. Best linear unbiased when WCov(ϵ)1W \propto \mathrm{Cov}(\epsilon)^{-1} (Aitken/GLS). The cap\sqrt{\text{cap}} convention approximates this for equities (Ch. 6).

GLS: General Cov(ϵ)=Ω\mathrm{Cov}(\epsilon) = \Omega: f^=(XΩ1X)1XΩ1r\hat f = (X^\top \Omega^{-1} X)^{-1} X^\top \Omega^{-1} r. In the factor-model context Ω=Δ\Omega = \Delta (diagonal), so GLS = WLS with W=Δ1W = \Delta^{-1}.

Covariance algebra used throughout: For conformable constant matrices A,BA, B and random vectors u,vu, v: Cov(Au)=ACov(u)A\mathrm{Cov}(Au) = A\,\mathrm{Cov}(u)\,A^\top; Cov(Au+Bv)=ACov(u)A+BCov(v)B\mathrm{Cov}(Au + Bv) = A\,\mathrm{Cov}(u)A^\top + B\,\mathrm{Cov}(v)B^\top when Cov(u,v)=0\mathrm{Cov}(u,v) = 0. These two lines, applied to r=Xf+ϵr = Xf + \epsilon, are the derivation of Σ=XFX+Δ\Sigma = XFX^\top + \Delta.

EWMA: Weights λs\lambda^s on lag ss. Half-life hλ=21/hh \leftrightarrow \lambda = 2^{-1/h}. Recursive update F^t=λF^t1+(1λ)f~tf~t\hat F_t = \lambda \hat F_{t-1} + (1-\lambda)\tilde f_t \tilde f_t^\top (normalized form). Effective sample size (1+λ)/(1λ)2.89h\approx (1+\lambda)/(1-\lambda) \approx 2.89h for large hh.

Eigendecomposition/PCA: Symmetric Σ=QΛQ\Sigma = Q\Lambda Q^\top, QQ orthonormal, Λ\Lambda diagonal with λ1λN0\lambda_1 \ge \dots \ge \lambda_N \ge 0 for PSD. PCA: factor jj‘s exposures = qjq_j, factor variance = λj\lambda_j. The rank-KK truncation is the best rank-KK approximation in Frobenius norm (Eckart–Young). Marchenko–Pastur noise edge for aspect ratio N/TN/T: λ±=σ2(1±N/T)2\lambda_{\pm} = \sigma^2 (1 \pm \sqrt{N/T})^2.

17.3 Derivation collection

D1 - Constrained WLS (Ch. 6): Minimize S(f)=(rXf)W(rXf)S(f) = (r - Xf)^\top W (r - Xf) s.t. Cf=0Cf = 0 (cc independent rows). Restriction form: pick RR (K×(Kc)K \times (K-c)) whose columns span the null space of CC (so CR=0CR = 0 and any feasible f=Rgf = Rg). Substitute: S(Rg)S(Rg) is unconstrained in gg, and by the WLS formula with design XRXR: g^=(RXWXR)1RXWr\hat g = (R^\top X^\top W X R)^{-1} R^\top X^\top W r, f^=Rg^\hat f = R\hat g. Lagrangian form: L=S(f)+2λCf\mathcal{L} = S(f) + 2\lambda^\top C f. Stationarity gives the bordered system (XWXCC0)(f^λ)=(XWr0)\begin{pmatrix} X^\top W X & C^\top \\ C & 0\end{pmatrix}\begin{pmatrix}\hat f\\ \lambda\end{pmatrix} = \begin{pmatrix}X^\top W r\\ 0\end{pmatrix}. Same solution, and the multiplier λ\lambda prices the constraint.

D2 - Pure factor portfolios, PX=IPX = I (Ch. 7): Unconstrained: P=(XWX)1XWPX=(XWX)1(XWX)=IKP = (X^\top W X)^{-1}X^\top W \Rightarrow PX = (X^\top W X)^{-1}(X^\top W X) = I_K. Row kk of PP is a portfolio with exposure vector = row kk of PXPX = eke_k^\top: unit own-factor, zero others. Constrained: P=R(RAR)1RAP = R(R^\top A R)^{-1} R^\top A with A=XWXA = X^\top W X. Then PXRg=RgPXRg = Rg for all gg, identity on the feasible subspace. Style rows are exactly pure, market/industry rows carry the constraint’s structure (Ch. 7 table).

D3 - Euler risk decomposition (Ch. 9): σ(w)=(wΣw)1/2\sigma(w) = (w^\top \Sigma w)^{1/2} is positively homogeneous of degree 1. Euler’s theorem for homogeneous functions: σ=iwiσ/wi\sigma = \sum_i w_i \partial\sigma/\partial w_i. Directly: σ/w=Σw/σ\partial \sigma/\partial w = \Sigma w / \sigma, so iwi(Σw)i/σ=wΣw/σ=σ\sum_i w_i (\Sigma w)_i / \sigma = w^\top \Sigma w / \sigma = \sigma. ∎ Factor-space version: with σ(x)=(xFx)1/2\sigma(x) = (x^\top F x)^{1/2} (factor block), contributions xk(Fx)kx_k (Fx)_k sum to xFxx^\top F x.

D4 - Characteristic portfolio (Ch. 7): minhhΣh\min_h h^\top \Sigma h s.t. xh=1x^\top h = 1. Lagrangian hΣh2γ(xh1)h^\top \Sigma h - 2\gamma(x^\top h - 1). Stationarity Σh=γxh=γΣ1x\Sigma h = \gamma x \Rightarrow h = \gamma \Sigma^{-1} x. The constraint fixes γ=1/(xΣ1x)\gamma = 1/(x^\top \Sigma^{-1} x): hx=Σ1x/(xΣ1x)h_x = \Sigma^{-1}x / (x^\top \Sigma^{-1} x), with minimized variance 1/(xΣ1x)1/(x^\top \Sigma^{-1}x). The GLS pure portfolio for factor kk solves the same problem with the added constraints of zero exposure to the other factors. Stacking those constraints and applying D1’s bordered system shows the GLS (W=Δ1W = \Delta^{-1}) regression rows solve it: estimation efficiency = portfolio efficiency.

D5 - Woodbury identity for Σ1\Sigma^{-1} (Ch. 11): For Σ=Δ+XFX\Sigma = \Delta + XFX^\top with Δ,F\Delta, F invertible: Σ1=Δ1Δ1X(F1+XΔ1X)1XΔ1\Sigma^{-1} = \Delta^{-1} - \Delta^{-1}X(F^{-1} + X^\top \Delta^{-1} X)^{-1} X^\top \Delta^{-1}. Verify by multiplication: Σ[RHS]=I+XFXΔ1(X+XFXΔ1X)(F1+XΔ1X)1XΔ1\Sigma \cdot [\text{RHS}] = I + X F X^\top \Delta^{-1} - (X + XFX^\top\Delta^{-1}X)(F^{-1} + X^\top \Delta^{-1}X)^{-1}X^\top \Delta^{-1}. Factor XFXF from the middle term: X+XFXΔ1X=XF(F1+XΔ1X)X + XFX^\top \Delta^{-1} X = XF(F^{-1} + X^\top \Delta^{-1} X), so the middle term collapses to XFXΔ1XFX^\top \Delta^{-1}, cancelling. ∎ Cost: O(NK2+K3)O(NK^2 + K^3) vs. O(N3)O(N^3).

D6 - Minimum-variance hedge ratio (Ch. 12): Var(rp+hrh)=σp2+2hCov(rp,rh)+h2σh2\mathrm{Var}(r_p + h\, r_h) = \sigma_p^2 + 2h\,\mathrm{Cov}(r_p, r_h) + h^2 \sigma_h^2. Minimize over hh: h=Cov(rp,rh)/σh2=βp,hh^* = -\mathrm{Cov}(r_p, r_h)/\sigma_h^2 = -\beta_{p,h}. Multi-instrument: h=Cov(rH)1Cov(rH,rp)h^* = -\mathrm{Cov}(r_H)^{-1}\mathrm{Cov}(r_H, r_p), the population regression coefficients of the portfolio on the instruments. Through the model, Cov(rH)=XhFXh+Δh\mathrm{Cov}(r_H) = X_h F X_h^\top + \Delta_h and Cov(rH,rp)=XhFxp(+specific overlap)\mathrm{Cov}(r_H, r_p) = X_h F x_p (+ \text{specific overlap}).

17.4 Glossary

  • Active return/risk: portfolio minus benchmark return. Volatility thereof (tracking error).
  • Alpha: expected return not explained by factor exposures. In construction, the forecast vector fed to an optimizer.
  • Bias statistic: std of realized returns standardized by forecast volatility. The calibration score of a risk model.
  • Characteristic portfolio: minimum-variance portfolio with unit exposure to a characteristic: Σ1x/xΣ1x\Sigma^{-1}x / x^\top\Sigma^{-1}x.
  • Coverage universe: all assets the model assigns exposures and risk to (cf. estimation universe).
  • Descriptor: a raw measurable per-stock quantity (B/P, 12-1 return) before standardization. Factors blend one or more.
  • Estimation universe: the curated asset set on which factor returns are estimated.
  • Exposure (loading): a stock’s sensitivity to a factor: dummy (industry/country) or standardized z-score (style).
  • Factor-mimicking/pure factor portfolio: the long–short portfolio (row of PP) whose return is the estimated factor return. Unit own-exposure, zero other-exposure.
  • Factor return: per-period payoff to unit exposure of a factor, estimated by cross-sectional regression.
  • Half-life: lag at which an EWMA weight halves. The responsiveness dial of covariance estimation.
  • Idiosyncratic/specific risk: return variance unique to a stock. Diagonal of Δ\Delta. Diversifiable.
  • Information coefficient (IC): cross-sectional correlation between a signal and forward returns. In alpha research, the number that matters is the IC of the factor-residualized signal.
  • Linked assets: multiple listings of one issuer (ADR, share classes) sharing factor exposures and correlated specifics.
  • Pure factor portfolio: see factor-mimicking portfolio.
  • Restriction matrix: basis of a constraint’s null space, converting constrained to unconstrained regression.
  • Style factor: continuous characteristic-based factor (value, momentum, size, quality…).
  • Tracking error: annualized volatility of active return.
  • VIF (variance inflation factor): 1/(1R2)1/(1-R^2) of one exposure regressed on the others. The redundancy gauge for candidate factors.
  • Winsorization: clipping extreme descriptor values before standardization.
  • Z-score: standardized exposure: (descriptor − cap-weighted mean) / equal-weighted std.

17.5 The mini example: complete dataset

Everything below reproduces every number in Chapters 215 (NumPy, deterministic, no randomness). The full script is reproduced on its own page: Mini Example Source Code.

Universe, descriptors, month-1 data, portfolios:

StockIndustryCap $bnB/PMom (12-1)Spec vol (ann)r1r_1 (%)wpw_p
AXIOMTech1500.15+0.3218%+4.20.10
BINARYTech800.25+0.1822%+2.80.08
CIPHERTech400.45−0.0530%+0.50.10
DIGITTech100.60+0.4038%+6.00.03
EVERGREENFin1200.85+0.0616%+0.80.22
FIDELISFin600.95−0.0220%−0.60.14
GUARDIANFin201.10−0.1228%−1.80.06
HARVESTCons900.40+0.1017%+1.20.15
INDIGOCons300.55+0.0226%+2.00.08
JUNIPERCons150.70−0.0832%−0.50.04

Benchmark wbw_b = cap weights (total cap 615). SIZE descriptor = ln(cap). Style standardization: cap-weighted mean, equal-weighted std (Ch. 3. Resulting XX tabulated there). Regression weights cap\propto \sqrt{\text{cap}}, normalized. Constraint: cap-weighted industry factor returns sum to zero, industry cap weights (0.4553, 0.3252, 0.2195).

Factor covariance FF (annualized): Volatilities: MKT 16%, TECH 9%, FIN 7%, CONS 5%, VALUE 4%, MOM 6%, SIZE 4%. Correlations:

MKTTECHFINCONSVALUEMOMSIZE
MKT10.10−0.05−0.10−0.200.050.15
TECH1−0.40−0.30−0.350.300.05
FIN1−0.100.40−0.150.00
CONS10.05−0.05−0.05
VALUE1−0.450.10
MOM10.05
SIZE1

(Symmetric, eigenvalues all positive: PSD verified in the script.)

Stipulated later-month factor returns (Ch. 10. CONS set by the constraint): f2f_2 = (−2.0, −1.5, +1.0, +1.63, +1.2, −0.8, +0.5)%. f3f_3 = (+3.0, +0.8, −0.5, −0.919, −0.6, +1.0, −0.3)%. Active specific returns months 2–3: +0.30%, −0.10%. Constant exposures assumed across the quarter.

Hedge instruments (Ch. 12): index future = benchmark exposures (1, 0.4553, 0.3252, 0.2195, 0, 0, 0). Small-cap future = (1.05, 0.35, 0.30, 0.35, 0.10, −0.05, −1.20). Both specific-risk-free.

Alpha-research candidate (Ch. 13): raw descriptor = book-to-price − 0.30·(12-1 momentum) + a fixed per-stock idiosyncratic term, standardized to exposure aa. Regressing aa on XX gives the spanning results below.

Key computed results (cross-chapter checkpoints): month-1 factor returns f1f_1 = (1.821, 0.768, −1.282, 0.306, 0.548, 1.962, 0.046)%. Portfolio/benchmark/active vols 17.55% / 18.14% / 5.42%. Active exposures (0, −0.145, 0.095, 0.051, 0.385, −0.332, −0.275). Quarter attribution VALUE +0.46% MOM −0.72% specific +0.24% total −0.102%. Optimization TE 5.42->4.65% at 25.2% turnover. Hedge (−0.759, −0.229) giving 17.55->8.14%. Alpha-research candidate: corr with VALUE/MOM/SIZE +0.82/−0.44/−0.50, spanned fraction 0.884, IC vs r1r_1 −0.49 raw / −0.13 residualized, signal long–short −0.63% = factor −0.46% + specific −0.17%.

17.6 Annotated bibliography

Foundations

  • Sharpe (1964), “Capital Asset Prices,” JF: the one-factor beginning. Beta as the first exposure.
  • Ross (1976), “The Arbitrage Theory of Capital Asset Pricing,” JET: multi-factor pricing without specifying the factors. The license under which all factor models operate.
  • Fama & MacBeth (1973), “Risk, Return, and Equilibrium,” JPE: the two-pass cross-sectional methodology (Ch. 7). Still the standard premia test.
  • Chen, Roll & Ross (1986), “Economic Forces and the Stock Market,” JB: the canonical macroeconomic factor model (Ch. 4).
  • Fama & French (1992, 1993) JF/JFE; (2015) JFE; Carhart (1997) JF: the style-factor canon: size and value, the three-factor model, profitability/investment, momentum. The sorted-portfolio construction of Ch. 7.
  • Rosenberg (1974), “Extra-Market Components of Covariance,” JFQA: the founding paper of the fundamental cross-sectional architecture this primer centers on.

Books

  • Grinold & Kahn, Active Portfolio Management (2nd ed.): the practitioner bible: characteristic portfolios, the fundamental law, IR-based thinking. The source of Ch. 7’s optimization view and Ch. 11’s alpha discipline.
  • Connor, Goldberg & Korajczyk, Portfolio Risk Analysis: the most rigorous book-length treatment of factor risk models per se, all three families.
  • Qian, Hua & Sorensen, Quantitative Equity Portfolio Management: cross-sectional modeling and construction with worked detail.
  • Litterman et al., Modern Investment Management: risk decomposition and budgeting culture (Goldman’s quantitative tradition).

Methods

  • Ledoit & Wolf (2003, 2004): shrinkage covariance estimation (Ch. 8): “Honey, I Shrunk the Sample Covariance Matrix.”
  • Newey & West (1987), Econometrica: autocorrelation-consistent covariance. The horizon-scaling fix of Ch. 8.
  • Shanken (1992), RFS: errors-in-variables correction for Fama–MacBeth (Ch. 7).
  • Menchero (2000s, various): multi-period attribution linking (Ch. 10), Carino (1999): the log-linking algorithm.
  • Black & Litterman (1992), FAJ: equilibrium-anchored expected returns (Ch. 11).
  • Michaud (1989), “The Markowitz Optimization Enigma,” FAJ: error maximization named and shamed (Ch. 11).
  • Harvey, Liu & Zhu (2016), RFS, “…and the Cross-Section of Expected Returns”: the factor zoo’s multiple-testing reckoning (Ch. 16). Hou, Xue & Zhang (2020), RFS: the replication audit.
  • Kelly, Pruitt & Su (2019), JFE: IPCA, Gu, Kelly & Xiu (2020), RFS: ML asset pricing (Ch. 16 directions).

Practitioner references

  • MSCI Barra model handbooks (USE4, GEM3 and successors): full disclosure of a production fundamental model: descriptor recipes, estimation universes, regression weights, specific-risk blending. The single best way to see every choice in Chapters 38 made concretely, with parameters.
  • Axioma/SimCorp research papers: practical treatments of alpha alignment (Ch. 11), statistical-vs-fundamental hybrids (Ch. 4), and bias-statistic methodology (Ch. 14).
  • Menchero, Orr & Wang (MSCI), “The Barra US Equity Model (USE4)” research notes: readable bridge between the handbooks and the academic literature.