Goldman's Tail Hedging Analysis Is Right About the Direction. Building the Implementation Stack Beneath It.
Goldman's framing is directionally right. The implementation stack that sits beneath the model is where programs actually live or die.
With thanks to Daan Streumer for conversations that shaped this piece.
Goldman Sachs Asset Management published "From Defense to Offense: Finding the True Value of Tail-Risk Hedging" in January. The paper makes the case that standalone tail hedging adds almost nothing: 0.8 basis points annually, even at 99% reliability. Where the value shows up, they argue, is in what hedging permits. A 50% reliable strategy with modest alpha enables roughly 30 bps of incremental compounding through higher equity allocation. Most readers skim past that.
The directional claim is right. Tail hedging is a permission structure. It lets you stay long through drawdowns instead of panic-selling at the bottom. We are glad that we've come to the same conclusions in both our fundamental research and psychological philosophy.
The paper earns credit for the things it does well: the two-regime framing of defense versus offense, the model-based demonstration that standalone tail hedging has almost no standalone value, the explicit utility-based sizing framework, and a useful benchmarking lens for comparing strategies and managers. Those are useful conceptual anchors for a conversation that too often gets stuck arguing about cost alone.
The paper models hedging at the conceptual level using two parameters, reliability and alpha. That is the right scope choice for a framework piece, especially given that Goldman is addressing a broad set of tail-risk approaches rather than just options overlays. For option-based programs specifically, there is a complementary layer worth mapping alongside the framework: five operational decisions that, in our experience running these programs, determine whether the reliability and alpha picture materializes in practice.
Five implementation variables for option-based hedges
The implementation stack beneath the reliability/alpha model
Reliability emerges from strike, entry gates, monetization, rehedge, and reinvest decisions working together as an operating stack.
Strike. The decision between 15% out-of-the-money (OTM) puts and 35% OTM puts corresponds to a completely different strategy. One responds to corrections at maybe 2-3% annual drag. The other is catastrophe insurance at a fraction of that. The payout frequencies, management requirements, and the way the total portfolio actually feels to the investor are completely different.
Different drawdown shapes create different hedge outcomes
The same hedge structure can look brilliant in a fast crash and useless in a slow grind. That is a shape problem before it is a strike problem.
15% OTM and 35% OTM are different programs
The deductible determines which class of event you insure and how much carry you agree to tolerate while waiting.
The same Verdad study showed that deeper OTM strikes were more efficient in sudden crises like COVID, because the primary payoff driver is implied vol convexity, not intrinsic value. Most allocators who say tail hedging is "too expensive" are quoting near-the-money protection on passive strategies and applying that cost to a problem that only calls for catastrophic coverage. Strike is where program design begins once the philosophical case for hedging is accepted.
The 15% and 35% anchors used in the visuals above are illustrative — chosen for visual contrast to show the shape of the tradeoff, not as specific strike recommendations. The actual strike calibration for a given program depends on mandate, vol regime, governance tolerance, and the rest of the implementation stack.
Entry Gates. In one illustrative case, Goldman sizes the optimal hedge at about 1.3% of portfolio risk for a 50% reliable, zero-alpha strategy. In the two-parameter model, premium cost lives inside alpha, which is the right abstraction for comparing strategies. In an actual program, cost discipline is itself an active management variable, expressed primarily through entry timing. The CBOE PPUT Index, which is passive monthly put-buying and about as naive as it gets, has underperformed the S&P by roughly 4-5% annualized over 30 years. AQR showed similar results: systematic put-buying runs around -6.4% annualized (Ilmanen, Thapar, Tummala & Villalon, 2020). Israelov's "Pathetic Protection" went further: protective puts are "quite ineffective at reducing drawdowns versus the simple alternative of statically reducing exposure." Verdad Capital reached the same wall from a different direction. After testing passive put-buying across OptionMetrics data since 1996, they concluded: "we were unable to identify a simple options-based approach that both protected against black swan events and did so at an acceptably low cost." Their caveat is worth reading in full, because it names the gap directly: "achieving this goal requires either a complexity of strategy or a robust active approach that goes beyond our basic quantitative efforts." Three independent research teams arrived at the same floor from different angles: passive implementation creates structurally ruinous drag. Programs with real entry discipline, sizing up when protection is cheap and scaling back when it's expensive, operate at a fraction of that drag. The spread between naive and intelligent implementation is large, and in practice it is the variable that ends most hedge programs before they ever have the chance to pay off.
Monetization. In our experience, monetization is the single most consequential operational decision for option-based programs, and the one most invisibly bundled into the reliability and alpha parameters. The model treats payoff as a function of reliability. In practice, payoff is a function of whether someone pulls the trigger in a compressed window. In real markets, nobody delivers the payoff. You take it or you don't, but this is a choice. It is your option.
The S&P fell 34% in 23 trading days in March 2020. Universa reported a 3,612% return on its tail-risk book, although this reporting has been debated.
Eight trading days after the bottom, the S&P had recovered 17%. By mid-April, up 28%. A hedge that peaked around March 16-23 and wasn't monetized inside that window gave back most of its value within weeks. Israelov & Nze Ndong (2023) studied this directly: the monetization window during COVID was compressed to days, not weeks, and the V-shaped recovery made timing the dominant variable in hedge value capture.
The payoff spike is perishable
Own the hedge and delay the monetization decision long enough, and the market will hand back a meaningful share of the convexity for you.
Sell too early and you clip the tail you're paying to capture. Sell too late and the V-shaped recovery eats your hedge P&L. Don't sell at all and you're not running a hedge program, you're buying lottery tickets and letting them expire. Bhansali, Chang, Holdom & Rappaport showed in "Monetization Matters" (2020) that simple rules-based monetization, selling at pre-defined price multiples of initial cost, significantly improved hedge program performance versus hold-to-expiry using actual March 2020 data. Their conclusion: actively managed tail-risk strategies can result in significant increases in efficacy.
A monetization system must have a ruleset detailing when to take profits, in what tranches, and using what triggers. This is the most consequential operational decision in a tail hedge program, and it lives in the implementation layer rather than the parameter layer. It's also the clearest example of why a simple, passive approach hits a ceiling. A passive strategy has no monetization framework. It holds to expiry or it doesn't. The gap between that and a managed program with real exit discipline is where most of the reliability range lives.
Rehedge. After you monetize, do you re-establish protection immediately in an elevated vol environment? Or wait for normalization and accept being unhedged through a potential second leg? The answer depends on regime and probability assessment, and it changes the character of the program entirely. This is implementation territory: regime read, probability assessment, vol surface dynamics, and the discipline to not chase convexity that just got expensive.
Reinvest. Goldman frames the value of tail hedging as enabling a higher static equity allocation, holding more beta across the cycle. There is a complementary mechanism worth naming: what you do with the proceeds once you've monetized. A program that cashes out during a crash generates liquidity at exactly the moment equity prices are depressed. Reinvesting those proceeds, buying cheap equity with hedge profits, is counter-cyclical capital deployment that compounds through the recovery. That becomes a compounding engine on top of the higher static allocation. Bhansali & Davis formalized this at PIMCO in "Offensive Risk Management" (2010), showing that the "shadow value" of a tail hedge program, the optionality to deploy capital at distressed prices, can exceed the direct hedge payoff. The reinvestment mechanism is itself an operational decision: tranches, triggers, target allocations. And buying at depressed prices rather than forced selling doesn't even begin to describe the emotional and psychological effects during a major drawdown, which we spoke about here and here.
Governance
Governance is arguably the dominant determinant of program survival, and the most common failure mode we see in practice. We've laid out a governance checklist for options overlays specifically.
Goldman's sizing framework uses CRRA utility, or Constant Relative Risk Aversion. The investor has stable preferences and acts rationally across all market conditions. If they held all equities before, that reveals their risk tolerance, and the model optimizes from there.
It is an elegant model. It also describes an investor who doesn't quite exist in committee rooms during drawdowns or rallies.
In August 2017, CalPERS, the largest US public pension at over $400B under management, allocated to Universa Investments and LongTail Alpha as tail-risk hedges. The programs ran for two years during a bull market.
In October 2019, CalPERS terminated both programs. The hedges were a visible drag during a rally. The cost was small but it showed up negative in every quarterly review. The benefit, protection against something that hadn't happened, was invisible.
By January 2020 the positions were fully unwound. On February 19, the S&P peaked. Thirty-three days later it had fallen 34%. CalPERS' estimated missed payout: over $1 billion.
The cost of the hedge was probably a handful of basis points. The cost of cancelling it was the largest missed windfall in public pension history.
And this is hardly unique. It follows the familiar failure pattern we've called the quiet tax. Allocate with conviction, bleed through quiet months, field uncomfortable questions in quarterly reviews, cancel the program, watch the tail event arrive. Benartzi and Thaler documented the underlying mechanism in 1993: myopic loss aversion. Evaluate a tail hedge on a quarterly horizon and it always looks like a waste. Evaluate it over a full cycle and it transforms the portfolio. Most governance structures evaluate quarterly. Gneezy & Potters (1997) confirmed it experimentally: subjects who evaluated their portfolios more frequently took less risk and earned lower returns, even when the underlying opportunities were identical.
Goyal & Wahal (2008) showed the same pattern at institutional scale: across 3,400 plan sponsor decisions, the managers they fired subsequently outperformed the managers they hired. Committees systematically buy high and sell low. The pressure to act on recent performance is structural, not a character flaw.
An optimal hedge allocation modeled with CRRA utility is a strong starting point. Keeping the program in place through a raging bull market and exuberant investor appetite, which is probably exactly when one needs it most, is a separate problem, and it lives in governance design. In our experience, governance is often the primary failure mode, more so than the math, strike selection, or headline cost. That's why, in our view, deep understanding of the strategy, with all its nuances, is paramount to actually running it successfully long-term.
Reliability
Goldman's central metric, "reliability," is defined as correlation between a strategy's returns and an ideal hedge. They cite PPUT at 40%, their own blended strategies at 20-70%.
That is a wide range, and unpacking what drives it is essentially the implementation question. Verdad's work gives a useful floor for options-based programs: passive put-buying, the simplest possible implementation, couldn't simultaneously protect against tail events and keep costs survivable. They tried. Their conclusion was that you need either structural complexity or active management to get there. For options-based hedges, that reads like the low end of the range. The high end likely reflects a materially different operating stack, with stronger implementation choices and management.
In practice, reliability isn't a parameter you set in isolation. It emerges from how well the five decisions above work together: strike, entry gates, monetization, rehedge, reinvest. Two programs with identical "50% reliability" can behave completely differently depending on how they got there. A move from 20% to 70% reliability often comes from shifting from a passive allocation toward a managed program with real implementation expertise.
Goldman's framework uses reliability as an input. We have found it equally useful as a scoreboard for implementation outcomes — both readings are valuable, and they reinforce each other.
Where that leaves us
Goldman's thesis matches what we've seen in our own research: tail hedging is a permission structure that enables risk-taking, and the two-regime framing of defense versus offense is the right mental model. That part of the conversation stands on its own, and the paper is worth reading on its merits.
The complementary layer is the five-decision implementation stack: strike, entry gates, monetization, rehedge, reinvest. Plus the governance scaffolding that holds the program in place during the quiet years when everyone wants to cancel it. That is the layer where tail hedge programs actually live or die.
The philosophy is public, and useful. The engineering is harder to publish, because every program's implementation is a function of its specific mandate, governance, and counterparty relationships. That is why most published work stops at the philosophy layer, and why the implementation layer tends to live inside individual programs.
If you're evaluating or running a tail hedge program and these questions sound familiar — strike calibration, entry discipline, monetization triggers, rehedge cadence, reinvestment logic, and the governance to hold them all in place across cycles — that's what we do.
Philosophical note for veriolab.com. Educational only. Not investment advice. Verio Labs provides modeling, analytics, and evaluation. We do not manage assets or give trade recommendations. See our Disclosures.