Discrete-time survival (family = "logit-hazard")¶
hapc.hazard_hapc() fits a discrete-time logistic hazard model with
HAPC from right-censored survival data (X, T, Delta). It performs the
person-period expansion, prepends the visit time as the first HAL covariate, and
cross-validates the binomial fit. The full statistical derivation (model,
person-period likelihood, survival function) and references are in the function
docstring below.
The same routine is reachable from the cross-validation dispatcher as
cv_hapc(X, T, family="logit-hazard", Delta=Delta, norm="1").
- hapc.hazard_hapc(X: ndarray, T: ndarray, Delta: ndarray, norm: str = '1', max_degree: int = 1, npcs: int | None = None, time_grid: ndarray | None = None, log_lambda_min: float = -4, log_lambda_max: float = -1, grid_length: int = 15, nfolds: int = 5, predict: ndarray | None = None, center: bool = True, verbose: bool = False, max_iter: int = 5000, tol: float = 0.001, step_factor: float = 0.8) HazardResult[source]¶
Discrete-time logistic hazard HAPC fit (
family="logit-hazard").Convenience wrapper around
hapc.cv_hapc()withfamily="binomial". The right-censored survival data(X, T, Delta)are expanded into a person-period table (one row per subject-per-interval-at-risk) whose binary response is the discrete hazard indicator, the visit time is prepended as the first HAL covariate, and the regularisation parameterlambdais chosen by cross-validated logistic deviance. R counterpart:hapc::hazard.hapc().- Parameters:
X (np.ndarray, shape (n, p)) – Baseline covariates, one row per subject.
T (np.ndarray, shape (n,)) – Observed times
T_i = min(T_event_i, C_i). Assumed discrete.Delta (np.ndarray, shape (n,)) – Event indicators
Delta_i in {0,1}(1 = event, 0 = right-censored).norm ({"1", "2"}, default "1") –
"1"= logistic LASSO,"2"= logistic ridge."sv"raisesNotImplementedError.max_degree (int, default 1) – HAL interaction order over
[time, X].npcs (int, optional) – Number of principal components. Defaults to the number of person-period rows (capped internally as in
hapc.cv_hapc()).time_grid (np.ndarray, optional) – Discrete time grid (risk-set grid). Defaults to
min(T):max(T)whenTis integer-valued, elsenp.unique(T). Subjects are assumed at risk frommin(time_grid)onward.log_lambda_min (float / int) – Log-λ CV grid (defaults
-4,-1,15).log_lambda_max (float / int) – Log-λ CV grid (defaults
-4,-1,15).grid_length (float / int) – Log-λ CV grid (defaults
-4,-1,15).nfolds (int, default 5) – Number of CV folds.
predict (np.ndarray, optional, shape (m, p)) – Baseline covariates for new subjects. When supplied the model (refit at the CV-selected λ) is evaluated on the full time grid for each new subject, returning the hazard surface and the implied survival curves.
center – Passed through to
hapc.cv_hapc()/hapc.hapc().verbose – Passed through to
hapc.cv_hapc()/hapc.hapc().max_iter – Passed through to
hapc.cv_hapc()/hapc.hapc().tol – Passed through to
hapc.cv_hapc()/hapc.hapc().step_factor – Passed through to
hapc.cv_hapc()/hapc.hapc().
- Returns:
HazardResult – See
HazardResult.hazardholds the cross-validated discrete hazard for each person-period row;predict_hazard/predict_survivalare populated whenpredictis supplied.
Notes
Model. The discrete hazard is the conditional event probability in interval
tgiven survival up tot,lambda(t | x) = P(T_event = t | T_event >= t, X = x),
modelled on the logit scale by a Highly Adaptive Principal Components fit
fof the augmented covariate(t, x):logit lambda(t | x) = f(t, x).
The HAL basis spans indicator (and, for
max_degree > 1, interaction) tensor products in(t, x), so the time effect, the covariate effects and their interactions are estimated nonparametrically; the L1/L2 penalty (norm) controls smoothness and is tuned by cross-validation.Person-period likelihood. Under independent right-censoring the observed-data likelihood factorises over the at-risk intervals,
- prod_i prod_{t <= T_i}
lambda(t | x_i) ** Y_it * (1 - lambda(t | x_i)) ** (1 - Y_it),
with
Y_it = 1(T_event_i = t). This is exactly the Bernoulli (logistic) likelihood of the expanded person-period table, so fitting a binomial HAPC model toY_itagainst(t, x_i)estimates the discrete hazard (Cox 1972; Brown 1975; Allison 1982).Survival. The conditional survival function follows from the estimated hazard by the product-limit relation
S(t | x) = P(T_event > t | x) = prod_{s <= t} (1 - lambda(s | x)),
returned in
predict_survivalfor new subjects whenpredictis given.References
Cox, D. R. (1972). Regression models and life-tables. JRSS B, 34(2), 187-220.
Brown, C. C. (1975). On the use of indicator variables for studying the time-dependence of parameters in a response-time model. Biometrics, 31(4), 863-872.
Allison, P. D. (1982). Discrete-time methods for the analysis of event histories. Sociological Methodology, 13, 61-98.
Singer, J. D. and Willett, J. B. (2003). Applied Longitudinal Data Analysis. Oxford University Press.
Benkeser, D. and van der Laan, M. (2016). The Highly Adaptive Lasso estimator. IEEE DSAA, 689-696.
Examples
>>> import numpy as np >>> from hapc import hazard_hapc >>> rng = np.random.default_rng(0) >>> n = 200 >>> X = np.column_stack([rng.uniform(size=n), rng.integers(0, 2, n)]).astype(float) >>> grid = np.arange(1, 7) >>> def haz(t, x): return 1 / (1 + np.exp(-(-2.6 + 0.3 * t + 1.3 * x[0] - 0.9 * x[1]))) >>> Tev = np.full(n, grid.max()) >>> for i in range(n): ... for t in grid: ... if rng.random() < haz(t, X[i]): ... Tev[i] = t; break >>> C = rng.choice(grid, n) >>> Tobs = np.minimum(Tev, C); Delta = (Tev <= C).astype(float) >>> fit = hazard_hapc(X, Tobs, Delta, norm="1", max_degree=2, time_grid=grid) >>> bool(fit.best_lambda > 0) True
Result type¶
The field-by-field description is rendered from the class docstring.
- class hapc.HazardResult(hazard: ndarray, ids: ndarray, times: ndarray, Y: ndarray, time_grid: ndarray, lambdas: ndarray, risk: ndarray, best_lambda: float, interior: bool, cv: CVResult, predict_hazard: ndarray | None, predict_survival: ndarray | None)[source]¶
Output of
hazard_hapc().- hazard¶
Estimated discrete hazard for each person-period row (aligned with
ids/times); the cross-validated predictions at the winning λ.- Type:
np.ndarray, shape (N,)
- ids¶
Subject index (0-based) for each person-period row.
- Type:
np.ndarray, shape (N,)
- times¶
Grid time for each person-period row.
- Type:
np.ndarray, shape (N,)
- Y¶
Binary hazard label for each person-period row.
- Type:
np.ndarray, shape (N,)
- time_grid¶
The discrete time grid used.
- Type:
np.ndarray, shape (K,)
- lambdas¶
CV λ grid.
- Type:
np.ndarray, shape (L,)
- risk¶
Mean cross-validated logistic deviance per λ.
- Type:
np.ndarray, shape (L,)
- interior¶
Truewhenbest_lambdais strictly inside the grid (not at either endpoint) — a basic check that the grid brackets the optimum.- Type:
- cv¶
The full underlying
hapc.cv_hapc()result.- Type:
- predict_hazard¶
Hazard surface for new subjects (only when
predictis supplied).- Type:
np.ndarray or None, shape (m, K)
- predict_survival¶
Survival curves
S(t|x) = prod_{g<=t}(1 - hazard(g|x))for new subjects (only whenpredictis supplied).- Type:
np.ndarray or None, shape (m, K)