Discrete-time survival (family = "logit-hazard")

hapc.hazard_hapc() fits a discrete-time logistic hazard model with HAPC from right-censored survival data (X, T, Delta). It performs the person-period expansion, prepends the visit time as the first HAL covariate, and cross-validates the binomial fit. The full statistical derivation (model, person-period likelihood, survival function) and references are in the function docstring below.

The same routine is reachable from the cross-validation dispatcher as cv_hapc(X, T, family="logit-hazard", Delta=Delta, norm="1").

hapc.hazard_hapc(X: ndarray, T: ndarray, Delta: ndarray, norm: str = '1', max_degree: int = 1, npcs: int | None = None, time_grid: ndarray | None = None, log_lambda_min: float = -4, log_lambda_max: float = -1, grid_length: int = 15, nfolds: int = 5, predict: ndarray | None = None, center: bool = True, verbose: bool = False, max_iter: int = 5000, tol: float = 0.001, step_factor: float = 0.8) HazardResult[source]

Discrete-time logistic hazard HAPC fit (family="logit-hazard").

Convenience wrapper around hapc.cv_hapc() with family="binomial". The right-censored survival data (X, T, Delta) are expanded into a person-period table (one row per subject-per-interval-at-risk) whose binary response is the discrete hazard indicator, the visit time is prepended as the first HAL covariate, and the regularisation parameter lambda is chosen by cross-validated logistic deviance. R counterpart: hapc::hazard.hapc().

Parameters:
  • X (np.ndarray, shape (n, p)) – Baseline covariates, one row per subject.

  • T (np.ndarray, shape (n,)) – Observed times T_i = min(T_event_i, C_i). Assumed discrete.

  • Delta (np.ndarray, shape (n,)) – Event indicators Delta_i in {0,1} (1 = event, 0 = right-censored).

  • norm ({"1", "2"}, default "1") – "1" = logistic LASSO, "2" = logistic ridge. "sv" raises NotImplementedError.

  • max_degree (int, default 1) – HAL interaction order over [time, X].

  • npcs (int, optional) – Number of principal components. Defaults to the number of person-period rows (capped internally as in hapc.cv_hapc()).

  • time_grid (np.ndarray, optional) – Discrete time grid (risk-set grid). Defaults to min(T):max(T) when T is integer-valued, else np.unique(T). Subjects are assumed at risk from min(time_grid) onward.

  • log_lambda_min (float / int) – Log-λ CV grid (defaults -4, -1, 15).

  • log_lambda_max (float / int) – Log-λ CV grid (defaults -4, -1, 15).

  • grid_length (float / int) – Log-λ CV grid (defaults -4, -1, 15).

  • nfolds (int, default 5) – Number of CV folds.

  • predict (np.ndarray, optional, shape (m, p)) – Baseline covariates for new subjects. When supplied the model (refit at the CV-selected λ) is evaluated on the full time grid for each new subject, returning the hazard surface and the implied survival curves.

  • center – Passed through to hapc.cv_hapc() / hapc.hapc().

  • verbose – Passed through to hapc.cv_hapc() / hapc.hapc().

  • max_iter – Passed through to hapc.cv_hapc() / hapc.hapc().

  • tol – Passed through to hapc.cv_hapc() / hapc.hapc().

  • step_factor – Passed through to hapc.cv_hapc() / hapc.hapc().

Returns:

HazardResult – See HazardResult. hazard holds the cross-validated discrete hazard for each person-period row; predict_hazard / predict_survival are populated when predict is supplied.

Notes

Model. The discrete hazard is the conditional event probability in interval t given survival up to t,

lambda(t | x) = P(T_event = t | T_event >= t, X = x),

modelled on the logit scale by a Highly Adaptive Principal Components fit f of the augmented covariate (t, x):

logit lambda(t | x) = f(t, x).

The HAL basis spans indicator (and, for max_degree > 1, interaction) tensor products in (t, x), so the time effect, the covariate effects and their interactions are estimated nonparametrically; the L1/L2 penalty (norm) controls smoothness and is tuned by cross-validation.

Person-period likelihood. Under independent right-censoring the observed-data likelihood factorises over the at-risk intervals,

prod_i prod_{t <= T_i}

lambda(t | x_i) ** Y_it * (1 - lambda(t | x_i)) ** (1 - Y_it),

with Y_it = 1(T_event_i = t). This is exactly the Bernoulli (logistic) likelihood of the expanded person-period table, so fitting a binomial HAPC model to Y_it against (t, x_i) estimates the discrete hazard (Cox 1972; Brown 1975; Allison 1982).

Survival. The conditional survival function follows from the estimated hazard by the product-limit relation

S(t | x) = P(T_event > t | x) = prod_{s <= t} (1 - lambda(s | x)),

returned in predict_survival for new subjects when predict is given.

References

Cox, D. R. (1972). Regression models and life-tables. JRSS B, 34(2), 187-220.

Brown, C. C. (1975). On the use of indicator variables for studying the time-dependence of parameters in a response-time model. Biometrics, 31(4), 863-872.

Allison, P. D. (1982). Discrete-time methods for the analysis of event histories. Sociological Methodology, 13, 61-98.

Singer, J. D. and Willett, J. B. (2003). Applied Longitudinal Data Analysis. Oxford University Press.

Benkeser, D. and van der Laan, M. (2016). The Highly Adaptive Lasso estimator. IEEE DSAA, 689-696.

Examples

>>> import numpy as np
>>> from hapc import hazard_hapc
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> X = np.column_stack([rng.uniform(size=n), rng.integers(0, 2, n)]).astype(float)
>>> grid = np.arange(1, 7)
>>> def haz(t, x): return 1 / (1 + np.exp(-(-2.6 + 0.3 * t + 1.3 * x[0] - 0.9 * x[1])))
>>> Tev = np.full(n, grid.max())
>>> for i in range(n):
...     for t in grid:
...         if rng.random() < haz(t, X[i]):
...             Tev[i] = t; break
>>> C = rng.choice(grid, n)
>>> Tobs = np.minimum(Tev, C); Delta = (Tev <= C).astype(float)
>>> fit = hazard_hapc(X, Tobs, Delta, norm="1", max_degree=2, time_grid=grid)
>>> bool(fit.best_lambda > 0)
True

Result type

The field-by-field description is rendered from the class docstring.

class hapc.HazardResult(hazard: ndarray, ids: ndarray, times: ndarray, Y: ndarray, time_grid: ndarray, lambdas: ndarray, risk: ndarray, best_lambda: float, interior: bool, cv: CVResult, predict_hazard: ndarray | None, predict_survival: ndarray | None)[source]

Output of hazard_hapc().

hazard

Estimated discrete hazard for each person-period row (aligned with ids/times); the cross-validated predictions at the winning λ.

Type:

np.ndarray, shape (N,)

ids

Subject index (0-based) for each person-period row.

Type:

np.ndarray, shape (N,)

times

Grid time for each person-period row.

Type:

np.ndarray, shape (N,)

Y

Binary hazard label for each person-period row.

Type:

np.ndarray, shape (N,)

time_grid

The discrete time grid used.

Type:

np.ndarray, shape (K,)

lambdas

CV λ grid.

Type:

np.ndarray, shape (L,)

risk

Mean cross-validated logistic deviance per λ.

Type:

np.ndarray, shape (L,)

best_lambda

Deviance-minimising λ.

Type:

float

interior

True when best_lambda is strictly inside the grid (not at either endpoint) — a basic check that the grid brackets the optimum.

Type:

bool

cv

The full underlying hapc.cv_hapc() result.

Type:

CVResult

predict_hazard

Hazard surface for new subjects (only when predict is supplied).

Type:

np.ndarray or None, shape (m, K)

predict_survival

Survival curves S(t|x) = prod_{g<=t}(1 - hazard(g|x)) for new subjects (only when predict is supplied).

Type:

np.ndarray or None, shape (m, K)