Diagnosis Structure Explains Much of the Apparent Mutation-Burden Survival Signal in Public Pediatric Brain Tumor Atlas Data
Public pediatric brain tumor datasets now make pan-histology molecular prognostic analyses possible, but that breadth creates a hard statistical problem: tumor class, molecular burden, treatment, and outcome are not exchangeable across diagnoses. We reanalyzed public summary files from OpenPBTA release v23, merging harmonized clinical annotations with coding tumor mutation burden for 782 survival-annotated molecular profiles from 943 enrolled participants. Hypermutated samples, defined as coding tumor mutation burden at or above 10 mutations/Mb, had worse unadjusted overall survival than lower-burden samples (log-rank p=0.0036), but the hypermutated group was small (n=7) and diagnosis-mixed. Mutation burden also varied strongly across broad histology classes. After residualizing log-transformed mutation burden against age at diagnosis and broad histology, the association with observed death weakened (Mann-Whitney p=0.089). Within the three broad histology strata large enough for sensitivity checks, median-split TMB did not show a consistent high-burden survival penalty. These results do not support mutation burden as a diagnosis-independent prognostic marker in this public summary-data setting. They instead quantify a useful failure mode: pan-pediatric CNS tumor models can rediscover diagnosis composition unless diagnosis structure is modeled explicitly. OpenPBTA remains an unusually valuable resource, but survival modeling from public summaries should be framed as hypothesis generation unless treatment, molecular subtype, sampling phase, and diagnosis-specific effects are incorporated.