How the Galaxy-Halo Connection Depends on Large-Scale Environment

Kavli Affiliate: Risa H. Wechsler

| First 5 Authors: John F. Wu, Christian Kragh Jespersen, Risa H. Wechsler, ,

| Summary:

We investigate the connection between galaxies, dark matter halos, and their
large-scale environments with Illustris TNG300 hydrodynamic simulation data. We
predict stellar masses from subhalo properties to test two types of machine
learning (ML) models: Explainable Boosting Machines (EBMs) with simple galaxy
environment features and E$(3)$-invariant graph neural networks (GNNs). The
best-performing EBM models leverage spherically averaged overdensity features
on $3$ Mpc scales. Interpretations via SHapley Additive exPlanations (SHAP)
also suggest that, in the context of the TNG300 galaxy–halo connection, simple
spherical overdensity on $sim 3$ Mpc scales is more important than cosmic web
distance features measured using the DisPerSE algorithm. Meanwhile, a GNN with
connectivity defined by a fixed linking length, $L$, outperforms the EBM models
by a significant margin. As we increase the linking length scale, GNNs learn
important environmental contributions up to the largest scales we probe ($L =
10$ Mpc). We conclude that $3$ Mpc distance scales are most critical for
describing the TNG galaxy–halo connection using the spherical overdensity
parameterization but that information on larger scales, which is not captured
by simple environmental parameters or cosmic web features, can further augment
these models. Our study highlights the benefits of using interpretable ML
algorithms to explain models of astrophysical phenomena, and the power of using
GNNs to flexibly learn complex relationships directly from data while imposing
constraints from physical symmetries.

| Search Query: ArXiv Query: search_query=au:”Risa H. Wechsler”&id_list=&start=0&max_results=3