How the Galaxy-Halo Connection Depends on Large-Scale Environment

Kavli Affiliate: Risa H. Wechsler

| First 5 Authors: John F. Wu, Christian Kragh Jespersen, Risa H. Wechsler, ,

| Summary:

We investigate the connection between galaxies, dark matter halos, and their
large-scale environments at $z=0$ with Illustris TNG300 hydrodynamic simulation
data. We predict stellar masses from subhalo properties to test two types of
machine learning (ML) models: Explainable Boosting Machines (EBMs) with simple
galaxy environment features and $mathbb{E}(3)$-invariant graph neural networks
(GNNs). The best-performing EBM models leverage spherically averaged
overdensity features on $3$ Mpc scales. Interpretations via SHapley Additive
exPlanations (SHAP) also suggest that, in the context of the TNG300 galaxy-halo
connection, simple spherical overdensity on $sim 3$ Mpc scales is more
important than cosmic web distance features measured using the DisPerSE
algorithm. Meanwhile, a GNN with connectivity defined by a fixed linking
length, $L$, outperforms the EBM models by a significant margin. As we increase
the linking length scale, GNNs learn important environmental contributions up
to the largest scales we probe ($L=10$ Mpc). We conclude that $3$ Mpc distance
scales are most critical for describing the TNG galaxy-halo connection using
the spherical overdensity parameterization but that information on larger
scales, which is not captured by simple environmental parameters or cosmic web
features, can further augment these models. Our study highlights the benefits
of using interpretable ML algorithms to explain models of astrophysical
phenomena, and the power of using GNNs to flexibly learn complex relationships
directly from data while imposing constraints from physical symmetries.

| Search Query: ArXiv Query: search_query=au:”Risa H. Wechsler”&id_list=&start=0&max_results=3