We introduce sparse feature networks (SFNets), which contain a simple top-k sparsity constraint in their penultimate layers. We show that these SFNets can predict galaxy properties, such as gas metallicity or BPT line ratios, directly from image cutouts. SFNets produce interpretable feature activations, which can then be studied to better understand galaxy formation and evolution.
This software uses fastai
, built atop pytorch
, and a few other packages that are commonly found in the data science stack. We've tested that this code works using fastai==2.7.17
and torch==2.4.1
on both Linux and macOS.
Install requirements with:
pip install torch fastai numpy pandas matplotlib cmasher tqdm
./
├── data/
│ ├── images-sdss/
│ └── galaxies.csv
├── model/
├── results/
└── src/
├── config.py
├── dataloader.py
├── model.py
├── main.py
└── trainer.py
-
Prepare your data:
- For convenience, the data can all be obtained via Zenodo. Simply download the
images-sdss.tar.gz
and unpack it (tar xzf images-sdss.tar.gz
), and also downloadgalaxies.csv
. - Alternatively, you can obtain the data directly from the source:
- Construct
galaxies.csv
with the required columns (objID
,oh_p50
for metallicity, or line flux measurements for BPT analysis). We used CASJobs to download galaxies using this query, and then enforced a signal-to-noise ratio (SNR) cut of 3 for all spectral lines. - Download SDSS galaxy images into
data/images-sdss/
. We used the DESI Legacy Viewer to download via the RESTful interface, e.g.http://legacysurvey.org/viewer/cutout.jpg?ra={ra}&dec={dec}&pixscale=0.262&layer=sdss&size=160
.
- Construct
- For convenience, the data can all be obtained via Zenodo. Simply download the
-
Run experiments:
- Modify and run the main
python main.py
- Modify and run the main
from config import ExperimentConfig, DataConfig, TrainingConfig
from trainer import ModelTrainer
config = ExperimentConfig(
name="metallicity_experiments",
target="metallicity",
k=2,
model_dir=Path("../model"),
results_dir=Path("../results"),
data_config=DataConfig(),
training_config=TrainingConfig()
)
# Train models
trainer = ModelTrainer(config)
trainer.train_model()
The trained model weights can also be found on Zenodo.
Additionally, we have uploaded our trained model weights and sparse activation results here. The optimized ResNetTopK18
models should be able to reproduce the results shown in the paper.
This paper can be found on arXiv. For now, please use the following citation:
@ARTICLE{2025arXiv250100089W,
author = {{Wu}, John F.},
title = {Insights on Galaxy Evolution from Interpretable Sparse Feature Networks},
journal = {arXiv e-prints},
keywords = {Astrophysics - Astrophysics of Galaxies, Computer Science - Machine Learning},
year = 2024,
month = dec,
eid = {arXiv:2501.00089},
pages = {arXiv:2501.00089},
doi = {10.48550/arXiv.2501.00089},
archivePrefix = {arXiv},
eprint = {2501.00089},
primaryClass = {astro-ph.GA},
adsurl = {https://ui.adsabs.harvard.edu/abs/2025arXiv250100089W},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
This project is licensed under the MIT License; please see the LICENSE
file for details.