Skip to content

Latest commit

 

History

History
556 lines (332 loc) · 29.5 KB

README.md

File metadata and controls

556 lines (332 loc) · 29.5 KB

powspec

Codacy grade

Table of Contents

Introduction

This is a C program for computing auto and cross power spectrum multipoles P (k) given the 3-D positions of tracers. It is applicable for both simulations in periodic boxes, and survey-like data with arbitrary geometry and selection effects.

It is designed in the hope of being (both time and space) efficient, portable, and user-friendly. To this end, various operations are provided for pre-processing the data on-the-fly and in parallel, with a least number of external dependencies. Moreover, little programming knowledge is required for the usage of the code, and the user is only ask for a form-like configuration file.

This program is compliant with the ISO C99 and IEEE POSIX.1-2008 standards. Parallelisation can be enabled with OpenMP. Thus it is compatible with most of the modern C compilers and operating systems. It is written by Cheng Zhao (赵成). And as a whole, it is released under the GPLv3 license, due to its reliance on the FFTW library, though many of the source files are distributed under the MIT license, as indicated in their header lines.

If you use this tool in research that results in publications, please cite the following paper:

Zhao et al. 2020, arXiv:2007.08997

[TOC]

Algorithm

Simulation box

For simulation boxes, the data is assumed to be in a regular cuboid, with periodic boundary conditions on all directions. The side length of the box (Lbox) has then to be supplied by the user, and the 3-D coordinates of the input data (x0, x1, x2) must satisfy

0 ≤ xi < Lbox ,  i = 0, 1, 2.

Otherwise, the coordinates of the input catalogue have to be pre-processed, see Specifications of the input catalogues for details.

Particle assignment

For a cubic simulation box, tracers are distributed to a regular 3-D mesh for Fast Fourier Transform (FFT), with optionally the following grid assignment schemes [1]:

  • Nearest-Grid-Point (NGP):
    • w = 1, if d < 1/2;
    • w = 0, otherwise;
  • Could-In-Cell (CIC):
    • w = 1 − d, if d < 1;
    • w = 0, otherwise;
  • Triangular Shaped Cloud (TSC):
    • w = 3/4 − d2, if |d| < 1/2;
    • w = (3/2 − d)2/2, if 1/2 ≤ d < 1;
    • w = 0, otherwise;
  • Piecewise Cubic Spline (PCS):
    • w = (4 − 6 d2 + 3 d3)/6, if |d| < 1;
    • w = (2 − d)3/6, if 1 ≤ |d| < 2;
    • w = 0, otherwise;

where w denotes the number assigned to a grid point for each dimension, given a particle with distance component (d · H) to the grid point, with H being the side length of one cell.

The window effect induced by the particle assignment scheme is then corrected following [2].

[TOC]

Grid interlacing

To reduce the "alias" effects of particle assignments, the grid interlacing technique[1] is implemented. It generates an addition real-space density field with a shift of half the cell size (H/2) on all directions. The two densities fields are combined in Fourier space, and the combined field is then used for power spectrum evaluation.

Since the density field is real, we use the real-to-complex routines of the FFTW library for Fourier transforms, to increase both time and space efficiencies.

[TOC]

Multipole evaluation

The power spectrum is evaluated via mode counts of the Fourier space density field δ (k). And the multipoles are given by

P (k) = ⟨ | δ (k) δ (−k) | L (μ) ⟩ .

Here, the average ⟨•⟩ is taken over all grid points in Fourier space with klow ≤ |k| < khigh, where klow and khigh indicate the lower and upper limit of the k bin, respectively. And L indicates the Legendre polynomial, with

μ = (k · l)/|k| ,

where the unit line-of-sight vector l is supplied by the user.

[TOC]

Survey-like data

Due to the non-trivial geometry and completeness of survey-like data, random catalogues that encode the same selection functions as the data catalogues are required. And typically various weights are supplied for corrections of the incompleteness of the data. In addition, observational coordinates — Right Ascension (RA), Declination (Dec), and redshift (z) — are usually provided, and has to be converted to the comoving coordinates x0, x1, and x2.

Coordinate conversion

The key of the coordinate conversion process is the relationship between redshift z and radial comoving distance r. Once this relationship is evaluated, the comoving coordinates are then simply given by

x0 = r cos (Dec) cos (RA) ,

x1 = r cos (Dec) sin (RA) ,

x2 = r sin (Dec) .

This program implements the conversion from z to r within the framework of a wCDM cosmology:

r = ∫0z c d z′/[H0 E (z′)] ,

where H0 indicates the Hubble parameter at present (z = 0), and E (z) denotes the reduced Hubble parameter:

E2 (z) = Ωm (1+z)3 + Ωk (1+z)2 + ΩΛ (1+z)3(1+w) .

Here, Ωm, Ωk, and ΩΛ indicate the density parameter for matter, curvature, and dark energy at z = 0, respectively. And w denotes the dark energy equation of state. In particular, only Ωm, ΩΛ, and w are supplied by the user, as

Ωk = 1 − Ωm − ΩΛ .

Furthermore, to ensure E2 (z) > 0, the following condition has to be satisfied:

3wΩm(3w+1)/(3w)ΩΛ < Ωk[−(3w+1)ΩΛ](3w+1)/(3w) .

The integration for the z to r conversion is performed using the Legendre-Gauss Quadrature algorithm. The program uniformly samples 128 (customisable in define.h) redshift values in the redshift range of the input catalogues, and checks the maximum order needed for the desired integration precision. This order is then used for the coordinate conversion of all the input objects.

For non-wCDM cosmologies, the program itself cannot convert z to r, but it is able to interpolate a table of (z, r) pairs for coordinate conversions, where the unit of the radial comoving distance has to be Mpc/h. And the interpolation is performed using a cubic-spline algorithm[3].

[TOC]

Box configuration

To sample the tracer density field on grids, a cuboid that is large enough to include all the data needs to be specified. The side lengths of the cuboid can either be supplied by the user, or determined by the program automatically. For user-specified side lengths, the following condition has to be fulfilled:

xi, maxxi, minLbox, i ,   i = 0, 1, 2.

If the box size is not set, adaptive side lengths are evaluated as

Lbox, i = (xi, maxxi, min) · (1 + fpad, i) ,

where fpad, i indicates the user-supplied padding fraction of the box.

Once Lbox is decided, the input catalogues are placed at the centre of the box, and then the particle assignment scheme and corresponding corrections are applied in the same way as for simulation boxes, to generate the density fields.

[TOC]

Weighted density field

Given the data and random catalogues, the weighted density field is given by

F (r) = wFKP (r) · [nd (r) − αnr (r)] ,

where wFKP is the so-called FKP weight, for reducing the variance of the power spectra[4]. nd (r) and nr (r) are the density field for the data and random catalogues respectively. And α is the weighted ratio between the data and random samples:

α = ∑d wc, d / ∑r wc, r .

Here, wc, d and wc, r indicate the user-supplied completeness weights for the samples, and the sums are taken over the corresponding catalogues.

The weighted density field is then Fourier transformed using FFTW. Moreover, if grid interlacing is enabled, the combined Fourier space density field is further converted back to configuration space, using the complex-to-real routines, for power spectrum multipole estimations.

[TOC]

Multipole estimator

The power spectrum multipoles are estimated by[5][6]

P (k) = [ (2ℓ+1) ⟨F0 (k) F (k)⟩ − (1+α) I12 ] / I22 ,

where

Iab = ∫ d3r ña (r) wbFKP (r) ,

F (k) = ∫ d3r F (r) exp (ik · r) L [ k · r/(|k||r|) ] .

And ñ denotes the comoving number density of the tracers.

Furthermore, the Legendre polynomials L can be decomposed into products of real-form spherical harmonics Ym , the Fourier space density field multipoles are then[7]

F (k) = 4π/(2ℓ+1) ∑m Ym (k/|k|) ∫ d3r F (r) Y*m (r/|r|) exp (ik · r) .

Thus, for each of the multipole, only (2ℓ+1) FFTs are required.

[TOC]

Compilation

The dependencies of this program are listed below:

Library Mandatory Compilation flags*
FFTW -lfftw3
OpenMP -DOMP -fopenmp -lfftw3_omp (gcc, clang)
-DOMP -openmp -lfftw3_omp (icc)
CFITSIO -DWITH_CFITSIO -lcfitsio

*: paths to the header and library files are not included (e.g., -I/path/to/include -L/path/to/lib).

The density fields are stored as double-precision floating point numbers by default, with FFTs performed in double precision as well. If the memory consumption is an issue, the program can also be compiled with the -DSINGLE_PREC flag, to enabled single-precision density fields and FFTs. Note that in this case FFTW has to be compiled with single precision too (see the FFTW documentation for details).

Once the mandatory dependencies are installed, and the corresponding compilation flags are set in the Makefile, this program can be compiled with a C compiler that supports the C99 standard, by simply the command

make

By default, an executable POWSPEC is created on success.

[TOC]

Configuration parameters and command line options

Configuration files and command line options for this program are parsed using the libcfg library. The default name of the configuration file is powspec.conf, which is customisable in define.h. Custom configuration files can also be passed to the program using the -c or --conf command line options.

A list of the supported command line options can be displaced using the -h or --help flags, and a template configuration file is printed with the -t or --template flags. Please consult libcfg for the formats of the configuration files and command line options.

Specifications of the input catalogues

DATA_CATALOG, RAND_CATALOG

Filename of the input data and random catalogues. They can be either strings or 2-element string arrays. If the cross power spectrum is required, then the data and random catalogues must be string arrays. For simulation boxes, RAND_CATALOG is not mandatory.

Examples

DATA_CATALOG = input_galaxy_catalog.txt
DATA_CATALOG = [input_galaxy_catalog.txt]
DATA_CATALOG = [ galaxy_catalog_1.txt, galaxy_catalog_2.txt ]
DATA_CATALOG = [ galaxy_catalog_1.txt, \
                 galaxy_catalog_2.txt ]

[TOC]

DATA_FORMAT, RAND_FORMAT

Format of the input catalogs. They must be integers, or 2-element integer arrays, depending on the dimension of DATA_CATALOG and RAND_CATALOG respectively. The allowed values are:

  • 0: for ASCII format text files (default);
  • 1: for FITS tables.

In particular, FITS tables are supported via the CFITSIO library, so the compilation flag -DWITH_CFITSIO has to be enabled for reading FITS files.

Examples

DATA_FORMAT = 0
DATA_FORMAT = [0,1]

[TOC]

DATA_SKIP, RAND_SKIP

Number of lines to be skipped for ASCII format input files. They can be non-negative long integers or long integer arrays, depending on the dimension of DATA_CATALOG and RAND_CATALOG respectively.

Examples

DATA_SKIP = 10
DATA_SKIP = [0,5]

[TOC]

DATA_COMMENT, RAND_COMMENT

Indicator of comment lines for ASCII format input files. They can be characters or 2-element character arrays, depending on the dimension of DATA_CATALOG and RAND_CATALOG respectively. If the first non-whitespace character of a line is the specified character, then the whole line of the input catalogue is omitted. If empty characters ('') are supplied, then no line is treated as comments.

Examples

DATA_COMMENT = '#'
DATA_COMMENT = [ '', '!' ]

[TOC]

DATA_FORMATTER, RAND_FORMATTER

C99-style formatter specifying the format of columns of ASCII format input files. They must be strings or 2-element string arrays, depending on the dimension of DATA_CATALOG and RAND_CATALOG respectively. Note however that only formatters with the following argument types are supported (see cppreference.com for details):

  • int *
  • long *
  • float *
  • double *
  • char *

Examples

DATA_FORMATTER = "%d %ld %f %lf %s"  # for int, long, float, double, and string types
DATA_FORMATTER = "%*d,%10s,%[123]"
        # Column separators are ',';
        # The first column is treated as an integer, but is omitted;
        # The second column is a string with 10 characters;
        # The third column is a string composed of characters '1', '2', and '3'.

[TOC]

DATA_POSITION, RAND_POSITION

3-D coordinates, in the order of [x0, x1, x2] or [RA, Dec, redshift], where RA and Dec must be in degrees. They must be 3- or 6-element string arrays. If DATA_CATALOG or RAND_CATALOG contains only one element, then the corresponding positions must be 3-element arrays. While if there are two elements for DATA_CATALOG or RAND_CATALOG, there should be 6 elements for the positions.

The strings must be column indicators, or expressions, which are parsed using the libast library. Columns are indicated by ${•}, where must be a number for ASCII format files, and a string for FITS tables. For instance, the 3rd column of an ASCII file can be indicated by $3, and the "RA" column of a FITS table can be indicated by ${RA}. Note that if there are more than one digits for the ASCII column numbers, the number must be enclosed by braces, e.g, ${10}. And if an ASCII column is omitted via the formatter (e.g. %*lf), it is not counted for the column number.

Moreover, expressions are supported for pre-processing the columns, with some basic arithmetic operators, and mathematical functions (see libast for details).

Examples

DATA_POSITION = [${RA}*180/3.1415927, ${DEC}, ${Z}, ${RA}, ${DEC}, ${Z}]
DATA_POSITION = [($1+1000)%1000, ($2+1000)%1000, ($3+1000)%1000]
DATA_POSITION = [$1, $2, ($3+$6*(1+0.6)/(100*sqrt(0.31*(1+0.6)**3+0.69))+1000)%1000]
      # The last expression implies real to redshift space conversion
      # with periodic boundary conditions given the box size 1000 Mpc/h,
      # in a FlatLCDM cosmology with Omega_m = 0.31, at redshift 0.6.

[TOC]

DATA_WT_COMP, RAND_WT_COMP

Completeness weights for the data and randoms (see Weighted density field). They can be column indicators or expressions, or the corresponding 2-element string arrays.

Examples

DATA_WT_COMP = ${WEIGHT_SYSTOT} * ${WEIGHT_NOZ} * ${WEIGHT_CP}
DATA_WT_COMP = [1, $4 * ($5 + $6 - 1)]

[TOC]

DATA_WT_FKP, RAND_WT_FKP

FKP weights for the data and randoms (see Weighted density field). They can be column indicators or expressions, or the corresponding 2-element string arrays. They are not used for simulation boxes.

Examples

DATA_WT_FKP = [$5, $5]
DATA_WT_FKP = 1 / (6000 * ${NZ})

[TOC]

DATA_NZ, RAND_NZ

Radial comoving number density of the data and randoms. They can be column indicators or expressions, or the corresponding 2-element string arrays. They are used for the normalisation of the power spectra (see Multipole estimator for details). By default, RAND_NZ is used. And if RAND_NZ is unset, then DATA_NZ is used for the normalisation. They are not used for simulation boxes.

Examples

RAND_NZ = 5678000 / 1000**3
RAND_NZ = [${NZ}, ${NZ}]

[TOC]

DATA_SELECTION, RAND_SELECTION

Selection criteria for the input catalogues. They can be column indicators or expressions, or the corresponding 2-element string arrays. Numerical, bitwise, and logical expressions are all supported. Only objects with columns fulfilling the conditions are kept.

Examples

DATA_SELECTION = isfinite($1) && $3 == "YES" && log($4) < 1.0
DATA_SELECTION = [$5 & 1 != 0, $1 != "no"]

[TOC]

DATA_CONVERT, RAND_CONVERT

Specify whether coordinate conversion is needed for the data and random catalogues (see Coordinate conversion for details). They must be boolean values or 2-element boolean arrays, depending on the dimension of DATA_CATALOG and RAND_CATALOG respectively. If the conversion for any of the catalogues is enabled, the parameters specified in the section Fiducial cosmology for coordinate conversion are used.

Examples

DATA_CONVERT = [T,F]
DATA_CONVERT = [ True, 0 ]  # 0 for false

[TOC]

Fiducial cosmology for coordinate conversion

The formulae for the coordinate conversion are detailed in the Coordinate conversion section.

OMEGA_M

Density parameter of matter at z = 0. It must be a double-precision floating point number, in the range (0, 1].

OMEGA_LAMBDA

Density parameter of dark energy at z = 0. It must be a double-precision floating point number, and ≥ 0. By default it is 1 − OMEGA_M.

DE_EOS_W

Dark energy equation of state. It must be a double-precision floating point number, and ≤ 1/3. By default it is −1.

CMVDST_ERR

The tolerance of the integration error. It must be larger than the machine epsilon, i.e., around 1e-16.

Z_CMVDST_CNVT

Filename of an ASCII table for redshift to radial comoving distance (in Mpc/h) conversion. The first two columns of this file have to be redshift and radial comoving distance, respectively. If the columns or units are not appropriate, the file can be passed to the program via command line options and pipe, e.g.

./POWSPEC --cmvdst-file <(awk '{printf("%lf %lf\n", $3, $4 * 0.676)}' input_cnvt_file.txt)

[TOC]

Configurations for power spectra evaluation

CUBIC_SIM

Indicate whether the input catalogs are from cubic simulation boxes. It must be a boolean value.

Examples

CUBIC_SUM = T

LINE_OF_SIGHT

A unit vector defining the line of sight for cubic simulation boxes (see Multipole evaluation). By default it is [0,0,1], i.e., the line of sight is along the 3rd Cartesian axis.

BOX_SIZE

Side length of the box that catalogues are placed in. It can be a double-precision floating-point number, or 3-element double array. In particular, a single double number implies a regular cuboid, or the side length on all directions are identical. Otherwise, box size on different directions are read from the array.

It is mandatory for simulation boxes. And adaptive box sizes are evaluated if it is not supplied for a survey-like data.

Examples

BOX_SIZE = 1000
BOX_SIZE = [600, 800, 1000]

BOX_PAD

Padding fraction for the adaptive box size. It can be a double-precision floating-point number, or 3-element double array. And it is essentially fpad, i in section Box configuration.

GRID_SIZE

Number of grid cells per box side for the density fields. It must be a positive integer, preferably a power of 2.

PARTICLE_ASSIGN

Particle assignment scheme (see section Particle assignment). It must be an integer, and allowed values are:

  • 0: Nearest-Grid-Point (NGP);
  • 1: Could-In-Cell (CIC);
  • 2: Triangular Shaped Cloud (TSC);
  • 3: Piecewise Cubic Spline (PCS).

GRID_INTERLACE

A boolean option indicating whether to use interlaced grids (see Grid interlacing).

[TOC]

Settings for the output

MULTIPOLE

Legendre multipoles of the power spectra to be evaluated. It must be an non-negative integer, or integer arrays. The current maximum supported ℓ is 6.

Examples

MULTIPOLE = 0
MULTIPOLE = [0,1,2,3,4,5,6]

LOG_SCALE

A boolean value indicating whether to set wave number bins in logarithm scale.

KMIN

Lower boundary of the first wave number bin. It can be a non-negative double-precision floating-point number. For logarithm scale bins, it must be positive.

KMAX

Upper boundary of the last wave number bin. It can be a non-negative double-precision floating-point number, and must be larger than KMIN. If it is unset, then the Nyquist frequency is used.

BIN_SIZE

Width of each wave number bin. It must be a double-precision floating-point number. And for logarithm scales, it indicates the base-10 logarithm of the ratio between two adjacent wave number bin edges.

OUTPUT_AUTO

Name of the output files for auto power spectrum multipoles. It can be a string or 2-element string array. In particular, auto power spectra can be omitted by setting an empty string.

Examples

OUTPUT_AUTO = output_auto_pk.txt
OUTPUT_AUTO = ["", output_auto_pk2.txt]
        # The auto power spectrum multipoles for the first catalogue are omitted.

OUTPUT_CROSS

String, for the name of the output file for cross power spectrum multipoles.

OUTPUT_HEADER

A boolean value indicating whether to write extra information as the header of the output files, including number of objects in the input catalogues, configurations of the box and grids, and the shot noise correction and normalisation factors.

OVERWRITE

An integer value indicating whether to overwrite existing files. Allowed values are

  • 0: quit the program when an output file exist;
  • postive: force overwriting output files whenever possible;
  • negative: notify at most this number (absolute value) of times, for asking whether overwriting existing files.

VERBOSE

A boolean value indicating whether to show detailed standard outputs.

[TOC]

Acknowledgements

This program benefits greatly from the open-source nbodykit project[8]. And I thank Dr. Chia-Hsun Chuang for helpful discussions on the early-stage development of this program.

[TOC]

References

[1] Sefusatti, Crocce, Scoccimarro, Couchman, 2016, Correcting for the Alias Effect When Measuring the Power Spectrum Using a Fast Fourier Transform, MNRAS, 460(4):3624–3636 (arXiv:1512.07295)

[2] Jing, 2005, Accurate estimators of correlation functions in Fourier space, ApJ, 620(2):559–563 (arXiv:astro-ph/0409240)

[3] Hornbeck, 2020, Fast Cubic Spline Interpolation, arXiv e-prints, 2001.09253 (source code)

[4] Feldman, Kaiser, Peacock, 1994, Power-Spectrum Analysis of Three-dimensional Redshift Surveys, ApJ, 426:23 (arXiv:astro-ph/9304022)

[5] Sefusatti, 2005, Probing fundamental physics with large-scale structure: From galaxy formation to inflation, PhD Thesis, New York University)

[6] Yamamoto, Nakamichi, Kamino, Bassett, Nishioka, 2006, A Measurement of the Quadrupole Power Spectrum in the Clustering of the 2dF QSO Survey, PASJ, 93:102 (arXiv:astro-ph/0505115)

[7] Hand, Li, Slepian, Seljak, 2017, An optimal FFT-based anisotropic power spectrum estimator, JCAP, 2017:2 (arXiv:1704.02357)

[8] Hand, Feng, Beutler, Li, Modi, Seljak, Slepian, 2018, nbodykit: An Open-source, Massively Parallel Toolkit for Large-scale Structure, AJ, 156(4):160 (arXiv:1712.05834)

[TOC]