Working with MaNGA Data

Table of Contents

Below, we collect some usage guidelines for the MaNGA data to help you navigate its complexity. Please make sure you understand and follow these guidelines because they can be critical to any science application of the data. Additional resources include Getting Started with MaNGA and our MaNGA tutorials.

Galaxies with Multiple Observations

The two primary identifiers for MaNGA observations are the combination of the plate and IFU identifiers (often referred to as the PLATEIFU) and the MaNGA ID. With a few exceptions (see here), the MaNGA IDs are unique to each galaxy observed, but the PLATEIFU number is not. MaNGA observed a few galaxies multiple times, and these observations can be identified using the DRPall catalog: search for PLATEIFU numbers with the same MaNGA ID. Galaxies with multiple observations can be useful for understanding the true error in the MaNGA measurements (cf., Westfall et al., 2019, AJ, 158, 231). Also, along with checking the data quality flags (see below), it is also useful to check for multiple observations of the same galaxy among subsamples of the MaNGA data, particularly for small samples. See a complete accounting of the number of galaxies observed by the MaNGA survey here.

Overall Data Quality Flags

The 3D phase of the DRP has an overall reduction quality bit DRP3QUAL that indicates any potential quality control issues with a given output file for each observation. Most of these issues, like shallow observations, are simply warnings that the data might not be of the usual quality. Flux-calibration failures, however, trigger the CRITICAL quality bit, which indicates that there may be severe problems with the data. This is determined by whether or not the astrometric calibration is successful without a substantial rescaling of the flux to match the imaging data. The DAP also provides a top-level data quality flag, DAPQUAL .

Critical failures occur in <1% of DR17 data cubes (see additional description here). These are a mixture of true critical failures (where, e.g., an IFU is badly out of focus, such as 7495-6103) and less critical issues where bright objects at the edges of the field or transients cause problems with the astrometric solution. Reasons for the latter can include some instances where the on-sky surface brightness distribution seems to be genuinely different from that predicted by the preimaging. In some cases the extra flux comes from astronomical transients such as supernovae in the MaNGA target galaxies, and in other cases from terrestrial transients (e.g., satellite trails, etc.). Such terrestrial transients are generally identified by visual inspection and the relevant exposures manually removed from the final set for a given galaxy (see bogey.par). Astronomical transients are not removed from the MaNGA data cubes.

Data Quality Arrays

Each MaNGA data cube has an associated 3D mask extension (MASK) describing the quality of a given pixel in the data cube DRP3PIXMASK, and whether it should be used in any analysis. This includes effects such as the IFU footprint, missing data, foreground stars (where known), etc. Any use of the MaNGA data cubes should consider these maskbits. Similarly, the DAP output MAPS and model LOGCUBE files provide mask extensions for each of the derived properties and model spectra; see DAPPIXMASK and DAPSPECMASK, respectively.

DRP: LSF Estimates

The estimates of the spectral line-spread function (LSF) are provided in four different ways for each data cube; both summary and per-spaxel values, and with and without accounting for pixellization effects of the convolution with the detector pixel response function. The per-spaxel values are given in the LSFPRE and LSFPOST extensions, while the median resolution for the entire cube is given in the PRESPECRES and SPECRES extensions.

The measurements of the LSF are performed for each fiber spectrum in the row-stacked spectra (RSS). The spectral resolution is given by R = λ / (2.355 σ(λ)), where σ(λ) are values in the LSFPRE and LSFPOST extensions and wavelength λ is in the WAVE extension. The SPECRES and PRESPECRES extensions are the per-wavelength median of the resolution measurements over all fiber spectra for a given PLATEIFU. For the data cubes, the measurements of σ(λ) for each spaxel is computed via a weighted sum of σ(λ) following the same interpolation algorithm as for the flux rectification (see Law et al. 2016, AJ, 152, 83).

In terms of the pixel integration, the choice of whether or not you should use the LSFPRE or LSFPOST extensions depends on how you plan to analyze the spectra. For example, if you fit a Gaussian function directly to the spectra in the LOGCUBE file, you're ignoring the effect of broadening due to pixel sampling and should account for it using the LSFPOST and/or SPECRES extensions. Alternatively, if you set up a template spectrum that you then convolve with a Gaussian kernel and pixel sampling grid, you're treating the sampling problem directly and likely want to use the LSF estimates in the LSFPRE and/or PRESPECRES extensions to avoid correcting for it twice. Note that the DAP analysis always uses the LSFPRE extension, and the LSF extension in the DAP model LOGCUBE file is based on this product of the DRP.

DRP: Datacube Covariance

Since the individual fiber spectra are combined together into a rectified data cube there is significant covariance between adjacent spaxels. When combining spectra from multiple spaxels, a rigorous calculation of the inverse variance in the combined spectrum must account for this covariance. Roughly, the calibration of the noise vector is:

n_{covar}/n_{nocovar} = 1 + 1.62 log(N_b),

for N_b < 100 and

n_{covar}/n_{nocovar} = 4.2,

for N_b >= 100.

where n_{nocovar} is determined via a nominal error calculation using the inverse variance provided in the datacube and N_b is the number of binned spaxels. The correction factor is constant above N_b = 100 because additional spaxels at that point are uncorrelated with the original spaxels. It is important to note that this calibration is dependent on the spaxels being adjacent to one another.

In DR17, this calculation can be done more rigorously using the correlation matrices included with each DRP data cube. Since the MaNGA reconstructed PSF is nearly constant with wavelength, the spatial covariance matrix also varies only slowly with wavelength. The DRP therefore describes the covariance in the data cubes by providing sparse correlation matrices (i.e., covariance matrices normalized so that all non-zero diagonal elements are unity) at `g`, `r`, `i`, and `z` bands that may be interpolated to any intermediate wavelength.

The correlation matrix is nominally extremely large, having about 5184 x 5184 elements for a 127-fiber IFU data cube. This is because there is an element describing the correlation of each of the spaxels in the 72x72 image slice with every other spaxel in that slice. The vast majority of elements are zero however, and the matrix is symmetric. The DRP therefore saves the information in sparse table format for substantial space savings. The correlation matrix table is a binary table containing one row for each non-zero element; see, e.g., the GCORREL extension in the LOGCUBE files here.

The binary-table extension has 5 columns: INDXI_C1, INDXI_C2, INDXJ_C1, INDXJ_C2, and RHOIJ.
Here (INDXI_C1, INDXI_C2) describes the x,y index of the first point in the flux array, (INDXJ_C1, INDXJ_C2) describes the x,y index of the second point in the flux array, and RHOIJ gives the correlation coefficient between the two spaxels. The header of the extension provides the wavelength at which the correlation matrix has been calculated (BBWAVE), the 0-indexed slice number associated with wavelength BBWAVE (BBINDEX), the type of matrix provided (either Covariance or Correlation; COVTYPE), and the dimensionality of the full correlation matrix COVSHAPE. There are facilities in both IDL and python to construct (sparse) matrices from these data.

When binning data, one can calculate the uncertainty in the binned spaxels by writing the binning operation as a matrix multiplication for each wavelength slice:

B = T x F,

where T is an N_b x N_s weight function that bins the N_s spaxels in the flattened flux map F into N_b bins, and

C = T x Σ x T^T

is the N_b x N_b covariance matrix for the binned fluxes and Σ is the N_s x N_s covariance read from the DRP file. Note that this calculation is performed by the DAP to determine the covariance in the binned spectra; however, this covariance is not provided in the output model LOGCUBE files. Also note that the DAP provides some convenience utilities for reading the DRP-produced covariance matrices; see here.

See also further discussion in Law et al. (2016, AJ, 152, 83) and Westfall et al. (2019, AJ, 158, 231).

Array indexing (IDL vs. astropy)

MaNGA's primary data products, both from the Data Reduction Pipeline (DRP) and the Data Analysis Pipeline (DAP), are FITS files. When reading these files, it is important to understand the ordering of the data within the array. Fits files were originally developed using FORTRAN, a row-major language. When reading the files using IDL, the intended ordering of the axes as (x,y,λ) is maintained. However, this ordering is transposed when using astropy.io.fits to (λ,y,x). Please see their FAQ, specifically the response to this question. Please see the MaNGA Python Tutorial for example code.

DAP: The Hybrid Binning Scheme

The DAP begins by binning spectra according to the prescribed method. Afterward, each DAP module only works with the "binned" spectra (called this even if a bin consists of a single spaxel), except for the HYB binning approach. In the HYB case, the emission-line modeling (see here) is done by first fitting the binned spectra and then distributing those results as a starting point for fitting the individual spaxels within each bin. By fitting the data as a hybrid between the Voronoi (VOR) binning and unbinned schemes, there are a few things to keep in mind:

Because the stellar kinematics are held fixed to the binned results during the spaxel-by-spaxel continuum+emission-line fit, there will be covariance among the emission-line and spectral-index results for spaxels associated with a single bin, beyond what one would expect from the datacube construction alone (described above). These covariances have not been characterized.
The binned spectra provided in the HYB model cube files are from the Voronoi binning step; however, the emission-line models are fit to the individual spaxels. When using the model cube files for this binning scheme:
- The stellar-continuum fits (provided by the STELLAR extension in the DAP model LOGCUBE file) should be compared to the Voronoi binned spectra in the file, but
- the best-fitting model spectra (stellar continuum + gas emission) in the MODEL extension should be compared to the individual spectra from the DRP LOGCUBE file!
Because the emission-line modeling is done on the individual spaxels, the emission-line moments (e.g., summed fluxes) are recalculated after the emission-line modeling to ensure the stellar continuum used for both the Gaussian model and the moment is identical. In the HYB case, this means the emission-line moments are provided for the individual spaxels. It also means that the spectral indices are measured on the individual spaxels because the emission-line model is first subtracted from the data before the index measurements.

DAP Map Corrections: Velocity Dispersions

WARNING: Some MAPS file extensions must be corrected to obtain the astrophysically relevant quantities as discussed here. See additional discussion here.

The stellar and gas velocity dispersion measurements must be corrected for instrumental resolution effects to obtain the astrophysical Doppler broadening. The corrected gas velocity dispersion is:

sigma_gas_corr = sqrt( square(EMLINE_GSIGMA) - square(EMLINE_INSTSIGMA) )

and the corrected stellar velocity dispersion is:

sigma_star_corr = sqrt( square(STELLAR_SIGMA) - square(STELLAR_SIGMACORR) ),

where EMLINE_GSIGMA, EMLINE_INSTSIGMA, STELLAR_SIGMA, STELLAR_SIGMACORR are all extensions in the MAPS file.

In both cases, beware of imaginary numbers. That is, when the correction is larger than the provided value, the above equations result in taking the square-root of a negative number. The correction for the stellar velocity dispersion measurements are based on the resolution difference between the MaNGA data and the MILES template library, as described in the DAP technical paper. See also a discussion in Law et al. 2021 on the implications of these imaginary numbers for systematic biases in the distribution of recovered astrophysical velocity dispersions as a function of the signal-to-noise ratio.

DAP Map Corrections: Spectral Indices

WARNING: Some MAPS file extensions must be corrected to obtain the astrophysically relevant quantities as discussed here. See additional discussion here.

The spectral-index measurements must be corrected for the influence of velocity dispersion on the continuum features; see the detail of these calculations in the DAP technical paper. To apply the corrections, you have to know the unit of each index, which can be determined using the Un (n is the number of the channel) header keywords in the SPECINDEX extension; see the data model description here.

For indices that are either unitless or in angstrom units:

specindex_ang_corr = SPECINDEX * SPECINDEX_CORR

and for magnitude units:

specindex_mag_corr = SPECINDEX + SPECINDEX_CORR,

where SPECINDEX and SPECINDEX_CORR are the relevant extensions in the MAPS file. The same applies to the SPECINDEX_BF extensions.

Tutorials are available that demonstrate how to apply these corrections, along with a greater discussion at our ReadTheDocs site.

Absorption-Line Index Definition

The DAP provides spectral index measurements following two slightly different definitions. The standard definition is provided by Worthey et al. (1994) and Trager et al. (1998); this was the only definition provided in DR15/DR16. In DR17, we include measurements following a second defintion based on the analysis of Burstein et al. (1984) and Faber et al. (1985). The fundamental difference is that the Worthey-Trager definitions perform integrals over the ratio of the flux to a measured continuum. Because of this, e.g., the mean of many index measurements from a set of spectra is not mathematically identical to a single index measurement made for a stack of those spectra. However, by performing the integration over the flux vector and the continuum vector separately, the Burstein-Faber index definition does ensure the two approaches (aggregation of indices from many spectra vs. a single index measurement on a spectral stack) are mathematically identical. If you expect to aggregate index measurements for on-sky regions, we recommend using the Burstein-Faber definitions, provided by the SPECINDEX_BF extensions and the associated weights in the SPECINDEX_WGT extensions. See additional discussion of the index definitions here and additional discussion of map aggregation issues here.