BOSS Pipeline

The SDSS Early Data Release (EDR) paper is the original resource for understanding the processing and data products from the SDSS, describing the pipelines and spectroscopic data products. Successive data release papers: DR1DR2DR3DR4DR5DR6DR7DR8DR9DR10, DR11, DR12DR13DR14DR15DR16, and DR17 describe changes to the optical spectroscopic data reduction between data releases. The technical summary paper provides more general information on the SDSS-I survey, the SDSS-III summary paper provides general information on SDSS-III, and similar for the SDSS-IV summary paper. The BOSS overview paper provides general information for BOSS, the eBOSS overview paper also describes SEQUELS and the eBOSS program.

Most of the optical spectroscopic catalog data (but not the spectra themselves) have been loaded into the Catalog Archive Server (CAS) database. Depending on the scientific use case, users may be better off obtaining SDSS data through a carefully constructed CAS query rather than by downloading the data files from the SAS. Simple queries can be used to select just the objects and parameters of interest, while more complex queries can be used to do complex calculations on many objects, thereby avoiding the need to download the data.

The current BOSS spectrographs were an upgrade of the original SDSS spectrographs used prior to DR8. The SDSS spectrographs originally had 320 fibers each, while the BOSS spectrographs are 500 fibers each.

The remainder of this page gives a short description of pipeline changes for DR18 motivated by the SDSS-V Plate Program. We provide a brief overview of spectroscopic data processing and sections that describe the steps of data processing in detail. In addition to the descriptions, each section provides references to papers that give additional details and a table of the files associated with that step of the pipeline that can be found in the SAS. These tables include links to the file format documentation (the “data model”). The SAS URLs and paths can be determined using the sdss_access tools.

Changes for DR18

  • Improved radial velocity precision, with the use of skylines as second-level calibrator
  • Improved SNR on the final reduction spectra
  • Updated internal python dependencies to python3
  • improved QSO redshift estimation with the addition of new QSO PCA templates
  • Implemented the use of the dustmaps python package to get the Bayestar 2015 3D dustmaps.
  • Added the pyXCSAO python package for calculating stellar radial velocities.
  • The addition of target level coadds of all exposures of targets on the SDSS-V eFEDs plates.
  • Implemented a coadding scheme that combines the red and blue data for each exposure, and also combines red and blue for all exposures to produce coadds for each exposure and the full observation epoch.
  • Changed from coadding based on fiber-ID to coadding based on either RA-DEC or CatalogID

While SDSS-V is a different survey than eBOSS (or BOSS before it), it uses the same instrument and has similar data to BOSS. However, at the completion of SDSS-IV, one of the twin BOSS spectrographs was removed from the Apache Point (APO) 2.5-meters SDSS telescope to be installed in Chile at The Irénée du Pont Telescope at Las Campanas Observatory (LCO). This reduced the potential BOSS fibers per plate from 1000 to 500. DR18 is limited to the reduction of the BHM lead eFEDs SDSS-V plates with the run2d=run1d=v6_0_4 version of idlspec2d and does not contain any reprocessing of SDSS-I/-II/-III/-IV data. This version and all SDSS-V versions of the pipeline are available from the SDSS-V GitHub, with older depreciated versions (v5 and earlier) associated with DR17 and earlier available in the SDSS-IV SVN. These are documented on the DR17 pipeline pages. This DR17 data, which is still available in the SAS and CAS, was run with idlspec2d run2d=run1d=v5_13_2 and contained data from DR9-DR17 using the BOSS spectrograph. Earlier data, obtained using the old SDSS spectrographs are reduced with spec2d 26 on the SAS and CAS.

DR19 and Future Developments

As part of SDSS-V, the old SDSS fiber plug-plates (like those released in DR18) are begin replaced by robotic fiber positioners in a new Focal Plane System (FPS). Future versions (v6_0_5+) of the BOSS Pipeline contain significant modifications for the reduction of this new FPS system data for data release DR19+.

idlspec2d

The idlspec2d software has two major pipeline steps:
spec2d: Extract and calibrate 1-dimensional spectra from 2-dimensional raw CCD data
spec1d: Measure object classifications, redshifts, and stellar parameters from those 1D spectra

Two-dimension Pipeline (spec2d)

References: Stoughton et al. (2002), section 4.10.1

The spec2d pipeline reads science and calibration exposures from the BOSS spectrograph, reduces and calibrates the science exposures, extracts the one-dimensional spectra from the two-dimensional exposures, stacks multiple exposures into combined spectra, and produces corresponding masks and noise estimates.

SAS Files used or generated by the spec2d pipeline

File TypeInput/outputDescription
sdRInputraw spectroscopic data frames
spPlan2dInputthe spectro2d processing plan
spPlancombInputthe processing plan for combining spectra
plPlugMapMInputrecords which fiber corresponds to which hole in a plate (and therefore objects, and what coordinates on the sky)
plateHolesInputrecords additional target/fiber meta data on a per plate basis
spFrameOutputcalibrated spectra for a single CCD and exposure
spCFrameOutputthe 500 (for SDSS-V) combined flux- and wavelength-calibrated spectra over all exposures (potentially spanning multiple nights) for a given mapped plate

spec2d Data Flow

SDSS-V utilizes the two BOSS spectrographs with 500 fibers each, grouped in 25 bundles of 20 fibers each. However, one spectrograph was removed from APO to be relocated to LCO to observe the southern sky. In SDSS-V BOSS plate operations, only the APO spectrograph was utilized.

The BOSS spectrographs (Smee et al. 2013) split the light into a blue channel and a red channel, for a total of 2 CCD images per exposure. The CCD y-coordinate is the spectral dispersion direction (larger y is larger λ) and larger x is larger fiber number, though the spectral “traces” in y vs. x are curved and do not exactly align with CCD columns.

Raw electrons are extracted from the CCD images using row-by-row extractions similar to Horne 1986 by fitting Gaussians, plus a polynomial background, to each CCD row for each bundle of 20 fibers.

Fiber flats correct for fiber-to-fiber variations by comparing the differences between fibers equally illuminated by a smooth flat lamp spectrum.

Sky model is derived from flat-fielded electrons of sky fibers and then interpolated to the locations of every science fiber and subtracted. Flux calibration vectors model the instrument and atmospheric throughput per exposure by comparing standard star spectra to a set of models of known flux.

Flux correction vectors adjust for flux mis-calibrations with low-order polynomials per-fiber per-exposure to make different exposures of the same object consistent with each other.

Flux distortion vectors model variations in the throughput across the focal plane.

Putting these together, the “flat-fielded sky-subtracted electrons” in spFrame:

    \[ F_e=electrons/(superflat\cdot fiberflat)−skymodel \]

become the “calibrated flux” in spCFrame:

    \[ F=(F_e/{calib})\cdot fluxcorr \cdot fluxdistort \]

The following sections describe each of these steps in more detail.

Flux correction

Individual exposures are initially flux-calibrated with no constraint that the same object has the same flux across different exposures. Empirical “fluxcorr” vectors are broadband corrections to bring the different exposures into alignment for each object prior to coaddition. In DR13 and prior, these were implemented for each spectrum by minimizing

    \[ \chi_i^2 = \sum_{\lambda}\frac{(f_{i\lambda}-f_{ref,\lambda}/a_{i\lambda})^2}{(\sigma_{i\lambda}^2 - \sigma_{ref, \lambda}^2/a_{i\lambda}^2)^2} \]

where f_{i\lambda} is the flux of exposure i at wavelength \lambdaf_{ref,\lambda} is the flux of the selected reference exposure; and a_{i\lambda} are low-order Legendre polynomials. The number of polynomial terms is dynamic, up to a maximum of 5 terms. Higher order terms are added only if they improve the \chi^2 by 5 compared to one less term. This approach is biased toward small a_{i\lambda}, since that inflates the denominator to reduce the \chi^2.

Since DR14, we solve the fluxcorr vectors relative to a common weighted coadd F_\lambda which is treated as noiseless compared to the individual exposures.

    \[ \chi_i^2 = \sum_{\lambda} \frac{(f_{i\lambda}-F_\lambda/a_{i\lambda})^2}{\sigma_{i\lambda}^2} \]

We additionally include an empirically-tuned prior that a_{i\lambda} \sim 1 to avoid large excursions in the solution for very low signal-to-noise data.

The fluxcorr terms are Chebyshev polynomials instead of Legendre polynomials. We actually solve for ((f - Fa)/\sigma)^2 and then return 1/a. The prior is weighted by the data weights such that the relative strength of the prior vs. the data is approximately independent of S/N.

Flux distortion

The flux distortion vectors are parameterized in terms of magnitude (i.e. log-flux) that are achromatic with xyx^2, y^2, xy where those are linear coordinates XFOCAL, YFOCAL from the plugmap.

There are also chromatic terms that scale as \tilde{\lambda} = 1 - (5070/\lambda)^2, since that function gives an equal effect between 3900 and 5070 \mbox{\AA} as between 5070 and 9000 \mbox{\AA}. There are also magnitude offsets as a function of spectrograph ID, and a chromatic offset as a function of spectrograph ID. The 13 parameters are:

    \[ F_{new} = F_{orig}(1+a_{0}s_{1}+a_{1}s_{2})exp(a_{2}x + a_{3}y + a_{4}xy +a_{5}x^{2} + a_{6}y^{2} + a_{7}\widetilde{\lambda}x + a_{8}\widetilde{\lambda}y \]

    \[ + a_{9}\widetilde{\lambda}s_{1} + a_{10}\widetilde{\lambda}s_{2} + a_{11}\widetilde{\lambda}^{2}s_{1} + a_{12}\widetilde{\lambda}^{2}s_{2}), \]

where s_1 = 1 if specid = 1 else 0 and s_2 = 2 if specid = 1 else 0.

Only SPECTROPHOTO_STD or REDDEN_STD objects are used to compute the flux distortion. The procedure minimizes differences between spectro-flux and photometry (CALIBFLUX). Only g, r and i-bands are used.

One-Dimensional Pipeline (spec1d)

Reference: Bolton et al. (2012)

The spec1d pipeline reads spectra and determines classifications, redshifts, and other spectroscopic parameters. It produces the following files.

SAS files generated by the spec1d pipeline

File TypeInput/outputDescription
spZlineOutputemission line fits
spZallOutputall spectroscopic classifications and redshifts
spZbestOutputspectroscopic classifications and redshifts
spXCSAOOutputstellar radial velocity parameters (valid only for stars)

Redshifts, Classifications and Velocity Dispersions

For each spectrum, we estimate a redshift and perform a classification into STARGALAXY or QSO. In addition, we define subclasses for some of these. Here we describe the redshift, classification, and velocity dispersion methods, described in detail in Bolton et al. (2012).

This page describes the algorithms used in SDSS-III, SDSS-IV, and SDSS-V. SDSS-IV Data Releases included additional algorithms for galaxy redshifts (DR14: “redmonster” & DR17: “redrock“) and quasar classification (DR16 Quasar Catalog). More detailed descriptions and documentation can be found on the corresponding Data Release pages.

Note that for galaxy targets in the DR9 and later BOSS CMASS and LOWZ samples, redshifts should now be selected using Z_NOQSO, Z_ERR_NOQSO, ZWARNING_NOQSO, and CLASS_NOQSO for redshift measurements, errors, warning flags, and classifications (respectively). These fits do not include quasar templates in the fitting of the spectra of objects targetted as galaxies. This change relative to SDSS-I/II is motivated by the lower signal-to-noise ratio of BOSS galaxy spectroscopy, which admits spurious and unphysical quasar classifications of galaxy spectra at an unacceptably high rate. Otherwise, the idlspec2d codes used in SDSS-I/II and SDSS-III/IV/V are very similar. The BOSS code includes new redshift and classification templates for stars, galaxies, and quasars, as described in Bolton et al. (2012).

In SDSS-V, the QSO PCA templates utilized have been updated using 849 SDSS-IV RM quasars (from the 2014 coadded spectra; Shen et al. 2019) and the empca code (Bailey 2012). These new templates were implemented with 10 eigenvectors instead of the 4 used in SDSS-I/II/III/IV. This change was made to improve PCA reconstruction and the redshift estimation.

The essential strategy for redshift fitting is to perform, at each potential redshift, a least-squares fit to each spectrum given the uncertainties, using a fairly general set of models, for galaxies, for stars, for cataclysmic variables, and for quasars. The best fit model and redshift are chosen and reported for the object. The fits are applied without regard to the target category of the object (so that if an object targeted as a galaxy turns out to be a star, we can identify it as such, although note the exception mentioned above for galaxy targets in BOSS).

Redshift & Classification

Redshift and classification templates for galaxy, quasar, and CV star classes are constructed by performing a rest-frame principal-component analysis (PCA) of training samples of known redshift. The leading “eigenspectra” from the PCA results are used to define a linear template basis that is used to model the spectra in the redshift analysis. Galaxy and quasar classes use a basis consisting of four eigenspectra, while the CV star class uses a basis consisting of three eigenspectra. The class of all non-CV stars uses a set of 123 stellar archetype spectra (rather than a PCA-defined basis), mostly drawn from The Indo-US Library of Coudé Feed Stellar Spectra (Valdes et al. 2004), supplemented by model-atmosphere data from the POLLUX Database of Stellar Spectra (Palacios et al. 2010) and additional stellar types from BOSS data.

For all spectra, a range of trial galaxy redshifts is explored from redshift -0.01 to 1.00. Trial redshifts are separated by 138 km/s (i.e., two pixels in the reduced spectra). At each trial redshift, the galaxy eigenbasis is shifted accordingly, and the error-weighted data spectrum is modeled as a minimum-chi-squared linear combination of the redshifted eigenspectra, plus a quadratic polynomial to absorb low-order calibration uncertainties. The chi-squared value for this trial redshift is stored, and the analysis proceeds to the next trial redshift. The trial redshifts corresponding to the 5 lowest chi-squared values are then redetermined locally to sub-pixel accuracy, and errors in these values are determined from the curvature of the chi-squared curve at the position of the minimum.

Quasar redshifts are determined for all spectra in a similar fashion to the galaxy redshifts, but over a larger range of exploration (z = 0.0333 to 7.00) and with a larger initial velocity step (276 km/s). Star redshifts are determined separately for each of the 123 single sub-type templates (excluding CV stars) using a single eigenspectrum plus a cubic polynomial for each subtype, over a radial velocity range from -1200 to +1200 km/s. Only the single best radial velocity is retained for each stellar subtype. Because of their intrinsic emission-line diversity, CV stars are computed with their 3-component PCA eigenbasis plus a quadratic polynomial, over a radial velocity range of from -1000 to +1000 km/s.

Once the best 5 galaxy redshifts, best 5 quasar redshifts, and best stellar sub-type radial velocities for a given spectrum have been determined, these identifications are sorted in order of increasing reduced chi-squared, and the difference in reduced chi-squared between each fit and the next-best fit with a radial velocity difference of greater than 1000 km/s is computed. The model spectra for all fits are redetermined, and used to compute statistics of the distribution of data-minus-model residual values in the spectrum for each fit. Both the spectra and the models are integrated over the SDSS imaging filter band-passes to determine the implied broadband magnitudes.

The combination of redshift and template class that yields the overall best fit (in terms of lowest reduced chi-squared) is adopted as the pipeline measurement of the redshift and classification of the spectrum. Several warning flags can be set so as to indicate low confidence in this identification, which are documented in the online data model. The most common flag is set to indicate that the change in reduced chi-squared between the best and next-best redshift/classification is either less than 0.01 in an absolute sense, or less than 1% of the best model reduced chi-squared, which indicates a poorly determined redshift.

Velocity Dispersions

At the best galaxy redshift, the stellar velocity dispersion is also determined. This is done by computing a PCA basis of 24 eigenspectra from the ELODIE stellar library (Prugniel & Soubiran 2001), convolved and binned to match the instrumental resolution and constant-velocity pixel scale of the reduced SDSS spectra, and broadened by Gaussian kernels of successively larger velocity width ranging from 100 to 850 km/s in steps of 25 km/s. The broadened stellar template sets are redshifted to the best-fit galaxy redshift, and the spectrum is modeled as a least-squares linear combination of the basis at each trial broadening, masking pixels at the position of common emission lines in the galaxy-redshift rest frame. The best-fit velocity dispersion is determined by fitting locally for the position of the minimum of chi-squared versus trial velocity dispersion in the neighborhood of the lowest gridded chi-squared value. Velocity-dispersion error estimates are determined from the curvature of the chi-squared curve at the global minimum, and are set to a negative value if the best value occurs at the high-velocity end of the fitting range. Reported best-fit velocity-dispersion values less than about 100 km/s are below the resolution limit of the BOSS spectrograph and are to be regarded with caution.

Not all redshifts and classifications from the pipeline are reliable! Most commonly, if the spectrum is of overly low S/N ratio or has no strong emission or absorption features, multiple templates will give equally good (or bad) chi2 fits. Other common indicators of trouble could be that a large fraction of the pixels in a spectrum are considered unreliable, or fit emission lines in a spectrum classified as a quasar are negative. These are indicated in a flag called ZWARNING; if ZWARNING=0 (as it is for the vast majority of the spectra), we have a high degree of confidence that the classification and redshift are reliable.

The previous idlspec2d velocity-dispersion measurements are implemented for BOSS spectra in the exact same manner as for SDSS-I/II spectra in previous data releases. For BOSS DR9 and later velocity-dispersion likelihood functions are also computed for objects targeted as galaxies and assigned a CLASS_NOQSO of GALAXY. These are reported over the same 100 to 850 km/s baseline in steps of 25 km/s as above in the VDISP_LNL vector. This computation uses only 5 stellar eigenspectra, and marginalizes over redshift uncertainties. The applications of VDISP_LNL to hierarchical galaxy population analysis are described in Shu et al. (2012).

Other Values

Flux values, redshifts, line-widths, and continuum levels are computed for common rest-frame ultraviolet and optical emission lines by fitting multiple Gaussian-plus-background models at their observed positions within the spectra. The initial-guess emission-line redshift is taken from the main redshift analysis, but is subsequently re-fit nonlinearly in the emission-line fitting routine. All lines are constrained to have the same redshift except for Lyman-alpha. Intrinsic line-widths are constrained to be the same for all emission lines, with the exception of the hydrogen Balmer series, which is given its own line-width as a free parameter, and Lyman-alpha and NV 1214, which each have their own free line-width parameters. Known 3:1 line flux ratios between the members of the [OIII] 5007 and [NII] 6583 doublets are imposed. When the signal-to-noise of the line measurements permits doing so, spectra classified as galaxies and quasars are sub-classified into AGN and star-forming galaxies based upon measured [OIII]/Hβ and [NII]/Hα line ratios, and galaxies with very high equivalent width in Hα are sub-classified as starburst objects. See the spectro catalogs page for details on the line ratio criteria.

Output

The output of the redshift and classification pipeline is stored in three files for each spectroscopic plate observation. The spZbest file contains the detailed results for the best-fit redshift/classification of each spectrum and includes the best-fit model spectrum that was used to make the redshift measurement. The spZall file contains parameters from all the next-best identifications, without the full representation of the associated model spectra (although these can be reconstructed from template files and reported coefficients). The spZline file contains the results of the emission-line fits for each object.

Stellar Radial Velocity Pipeline

In SDSS-V, a new stellar radial velocity pipeline, pyXCSAO, was added to the spec1d pipeline. pyXCSAO is a python replication of IRAF XCSAO. It determines redshifts and velocity dispersions by cross-correlating a spectrum against template spectra of known velocities to determine radial velocities, stellar temperatures, surface gravity, metallicity, and [⍺/H] abundance. The spXCSAO file contains the results of the radial velocity fits for stars.

Post Pipeline Products

In addition to the products of the spec2d and spec1d pipelines, the idlspec2d pipeline produces a few additional products that merge the outputs of the pipelines into summary products.

Post Pipeline Final Products

File TypeDescription
spec(full)All spectral information for a single PLATE-MJD-FIBER
spec(lite)All spectral information for a single PLATE-MJD-FIBER except the individual exposures
spAllA summary of the Spectro-1D outputs for all spectra merged with the matched photometric data and spInspect data.

Custom Target Coadds

In SDSS-V, we added a new custom target coadds product. This product is available on the SAS and the CAS using SkyServer. These custom coadds include all exposures of a target, within a given date range, for a target regardless of the observed plate/field.

eFEDS Coadds

The eFEDS coadds were produced by treating all eFEDs targeted plates as a single pseudo plate. They were then run through the second part of the spec2d pipeline (that combines all exposures) and spec1d. The coadding was done on an RA-DEC baseline, with different numbers of exposures, corresponding to the number of exposures and fields in which it was observed.

Located: https://data.sdss.org/sas/dr18/spectro/boss/redux/eFEDS/


Dependent External Packages

idlutils (Github)
idlutils is a collection of IDL functions and routines used by a variety of SDSS software.

pydl (Github – external)
pydl is a package that consists of python replacements for IDL function, both built-in and from external astronomical libraries

pyXCSAO (Github – external)
pyXCSAO is a python package designed to replicate the functionality of IRAF XCSAO.

dustmaps (Green 2018, Github – external)
dustmaps provides a unified interface for several 2D and 3D maps of interstellar dust reddening and extinction. idlspec2d makes use of the Bayestar 2015 dustmaps (Green, Schlafly, Finkbeiner et al. 2015)

sdssdb (Github)
sdssdb contains the source catalogs, targeting catalogs, and operational databases.


SDSS Product Dependencies

elodie (SVN Product)
A database of high and medium-resolution stellar spectra (Prugniel+, 2001) used by spec1d to classify spectra and determine stellar parameters.

dust (SVN Product)
A catalog of dust extinction models, including the SFD model.

speclog (SVN Product)
speclog is an SDSS product that contains information about SDSS BOSS plate operations including seeing measured by the guides (guiderMon-{MJD}.par, plate plug maps (plPlugMapM-{plateid}-{mjd}-{plugid}.par, and header correction files to change the header exposure values (sdHdrFix-{mjd}.par)

platelist (SVN Product)
platelist is an SDSS product that contains information on the plate designs and plugging. The plateHoles files include additional metadata associated with the targets on a plate

specflat (SVN Product)
specflat is an SDSS product that contains master calibration frames and bad pixel masks for use in the idlspec2d pipeline.

gaia/dr2 (external)
idlspec2d utilizes gaia_source/csv to calculate the distance to standard stars from GAIA DR2 proper motion.


Back to Pipelines

Back to Software

Back to Top