Redshifts, Classifications and Velocity Dispersions
For each spectrum, we estimate a redshift and perform a classification into
QSO. In addition, we define subclasses for some of these. Here we describe the redshift, classification, and velocity dispersion methods, described in detail in Bolton et al. (2012).
This page describes the algorithms used in SDSS-III and SDSS-IV up to Data Release 16.
Note: for Data Release 14 and 16, improved algorithms for classification and redshift estimation were used, and results are also available in addition to the ones obtained with the algorithm described in this page.
For Data Release 14, the algorithm "redmonster" was used to measure redshifts for the galaxy samples. To find out more about this algorithm, go to this page. For the quasar sample of eBOSS, the classification algorithm is described here.
For Data Release 16, a new algorithm named "redrock" was used for the galaxy samples. This algorithm and its redshifts will be publicly available by July 2020 when the cosmological large-scale structure catalogs will be released.
The software used is called
idlspec2d and is publicly available in our software repository. Note that for galaxy targets in the DR9 and later BOSS CMASS and LOWZ samples, redshifts should now be selected using Z_NOQSO, Z_ERR_NOQSO, ZWARNING_NOQSO, and CLASS_NOQSO for redshift measurements, errors, warning flags, and classifications (respectively). These fits do not include quasar templates in the fitting of the spectra of objects targetted as galaxies. This change relative to SDSS-I/II is motivated by the lower signal-to-noise ratio of BOSS galaxy spectroscopy, which admits spurious and unphysical quasar classifications of galaxy spectra at an unacceptably high rate. Otherwise, the idlspec2d codes used in SDSS-I/II and BOSS are very similar. The BOSS code includes new redshift and classification templates for stars, galaxies, and quasars, as described in Bolton et al. (2012).
The essential strategy for redshift fitting is to perform, at each potential redshift, a least-squares fit to each spectrum given the uncertainties, using a fairly general set of models, for galaxies, for stars, for cataclysmic variables, and for quasars. The best fit model and redshift is chosen and reported for the object. The fits are applied without regard to the target category of the object (so that if an object targeted as a galaxy turns out to be a star, we can identify it as such, although note the exception mentioned above for galaxy targets in BOSS).
Redshift & Classification
Redshift and classification templates for galaxy, quasar, and CV star classes are constructed by performing a rest-frame principal-component analysis (PCA) of training samples of known redshift. The leading "eigenspectra" from the PCA results are used to define a linear template basis that is used to model the spectra in the redshift analysis. Galaxy and quasar classes use a basis consisting of four eigenspectra, while the CV star class uses a basis consisting of three eigenspectra. The class of all non-CV stars uses a set of 123 stellar archetype spectra (rather than a PCA-defined basis), mostly drawn from The Indo-US Library of Coudé Feed Stellar Spectra (Valdes et al. 2004), supplemented by model-atmosphere data from the POLLUX Database of Stellar Spectra (Palacios et al. 2010) and additional stellar types from BOSS data.
For all spectra, a range of trial galaxy redshifts is explored from redshift -0.01 to 1.00. Trial redshifts are separated by 138 km/s (i.e., two pixels in the reduced spectra). At each trial redshift, the galaxy eigenbasis is shifted accordingly, and the error-weighted data spectrum is modeled as a minimum-chi-squared linear combination of the redshifted eigenspectra, plus a quadratic polynomial to absorb low-order calibration uncertainties. The chi-squared value for this trial redshift is stored, and the analysis proceeds to the next trial redshift. The trial redshifts corresponding to the 5 lowest chi-squared values are then redetermined locally to sub-pixel accuracy, and errors in these values are determined from the curvature of the chi-squared curve at the position of the minimum.
Quasar redshifts are determined for all spectra in similar fashion to the galaxy redshifts, but over a larger range of exploration (z = 0.0333 to 7.00) and with a larger initial velocity step (276 km/s). Star redshifts are determined separately for each of the 123 single sub-type templates (excluding CV stars) using a single eigenspectrum plus a cubic polynomial for each subtype, over a radial velocity range from -1200 to +1200 km/s. Only the single best radial velocity is retained for each stellar subtype. Because of their intrinsic emission-line diversity, CV stars are computed with their 3-component PCA eigenbasis plus a quadratic polynomial, over a radial velocity range of from -1000 to +1000 km/s.
Once the best 5 galaxy redshifts, best 5 quasar redshifts, and best stellar sub-type radial velocities for a given spectrum have been determined, these identifications are sorted in order of increasing reduced chi-squared, and the difference in reduced chi-squared between each fit and the next-best fit with a radial velocity difference of greater than 1000 km/s is computed. The model spectra for all fits are redetermined, and used to compute statistics of the distribution of data-minus-model residual values in the spectrum for each fit. Both the spectra and the models are integrated over the SDSS imaging filter band-passes to determine the implied broadband magnitudes.
The combination of redshift and template class that yields the overall best fit (in terms of lowest reduced chi-squared) is adopted as the pipeline measurement of the redshift and classification of the spectrum. Several warning flags can be set so as to indicate low confidence in this identification, which are documented in the online data model. The most common flag is set to indicate that the change in reduced chi-squared between the best and next-best redshift/classification is either less than 0.01 in an absolute sense, or less than 1% of the best model reduced chi-squared, which indicates a poorly determined redshift.
At the best galaxy redshift, the stellar velocity dispersion is also determined. This is done by computing a PCA basis of 24 eigenspectra from the ELODIE stellar library (Prugniel & Soubiran 2001), convolved and binned to match the instrumental resolution and constant-velocity pixel scale of the reduced SDSS spectra, and broadened by Gaussian kernels of successively larger velocity width ranging from 100 to 850 km/s in steps of 25 km/s. The broadened stellar template sets are redshifted to the best-fit galaxy redshift, and the spectrum is modeled as a least-squares linear combination of the basis at each trial broadening, masking pixels at the position of common emission lines in the galaxy-redshift rest frame. The best-fit velocity dispersion is determined by fitting locally for the position of the minimum of chi-squared versus trial velocity dispersion in the neighborhood of the lowest gridded chi-squared value. Velocity-dispersion error estimates are determined from the curvature of the chi-squared curve at the global minimum, and are set to a negative value if the best value occurs at the high-velocity end of the fitting range. Reported best-fit velocity-dispersion values less than about 100 km/s are below the resolution limit of the SDSS spectrograph and are to be regarded with caution.
Not all redshifts and classifications from the pipeline are reliable! Most commonly, if the spectrum is of overly low S/N ratio or has no strong emission or absorption features, multiple templates will give equally good (or bad) chi2 fits. Other common indicators of trouble could be that a large fraction of the pixels in a spectrum are considered unreliable, or fit emission lines in a spectrum classified as a quasar are negative. These are indicated in a flag called ZWARNING; if ZWARNING=0 (as it is for the vast majority of the spectra), we have a high degree of confidence that the classification and redshift are reliable.
The previous idlspec2d velocity-dispersion measurements are implemented for BOSS spectra in the exact same manner as for SDSS-I/II spectra in previous data releases. For BOSS DR9 and later velocity-dispersion likelihood functions are also computed for objects targeted as galaxies and assigned a
GALAXY. These are reported over the same 100 to 850 km/s baseline in steps of 25 km/s as above in the
VDISP_LNL vector. This computation uses only 5 stellar eigenspectra, and marginalizes over redshift uncertainties. The applications of
VDISP_LNL to hierarchical galaxy population analysis are described in Shu et al. (2012).
Flux values, redshifts, line-widths, and continuum levels are computed for common rest-frame ultraviolet and optical emission lines by fitting multiple Gaussian-plus-background models at their observed positions within the spectra. The initial-guess emission-line redshift is taken from the main redshift analysis, but is subsequently re-fit nonlinearly in the emission-line fitting routine. All lines are constrained to have the same redshift except for Lyman-alpha. Intrinsic line-widths are constrained to be the same for all emission lines, with the exception of the hydrogen Balmer series, which is given its own line- width as a free parameter, and Lyman-alpha and NV 1214, which each have their own free line-width parameters. Known 3:1 line flux ratios between the members of the [OIII] 5007 and [NII] 6583 doublets are imposed. When the signal-to-noise of the line measurements permits doing so, spectra classified as galaxies and quasars are sub-classified into AGN and star-forming galaxies based upon measured [OIII]/Hβ and [NII]/Hα line ratios, and galaxies with very high equivalent width in Hα are sub-classified as starburst objects. See the spectro catalogs page for details on the line ratio criteria.
The output of the redshift and classification pipeline is stored in three files for each spectroscopic plate observation. The spZbest file contains the detailed results for the best-fit redshift/classification of each spectrum, and includes the best-fit model spectrum that was used to make the redshift measurement. The spZall file contains parameters from all the next-best identifications, without the full representation of the associated model spectra (although these can be reconstructed from template files and reported coefficients). The spZline file contains the results of the emission-line fits for each object.
Bolton, A. S., Schlegel, D. J., Aubourg, E., Bailey, S., Bizyaev D., Bhardwaj, V., Brewington, H., Brownstein, J. R., Burles, S., Chen, Y., Dawson, K., Ebelke G., Eisenstein, D. J., Malanushenko, E., Malanushenko, V., Maraston, C., Myers, A. D., Olmstead, M. D., Oravetz, D., Padmanabhan N., Pan, K., Paris, I., Percival, W. J., Petitjean, P., Ross, N. P., Schneider, D. P., Shelden A., Shu, Y., Simmons, A., Snedden, S., Strauss, M. A., Thomas. D., Tremonti, C. A., Wake, D. A., Weaver, B. A., Wood-Vasey, W. M., 2012, AJ, 144, 144.
Prugniel, P., Soubiran, C., 2001, A&A, 369, 1048.
Shu, Y., Bolton, A. S., Schlegel, D. J., Dawson, K. S., Wake, D. A., Brownstein, J. R., Brinkmann, J., Weaver, B. A., 2012, AJ, 143, 90.
Valdes, F., Gupta, R.; Rose, J. A., Singh, J. P., Bell, D. J., 2004, ApJS, 152, 251.