The Optical Spectra Catalogs
The SDSS spectroscopic catalogs contain parameters such as redshift, classification, velocity dispersion, quality flags, and the like, measured from each spectrum. This information is processed by SDSS science teams in order to fulfill the main SDSS science goals. SDSS III and IV observations use the BOSS spectrograph while SDSS I/II use the SDSS spectrograph. See the links in order to obtain further information on particular catalogs.
This page primarily describes the general information available for every reduced SDSS spectrum.
Before reading this page, please make sure you understand the basics of SDSS spectra. For details on the locations of these datasets, see the spectroscopic data access page. Below we describe how to select "good" spectra, how to exclude duplicates, and the meaning of the most important spectroscopic parameters.
Selecting surveys and programs
programnametags in the
specObjfiles or tables. For example, the SDSS-IV eBOSS large-scale structure sample data is taken as part of the
ebossprogram in the survey
The different surveys are
Note that many of the linked catalogs intended for SDSS primary science analysis compile information from multiple programs. For instance, the eBOSS quasar large-scale structure catalog is produced primarily from redshifts of spectra observed during SDSS-IV, but also includes redshifts determined from SDSS I/II/III spectra (which are often referred to as "legacy" observations).
See the Science Archive Webapp tool for a convenient way to search for and visualize the available spectra.
Selecting unique spectra
The essential information (redshifts and classifications) of each object are stored in the
specObj file (the
specObjAll table in the CAS). The files and tables contain all spectra, which includes bad spectra as well as repeat spectra. To select the best observations of all unique objects, look for objects in
specObjAll with the column "sciencePrimary" (called "specprimary" in the SAS flat files) greater than zero (i.e.,
true). Note that this is equivalent to the
specObj view in the database (analogous to the
photoObj view of the
photoObjAll table for imaging data).
sciencePrimary is basically designed to choose the best available unique set of spectra, in the sense that between two (or more) spectra of the same location on the sky sciencePrimary will only be set for one. Any group of spectra which are within 2 arcsec of each other are considered to be of the same object, and only one of them is designated as "sciencePrimary". The object picked as primary is the one that satisfies best the following conditions, in order of importance:
- whether they are on a "primary" observation of a given plate (see below for the technical definition);
- whether the spectrum has a positive signal-to-noise;
- whether the plate they are on is classified as "good";
- whether the redshift determination spawns no warnings in ZWARNING;
- whether they are considered primary within BOSS and eBOSS alone;
- whether they are considered primary within SDSS alone;
- whether they are BOSS spectrograph data (rather than SDSS); and,
- in the case that more than one spectrum satisfies all the above conditions equally well, or equally badly, the one with the largest signal-to-noise is selected.
As alluded to in these conditions, there is a sense of "primary"within different surveys. Indeed, in addition to
sciencePrimary there are several other indicators of primary-ness that are appropriate if you are only interested in a subset of the data:
- Indicates whether the spectrum is primary when considering only data from the SDSS spectrograph (called "specsdss" in the SAS flat files.
- Indicates whether the spectrum is primary when considering only data from the SDSS Legacy survey (called "speclegacy" in the SAS flat files.
- Indicates whether spectrum is primary when considering only data from the eBOSS and BOSS survey plates (called "specboss" in the SAS flat files).
- Indicates whether spectrum is primary when considering only data from the SEGUE-1 and SEGUE-2 survey plates (called "specsegue" in the SAS flat files).
- Indicates whether spectrum is primary when considering only data from the SEGUE-1 survey plates (called "specsegue1" in the SAS flat files).
- Indicates whether spectrum is primary when considering only data from the SEGUE-2 survey plates (called "specsegue2" in the SAS flat files).
Note that a fiber can be primary even if it is not in a "primary"MJD for its plate as defined below; it just has to be the "best"observation of that location according to the above conditions. For example, if a group of spectra of the same location on the sky are ALL on "bad" plates (failing condition 1), one of them will still be chosen as "sciencePrimary" based on the subsequent criteria. The subsequent criteria are treated similarly.
Selecting good spectra
In general, to test if the spectrum is "good" the primary indicator to use is
zWarning, which should be 0 or 16 for the data to be good with no identified problems. This flag is a bitmask, and each set bit has a particular meaning, described in our bitmask documentation. (Bit 4 is
MANY_OUTLIERS, which turns out is commonly set even for perfectly good spectra).
For BOSS and eBOSS galaxy targets, there is a wrinkle. For these galaxy targets, a dominate source of false identifications is due to QSO templates with unphysical fit parameters (e.g., large negative parameters causing a QSO template emission feature to fit a galaxy absorption feature). Thus, for (e)BOSS galaxy targets, the best redshift is the value of the Z_NOQSO field (with uncertainty given by Z_NOQSO_ERR, and warning flag given by ZWARNING_NOQSO), and the best classification is given by CLASS_NOQSO and SUBCLASS_NOQSO.
For eBOSS targets, there are further caveats. Additional processing of the spectral information is done in order to extract the best redshifts. This processing is done separately for the different target classes (lyman-alpha quasar, tracer quasars, LRGs, and ELGs) and one should consult the particular documentation for each of these. The final DR16 redshift information for these samples will be made available after the cosmological studies using their data are completed.
Other useful quality information is available in the signal-to-noise estimates (
snMedianAll) and in the estimate of plate quality (
plateQuality, which is "good", "marginal" or "bad"). "bad" plates are released in the data set because there are in fact some good redshift determinations and other information in these data, but they should be used with care.
Plate quality and primary plates
The "plates" file (and the plateX table) contain all observations of all plates, including repeats and including some low signal-to-noise plates. You can restrict that list of plates in the following way:
- To select primary, survey quality plates, look for plates with
isPrimarygreater than zero (
IS_PRIMARYin the flat files)
- To select the best observation of each plate, including plates with no survey quality observations, look for plates with
isBestgreater than zero
- To select primary, survey quality tiles from SDSS Legacy, look for plates with
isTilegreater than zero
- To select primary, survey quality tiles from the SEGUE surveys, look for plates with
isSeguegreater than zero
- To simply check for plate quality, regardless of whether it is a repeat plate, "platequality" classifies plates into "bad", "marginal" and "good".
The PLATEQUALITY string is set for each observation (labeled by its MJD) of each plate. For DR9 plates the definition varies depending on whether the plate is an SDSS plate (that is, has survey set to 'sdss'), a SEGUE-1 plate (that is, has survey set to 'segue1'), a SEGUE-2 plate (that is, has survey set to 'segue2'), or a BOSS plate (survey = 'boss'). For SDSS plates, the conditions are based on the signal-to-noise and the fraction of bad pixels:
PLATESN2>15 AND FBADPIX<0.05 -> 'good'PLATESN2>9 AND FBADPIX<0.13 -> 'marginal' (if not 'good') otherwise -> 'bad'
For SEGUE-1 plates, the conditions are based on the signal-to-noise of the main sequence turnoff at g=18, except for some special plates:
for faint plates SN of turnoff @ g=18 > 16 for 'good'for bright plates SN of turnoff @ g=18 > 7.5 for 'good'for low-latitude or test plates, consult $SAS_DIR/data/segue1-hand.par
For SEGUE-2 plates, the conditions are also based on the signal-to-noise of the main sequence turnoff at g=18:
median(SN for MS-turnoff @ g=18) > 10 -> 'good'otherwise -> 'bad'
For (e)BOSS plates,
'good' if SN2_G1 > 10.0 and SN2_G2 > 10.0 and SN2_I1 > 22.0 and SN2_I2 > 22.0 and FBADPIX < 0.10 and min(NEXP_*) ≥ 3 otherwise 'bad'
In a few cases, (e)BOSS plates just under threshold were overridden to "good" by hand: plate-MJD 3589-55186, 3834-55569, 3841-55572, 4627-55626, 4718-55750, and 5017-55715 (a special ELG test plate). The
isPrimary flag is set for each observation (labeled by its MJD) of each plate. It is "1" if we consider that MJD to be the best observation of that plate, and for it to be an acceptable observation from a science point of view (with PLATEQUALITY either 'marginal' or 'good'). It is "0" either if there is a better observation or if all observations are labeled 'bad'. The
isBest flag is set for each observation (labeled by its MJD) of each plate. It is "1" if we consider that MJD to be the best observation of that plate if that observation is PLATEQUAlITY="bad". It is "0" if there is another MJD observation of that plate which is better.
Each spectroscopic object has a classification (CLASS) and a redshift determination (Z) with an associated error (Z_ERR). For galaxies, a velocity dispersion can be determined (down to about 70 km/s). The redshifts are determined by fitting models to each spectrum assuming a large range of possible redshifts. The best model is chosen on the basis of the chi-squared value of the data with respect to the model. In addition, there is a bitmask called ZWARNING which has flags set in suspicious cases. A ZWARNING equal to zero indicates no problems identified. Most bits in that mask are signs of substantial problems, suggesting that the classification and redshift are not to be trusted; the exception is the MANY_OUTLIERS bit, which can be set for successful spectra that either happen to be very high signal-to-noise ratio (e.g. bright stars) or unusual (e.g. some broad-line AGN in galaxies). The classifications are stored in the CLASS and SUBCLASS parameters. They can take the following values:
- identified with a galaxy template; can have subclasses:
- set based on whether the galaxy has detectable emission lines that are consistent with star-formation according to the criteria: log10(OIII/Hα) < 0.7 - 1.2(log10(NII/Hα) + 0.4)
- set if the galaxy is star-forming but has an equivalent width of Hα greater than 50 Å
- set based on whether the galaxy has detectable emission lines that are consistent with being a Seyfert or LINER: log10(OIII/Hα) > 0.7 - 1.2(log10(NII/Hα) + 0.4)
- identified with a QSO template
- identified with a stellar template, chosen among the following subclasses: O, OB, B6, B9, A0, A0p, F2, F5, F9, G0, G2, G5, K1, K3, K5, K7, M0V, M2V,M1, M2, M3, M4, M5, M6, M7, M8, L0, L1, L2, L3, L4, L5, L5.5, L9, T2, Carbon, Carbon_lines, CarbonWD, CV
If any galaxies or quasars have lines detected at the 10-sigma level with sigmas > 200 km/sec at the 5-sigma level, the indication "BROADLINE" is appended to their subclass. As examples, the full resolution version of the figure showing the spectra on the Understanding SDSS Spectroscopic Data page lists the CLASSes, SUBCLASSes and error flags of these particular spectra. For each object, there are also detailed quality determinations, a full description of the templates used, and the targeting information. See the CAS table schema or the specObj file datamodel page for full information.