Observing Operations | Reviews | Survey Management

 

REVIEW OF SDSS DATA PROCESSING AND DISTRIBUTION
Fermi National Accelerator Laboratory
July 20-21, 2000

Submitted by the Review Committee
Dr. Sidney Wolff, Chair
August 14, 2000

Executive Summary

The Sloan Digital Sky Survey (SDSS) project has already produced impressive results by any standard. The Science Highlights provided in the presentation package are truly astounding in the breadth and scope of what has been accomplished with the Sloan data to date. The project has covered 2000 square degrees in imaging and obtained ~ 30,000 spectra-an awesome accomplishment. The power of large datasets in addressing fundamental problems in astrophysics has been demonstrated by research based on measurements from what amounts to only a tiny fraction of the sky that will ultimately be covered by the SDSS. Data have already been successfully used in studies of objects as nearby as comets and asteroids and as distant as quasars formed when the universe was only about 10 percent of its current age. Important scientific results include: 1) good agreement between the observed two-point clustering function and the hierarchical clustering models; 2) the discovery of over 150 quasars with redshifts greater than 3.5 and eight of the ten highest redshift quasars known; 3) detection of weak lensing, which has been used to estimate the mass of galaxies and determine the shapes of galaxy halos; 4) the discovery of white dwarfs, low mass stars, and brown dwarfs; and 5) an analysis of substructure in the halo of our own Galaxy.

The technical feasibility of the SDSS has been demonstrated. The review committee believes that the work to date has demonstrated the functionality of all of the separate components of the SDSS, from data acquisition to reduction. Those on the committee who had reviewed SDSS in earlier years were very impressed with the progress that has been made.

Transitioning into the production mode of this survey will be challenging. This same comment was made in the report by the committee that reviewed operations on site at Apache Point. The SDSS is, in NASA parlance, approaching its "launch date," namely the time when the North Galactic Cap again becomes observable and when routine operations must begin if the project is to cover the area of sky called for in the requirements document with both imaging and spectroscopy. Despite the imminence of launch, the presenters did not provide to the review committee a clear set of priorities for tasks that must be accomplished before launch, a statement about what is on the critical path, or an evaluation of whether there are sufficient resources to complete the mandatory tasks.

Achievable requirements must be clearly defined in advance of initiation of the survey. The recent completion of a requirements document is a major step toward determining what the priorities must be during the next several months. However, given real-world cost and schedule constraints, the requirements document must be a living document with trades being made among potentially conflicting requirements to optimize the scientific usefulness of the survey overall. Realistically, it is likely that there will have to be trades between the area of the sky covered and the quality of the data, e.g. image size and photometric accuracy.

We recommend that the key leadership of the SDSS meet as soon as possible but not later than early fall, with the following goals:

  • Assess the then current performance of the various telescope and instrument systems, reduction pipelines, etc.
  • Determine what science can be done (and cannot be done) with the already achieved level of performance.
  • Decide what limited set of further improvements have the highest priority and identify the resources to complete those improvements on schedule.
  • Freeze the systems to the maximum extent possible.
  • Proceed to production.
In our discussions with the SDSS staff, the review committee gained the impression that there is currently a lack of consensus and confidence among the scientists that they are achieving a level of scientific performance that is adequate to accomplish the general goals of the survey. It is clear that absolutely outstanding science can be achieved with today's performance and must not be lost because of a continued search for improvements. This is a case where, as engineers often remind us, "better is the enemy of good enough."

The committee recommends that stronger and more decisive scientific management be put in place for the life of the project. Effective management will be required to ensure the implementation of the adopted priorities and to accomplish three additional goals: to test and verify pipeline operational performance; to engage the effort of consortium scientists in critical tasks; and to make clean decisions in a timely way about modifications to the pipelines or the scientific requirements. This focusing of responsibility and authority should facilitate the change in mindset from development to production, which is mandatory if the SDSS project is to succeed.

Scientists must accept the responsibility for validating the pipelines and verifying the quality of the data. Many scientists in the collaboration are clearly eager to exploit the data being provided by Sloan, and the impressive scientific achievements to date testify to this fact. However, it is less clear that the scientists using the data are feeding information back to the developers of pipelines, etc. on the quality of the data. Verification of pipelines and reduction procedures is crucially dependent on the attention the scientific collaborators bring to the data already collected and about to be collected by the survey.

The SDSS should identify some rewards for those scientists who do take on significant responsibility for tasks, such as validating the pipelines, that advance overall project goals. Since the SDSS does not directly support the salaries of most of the scientific participants, project leadership may have been somewhat reluctant to impose requirements on individual collaborators. If operations are to be initiated with a high probability of success in only a few months, however, it is clear that significantly more workforce must be brought to bear on the critical outstanding analysis problems. The committee believes that all members of the collaboration must share in the responsibility for using available data to help in the determination of its quality, find problems early in the 5-year operations period (or even before), and contribute to the significance and excellence of the scientific return on the large investment of resources that this project will continue to require. To this end, we suggest that the project consider ways to encourage additional "volunteer" work to provide the necessary analyses of the existing survey data. In addition, we encourage much broader use of the Science Databases in the next six months and an ongoing effort to support this use by the developers, support staff, and data gurus at JHU, Fermilab and the other institutions.

The committee is concerned that, although individual components of the Sloan survey are at or near the levels of performance required, these subsystems may not yet have been integrated into an overall system that is ready for a full 5 years of operations. We recommend that the SDSS team utilize the fall observing season as an end-to-end test of their survey operations, exercising all aspects of the survey from planning of imaging observations through executing spectroscopic observations, processing the data in the appropriate manner, and populating the appropriate databases. Through such a real test of the operational system, the SDSS staff will identify deficiencies in plans, procedures, etc. so that they can be ready for full operations when the winter northern hemisphere sky is accessible toward the end of the year. It is likely that an operations manager with authority over all relevant personnel will be required, and early appointment of an individual with a focus on operational aspects of the Sloan project would greatly facilitate the transition to the "production" phase of this project.

As a final general comment, the committee notes that the SDSS is a highly visible project in the astronomical community. While it might not have started as such, it has become a big project. If SDSS is not perceived as successful, this will cast a pall over many other initiatives, particularly those that propose to undertake surveys of similar or even larger scale. For this reason, as well as the huge investment of time and resources by the collaboration, we consider it vital that the SDSS succeed. A major step toward success will be embarking on full survey operations this coming observing season. We consider this the paramount priority for the SDSS.
 
 

THE REPORT:

SDSS DATA PROCESSING AND DISTRIBUTION

The Review Process

This report provides the assessment of the review committee invited to examine the data processing and distribution components of the Sloan Digital Sky Survey (SDSS). The committee met at Fermi Lab on July 20 and 21. Written material was provided on the Web and at the meeting. We heard a series of presentations from members of the Sloan project, arranged for three breakout sessions, one with senior management and the other two with Sloan staff, to examine in detail various aspects of the data handling tasks. The committee then met in executive session to arrive at its consensus evaluation.

The committee consisted of Liz Buckley-Geer, Fermi National Laboratory; Roc Cutri, IPAC and the 2MASS survey; Richard Green, Kitt Peak National Observatory; Robert Hanisch, Space Telescope Science Institute; Ruth Pordes, Fermi National Laboratory; Tom Soifer, Caltech and the SIRTF Science Center, Tony Tyson, Bell Labs; and Sidney Wolff (Chair), National Optical Astronomy Observatories.

The Assessment

In the following sections, we address in detail each of the questions included in the charge to the review committee. Where appropriate we have also added some discussion on issues not explicitly covered in the charge but that we explored in order to address the charge fully.

Do the pipelines have sufficient functionality to support the start of the five-year survey? Are the pipelines, as you find them on July 20 and 21, sufficiently mature to achieve our scientific goals? Is the level of effort assigned to each pipeline for that pipeline appropriate considering its current maturity?

Pipeline Functions

The pipelines now encompass more than a million lines of code and must perform a set of very sophisticated tasks with high efficiency. The pipeline is expected to calibrate the image data stream, divide it into manageable size frames, identify objects, capture postage stamp areas, deblend complex regions, separate stars and galaxies, measure brightness in various ways, determine position in celestial coordinates, provide structural parameters for extended objects, select objects for spectroscopic observation on the basis of complex criteria, produce a plan for each fiber plugboard area, and populate the catalogue databases. The spectroscopic pipeline extracts and calibrates one-dimensional spectra, while the photometric pipeline processes the data from the photometric telescope and feeds into the survey calibration. The individual operations have been verified, and the system has worked end-to-end for the production of fiber spectra from recently obtained imaging data.

However, implementing routine operations when the North Galactic Cap region becomes available again in 3-4 months requires that the pipelines be frozen. This in turn requires that the scientific requirements for the pipelines be clearly defined, that the pipelines offer the functionality required to achieve those goals, and that the performance of the pipelines be verified. Substantial additional effort is required in all three areas.

 Requirements

The SDSS faces a unique challenge in developing its pipelines in that the spectroscopy program is interleaved with photometric observations and depends on the output of the photometric pipelines. Therefore, the photometric pipelines must be sufficiently robust at the initiation of the survey that the selection of objects for spectroscopy is not biased in some fundamental way by any subsequent improvements to the pipeline. Most other surveys have had the luxury of reprocessing their data as they gain experience with it.

The committee was given the impression that there were three main factors that did not yet allow an official version of the pipelines to be frozen. The first was that the photometric calibration was far from the accuracy of 2% stated in the scientific requirements. The second was that agreement had not been reached on target selection criteria for the spectroscopic samples. The third was the concern that, based on previous experience, the system would encounter complex images that would "break" a pipeline, and require significant modifications to achieve uniform data quality in such regions.

The astronomers on the review committee did differ with the SDSS project astronomers on the degree of criticality of addressing each of these factors before the start of production. The differences in viewpoint could not be resolved because the committee was not provided with simulations of the impact on the overall project goals of relaxing certain requirements. More important than whether or not the committee itself could reach any conclusions is the fact that we were unable to identify a broadly understood process by which system-level decisions on scientific priorities, changes of requirements, and configuration sign-off could be reached. The heritage of the pipelines is that each one has been developed with the input of a separate group of scientists, and each pipeline group seems to be relatively autonomous in the processes of version control, regression testing, and cadence of revisions. Those processes would benefit from a more centralized, system-level control at this point in the project. A systems approach is essential to evaluating the cost-benefit trades of various proposed improvements and enhancements to individual pipelines.

The committee's view is that the key to a successful transition to survey production is the implementation of an effective scientific decision-making apparatus for the partnership. The lead individuals must be empowered to prioritize the verification of existing, as yet not analyzed datasets; to target the use of the South Galactic Pole time in the fall for practice, test, and final tuning of the Northern survey strategy and production rate; and to make key decisions on changes in scientific requirements. They must also have the ability to persuade their colleagues to participate in characterization projects on schedule-driven timescales. Defining various levels of privileged access to some or all of the data might be a suitable tool for encouraging appropriate levels of commitment from the scientific collaborators.

With this management structure in place and with a defined set of scientific decision milestones, the pipelines can be stabilized to concentrate on efficient production of survey data, processing to catalogs, and selection of spectroscopic targets for follow-up observation.

Functionality: Photometric Accuracy

The one area where there seemed to be substantial uncertainty about the functionality of the pipelines concerned photometic accuracy, where the stated goal is 2 percent. A number of errors have been identified already in the pipeline that compromised the results. It appears that with the known corrections having been made, the current accuracy achieved for the data of the highest quality in terms of telescope and atmospheric stability is in range of 2-3 percent. The committee cautions, and SDSS staff agree, however, that maintaining this level of accuracy during for the entire survey will be very difficult, with diverse contributions to the overall uncertainty by the atmosphere, telescope, camera, and pipeline-with some effects, including especially extinction, varying on timescales as short as one hour.

Luckily, 2 percent accuracy is required only for the colors, and for the colors, nature provides a moderately reliable alternative. The photometric zero-points for each band do not have to be determined independently: the color zero-points may be obtained to reasonable accuracy from the stellar sequence, corrected for extinction, although SDSS staff caution that the intrinsic color of the stellar locus at different positions in the Milky Way varies by 5 percent. (We also recognize that implementing this approach would require new pipeline functionality.)

The remaining overall photometric zero-point must be obtained via transfer standards from the PT telescope. The required accuracy of this zero-point is much less stringent. The requirement that the number density of galaxies not be systematically biased by more than 3 percent over some scale (set by PT transfer sampling) results in a 0.05 mag. overall zero-point error requirement. Thus, the photometric performance now being reached during the best runs appears to be adequate for the survey science. The individual band zero-points may be used as a check on the overall photometry, as these absolute color zero-points should agree with the stellar sequence derived zero-points. Any significant disagreement will be a diagnostic of a problem with the photometry or pipeline.

Clearly, this whole issue needs more detailed examination and a focused effort is in progress to obtain the necessary information. We also recommend that the integrity and robustness of the photometric pipeline should be tested early with repeated scans of the same field. This fall there is an opportunity for this, and it should be pursued. Even before then, however, data already taken can be used to make these tests and should be analyzed. This may require a temporary increase in analysis manpower, but it will be worth it to assure the quality of the photometric data.

Verification

The SDSS project scientists present at the review identified several areas of performance verification that they viewed as essential before committing to a freezing of the pipelines. The three pipelines most affected were the photometric pipeline, the image processing pipeline PHOTO, and the target selection pipeline. It was made clear that there were not enough scientists currently volunteering to complete the verification tasks to meet a late fall deadline for freezing the pipelines. It also became clear that there was no effective system in place by which such volunteers could be persuaded and mobilized on a short timescale to get the job done. The tiger team assembled to work on the photometric calibration problem is a good example of extra volunteer effort, but there are several other areas also requiring immediate attention.

Has the data processing staff developed the software tools to support a five-year survey? Is the size of the data processing staff sufficient to support the data, which will be obtained by a five-year survey?

In addressing these questions we interpret the software tools required to support a five year survey to be primarily tools to validate the pipelines and to enable proper quality assurance during operations. As noted above, the pipelines appear to be approaching a level of development where they provide the needed functionality. However, it appears to us that many issues relating to quality assurance and documentation of the data remain to be addressed.

Pipeline Validation

A good deal of thoughtful and careful design has gone into configuration control for the pipeline software. However, it does not seem that as much thought has gone into the mechanism that will be used to validate that new pipeline deliveries/upgrades produce data products that are of acceptable quality. We recommend that a set of regression test baseline (RTB) data be selected and that each new pipeline build be tested and validated using these data. Incremental changes in the resulting processing output should be gauged to assess quality of production output relative to the survey science requirements and relative to the results from the previous pipeline version. A specific set of criteria can be defined for the RTB data to allow efficient and quantitative validations of the new software. This procedure becomes increasingly critical during the transition from development to operational data processing when software changes must be made with some care.

The same RTB sets can be used to test ports to new hardware platforms.

Quality Assurance

The project is in the process of defining quality assurance criteria that will be used to validate the output from each pipeline and the global science quality. The complete list of criteria that will be used to test both the raw data quality and the success of the processing will likely not be mature until well into survey operations when a wide range of atmospheric and celestial conditions have been sampled. But it should be possible to make a good first cut at the criteria now, and this task should be given high priority. However, it must also be recognized that it will not be possible to make a concise list of quality control criteria until the science requirements for the survey are frozen.

The quality assurance task must be automated as much as possible to enable efficient routine operations. This is essential for feeding back quality assessments into observatory operations to drive survey strategy and to assure that the sky is covered to the necessary quality required for the survey.

It was not obvious that the project has identified the mechanisms by which final data quality assessments are fed back into survey operations planning and archived in the operational and science archive databases. What will be the procedure for quality assessment to be reviewed and formally accepted by the project? Will one or more scientists be responsible for "signing-off" on each portion of the sky?

Is the assessment of quality a binary system (i.e. data do or do not meet the science requirements), or is there a graded scale? This will become important toward the end of the survey when decisions will have to be made whether to re-observe data of marginal quality or to cover unobserved sky. For example, if the seeing limit for survey acceptance is 1.5 arcsec and data for a region of sky has 1.7 arcsec seeing, will that piece of sky be re-observed, or will priority be given to parts of the galactic cap which have no coverage yet? Is the 1.5 arcsec requirement absolute or a goal for, perhaps, 95 percent of the sky coverage? What is the absolute limit for rejection of data based on seeing?

Staffing

The discussion of staffing levels for operational data processing at Fermilab stated that there will be 14 FTE's carrying out the operations. This staffing may be inadequate for the magnitude and complexity of the task. It is also unclear whether there is the right balance between scientists and production staff and whether that balance needs to change as the transition is made from development to operations. Without reviewing a complete WBS it is difficult to assess this situation accurately. For comparison, 2MASS data processing operations at IPAC utilize approximately 18 FTEs. There are 4 full-time and 2 part-time quality assurance scientists on the 2MASS staff. Although the data rate from 2MASS is comparable to that from SDSS, 2MASS operations are considerably less complex than SDSS since 2MASS acquires and processes only one type of data.

Is the hardware (computers, disk, and tape storage) for the data processing and storage operations sufficient to support the analysis and distribution of the data volume that will be generated by a five-year survey? Are there adequate plans to deal with hardware and system software obsolescence? Is medium for data storage adequate for the present (next ten years)?

Hardware

In general, the committee has little concern about access to the required hardware for the SDSS project. The rapid pace of development has made tasks that once seemed prohibitively difficult now very straightforward.

Since the initial purchase of the SDSS production hardware in 1995, there has been an explosion in both the speed of CPUs and the capacity of disk storage. At the same time, the cost per CPU (for PC platforms running Linux) and the cost per Gbyte of disk storage has fallen dramatically. What seemed like significant processing and storage requirements in 1995 now are quite modest on the scale of other projects at Fermilab (e.g. each Run II experiment will have 30 Tbytes of disk storage and 1 PetaByte of robotic tape storage).

The storage capacity of tape has not increased as fast and the costs have not dropped significantly during the same period, but capacities of 50 Gbytes per cartridge are currently available.

 The plan to use the central FNAL robot to archive the data on tape is a good plan and the ultimate storage requirement of 50 Tbytes should not be difficult to meet. The SDSS data processing group should allow for the possibility that the data in the robot might need to be copied to a new tape medium on the time scale of 10 years.

Obsolescence

The image data processing is currently being done on COMPAQ Alpha machines which are large symmetric multiprocessor machines (SMPs). This stage of processing requires efficient high-speed access to a fairly large quantity of data, which makes these machines ideal for this task. They are, however, relatively expensive machines compared to PCs. The data processing group is investigating using Linux machines connected with GFS (Global File System) or equivalent to provide read and write access to the data disks from many CPUs. They are also looking at load-balancing software (MOSIX) that would make the cluster of Linux machines look like an SMP.

The committee strongly encourages SDSS to follow this line of investigation so that the project will be in a position to replace the COMPAQ Alpha machines with a Linux cluster when and if appropriate. Any change in platforms should take into account the full cost of porting, testing, and ongoing support efforts. SDSS should work closely with other groups in the computer division who are interested in this technology to maximize their resources.

Databases

The project is correctly concerned about the viability of Objectivity for the longer term, and we encourage SDSS staff to think through the schedule and manpower impact of, and plan the response to, a scenario where the DBMS must be replaced.

The performance of the wide area networks and the continued decrease in the cost of magnetic disk storage should allow the continued use of "lower tech" methods of data distribution through FITS files and FTP transfer in parallel to the science databases while the Science Databases are being tuned and extended to meet the complete needs of the collaboration.

Configuration management of the database schema and the data description language are part of the overall configuration management necessary for the production software system and require continued attention. As noted elsewhere in this report, data distributed to the collaboration in either method above must be accompanied by a rigorously managed and complete record that documents the associated parameters and versions of the processing code.

The plan for distributing the data to the collaboration will be presented at the review. Will the plan and its proposed implementation meet the needs of the collaboration? The SDSS management welcomes suggestions for changes to the plan or the method of implementation within the framework of the existing resources.

The plan for distributing the data to the astronomy community at-large is defined by a plan drawn up by the SDSS, subsequently reviewed and approved by the NSF. Is the proposed implementation plan adequate and can it be done with the existing resources?

SDSS Distribution Plans

There are two components of the SDSS data distribution plan: distribution to the consortium members and public distribution. The former is already functioning through the distribution of flat files and FITS images, and the various consortium members have developed tools to utilize these data products. In addition, there has been somewhat limited use of the JHU-developed WWW interface to the science archive.

The Science Databases are the means by which the main body of the collaboration has access to the data and is enabled to do quality and scientific analysis. During this first year of production data taking, it is crucial that as many people in the collaboration as possible have access to and work with the acquired data in a timely fashion and provide in depth verification and feedback on the performance of the instruments. While development of support of more query modes is needed for the Science Databases, there should also be an emphasis on ensuring smooth and timely transfer of data from the Operational to the Science Databases as part of the standard data processing production cycle. The level of manpower on this part of the project must be maintained for collaboration data distribution independent of any effort toward public distribution of the data.

The public data distribution plan has two components: an early release program covering data acquired prior to April 2000, and a general public archive in which the compact photometric catalog and a "finding chart" service would be made freely available. The full photometric catalog and atlas images could be accessed on a limited/controlled basis. A partnership has been established with Microsoft (Jim Gray), who will provide hardware and some development support for the finding chart service. This development will leverage on Microsoft's Terraserver project. A test version of this service is already available (http://dart.pha.jhu.edu/sdss/).

The resources needed to develop and support the public archive have been estimated at 1 FTE for the first year, 2.3 FTE for years 2 through 4, and $100K for hardware and software in years 2 through 4. We are concerned that these resource estimates are too low and that the level of service proposed is too restricted. In particular, the community will benefit most from the SDSS archive only if access to the atlas images and science archive are unrestricted, and all data are fully available for cross-correlation with other catalogs and archives, e.g., NVSS, FIRST, and 2MASS. We also note that restricted access implies an access control mechanism (passwords, data encryption, etc.) and that access control issues may become complex if Fermilab hosts the public archive and is required to implement enhanced computer security measures.

Data Release

The presentation on data distribution did not include any discussion of how the final data products would be generated, validated, and documented. The complexity and drain on resources for these tasks should not be underestimated. The committee recommends that the project define and scope the job of preparing data for release before plans for either the early or subsequent releases are finalized. The project should also define a set of validation tests and criteria to establish that the data products are of release-quality. These same tests should provide sufficient characterization of products to inform users of the general quality and limits of data. This will be especially important for the first releases of SDSS data.

We hope to distribute a fraction of the data obtained prior to April 11, to the astronomy community early in 2001. We do not plan to do this until the data is fully calibrated. Will this processed data be sufficiently valuable to the astronomy community to justify the effort that the SDSS project will have to expend in distributing the data, considering the current status of the project?

The early release program is an extremely important effort for the SDSS. It is likely that SDSS will require increasing levels of public funding in the coming years, and the astronomy community will want open access to the data as a result. An early release will demonstrate to the community the project's commitment to a publicly accessible archive and allow the community to become familiar with the SDSS data products. Moreover, it will allow full scrutiny of the reliability and integrity of the SDSS data. This implies that the project must be careful not to make any unwarranted claims on the accuracy of the early release data and that it must not make a release until the data products are indeed scientifically useful. Along these lines, thought should be given to selecting a strategically interesting data subset, e.g., a small region of sky that has complementary coverage in other surveys. Data releases from the SDSS must include documentation that provides sufficient detail about the data acquisition, processing, final product generation, and analysis and validation to allow users to take full advantage of the SDSS data products. This is particularly important for the public data releases where no assumptions should be made about familiarity with the Sloan data.

As important as an early data release is for this project, it should not, however, come at the expense of preparation for the initiation of routine operations when the North Galactic Cap region next becomes observable.

To the extent that significant public monies are sought for the completion of the survey, the level of expectation for public access will increase. In NASA terminology, the project may have to begin to face a transition from a PI-type mission to a facility-class mission. It is clear to the committee that the Sloan data are so beautiful and so rich that the full dataset will enable many scientific projects beyond those that are the focus of the current collaborators. Furthermore, no matter what the policy is on data releases, the members of the Sloan collaboration will inevitably have a substantial lead in exploiting the data for their own science. Broad public access can only help the project in terms of community support and maximal scientific return.

A major area of committee discussion, which was not explicitly addressed in the charge to the committee was overall readiness for survey operations. In the following section we address issues that arose on this subject in our discussions.

Operations

None of the questions addressed in the charge to the committee deal with end-to-end systems operations of the survey. This suggests to us that the SDSS team may be underestimating the effort required to maintain efficient and reliable operation over the lifetime of the survey. Simply achieving the required level of functionality in each subsystem, whether at the telescope or in processing the data, does not guarantee that the individual components of the system can be successfully and efficiently integrated.

The material presented at the review, along with the data already obtained, demonstrate that the SDSS team has exercised all the components of the system and that it is fully capable of performing all the functions from defining a scan track, through processing the photometric data, to drilling fiber plates, obtaining the spectra, processing them, and populating the databases. What is missing is a demonstration that the team is ready to tackle the major task of a 5 year survey, where all the functions must be executed on a routine basis, with minimal human intervention in the "routine" aspects.

Successful prosecution of all of the complex and interrelated steps required to complete the SDSS survey will require enhanced coordination among all personnel involved in the process, from pipeline programmer to nightly observer to QA operators, and checks should be installed for monitoring of this communication. The early appointment of an operations manager with software systems management experience and with authority over all relevant personnel would be beneficial. This is a departure from the current organization, but one which we feel will almost certainly be required to achieve the production goals of the Sloan Survey. At the present time, it does not appear that anyone "owns" the problem of getting the survey started.

The SDSS team itself identified the fact that observing efficiency is not currently at the level that will be required, and it appears that all of the processes, e.g. defining the fiber plates, etc. were done on a very ad hoc basis with significant effort and a great deal of human intervention by many individuals. It is the committee's impression that this level of effort is not sustainable as a mode for routine operations for 5 years. In summary, what has been accomplished to date can be described as demonstrating all the functionality, but not the operability necessary to execute a 5-year sky survey with the available resources.

As part of preparing for routine operations, it is essential that the team identify as much as possible the automatic processing that can be done to reduce the level of human intervention in the pipelines to the absolute minimum. This will be necessary in order to conduct pipeline processing with the very minimal staff that is available.

Another aspect of survey operations where the committee was uneasy about the project's level of readiness was configuration control and configuration management. It was unclear to the review committee whether or not the project has in place an adequate level of configuration control to be able to begin full survey operations. The presentations to the committee did not define how the SDSS team could identify the pedigree of the data processing of the science data that led to the science highlights achieved to date. It is essential when the survey is under way at full speed that the software used to acquire and process the data be well understood and controlled. Rigorous configuration management is not something that comes naturally to the scientific enterprise. However, when a large survey is executed, it is essential to be able to ascertain that the data have been processed in a uniform manner. While there is a mechanism in place to maintain control of software releases, it is unclear that the team has in place any plans to ensure that future upgrades are properly tested in advance so that they can be integrated into the survey pipelines in a systematic and well understood manner.

Another issue relating to configuration management that emerged in one of the breakout sessions was that the ancillary data, e.g. various operational parameters, that were necessary to processing data, did not automatically accompany the survey data to the site of data processing. Procedures need to be in place to ensure that such data are properly part of the input to the pipelines. This will be vital for efficient and properly documented pipeline data processing.

The review committee recommends that the SDSS set as its highest priority utilizing the fall observing season (i.e. the southern strip) as practice operations for the higher priority northern survey, and try to operate the survey in this period as though it were normal survey operations. This should be thought of as an end-to-end test of the survey operations. Such an end-to-end test will provide a mechanism to identify a complete list of the operations teams, tasks, procedures, analyses etc. that must be in place to achieve routine operation. Identifying bottlenecks in these operations procedures and processes will allow the staff to take the corrective actions in a timely manner, so that when the northern hemisphere region becomes visible they will be able to execute the survey in a maximally efficient manner.

Another area where the SDSS team has already identified a need for future work is in the survey planning software. We urge the team to undertake the development of such software as rapidly as time and resources permit. While it is possible in the near future to continue to do survey planning "by hand," it is necessary that the automatic planning tools be developed as soon as possible to ensure that the survey is executed with maximal efficiency over the 5 year period,.

The presentations we received indicate that the Operational Database and Science Databases are in good shape and ready to be used seriously and continuously. We agree with the stated goals of developing and following operational procedures for the monitoring and validation of the databases.  Now that production data taking has begun, these databases will start to increase in size. We advise a test of the Operation Databases at a size expected after a year or two of data-taking.



 
Submit comments and questions to the SDSS Helpdesk. Last updated 10/31/08