1. Report for July 15, 2006 (covers period from 6/16/07 to 7/15/07)

Status reports for LSST Corp.

Data Management Team

Report for July 15, 2006 (covers period from 6/16/07 to 7/15/07)

Summary of Progress
Work activity, whether it reached a milestone or not
U.Pitt/CMU & UA-CS
· These teams have finished their work and are no longer required to report.
UW (submitted on July 12, 2007 by Nicole Silvestri)
Andy B.
· Continued work on ImageSubtraction code in Imageproc:
-  Write computePSFMatchingKernelForMaskedImage
-  Write computePSFMatchingKernelForPostageStamp
-  ImageSub nearly 25% complete
· Exposure class nearly 100% complete
-  hung up on some WCS class errors, RHL implemented some fixes to Tim's WCS class to help with this
-  testing code and adding Doxygen comments
· Attended Transients Science Working Group meeting (Jul 01-03 Caltech)
· Typed up minutes from local LSST group meetings
· Prepared monthly progress report for Cristina
· Got Kernel ready for review. Minor tweaks as the review progresses
· Updated the EA Model class diagrams to match the "as built" Kernel classes
· Helped Nicole and Andy write their software
Lynne & Simon:
· Wrote filter documentation on ewok.astro.washington.edu:8080/pointing_timing/filters.jsp, including science questions for science working groups collaborations.
· Received new Y (Y1/Y2/Y3/Y4) filter information from Kirk Gilmore, cleaned up curves and put into text format suitable for further testing by other codes. Will be working on this testing over the next month.
· Wrote summary of Alert discussions with various LSST team members and Transient Working Group - sent summary to science council & Data Access group & discussed alerts there.
· Attended Transient Science Working Group meeting (Jul 01-03 Caltech)
NCSA (submitted on July 12 by Chris Cribbs, and on July 13 by David Gehrig, Ray Plante, Steve Pietrowicz and Greg Daues)
· Ray continued moderating the DC2 telecons and the MW WG.
· Ray and Cristina are using the DC2 plan to track progress toward getting ready for Data Challenge 2. The plan is being refined to reflect work at a finer granularity.
· Ray took up ticket to split out the support classes from the fw package. I Presented a plan to the group and he is executing it now.
· Ray also worked with Steve on changes to C++ logging module for better integration with middleware. This is largely completed; though checking in the code is pending the fw-mwi split.
· Ray worked on Policy class which has included consulting with Bartels on the DataProperty class.
· David and Chris completed data moving scenario.: http://lsstdev.ncsa.uiuc.edu:8100/trac/wiki/DC2DataTransfer  
· David transferred 2.5 TB test data (TALCS and D4) to NCSA machine
· David continued his work toward adding automated testing (python) for build system
· Ongoing tasks for David: maintained/debugged build system, including updated builds (w/ assistance from Ray) and maintained TRAC site.
· Steve: Event system: worked on the event system objects to allow easier filtering of message types through Mule so that we can store event messages more easily.
· Steve: Working on a C++ interface to send and receive logging event and exception messages. The preliminary work for this is done and has been successfully prototyped. The interface for Python will likely be through Swig if my tests are successful. Otherwise, a native Python implementation will be written. The logging code is 90% done, and I'm working with Ray on the final interface to the logging code.
· Steve: preliminary work on exception messaging has started.
· Greg: In work on the Pipeline Harness, the integration with the scons based build system is complete. Within this,
1.  C++ classes are compiled and built into a shared library libdps
2.  Python classes, including main Pipeline and Slice `executable program' scripts, are installed into place
3.  C++ classes are swig-ed and built into Python extension modules.
· Greg: Documentation of C++ and Python classes and the verification of their adherence to LSST coding standrads is proceeding, and is being performed synergistically in parallel as I perform the Code Review of fw::Trace and fw::Citizen.
· Chris: Macho data transfer to Canadian site: moved over a sample of the MACHO data to Canada. We are still trying to decide on the best way to move over the remaining data.
· Jeff Bartels: Build install problems from 6/30 until 7/13. Able to complete only portions of iteration #1 of DataProperty redesign with workarounds, not able to complete until build problems resolved. Problems now solved. Two iterations of DataProperty redesign completed, code now in review. Expect further minor changes to DataProperty as review/testing of other software proceeds. Some design refinements to Persistence discussed with K-T. No further contributions to Persistence since last week of June.
LSST/NOAO (Jeff Kantor on July 20, no report from Robyn Allsman; she started vacation on June 8)
· We continue to prepare Concept Design Review materials, including updating the SysML database for the System Engineering requirements traceability. We finished entering the traceability from the Data Management Functional/Performance Requirements to the Science Requirements. We have started work linking the DM Sizing Model to the SysML use cases.
· We published the Data Quality Analysis workshop minutes in the project archive (document-3760), including an updated list of 34 DQ metrics that will be acquired and analyzed during the survey to ensure that we are meeting single image and full survey science requirements.
· We participated in discussions with Jean-Yves Nief of IN2P3 France about participation in LSST DM. They are interested in positioning to be a Data Center. NCSA and SLAC are working on a proposal to involve them in Teragrid-Open Science Grid integration activities, including having a version of DC2 pipelines executing jointly across both grids with nodes in the U.S. and France.
· We prepared the Draft LSST Data Access Whitepaper (document-3741) and submitted this as the report of the board-appointed committee, including Jeremy Mould, Jeff Kantor, and Phil Pinto. We are awaiting feedback from the board.
LLNL (submitted on July 16, 2007 by Don Dossa)
· My activities include running the tech assessment conference call. Topics were the preliminary assignments to people to write updates to the NSF MREFC proposal. Areas that need to be examined are multi-core chips, FPGA, GPUs, Cell, and perhaps accelerators. Network areas are broken out into summit->base, base->Miami, and Miami-> NCSA&SDSC. Memory technologies include DDR3, DDR4, NAND/NOR flash ram, phase change ram, CNTRam. CNTRam definitely needs to be included since Intel announced plans for prototype availability in 2 years. Disk updates include rotating and SSD. Tape and archiving will be looked at. I expect most of these areas will not result in updates to the MREFC. Any changes we make in the projections will require updated costs estimates.
Database Group (submitted on Jul 12, 2007 by   Jacek Becla for J Becla/SLAC, K-T Lim/SLAC, Ani Thakar/JHU, Maria A. Nieto-Santisteban/JHU, Kem Cook/LLNL, Serge Monkewitz/IPAC, Andy Hanushevsky/SLAC)
· storage estimates [Jacek, KT/SLAC, Kem/LLNL]
-  added estimates for images, Observatory Telemetry (db and images), cutout images
-  applied changes suggested by science council about star/galaxy count, epochs, etc
-  reviewed and improved assumptions related to growth estimates, other assumptions
-  documented all in Word spreadsheet
-  released
· association pipeline
-  finished tests related to database-centric approach [Serge]
-  testing application-centric approach [Serge]
-  investigated pipeline harness [KT]
· DBMS Storage [KT]
-  have working prototype for writing, ready for review
· Scalable query architecture
-  investigated ADQL [KT]
-  documented scalable query architecture in wiki [Jacek, KT]
· xldb workshop [Jacek]
-  have official webpage http://www-conf.slac.stanford.edu/xldb07/
-  will be opening official registration this week
· other activities:
-  several DataAccWG telecons [all] topics: association pipeline, source classification, storage estimates, file system metadata
-  Maria/Ani:
§ Respond to Serge about the inner loop join hint question.
§ Participate in the SQL Stored Procedure Coding standard discussions
§ Give feedback about ADASS abstract
§ Follow with great interest discussions about MOPS, alerts, and VOevents
-  KT
§ email exchange about alerts
-  Jacek
§ discussion about IN2P3 involvement
§ discussion about HOURS
GMU (submitted on Jul 12, 2007 by   Kirk Borne)
GMU staff supports the LSST project in the areas of data management, data products, community science database access, and EPO (education/public outreach). Specific monthly activities during the reporting period included:
· GMU organized and participated in a joint telecon regarding LSST-GCN-VOEvent astronomical alert notifications. Participants from the US and UK were involved, from numerous projects.
· GMU initiated and participated in an LSST-DATA listserve thread that discussed how many and what kinds of alerts should the LSST data pipeline produce, and what are the user community's expectations and requirements.
· GMU attended the NSF CI-TEAM (Cyberinfrastructure Training, Education, Advancement, and Mentoring for Our 21st Century Workforce) pre-proposal bidders' planning workshop in Washington, DC on July 10-11: http://www.eotepic.org/page.php?file=citeam/agenda.html . GMU provided a summary of the workshop to LSSTC personnel.
· GMU prepared initial plans, notice of intent, and ideas for an NSF science education proposal that relates to the LSST EPO plans. GMU provided these to LSSTC.
· GMU met with a local high school science teacher to discuss the use of astronomical data in the classroom. LSST data and LSST EPO were discussed -- the teacher was very excited and provide some feedback, including his own needs and expectations for such activities. GMU provided a brief report to LSSTC on these discussions.
· GMU submitted an abstract for a proposed paper to the 2008 IEEE Aerospace Conference, on the subject Dynamic Data-Driven Discovery, which includes references to the LSST science data environment.
Detailed Progress
Details of above
· Steve: Adding symbolic links to the boost library modules cause problems for client programs wanting to build against them. Because of these problems, I removed the BLF script from the build system. As per consensus at our weekly telecon, we are going to use the third party software without modifications, and change the build system to accommodate those changes.
Major Accomplishments
Only significant breakthroughs, issues resolved
· NCSA: 10 new LSST systems are now installed. They are named lsst1 – lsst10. They are running RHAS4 with the 2.6.9-55 kernel. All logins are in place. (Chris Cribbs)
· LLNL: Don’s MLA has been approved and he has an appointment as an Associate Research Physicist at UC Davis. All of the bureaucrats have finished moving paperwork and I can now be paid via UC Davis. Isn't that amazing? We started this process last August. (Don Dossa)
Milestones Achieved
Only major tasks in project plan
Objectives for the Next Period
What you expect to accomplish by next month
Andy B.:
· Write code for fitKernelsUsingPrincipalComponentAnalysis use case
· Write code for computeSpatiallyVaryingPSFMatchingKernel use case
· Remaining tasks associated with Ticket 36 include:
-  test WCS class implementation in wcsMatch function
-  Check in for review
-  Merge ticket36 (after review) to trunk
· Start helping Andy with ImageSub code
· Update EA model sequence diagrams with changes from final code version
· Send transients SWG summary to Science Council
· Implement review changes from RHL on Kernel class
· commit ticket25 to the trunk
· Start writing the python interface for this code
· Start writing a test suite using the python interface
· Add a sequence diagram for PSFMatch
· Find somebody to modify EA model to change "camera" to "image" in various diagrams and use cases
Lynne & Simon:
· work on light curve service plan examples - improve based on feedback from science working groups.
· further testing of Y filter data
· Ray and Cristina will continue to use the DC2 plan to monitor progress and to make adjustments to the schedule and/or scope as necessary.
· Ray will continue to moderate the DC2 and MW WG telecons.
· Ray will finish split of fw package, check in Logging changes, update EA model accordingly, complete Policy class and complete FileFormatter class for use in DC2.
· David will be moving TALCS raw data to NCSA machine and ingesting data into precursor repository; he will also work on the automated testing paradigm/code for build system -- maintaining build system and trac
· Steve will finalize the logging and exception code for C++ and Python interfaces. (he will be on vacation starting August 4th for two weeks.)
· Greg: Within the next month the documentation and code checking will be completed and a version of the Pipeline Harness under lsst/dps will be submitted for Code Review. Also, development and testing of the Harness to solidify its ability to wrap application developers "AppStages" will go forward, with special attention given to accommodate requirements of Association Pipeline work of KT Lim et al.
· Chris will revisit the road maps for CPU, storage, Tape vs Disk.
· Don started looking at software automatic recovery from errors. Papers have been written by Valerie Taylor, whom I know well from Salishan. He will be reading these papers during July and August.
· We have made some progress on determining the requirement of the OCS connection into DM at the base. Don wants to have a meeting in La Serena with all of the individuals involved in this - me, Chris Smith, Ron Lambert, German Schumacher, and hopefully Jeff Kantor.
· Robyn and Tim on vacation until Jul 31
· coordinate implementation of Association Pipeline and Database Services for DC2
· implementation of association pipeline [Serge]
· implementation of Database Services [KT]
· install database software (coral, mysql...) on NCSA cluster
· work on interfacing application code with database schema
· documenting LSST Database for rdbms vendors, including planned scalable architecture [Jacek, KT]
· continue work on xldb workshop
· continue work on mysqld + xrootd [AndyH]
· integrate persistence architecture with rest of code including check-in to subversion, code reviews, and creation of Trac components [KT]
· ensure DC2-level metadata is adequately handled [KT]
· continue architecture work on query [KT]
· continue refinement of alert generation requirements [KT]
· vacations mid July - mid August :=) - will read emails and will do light work
· GMU will work on at least one proposal to an NSF science-education related program involving large scientific data sets (such as LSST).
· GMU will develop a report that summarizes the review of the LSST sample queries from the assembled responses of the LSST education / outreach partners.
· GMU will report on the science-specific sections of the LSST Database Schema to the LSST Galaxies Collaboration Team.
Problems Encountered and Solutions Being Pursued
Budget or schedule variances, technical issues, management issues
· Don Dossa: crosstalk correction: The question of who does crosstalk correction and the resulting file size has not been resolved. Mike Huffer and I agree that dependent on the extent of the crosstalk, either one of us can do it. I met with Mike Huffer, Richard Mount, and Jacek Becla at Stanford to discuss their petacache project. While that might be too early for LSST, it may have more immediate applications at LLNL which will help to keep their project alive.
· Chris: Connect SAN disk to LSST cluster: waiting on 4 Emulex daughter cards and a Brocade switch to put in the Dell chassis. When they are installed I can hookup the SAN disk.
· Steve: worked on looking at what appears to be a bug in Swig that surfaced in Robyn's Log.cc code that was checked into the source tree. A boolean wasn't being initialized, caused spurious output to be sent without header information. I don't know of a way around this.
· Ray: Logging Design issue: When Steve set about connecting the logging module to the event system, he found to two problems.
o   Full records were not be flushed atomically. In particular, it is not possible to determine when a message ends until a new message starts.
o   Under the use of reset() and changes in verbosity, partial records were being flushed out, and some bits were out of order.
We also noted that the design allowed for only one single destination per application. There were also some features we felt should be added that are helpful in both development and production. We worked on some design changes at the C++ level to support the following:
o   atomic recording of messages
o   add traditional, text-oriented message interface
o   allow for independent Log instances in a single runtime
o   allow multiple destinations for messages (screen & file) that support different verbosity levels.
o   retain all currently supported features.
Ray has implemented and tested a new implementation on this design.


Back to top