1. Report for April 15, 2006 (covers period from 3/16/07 to 4/15/07)

Status reports for LSST Corp.

Data Management Team

Report for April 15, 2006 (covers period from 3/16/07 to 4/15/07)

Summary of Progress
Work activity, whether it reached a milestone or not
U.Pitt/CMU & UA-CS
· These teams have finished their work and are no longer required to report.
UW (submitted on Apr 13, 2007 by Nicole Silvestri)
All: March 21-23  
· Andy, Nicole, Russell, Lynne and Zeljko attended the UW DM meeting with Jeff, Tim (via polycon), Robert, Roc Cutri, and Deb Levine. Lots of good discussion about our use cases, QA and quantifying metrics in use cases. Chatted for a few hours on March 23 with RHL about Kernels and spatial decomposition before the PCA (an alternate course in our "Compute Spatially Varying PSF Matching Kernel" use case).
Andy B.
Mar 15 - Apr 26
· Debugging build system on Darkstar. Got it working, and Scons to build MaskedImage. Stuck at Swig problems.
· Worked on designing New New Kernel class in EA model.
Mar 27 - Apr 4
· out of town at SDSS meeting and giving talks.
Apr 4 - Apr 11
· Created ticket to work on Kernel class.
· Checked out a branch for this ticket and am now working on adding code.
· Defined metrics for image processing in DC2.
March 15-20
· Lots of work on Kernel class 'new new Kernel' with Andy and Russell.
· Fleshed a lot of this out in EA. See Nicole's DM change emails from this week.
· Fleshed out some final use case issues (mostly naming issues) and synced-up the corresponding robustness diagrams to match the use cases.
· Helped with preliminary discussion of what is now, hopefully, the final Kernel class.
March 26-30
· started working on some sequence diagrams for our use cases in EA this week. Then Russell made a variety of changes to the use cases on Friday as discussed them during our DM meeting last week. See his DM change notes and email from this date.
· spent the day at Bellevue Community College on Wednesday from 8-2pm. Gave astronomy presentations on meteorites, comets, and telescopes (like LSST) for the 'Expanding Your Horizons' high school girls conference.
April 2-6
· some minor work on sequence diagrams, not submitted to the final EA model. Just practicing and evaluating whether a sequence diagram is even necessary for some of our more simple use cases.
· met with Andy and Russell to discuss CoDR and DC2 dependencies and metrics. Helped flesh out metrics that Andy published at http://lsstdev.ncsa.uiuc.edu:8100/trac/wiki/DC2ImageProcMetrics
· had to address some non-LSST research/work that has been put off in favor of LSST work [A&A review paper due, referee report response on my SDSS paper due, Gemini and IRTF Observing proposals due, organize observing runs on APO for Apr 15 & 21, comments on collaborators paper due, meetings with student research advisee].
April 9-11
· Made first attempts at generating code in EA and transferring to build system computer (Darkstar). Sent detailed email to RHL & Tim with questions concerning this process. Discussed most of these issues at the DC2 telecon on Apr 10.
· Spent some time, on Tim's advice, using EA to place methods (from sequence diagrams) onto classes, primarily for the Exposure class.
· Finishing two final sequence diagrams. Need to chat with Tim about adding methods in these diagrams to the MaskedImage class
· Sent email to science collaboration members in the UW department reminding them to contact their science chairs soon concerning comments on the database schema sent by Jacek, the cronos92 website developed by Lynne and Simon, and the database queries discussed at the All-Hands meeting.
· Compiled this report
March 15-23
· Lots of work on the EA model.n See DM change emails from these dates.
· lots of work on the 'new new Kernel' class.
March 26-30
· Very preliminary proposal for kernel class layout, though Russell was hoping to have more feedback by now so it was further along. Waiting on final comments from RHL.
April 2-6
· attended important APO meeting for the week.
April 9-11
· on vacation Apr 9/10.
· continuing work on design of the final Kernel class. Contact RHL for feedback. Added overhauled Kernel class design to EA.
March 15 - April 11
· Working with Simon Krughoff, developed a web interface & documentation for cronos92 output suitable for use by the science collaborations. A particular facet of this website is the ability to query the cronos92 database over the web, pulling the observational cadence of particular RA/Dec pointings and the limiting magnitudes of each observation at these pointings. This website is available at http://ewok.astro.washington.edu:8080/pointing_timing/index.jsp . Hope to receive feedback from the science collaborations and will add further interfaces as requested
· Working on 'Work Package 2' for the calibration team. Work package 2 is developing catalogs of standard stars and future potential standard stars (such as a simulation of the locations of main sequence stars in our galaxy) for LSST photometric calibration.
· Took filter wavelength throughput information from Kirk Gilmore, created power-law and blackbody source profiles, ran these through the filter wavelength throughput to measure magnitude errors created when source profiles differ from the flat-power spectrum that will be used to create magnitude measurements in LSST databases.
· Currently adding Kurucz star spectra and supernovae spectra as other source profiles. In the future (perhaps in the next month) I will be adding different atmosphere throughputs, to evaluate the magnitude errors due to non-gray (or otherwise changing) atmospheric throughput.
· Kept in contact with the PS MOPS team about their progress on various aspects of MOPS . Also ran the known catalogs of NEO orbits forward for the ten years of cronos92 operations, and matched the positions of these known NEOs against the cronos92 field locations, requiring a time cadence of observations consistent with current MOPS limitations. If the LSST 5-sigma limiting magnitude in g,r,i holds at about 24.3, we could potentially detect 75% of all known NEOs in the cronos92 fields.
· Ran preliminary tests of MOPS, which were halted due to lack of time and then lack of access due to security issues.
NCSA (submitted on, on Apr 13 by Steve Pietrowicz, Chris Cribbs and Greg Daues)
(Ray was on a mini-sabbatical during the month of April)
· Ray covered system administration and user support formerly handled by Fleming and Williamson.
· Ray continues to moderate the DC2 telecons.
· Ray has gathered inputs for the DC2 plan and is integrating that information together.
· Ray has been working on a new release of the DC2 software stack. This includes:
§ re-organization to heighten the visibility of the multi-platform support.
§ lower-case (external) package names for use of use
§ full integration of scons as a build system
This work should be completed early in the next month.
· Ray worked with the DataAcc WG on the model for persistence and storage
· Steve put together a set of slides on the event system for the upcoming presentation.
· He started to work on the NetLogger aggregation task, but had that task reprioritized because of the need to look at the build system and get Boost working on the Mac platform. (see more under details)
· Steve fixed several things in the TRAC bug list.
· Greg worked on coding interfaces and trial implementation for Distributed Processing section of DC2 Logical Model. Thus, filling out header and source files for
· Pipeline
· Slice
· Stage
· Queue InputQueue/OutputQueue
· Blackboard
· In testing out the implementation Greg is arriving at challenges to the current model in the areas of
1.  Pipeline initialization
2.  Relationship between Queues and Blackboards
3.  Starting up a Pipeline
and will continue to iterate, pushing the implementation forward while feeding back into the model.
· Chris working on the Abe migration plan. Currently we are looking into using an existing group of servers that is being used in the security group. They are blade servers that are a few years old. He is talking to the group to see if they are about to part with the systems.
· Chris is working on finishing the slides for the NSF report. He has finished about half of them but still needs to go through and polish them up and make sure the data is complete.
· Chris has been talking to Robyn about transferring the macho data to a different site. I am still working on a file list to see how long staging the data will take. My early guess is a few weeks to get the data to disk and a few more to move to remote site.
LSST/NOAO (submitted on Apr 13, 2007 by Robyn Allsman, and Jeff Kantor on April 24)
DM Standards
· Revisiting DC1 logging mechanism to verify that it remains best option for DC2 use
· Continuing to update the LSST Error Handling Document ( http://lsstdev.ncsa.uiuc.edu:8100/trac/wiki/ErrorHandling ) on issues, standards, and testing related to Exception Handling.
· Worked with TimA on interim DataProperty class used by Exception when passing arguments to exception handler. All is working now.
· Installed the DC2 stack on another 3 systems locally. NCSA build system and documentation is just about bullet-proof; especially if the installer remembers to read the document when there is a glitch during installation!
· We expanded the Application Layer design documented in the UML model and implemented initial versions of several of the Application Framework classes: Exposure, Kernel, LsstData, LsstBase, Persistence, DBMSStorage , Metadata, Policy, Provenance, ReleasePolicy, Pipeline, Slice, Stage, Queue (InputQueue/OutputQueue), Blackboard
· We presented an overview of all the data challenges and a description of work to date on DC2 to the LSST Board.
· Previous plans for utilizing shared time on an existing NSCA cluster for DC2 execution are not going to work. The NCSA team leader is working on acquiring new, dedicated cluster at NCSA for our use.
· We are otherwise on track for the scheduled execution of DC2 in September through November, 2007.
· Assisting Jon Smillie ANUSF/Australia install the precursor data into a DB and provide a VO-enabled front-end.
My role: point him to my work done a year ago which includes:
1)  definition of DB schema
2)  scripts for formatting the data for CSV ingest
3)  scripts for ingesting into MySQL and PostGreSQL DB
JonS has ingested all the metadata.
· Following is on indefinite hold:
o   Past: talked with Ray about what needs to be done to integrate the LSST-reformatted MACHO data into the LSST data archive. Project scientists will determine which subset of the data should be ingested into the LSST Precursor DB. This detail doesn't need to wait on any preliminary task.
o   Future: ask DM Software manager what datasubset should be ingested.
(Same as last month)
· Past: images containing (at least) one truncated amp image were reprocessed at ANUSF to remove the partial amp image and then WCS reprocessed. The xfer of reprocessed images to NCSA was prematurely stopped when size of new images was found to be too short for any amp image to be included.
· The LSST Board approved an amendment to our existing Memorandum of Understanding for the Moving Object Processing System (MOPS) with Pan-STARRS. Pan-STARRS will fund a year of burdened salary of an LSST DM programmer (Francesco Pierfederici) to make further modifications to the MOPS in the areas of parallelization, modularity, and processing cadence.
Concept Design Review
· We refined the work plan for the DM Concept Design Review (CoDR) presentations, primarily to address the change of dates to August, 2007. We continue to prepare CoDR materials, but are deferring the planned April dry runs/mini-reviews until July. CoDR preparation includes updating the SysML database for the System Engineering requirements traceability. We populated the database with initial Operational Use Cases and Actors. These Use Cases will be expanded and will also form the basis for Data Quality Analysis requirements.
System Engineering
· In collaboration with the System Engineering group, we planned a cross-team workshop (June date TBD, location IPAC) to define the Data Quality Analysis requirements and end-to-end quality flow in the LSST. The results of the workshop will enable to us perform the next level of detail in design of the Data Quality Analysis system in DM.
· We submitted a poster to the upcoming NASA conference “Science Archives for the 21st Century” concerning handling large-scale databases with design utilizing partitioning, parallelization, and provenance.
· Don Dossa was been occupied 100% with organizing a DOE conference – no specific LSST activity to report for this period.
SLAC (submitted on Apr 12, 2007 by   Jacek Becla)
· worked on Persistency, in particular Storage class:
o   proposed APIs, decided to evaluate CORAL from CERN, evaluating CORAL now
· updated database storage estimates
· provided schema/provenance related input for the abstract for Science Archive Workshop. Accepted as poster
· continued work on Object partitioning tests (Serge at al)
· wrote introduction for Schema document
JHU (submitted on Apr 14, 2007 by   Ani Thakar)
(Maria still on thesis break; Ani on vacation (trip to India) Apr 1-10)
· Data Access WG telecon on Mar 14
· Reviewed CORAL document sent by Jacek
· Persistence (special) telecon with Jacek and Serge on Mar 20
· Data Access WG telecon Mar 28
GMU (submitted on Apr 12, 2007 by   Kirk Borne)
GMU staff supports the LSST project in the areas of data management, data products, community science database access, and EPO (education/public outreach). Specific monthly activities during March 2007 included:
· GMU began reviewing the LSST Database Schema and sample queries from the perspective of community science and LSST science collaborations.
· GMU began reviewing the Provenance sections of the LSST Database Schema for inclusion in a workshop poster paper (see below).
· GMU participated in the "data in education" outreach activities initiated by the JHU Sloan Outreach Specialist, including two original postings to the dataineducation.blogspot.com website.
· GMU is preparing a proposal for the NSF CCLI (Course, Curriculum, and Laboratory Improvement) program: to develop a Data Sciences undergraduate degree program at GMU, which will include courses in data-intensive (LSST-related) science applications.
Detailed Progress
Details of above
· Steve P. (NCSA): I've worked on getting Boost building, and after a few messages to the mailing list, determined that some of the problems were that the Boost.Test work does not compile on the Mac. The Boost.Python code also doesn't compile. The EUPS build also didn't specify the Python version correctly, and the boost build was incorrectly flagged for the Mac platform. I was able to fix Boost.Python code to compile correctly, and I'm working on getting that reintegrated and my other fixes back into the build system. This has taken longer than I had anticipated because of lack of good information from the mailing list, and because the turnaround time for running tests takes such a long time with the compilations that are required. I was also out of the office for part of this time.
Anyway, things are looking a lot better for the builds now, and I hope to have the fixes integrated very soon so I can get back to the event system work.
Major Accomplishments
Only significant breakthroughs, issues resolved
· GMU prepared and submitted an abstract (which was approved for presentation) for the NASA Science Archives Workshop in April 2007.
Milestones Achieved
Only major tasks in project plan
Objectives for the Next Period
What you expect to accomplish by next month
Andy B.:
· Flesh out Kernel class in EA model.
· Write supporting code for Kernel object and Difference Imaging subclass.
· Have a look at CFHT data to see how we can make use of it in DC2. In particular, how can we make a MaskedImage out of the data we are sent?
· Flesh out Exposure class in EA.
· add alternate course to "Compute Spatially Varying PSF Matching Kernel" use case as per our discussion with RHL on March 23.
· Generate code from this and begin writing code for Exposure class which should be relatively easy.
· Help Andy figure out how to use VW to add basic Kernel functionality (eg. convolution) to the Kernel class.
· finish with the last few sequence diagrams which will add the final methods to the appropriate classes for the IPP.
· Finish design of Kernel class.
· Help Andy and Nicole with Kernel/VW code.
· Simon and Lynne plan to add a couple of dither patterns into the cronos92 pointings to evaluate the effects of these dithers on science such as weak lensing or evaluating the power spectrum. However, still need feedback from science collaborations (or others) on how to best evaluate these effects.
· plan to restart MOPS testing and write up summary.
· Work Package 2 development: will be ramping this effort up over the next month.
· In the future (perhaps in the next month) will be adding different atmosphere throughputs to the Kurucz models, to evaluate the magnitude errors due to non-gray (or otherwise changing) atmospheric throughput.
· NCSA team will continue to work on the LSST software build and deployment system.
· Ray and Cristina will work on the DC plan.
· Cristina will continue to moderate the DC2 telecons, until Ray returns.
· Chris will participate to a cluster file system conference during the week of April 23rd.
· Steve will finish up fixes for the build system (Boost specifically).
· He will help get David up to speed with Trac and the build system.
· Steve will work on Netlogger aggregator.
· Steve will continue event system design.
DM Standards
· Present uniform logging format to DC2 working group for discussion. Then integrate into Lsst Exception class
· RHL has implemented a logging class which doesn't include the network distribution manager. That is what I'm exploring.
· I will perform general sanity check on schema and ingested data.
· JonS examining error logs to determine if there is a problem. Could just be that all amp images were determined to be truncated in original image.
· Waiting on JonS; no current DC2 task is waiting on this completion.
· continue work on Storage: build toy implementation
· inish bulk of partitioning tests for DC2
· Association pipeline for CoDR – still need to discuss with Tim and Serge
· GMU will develop and present a poster paper on "LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance" at the NASA Science Archives Workshop April 25-26 in College Park, MD.
· GMU will develop a report that summarizes the review of the LSST sample queries from the assembled responses of the LSST education / outreach partners.
· GMU will report on the science-specific sections of the LSST Database Schema to the LSST Galaxies Collaboration Team.
Problems Encountered and Solutions Being Pursued
Budget or schedule variances, technical issues, management issues
· NCSA has a new Testing and Services Specialist on board (start date April 16). His name is David Gehrig and he will be responsible for developing tools for deploying the LSST software environment on multiple nodes that make up a grid testbed. He will also be responsible for setting up and maintaining the environment for the LSST Data Challenges.
· Ray Plante is on a mini-sabbatical in April – during this time other NCSA team members are covering part of his duties (Cristina leads the DC2 telecon, Greg moderates the Middleware WG, etc.)
· GMU budget was adjusted to remove the graduate student support, in parallel with the adjusted statement of work, which removed the development of an AstroDAS demonstration.
· Jacek B: Planning to hire Kian-Tat starting beginning of June, I'll be starting to work on necessary paperwork later this month.


Back to top