1. Project Status
  2. Current Photos
  3. Risk Management
  4. Detailed Project Progress and Status
    1. LSST Program Office
    2. University of Washington
    3. Princeton University and University of California, Davis
    4. 02C.04.03 - PSF Estimation
    5. No work is expected to be undertaken under this WBS.
    6. 02C.04.06 - Object Characterization Pipeline
    7. Priorities are:
    8. · Finishing the improvements to afw::table [DM-1766];
    9. · Continued work on galaxy shear performance [DM-1108];
    10. · Merger of the HyperSuprime Cam deblender [DM-1907];
    11. · Planning for the merger of other major HSC stack components.
    12. IPAC / California Institute of Technology
    13. SLAC / Stanford University
      1. · Documented structure of our custom ddl ascii schema
      2. 02C.06.02.05 Catalog Services
      3. · Finished implementing JSON results for Web Services.
    14. NCSA / University of Illinois
    15. Current accomplishments:
    16. NOAO
    17. Current accomplishments:

Data Management Monthly Report

May 2015
 
Project Status

The Summer 2015 release continued, with all teams working on the next round of features and tests against the LDM-240 Milestones and Key Performance Metrics. The rate of progress is as reported in the April report, and continues to be consistent with the reduced level of staffing relative to the initial plan. A major focus in this release is on incorporating into the LSST software stack all relevant enhancements made to the software for HSC, so that all functionality is working in both stacks.


The Data Management Development Roadmap (LDM-240), is undergoing an extensive update covering FY16 - FY18 was initiated in April. This "bottom-up" iteration for the first time incorporates input from the institutional Technical/Control Account Managers (T/CAMs) and aligns the Roadmap with their detailed plans for development in each location. We anticipate completing this update by the end of June.
 
In addition, the milestone and key performance metric information in LDM-240 is now being captured in JIRA Agile, and tightly integrated with the epics and stories that define the detailed development plans. With this integration, it will be possible to follow explicit links from LDM-240 milestones and metrics, down through PMCS activities, to JIRA Agile epics and stories.
 
We are continuing the process of analyzing the budget so that we can start preparing FY16 amendments. We will allocate some of the underspend due to slower ramp up than plan, to allow the institutions to hire additional resources to catch up the deferred work.
 
The amendment to the NCSA agreement for the procurement of hardware was reviewed and updated, especially for property management and the handling of equipment purchases that will remain in operation at NCSA. We have deferred the handling of equipment destined for Chile, as that will not occur for 2 more years and we need to address the NCSA-based equipment first.
 
The Brazil MOA (Networks) has been revised based on input from RNP, and is now is being reviewed by the Brazilian parties.
 
Recruiting and hiring activities continued across all DM institutions: one new offer was made, and one position filled. Eighteen positions have been filled to date since the MREFC award, while six positions are currently open. The DM SQuaRE Scientist position has been offered to an excellent candidate.
 

Back to top



Current Photos
 
No new photos this month.
 

Back to top



Risk Management
 
The DM Risk Register was reviewed in the monthly process. No new risks were added and no significant changes to existing risk exposure were made.

 

Back to top



Detailed Project Progress and Status


LSST Program Office

DM Project Management and Control

Current accomplishments:

The DM Project Manager:


·   Conducted the International Networks Acquisition Review on May 6 in Tucson. The review was passed successfully and the contracts are in place to acquire the international networks between Chile and the United States. The planned bandwidth is now 3 diverse paths of 100 Gbps each, pending successful signing of the MOA with Brazil.
·   Continued work on the amendment to the NCSA contract covering LSST equipment procurements. Meetings were held to determine the property management approach for equipment remaining at NCSA. Details of handling equipment destined for Chile are deferred to a future amendment, since this will not occur in this FY.
·   Continued supporting the process to move draft Memorandum of Agreement with Brazil to signature. Comments from RNP were incorporated and new draft was circulated to individual contributors from FIU/Amlight, LSST Corporation, and LINeA. Continued work on the visa process for Dr. Angelo Fausti of LINeA to work in Tucson for 1 year starting in July, 2015.
·   Conducted the quarterly DM Leadership Team Meeting at NCSA. Introduced K-T Lim as the new DM Project Engineer. Substantial work on revising LDM-240 DM Roadmap was accomplished and several other topics were addressed. Meeting notes and actions are at:
  https://confluence.lsstcorp.org/display/DM/DM+Leadership+Team+Meeting+2015-05-18+to+2015-05-21
 
·   Continued work on JIRA Agile/PMCS/LDM-240 integration. Created example "meta-epics", milestones, and key metrics in JIRA. Demonstrated producing LDM-240 style tables from JIRA.
·   Continued work on hiring DM SQuaRE scientist, addressed relocation issues, and made offer for the candidate's spouse to address "dual body" issue.
·   Developed an initial list of topics for the LSST 2015 Meeting at Bremerton August 17 - 21.
 

Planned activities:

The DM Project Manager will:


·   Prepare an LCR to LSE-78 LSST Observatory Networks Design and LDM-142 LSST Network Sizing to implement changes from the International and Chilean Networks Acquisition Reviews and incorporate updates to the summit network.
·   Continue recruiting and hiring, close on acceptance for the DM SQuaRE scientist.
·   Continue work with AURA and NCSA on the amendment to the NCSA contract covering LSST equipment procurements.
·   Continue supporting the process to move the draft Memorandum of Agreement with Brazil to signature.
·   Continue planning and coordination for the LSST 2015 Meeting at Bremerton August 17 - 21. 

DM Science

Current accomplishments:
The DM Project Scientist May activities continued to be focused on the study of the efficiency of LSST as a detector of NEOs, and on the Data Management Leadership Team
meeting at NCSA.
 
To continue the LSST NEO efficiency study, the DM Project Scientist met at IPAC with the leads of NEOWISE team, Amy Meinzer and Roc Cutri. They discussed the NEOWISE observing strategy, and the perceived challenges of discovering NEOs with LSST. The DM Project Scientist participated and presented LSST plans in the 2-day LSST Solar System collaboration workshop where the topic was discussed as well. Together with similar discussions that other team members have had at the Hotwired 3 meeting, these inputs are being used to formulate an LSST response to concerns about NEO detectability. In particular, with Lynne Jones, he is currently studying the effects of losses due to trailing on NEO completeness.
 
The DM Project Scientist took part in the LSST DMLT meeting at NCSA. He is providing input on scientific prioritization to the DM Project Engineer, and is monitoring the software development through monthly post-sprint demo sessions. The work on updates to DM requirements documents is on hold until new staff comes on board in September-October time frame.
 
Planned activities:
 
The DM Project Scientist expects to remain focused on the NEO studies in June (and, more broadly, at least through August).
 
DM System Engineering

Current accomplishments:

Activities completed by the DM System Architect include:
·   TOWG: Reconciled DM staff with other IT staff
·   Discussed DM leadership positions
·   Contributed to OCS/CCS/DM interface meeting
·  Refined definitions of system states
·  Defined handling of EFD large objects
·  Started discussing SCADA enclave security
·  Defined integration milestones
·   Prepared for and contributed to DM Leadership Team face-to-face meeting
·  Discussed overall goals and architecture
·  Discussed top worries
·  Discussed development process changes
·  Clarified alert system data flow and VO responsibilities
·  Began work on long-range roadmap
·   Prepared for and participated in XLDB 2015 conference and workshop
·  Moderated workshop discussions
·   Assisted with Winter2015 (v10.1) release
·   Interviewed for SLAC position
·   Met with Fabio Hernandez of IN2P3 about potential activities
·   Facilitated computing infrastructure and security discussions


 
Planned activities:

The DM System Architect will:


·   TOWG: Complete review and rewrite of DM-related use cases
·   Conduct extensive roadmap discussions with institutional representatives
·   Define overall cross-DM milestones and activities
·   Turn over Data Butler redesign and implementation
·   Start setting clear standards for team culture and interaction
·   Begin planning for August All Hands meeting
·   Assist with preparations for IVOA meeting
·   Discuss cooperation with Euclid and WFIRST
·   Interviews for SLAC position
·   Conduct monthly sprint demo session with Data Access team
·   Start conducting monthly sprint demos with Science Pipelines team

 
DM Science Quality and Reliability Engineering (SQuaRE)
 

Current Accomplishments:
02C.01.02
 


·   LDM-240 replanning exercise started ICW KTL and GPDF
·   IPAC visit to discuss potential early SQuaRE use of SUI
·   FE attended a meeting at Github on software citation
·   FE attended the DMLT meeting

02C.01.02.03

The W15 Stack Release activity completed. Improvements in the release engineering for this release included a more decentralized release process and a significant increase of tested platforms (5 in-house, 1 contributed)

02C.01.02.04

Progress on the new CI system reached internal testing, but missed our end-of-May target for deployment due to staff vacation and illness.

Planned Activities:

02C.01.02.04

·   Release CI system to developers for branch builds

·   Continue SQuaRE LDM-240 re-planning ICW KTL & GPDF

DM Applications, Middleware, and Infrastructure

Current accomplishments:

·   Refer to by institution reports below

Planned activities:

·   Refer to by institution reports below


 


University of Washington

Current accomplishments:

02C.03.00 -- Alert Production Managment Engineering and Integration

Simon Krughoff (SK) attended the very useful Hotwiring the Transient Universe IV meeting in Santa Barabara.  SK attended the DMLT meeting at NCSA.  Yusra AlSayyad (YA) and Russell Owen (RO) attended normal weekly team meetings and produced demos for the April sprint demo (although the demo was postponed until May).

02C.03.01 -- Single Frame Processing

RO fixed the confusing situation where the photometric calibration task was in the meas_astrom package with the astrometric calibration code (DM-1578).  This work improves the clarity of the namespace and also makes the photometric calibration task easier to find.

02C.03.05 -- Application Framework for Exposures

YA began the process of bringing over work to unify the approximation and interpolation interfaces.  The first step was to bring over the first cut of this done on the HSC side.  The second step was to design and RFC the LSST version (DM-2477).

02C.03.08 -- Astrometric Calibration Pipeline

RO worked with Dominique Boutigny to resolve the regression observed in the astrometric solutions.  This work involved fixing several issues: distance field was not set in the match list (DM-2511), matching algorithm could return duplicates (DM-2735), astrometry task was made more configurable (DM-2737).

Planned activities:
02C.03.00 -- Alert Production Management Engineering and Integration
SK will host Melissa Graham as a candidate for LSST jobs.
 
02C.03.01 -- Single Frame Processing
RO will work on porting the HSC version of aperture corrections to the LSST stack.
 
02C.03.05 -- Application Framework for Exposures
YA will work on implementing approximation and interpolation design RFC'd in May.
SK will implement a default reference task to facilitate situations where users have their own reference catalogs.


Princeton University and University of California, Davis

This report covers work carried out in FM8 of FY15 in the Data Release Production group (staff at Princeton plus Price and Gee working remotely).

Current accomplishments:

02C.04.00 Data Release Production Management Engineering and Integration

Swinbank was travelling all month, spending time at the workshop “Hot-wiring the Transient Universe IV”, the Data Management Leadership Team meeting at NCSA, and “The Dynamic Universe” meeting at the Aspen Center for Physics. Bosch and Lupton also attended the DMLT meeting.

In advance of the DMLT meeting more detailed documentation on LDM-240 milestones was prepared by Bosch & Swinbank. This was presented at the meeting and will form the basis of future LDM-240 updates and the translation to the new DM Long Term Planning Project on JIRA.

The mid-cycle replan took place at the end of May. We deferred work on the PSF estimation pipeline until W16 due to spending more effort than expected on the completion of the measurement framework transition. We are prioritizing the completion of the merger of HyperSuprime Cam functionality to the LSST stack for the rest of this cycle.

Hiring:

Nate Lust started his position at Princeton. We anticipate that he will spend the next few months getting up to speed with the LSST stack, and therefore have not included him in our planning for the rest of this cycle, but will allocate stories for him to tackle on an ad-hoc basis.

After a protracted discussion, we obtained approval from Princeton HR to advertise for a software developer. We anticipate that this will be published next month.

02C.04.01 - Application Framework for Catalogs

A performance regression in the Footprint dilation code was fixed [DM-2787]. This was introduced as part of DM-1128 in November of last year, which introduced a new method manipulating Footprints based on run-length encoding. This provided substantial performance improvements when Footprints are small, but, when ported to the HyperSuprime Cam stack and used with real data, it showed a major regression with large Footprints occasionally found in real-world data. The fix not only resolves this regression but improves performance in all regimes tested over the original code.

 

A new interface was proposed for image display code in the stack was proposed [RFC-42] and implemented [DM-2709]. This provides a backend-agnostic display system, such that the same API can be used to work with DS9, Firefly or other packages. A backwards compatibility layer, which maintains the old interface to DS9, is provided.

02C.04.03 - PSF Estimation

No work was carried out under this WBS during this month.

02C.04.06 - Object Characterization Pipeline

Work continued on experiments to establish the key parameters for galaxy shear fitting [DM-1108], which is essential to understand both the algorithmic and computational requirements on fitting. This has focused on adapting the GREAT3 simulation code for use with the LSST stack and developing appropriate drivers to generate the test data required. This preparatory work is nearing completion and we expect to start running the required simulations shortly.

Improvement to the afw::table system [DM-1766] were undertaken to increase flexibility and remove legacy code. This work expanded significantly in scope over the original plan, resulting in a much improved codebase. However, this rather more ambitious reworking was not completed by the end of the month and spills over into June.

A Task to generate SkyMaps was ported from HyperSuprime Cam as part of the ongoing merger efforts [DM-2737].

Planned activities:

02C.04.00 Data Release Production Management Engineering and Integration

Swinbank & Lupton will both be at “The Dynamic Universe”, Aspen, at the start of the month. Swinbank will attend the IVOA Interoperability Meeting in Sexten, Italy, for the week of 15-19 June.

Items from the LDM-240 spreadsheet will be transferred to the DLP project in JIRA.

Continued long-term planning with input from Lim based on discussions at the DMLT meeting in May.

02C.04.01 - Application Framework for Catalogs

No work is explicitly planned for this WBS during this month, but we expect a number of bug fixes and minor improvements to be undertaken.


02C.04.03 - PSF Estimation


No work is expected to be undertaken under this WBS.


02C.04.06 - Object Characterization Pipeline


Priorities are:



·  Finishing the improvements to afw::table [DM-1766];



·  Continued work on galaxy shear performance [DM-1108];



·  Merger of the HyperSuprime Cam deblender [DM-1907];



·  Planning for the merger of other major HSC stack components.

 


IPAC / California Institute of Technology

Current accomplishments:

02C.05.00  

Mario visited for one day.

SUI developers had some good discussion with Mario, to hear his vision about SUI’s role in LSST and capabilities for the users. He was very excited about using Firefly as the visualization fronted for pipeline development debug. Mario actually packaged Firefly using EUPS and it has been merged into Firefly repository.

Mario also encouraged the team to pay attention to mobile development, make the front end look good, and keep a shallow learning curve in using the tool.

Xiuqin, David, and Gregory attended the DMLT meeting at NCSA.

SUI team is glad that we have a clear picture of responsibilities for alert system. We will continue to pay attention to the development of alert broker by UW so we are ready to work with it for subscription and filters setup.

Xiuqin thinks VO support is still not clear. But K-T can worry about it.

Xiuqin thinks the T/CAM day was a useful session, using Jira DLP for long term planning is good.

Xiuqin, Gregory, Tatiana met with Tony Johnson and Jon Thaler about camera team needs

Tatiana gave a demo of using Firefly APIs to display images and XY plot

Jon gave a demo of what his students have done

After some discussion, we concluded that supporting for camera team needs is almost in line with supporting L3 data products.

Tatiana and Gregory also participated the file system discussion withTony and Don.

02C.05.01 Basic Archive Access Tools

Working on adding more new image stretch algorithms (asinh and power law gamma) to Firefly

Developing the Python APIs to work with Firefly, allowing user to control data display with Python.

Had couple of telecoms with SLAC group, discussed the error and exception handling in the data access APIs.

02C.05.02 Data Analysis and Visualization Tools

Continued the development on the JavaScript APIs and Python APIs for Firefly visualization components.

Supported Robert Lupton in pipeline group writing application to use Firefly as image display.

Camera team prepared their FDR using Firefly to display the HSC image in focal plane.

02C.05.03 Alert/Notification Toolkit

David, Gregory, and John attended the HotWired conference for a week. There were other LSST DM members and the thad good discussions.

David wrote a LOI for the LSST science enabling grants in collaboration with Princeton, UW and LSST.

Planned activities:
02C.05.00
 
Summer vacation begins. Xiuqin: two weeks; Trey: one week; John: one week; Tatiana: one week.
Transfer LDM-240 road map to Jira project DLP.
 
02C.05.01 Basic Archive Access Tools
 
Continue the discussion of data access APIs with SLAC group if needed
Setup the access to NCSA hosts to test out the access to APIs and database
 
 02C.05.02 Data Analysis and Visualization Tools
 
Continue the development on the JavaScript APIs and Python APIs for Firefly visualization components.
Continue to work with Robert to improve the Python APIs.
Continue to work with Camera group to support their development.
Finish the new the stretch algorithm in Firefly for image visualization.
Start working on the Firefly server side Python APIs.

 
 


SLAC / Stanford University

Current accomplishments:

02C.06.00 Science Data Archive and Application Services Management Engineering and Integration

·   Coordinated May Sprint for the Data Access Team

·   Participated remotely in parts of T/CAM meeting @ NCSA as time permitted (collision with running XLDB conference & workshop), reviewed notes from other days

·   Mid-cycle replan. Documented in https://confluence.lsstcorp.org/display/DM/S15+planning+4+DB+team

·   Started documenting Qserv Summer 2015 release, see: https://confluence.lsstcorp.org/display/DM/Summer+2015+Qserv+Releas e

·   Started working on FY2016 budget

·   Interviews:

·   Nathan Peace (onsite). We have extended an offer and SLAC HR is finalizing it

·   Michael Landau (phone screening, and onsite)

·   Muthian George (on site)

·   Ignacio Aracena from SLAC (onsite)

·   Mariano Trigo from SLAC (onsite)

·   Discussions with HR on increasing visibility, presence on jobs.github.com: https://jobs.github.com/positions/a042cbca-f9a1-11e4-8f67-1b0e6e0a34be , and increased searching on LinkedIn

·   Organized weekly Qserv and Data Access meetings

·   Extended subcontract with TAMU for Vaikunth

·   Run the XLDB 2015 Conference and Workshop. It was VERY successful. Many talks, discussions and connections we made are directly relevant to LSST DB. Wrote a draft report from the workshop.

·   Bootstrapping organizing XLDB 2016, which will be at SLAC a year from now. We are trying to gradually delegate more work onto the community.

·   Discussions with NSF about organizing XLDB for government event in D.C.

·   Proposed simplifying common python modules (RFC-50)

·   Paperwork for IN2P3 subcontract for Fabrice

·   SLAC related

·   Prepared for Data Access and Database talk for the SLAC lsst-local group

·   discussions with Fermi CIO

·   discussions with SLAC CIO team

02C.06.01.01 Catalogs, Alerts and Metadata

·   No change / no progress. Ingest code (DM-210) still to-do here. Debugging problems with large scale tests had higher priority.

02C.06.01.02 Image and File Archive

· No change, webform still to-do
02C.06.02.01 Data Access Client Framework
· Scheduled butler discussion for early June
02C.06.02.02 Web Services
· Finished code for error handling.
02C.06.02.03 Query Services
· Finished loading data data to IN2P3 cluster, started running large scale tests. Debugging problems related to timing and freezing (at scale) inside recently written code (and debugging it is slower than we'd wish because the author is no longer with us)
· Built Qserv Release 2015_05
· Addd support for BIT, VAR and BINARY columns in queries
· Finished implementing query killing through Ctrl-C
· Documented data distribution research and proposed design
· Designed and implemented code that allows to exchange information between processes (to be used between xrootd and worker)
· Documented SSI interface



·   Documented structure of our custom ddl ascii schema

·   Migrated boost::thread and boost::shared_ptro to std::thread and std::shared_ptr, Improved g++4.9.2 support

·   Found and fixed several major leaks (thread, connections, memory).

 

02C.06.02.04 Image Services

· Finished implementing image response for ImageServ and image stitching.


02C.06.02.05 Catalog Services


  
  
· Finished implementing JSON results for Web Services.
Planned activities:
02C.06.00 Science Data Archive and Application Services Management Engineering and Integration
· Organize weekly Qserv and Data Access meetings
· Search for candidates for remaining open position
· FY16 budget planning
02C.06.01.01 Catalogs, Alerts and Metadata
· ingest (maybe – depends on progress with debugging large scale test problems)
02C.06.01.02 Image and File Archive
· Finish work on improvements to the form
02C.06.02.01 Data Access Client Framework
· Discuss / plan future butler work, wrap up butler v2
02C.06.02.02 Web Services
· Implement API versioning
· Implement RESTful python client
02C.06.02.03 Query Services
· Understand and solve race conditions and other problems with large scale tests
· Finish designing Data Distribution, start lightweight prototyping of data distribution
· Continue work on Qserv Refactoring (DM-1707) – migrate Qserv to ssi v2
· Finish work on Multi-node Multi-query integration testing harness
· Write Qserv User Guide
02C.06.02.04 Image Services
· Finish implementing image response for ImageServ
· Implement image stitching across tract boundaries
02C.06.02.05 Catalog Services
· Implement RESTful interfaces for database (GET)


NCSA / University of Illinois


Current accomplishments:
02C.07.00 Processing Control and Site Infrastructure Management
 
Most management effort in May was spent in meetings. A three-day face-to-face meeting of the LSST-DMLT was held at NCSA, with an additional day for T/CAM-related activities. The following week NCSA representatives traveled to CC-IN2P3 to establish operational coordination between the institutions. This was the first meeting of its kind since the MOU was signed. NCSA and IN2P3 representatives presented overviews, technical activities, and goals of their respective institutions. Areas of future LSST technical collaboration and areas of common operational concerns were identified. The meeting was very successful and resulted in an agreement of short-term areas of focus and a regular meeting proposal for the Joint Coordination Committee.
 
Additionally, LSST management at NCSA met with the ISO to review his draft SCADA plan and his presentation at the CCS-DAQ-OCS-DM Workshop.
 
Negotiations of the contract amendment for purchasing equipment proceeded in May. Several meeting with both NCSA and AURA property management and contract offices occurred to discuss existing property management and accounting procedures at NCSA and AURA property management’s needs. Comments and additions to the draft contract were written, but further progress was delayed by the meetings described above.
 
NCSA interviewed a strong candidate for the Systems Management Lead position and made an offer, but the candidate declined.
 
02C.07.01 Processing Control
 
Data Management Control System: Alert Production
 
(DM-2268) – Following the refactoring of the initial prototype of the Alert Production simulator in April, development began in May on AP simulator components. An API for reading simulated camera data was implemented. This API is meant to simulate data transfer from the outside to the replicator, specifically for receiving data from the OCS. Methods for performing the file transfers and for splitting incoming images for the replicator were added. An error in the way sensor location information was being reported in the DMCS was fixed.
 
(DM-2269) – An additional API was implemented in order to test different types of file transfers to and from the AP simulator. This API is meant to move data from the replicator to the distributors. These methods are distinct from those mentioned above that transfer data from the OCS to the replicator.
 
(DM-2263) – Effort was placed on reducing the startup time for new replicator jobs. A few solutions were considered, including suggestions from the HTCondor team to create pilot Condor jobs and folding in functionality of replicator jobs into the replicator node itself. The current perspective is that using pilot jobs is not appropriate for some functionalities needed of the replicator jobs. Further consideration and discussion with the HTCondor team is needed.
 
LOE (misc)
 
Additionally, cycles were spent on gathering and summarizing processing middleware results and metrics of the split data release processing (SDRP) campaign to assist Yusra AlSayyad (UW) in preparing a SDRP-based science talk.
 
02C.07.02 Infrastructure Services
 
Security and Access Control Services: ISO work
 
(DM-2489) – The major accomplishment by the Information Security Officer in May was drafting the SCADA security plan for the observatory control systems (motivated by requirements from LSE-30). Early in the month an outline of this plan was presented at the CCS-DAQ-OCS-DM Workshop. Four levels of security were identified: host-level, network-level, data-level, and physical-level. After feedback and discussions at the Workshop a complete draft was written and distributed to the German Schumacher from the T&S team and the LSST project manager for comments.
 
(DM-2852) – Minor work was also done on identity management and towards creating an AAA (Authentication, Authorization and Accounting) requirements document. During his visit to NCSA in May the ISO met with Tony Johnson (SLAC) to identify AAA requirements and to discuss CAS, the single-sign on system used by the Fermi/GLAST project. CAS has the potential to delegate to a variety of authentication systems, making it a promising part of a broader system of identity management.
 
System Administration and Operations Services: Configuration Management (Puppet)
 
(DM-2237) – In May work continued on setting up Puppet for system configuration management. The final servers were added to Puppet environments, and now all servers in the LSST cluster are managed by Puppet. The environment was set up to provide streamlined management of vmtools, maintenance scheduling, storage drives, Enterprise Linux base packages. Additional functionalities including sudo configurations, admin accounts and prerequisites for Ceph file system were added.
 
File System Services: File System Research & Prototyping
 
(DM-2539) – Significant effort this month was placed on analyzing shared file system solutions. Implementation of the ZFS file system on the lsst-stor141 storage servers was completed. The pool was set up with four sets of seven RAIDZ2 disks and has a redundancy of up to two disk failures per set. The ZFS configuration was tuned to with adequate performance and so can be used to compare the performance of other file systems.
 
(DM-2540) – Some tuning of the 10G network improved the performance of NFS; however it was concluded that no further research would be done to optimize NFS in order to focus on other file system solutions that could replace it (see below).
 
(DM-2541) – Investigation of Ceph as a replacement networked file system continued in May. Early in the month the LSST system engineers at NCSA met with Dell’s Enterprise Ceph team to get an understanding of Calamari (Ceph management/monitoring service), data integrity, ZFS integration, and the Ceph metadata server (MDS).
 
(DM-2695) – Following this meeting, prototyping CephFS started. Ceph was deployed on spare storage servers and testing began, including setting up and configuring OSD storage servers and Rados block device clients at 1/10th desired network speed.
(DM-2578) – Investigation of BeeGFS as a replacement networked file system began in May. This is particularly interesting because of its strong performance with I/O intensive workloads, potentially useful for LSST shared scratch file systems. Basic documentation was reviewed to get an understanding of the general architecture and capabilities of BeeGFS.
 
(DM-2696) – Investigation of GPFS began in May, specifically into what is needed for LSST GPFS server access to the NCSA Storage Condo. Discussions occurred with the NCSA Storage Enabling Technologies (SET) group, who manages GPFS storage for DESDM project on the NCSA Condo. DESDM uses a single GPFS server with a 10G network card and then is able to mount their storage onto GPFS nodes in their remote clusters. To test this performance and compare to the performance of an NFS mount, at the end of the month the SET team set up a VM GPFS server for the LSST system engineers at NCSA.
 
File System Services: File Management Technology
 
(DM-2572) – In May analysis of iRODS as a file management technology continued. Most focus was placed on investigating how iRODS tools and services can detect and repair corrupt files. After utilizing iRODS microservices to create data replication rules (see below), some work was done to create scheduling rules. However, this activity was extended when it was discovered that the microservice, which was written in iRODS version level 3.x was deprecated in version level 4.x. It was decided that work will proceed with testing the functionality created using iRODS 3.x, but also examining new methodology for handling file corruption in iRODS 4.x.
(DM-2692) – The first step in handling file corruption was to ensure that extra copies of data were being maintained. To facilitate the investigation describe above, an iRODS system rule was written to automatically replicate data to a separate resource.
 
02C.07.03 Environment and Tools
 
Environment and Tools: Deployment plan for version 1 of OpenStack
 
(DM-1273) – The NCSA team has been performing initial testing on how best to deploy an OpenStack cluster for LSST DM. Previous work has been done on the NCSA Innovative Systems Lab testbed and old spare hardware. Further development requires purchasing new equipment, the process of which is still TBD. Thus, no work was done on this during May.
 
02C.07.04 Site Infrastructure
 
Development and Integration Infrastructure: Setup Qserv prototype for Qserv and SUI teams
 
(DM-2327) – Due to the pending equipment procurement contract, NCSA has been unable to buy capabilities for Qserv and SUI developers. In lieu of purchasing new hardware, NCSA decided in May to remove two hosts from the LSST development cluster and repurpose them for the SUI team, providing their developers with a dedicated platform. The system administration team at NCSA identified the two servers and worked with the SUI team to set up user access.
 
Archive Site External Network: Wide-Area Network Work
 
In May the networking engineer at NCSA accomplished various administrative tasks for the LSST project, including, participating in the network end-to-end meeting (DM-2757), documenting the long-haul network status and setting up a documentation structure for coordinating LHN perfsonar deployment (DM-2758), onboarding the newly-hired lead on perfsonar deployments in Chile (DM-2759), and documenting the NCSA wide-area network landscape for inclusion in the LSE-78 Observatory Network Design update (DM-2801). Additional work included installing, configuring and testing the performance of the LSST perfsonar host at NCSA (DM-2756).
 
LOE (sys admin)
 
Several system administration activities emerged in May. The LSST servers were audited to confirm documentation and Puppet and Nagios configurations were all up-to-date and to identify spare servers that could be provisioned for file system prototyping (four were located). Two VMs were created in anticipation of upcoming OCS software integration with the AP. The team assisted a developer who reported slow data transfers between UC-Davis, fixed a network outage on one of the storage servers, and analyzed the performance of SATA and SAS storage system drives. Monthly maintenance of the LSST systems occurred mid-month. Finally, a member of the sys admin team attended an O’Reilly Velocity Conference (velocityconf.com) where he gained insight about current web performance and operations trends, including DevOps, Linux performance, Docker, security, and team dynamics.
 
Planned activities:
 
02C.07.00 Processing Control and Site Infrastructure Management
 
Management staff will be thin in June due to several vacations and participation in a week-long DOE meeting. Effort will be focused on staffing and reorganizing. With the loss of a FTE in late May, plans from the meeting with CC-IN2P3, and the current status of the current procurement contract, the team will need to reprioritize, re-organize and re-scope their plans for the remainder of the S15 cycle. We also plan to coordinate with the LSST System Engineer to consider long-term plans. Additionally, further progress on finalizing the contract amendment is expected.
 
02C.07.01 Processing Control
 
In June, work on the Alert Production simulator will include addressing orphaned threads in the DMCS and following up with the HTCondor team about the startup time of Condor jobs. Integration of the OCS software with the AP simulator is expected to begin utilizing the VMs set up by the sys admin team and eventually attempting to move the VMs to Docker containers.
 
02C.07.02 Infrastructure Services
 
The ISO’s next goal is to investigate authentication and authorization services, including API keys. He will consider the CAS system suggested by Tony Johnson, and he also plans to attend a network security conference in Portland mid-June.
Work on configuration management using Puppet will continue with testing modules for managing user access and, more fundamentally, customizing server roles.
 
To proceed with file system research, effort will be placed on creating evaluation tools that measure I/O patterns and perform stracing. The goal is to generate metrics for comparing performance of applications on file systems under consideration.
As described above, work on handling file corruption in iRODS will proceed with testing the functionality already created using iRODS 3.x and examining new methodology in iRODS 4.x.
 
02C.07.03 Environment and Tools
 
As above, the process for purchasing is still TBD, so no work on this is expected in June.
 
02C.07.04 Site Infrastructure
 
In June the system engineers will setup the two servers for the SUI team and work with them to get the system configured for their development needs.
 
The cycle plan was to begin an infrastructure refresh in June under the assumption that the procurement contract agreement would be signed, but the amendment is still in negotiations so this activity will not be starting in June.
 
Planned maintenance in June includes the continued upgrade of the server network cards to use IPMI.

 


NOAO


Current accomplishments:
02C.08.01 Base Center
 
Did some preliminary work to calculate the rack count for the Base DC. Attended DC Design and LSST Buildings on the AURA compound preliminary reviews with Jeff Barr et al.
 
02C.08.03 Long-Haul Networks
 
We held meeting with Reuna and Fernando Liello to discuss procurement process and timeline for the purchase of the DWDM end nodes for the Summit to Base links and the La Serena to Santiago installed by Reuna. Dates in July are set for the open day meetings for RFI with at least six vendors.
 
Telefonica representative came to La Serena to explore the route from Cerro Tololo to Cerro Pachon for installation of the fiber cable which is a portion of the Summit to Base link. As of the end of May, Telefonica still has not provided an estimate for this work.
 
Added more content to Jira for the June updates on DLP, Meta Epic, Epic and Stories for the Chilean Networks
 
There was held in Tucson the International Network Review by Chip Cox. I attended remotely. Basically Ampath will expect to provide three paths of 100G from Santiago to Miami and Boca Raton. One of those is the Monet cable that is expected to be a layer 1 connectivity from La Serena to Chicago and potentially NCSA. Work still remains to be completed on the US mainland connectivity but it is expected ESNET and I2 will provide the networks.
 
Attended SAACC meeting in Santiago.
 
Planned Activities
 
02C.08.01 Base Center
 
Further work on Data Center design to support AE contract process.
 
02C.08.03 Long-Haul Networks
 
Meet with Reuna and Telefonica to discuss Pachon to Tololo fiber run. This will involve third party contractors versed in fiber cable installations.
 
Attend CiscoLive in San Diego for information on DC network designs.
 
Point of Interest. Although we have three potential paths there have been frequent outages recently on the Miami to Santiago link. For some considerable time the blame game was played out between Ampath and Reuna as the fault was not obvious. Eventually it was found to be Ampath equipment in the Level3 POP in Huechuraba. The point being that LSST should endeavour to avoid single points of failure (Level3 and Chicago) and that the planned operational personnel in the end points and Providers’ location are paramount to low downtime. Chip has already stressed the importance to Reuna to cultivate a connection with the landing stage in Valparaiso and one that we should give support.
 
 

Back to top