1. LargeSynopticSurveyTelescope(LSST)
  2. Concept of Operations for the LSSTDataFacilityServices
  3. D. Petravick and M. Gelman
  4. LDM-230
  5. LatestRevision: 2017-07-04
  6. Abstract
  7. ChangeRecord
  8. Contents
      1. 1 ScopeofDocument 1
      2. 2 ServicesforObservatoryOperations 1
      3. 3 ServicesforOfflineCampaignProcessing 18
      4. 4 DataAccessHostingServicesforAuthorizedUsers 27
      5. 5 Data,Compute,and ITSecurityServices 28
      6. 6 ITCProvisioningandManagement 43
      7. 7 ServiceManagementandMonitoring 44
      8. 8 Acronyms 51
  9. Concept of Operations for the LSST Data Facility
  10. Services
  11. 1 Scope of Document
  12. 2 ServicesforObservatoryOperations
      1. 2.1 LSSTCamPromptProcessingServices
      2. 2.1.1 Scope
      3. 2.1.2 Overview
      4. 2.1.3 OperationalConcepts
      5. 2.1.3.1 NormalOperations
      6. 2.1.3.2 Operational Scenarios
      7. 2.2 LSSTCamArchivingService
      8. 2.2.1 Scope
      9. 2.2.2 Overview
      10. 2.2.3 OperationalConcepts
      11. 2.2.3.2 Operational Scenarios
      12. 2.3 Spectrograph Archiving Service
      13. 2.3.1 Scope
      14. 2.3.2 Overview
      15. 2.3.3 OperationalConcepts
      16. 2.3.3.2 Operational Scenarios
      17. 2.4 EFD ETL Service
      18. 2.4.1 Scope
      19. 2.4.2 Overview
      20. 2.4.3 OperationalConcepts
      21. 2.4.3.1 NormalOperations
      22. 2.4.3.1.3 Reformatted EFD Operations
      23. 2.4.3.2 Operational Scenarios
      24. 2.5 OCS-DrivenBatchService
      25. 2.5.1 Scope
      26. 2.5.2 Overview
      27. 2.5.2.1 Description
      28. 2.5.2.2 Objective
      29. 2.5.3 OperationalConcepts
      30. 2.5.3.1 NormalOperations
      31. 2.5.3.2 Operational Scenarios
      32. 2.6 ObservatoryOperationsDataService
      33. 2.6.1 ScopeofDocument
      34. 2.6.2 Overview
      35. 2.6.2.4 Risks
      36. 2.6.3 OperationalConcepts
      37. 2.6.3.1 NormalOperations
      38. 2.6.3.2 Operational Scenarios
      39. 2.7 ObservatoryOperationsQAandBaseComputingTaskEndpoint
  13. 3 ServicesforOfflineCampaignProcessing
      1. 3.1 BatchProductionServices
      2. 3.1.1 Scope
      3. 3.1.2 Overview
      4. 3.1.3 OperationalConcepts
      5. PlatformType Notes
      6. 3.1.3.2 Operational Scenarios
  14. 4 DataAccessHostingServicesforAuthorizedUsers
      1. 4.1 UserDataAccessServices
      2. 4.2 BulkDataDistributionService
      3. 4.3 HostingofFeedstoBrokers
  15. 5 Data,Compute,and ITSecurityServices
      1. 5.1 DataBackboneServices
      2. 5.1.1 Scope
      3. 5.1.2 Overview
      4. 5.1.2.2 Objective
      5. 5.1.2.3 OperationalContext
      6. 5.1.3 OperationalConcepts
      7. 5.1.3.1 Operational Scenarios
      8. 5.2 ManagedDatabaseServices
      9. 5.2.1 Scope
      10. 5.2.2 Overview
      11. 5.2.2.1 Description
      12. 5.2.3 OperationalConcepts
      13. 5.3 BatchComputingandDataStagingEnvironmentServices
      14. 5.3.1 Scope
      15. 5.3.2 Overview
      16. 5.3.3 OperationalConcepts
      17. 5.4 Containerized Application Management Services
      18. 5.4.1 Scope
      19. 5.4.2 Overview
      20. 5.4.3 OperationalConcepts
      21. 5.5 Network-based ITSecurityServices
      22. 5.5.1 Scope
      23. 5.5.2 Overview
      24. ProductionService SecurityEnclave
      25. 5.5.3 OperationalConcepts
      26. 5.6 Authentication and Authorizations Services
  16. 6 ITCProvisioningandManagement
  17. 7 ServiceManagementandMonitoring
      1. 7.1 ServiceManagementProcesses
      2. 7.1.1 Overview
      3. 7.2 ServiceMonitoring
      4. 7.2.1 Scope
      5. 7.2.2 Overview
      6. 7.2.3 OperationalConcepts
      7. 7.2.3.1 NormalOperations
      8. Reliance Subordinate monitoring interfaces pro-
      9. vided
      10. Entity Need Notes
      11. 7.2.3.1.2 Batch Production Services
      12. 7.2.3.1.3 DataBackboneServices
      13. 7.2.3.1.4 DataAccessHostingServices
  18. References
  19. 8 Acronyms
      1. Acronym Description

Draft
LargeSynopticSurveyTelescope(LSST)

Back to top


Concept of Operations for the LSST
DataFacilityServices

Back to top


D. Petravick and M. Gelman

Back to top


LDM-230

Back to top


LatestRevision: 2017-07-04
DraftRevisionNOTYETApproved
– ThisLSSTdocumenthasbeenapprovedasaContent-Controlled
Document by the LSST DM Technical Control Team. If this document is changed or superseded,
the new document will retain the Handle designation shown above. The control is on the most
recent digital document with this Handle in the LSST digital archive and not printed versions.
Additional information may be found in the corresponding DM RFC. –
Draft Revision NOT YET
Approved
LARGESYNOPTICSURVEYTELESCOPE

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04

Back to top


Abstract
This document describes the operational concepts for the emerging LSST Data Fa-
cility, which will operate the system that will be delivered by the LSST construction
project. The services will be incrementally deployed and operated by the construc-
tion project as part of verification and validation activities within the construction
project.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
ii

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04

Back to top


ChangeRecord
Version Date
Description
Ownername
1
2013-05-22 Initialrelease.
Kian-TatLim
1.1 2013-09-10 Updates resulting from Process Control and
DataProductsReviews
Kian-TatLim
1.2 2013-10-10 TCTapproved
R.Allsman
2016-05-8 Beginningto enderworkinggroupschemaas
more complete view of operational need as a
basisforplanning.
D.Petravick
2017-06-26 Import draft versions into single document,
with updates based on evolved operational
concepts.
M.Gelman
Document source location:
https://github.com/lsst/LDM-230
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
iii

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04

Back to top


Contents
1 ScopeofDocument
1
2 ServicesforObservatoryOperations
1
2.1 LSSTCamPromptProcessingServices ......................... 2
2.1.1 Scope ........................................ 2
2.1.2 Overview ...................................... 2
2.1.3 OperationalConcepts .............................. 3
2.2 LSSTCamArchivingService ............................... 7
2.2.1 Scope ........................................ 7
2.2.2 Overview ...................................... 7
2.2.3 OperationalConcepts .............................. 8
2.3 SpectrographArchivingService ............................. 9
2.3.1 Scope ........................................ 9
2.3.2 Overview ...................................... 9
2.3.3 OperationalConcepts .............................. 10
2.4 EFDETLService ...................................... 10
2.4.1 Scope........................................ 11
2.4.2 Overview ...................................... 11
2.4.3 OperationalConcepts .............................. 12
2.5 OCS-DrivenBatchService ................................ 14
2.5.1 Scope........................................ 14
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
iv

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
2.5.2 Overview ...................................... 14
2.5.3 OperationalConcepts .............................. 15
2.6 ObservatoryOperationsDataService ......................... 15
2.6.1 ScopeofDocument ............................... 15
2.6.2 Overview ...................................... 15
2.6.3 OperationalConcepts .............................. 18
2.7 ObservatoryOperationsQAandBaseComputingTaskEndpoint ......... 18
3 ServicesforOfflineCampaignProcessing
18
3.1 BatchProductionServices ................................ 19
3.1.1 Scope........................................ 19
3.1.2 Overview ...................................... 19
3.1.3 OperationalConcepts .............................. 20
4 DataAccessHostingServicesforAuthorizedUsers
27
4.1 User Data Access Services ................................ 27
4.2 BulkDataDistributionService ............................. 28
4.3 HostingofFeedstoBrokers ............................... 28
5 Data,Compute,and ITSecurityServices
28
5.1 DataBackboneServices ................................. 29
5.1.1 Scope........................................ 29
5.1.2 Overview ...................................... 29
5.1.3 OperationalConcepts .............................. 30
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
v

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
5.2 ManagedDatabaseServices............................... 31
5.2.1 Scope........................................ 31
5.2.2 Overview ...................................... 32
5.2.3 OperationalConcepts .............................. 33
5.3 BatchComputingandDataStagingEnvironmentServices ............. 33
5.3.1 Scope........................................ 33
5.3.2 Overview ...................................... 34
5.3.3 OperationalConcepts .............................. 35
5.4 Containerized Application Management Services .................. 35
5.4.1 Scope........................................ 35
5.4.2 Overview ...................................... 36
5.4.3 OperationalConcepts .............................. 37
5.5 Network-based ITSecurityServices .......................... 37
5.5.1 Scope........................................ 37
5.5.2 Overview ...................................... 37
5.5.3 OperationalConcepts .............................. 40
5.6 Authentication and Authorizations Services ..................... 42
6 ITCProvisioningandManagement
43
7 ServiceManagementandMonitoring
44
7.1 ServiceManagementProcesses ............................ 44
7.1.1 Overview ...................................... 44
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
vi

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
7.2 ServiceMonitoring .................................... 45
7.2.1 Scope........................................ 45
7.2.2 Overview ...................................... 45
7.2.3 OperationalConcepts .............................. 47
8 Acronyms
51
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
vii

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04

Back to top


Concept of Operations for the LSST Data Facility

Back to top


Services

Back to top


1 Scope of Document
ThisdocumentdescribestheoperationalconceptsfortheemergingLSSTDataFacility,which
will operate the data management system as a set of services that will be delivered by the
LSSTconstructionproject.Theseserviceswillbeincrementallystoodupandoperatedbythe
construction project as part of validation and verification activities within the construction
project.

Back to top


2 ServicesforObservatoryOperations
The LSST Data Facility provides a set of services that supports specific functions of Observa-
toryOperationsandgeneratesLevel1(L1)dataproducts.TheseLevel1servicesinclude:
• APromptProcessingServiceforAlertProductionforwide-fieldandtargeteddeep-drilling
observingprograms,includingprovidingdatasupportfordifferenceimagetemplates
andcalibrations,Level1databases,interactionwiththealert-to-brokerdistributionsub-
system,andprovidingfeedbacktoobservers.
• APromptProcessingServiceforassessingthequalityofnightlycalibrationexposures.
• APromptProcessingServiceforassessingexposuresfromtheCollimatedBeamProjec-
tor,usedaspartoftelescopeopticalpathcalibration.
• An “Offline” L1 Batch Processing Service, not commanded by OCS, to facilitate catch-
up processing for use cases involving extensive networking or infrastructure outages,
eprocessing of image parameters used by the Scheduler, pre-processing data ahead
of release production for broker training, and other emergent use cases as directed by
projectpolicy.
• An Archiving Service for acquiring raw image data from the LSST main camera and in-
gesting it into the Data Backbone.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
1

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
• AnArchivingServiceforacquiringrawdatafromthespectrographontheAuxiliaryTele-
scope and ingesting it into the Data Backbone.
• AnExtract,Tansform,andLoad(ETL)ServicefordatastoredintheEngineeringFacilities
DatabaseattheObservatory.
• An OCS-driven Batch Processing Service for Observatory Operations to submit batch
jobs via the OCS environment either to NCSA or to the Commissioning Cluster at the
BaseSite.
• A QA and Base Computing Task Endpoint that allows fast and reliable access through
the QA portal to recently acquired data from the Observatory instruments and other
designateddatasets,andabilitytosubmitbatchjobs,supportingoperationsattheBase
Center.
• An Observatory Operations Data Service that allows fast and reliable access to data re-
centlyacquiredfromLSSTcamerasanddesignateddatasetsheldintheDataBackbone.
The concept of operations for each of these services is described in the following sections.
2.1 LSSTCamPromptProcessingServices
2.1.1 Scope
This section describes the prompt processing of raw data acquired from the main LSST cam-
eraby the DM system.
2.1.2 Overview
2.1.2.1 Description
Duringnightlyoperations, theDMsystemacquiresimagesfromthe
main LSST camera as they are taken, and promptly processes them with codes specific to an
observingprogram.
2.1.2.2 Objective
The LSSTCam Prompt Processing Services provide timely processing of
newly acquired raw data, including QA of images, alert processing and delivery, eturning
imageparameterstotheObservatory,andpopulatingtheLevel1Database.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
2

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
2.1.2.3 OperationalContext
PromptProcessingisaserviceprovidedbytheLSSTData
Facility as part of the Level 1 system. It is presented to Observatory Operations as an OCS-
commandable device. The Prompt Processing Service etrieves crosstalk-corrected pixel data
from the main LSST camera at the Base Center, builds FITS images, and sends them to NCSA
forpromptprocessing.
2.1.3 OperationalConcepts
2.1.3.1 NormalOperations
2.1.3.1.1 Science Operations
Science data-taking occurs on nights when conditions are
suitable. For LSST, this means all clear nights, even when the full moon brightens the night
sky. Observing is directed by an automated scheduler. The scheduler considers observing
conditions, for example, the seeing, the phase of the moon, the atmospheric transparency,
and the part of the sky near the zenith. The scheduler is also capable of receiving external
alerts, for example, announcements from LIGO of a gravitational wave event. The scheduler
also considers equired observing cadence and depth of coverage for the LSST observing
programs.
About90%ofobservingtimeis eservedfortheLSST “wide-fast-deep” program. Inthispro-
gram,observationswillbeonthewide-fieldtwo-image-per-visitcadence,inwhichsuccessive
observationswillbeinthesame filterwithnoslewofthetelescope.However,anewprogram,
potentially with a new filter, a larger slew, a different observing cadence, or a different visit
structure,canbescheduledatanymoment.
Another evisioned program is “deep drilling”, where many more exposures than the two ex-
posure visit will be taken.
In practice, science data-taking will proceed when conditions are suitable. Calibration data
may be taken when conditions are not suitable for further observations, with science data-
takingresumingwhenconditionsagainbecomesuitable.
It follows that the desired behavior for science data-taking operations is to start the Prompt
Processing system at the beginning of the night and to turn off the system after all processing
forallscienceobservationsis finished.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
3

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
Theoperationalframeworkforobservingdisclosesnofutureknowlegeaboutwhatexposures
will be taken, until the “next visit” is determined.
During science data-taking the Prompt Processing Service computes and promptly eturns
QA parameters (referred to as “telemetry”) to the observing system. The QA parameters are
not specific to an observing program; examples are seeing and pointing corrections derived
from the WCS. These parameters are not strictly necessary for observing to proceed – LSST
can observe autonomously at the telescope site, if need be. Also note that the products are
qualityparameters,notthethe “up-or-down” qualityjudgement.
The scheduler may be sensitive to a subset of these messages and may decide on actions, but
a detailed description is TBD and may vary as the survey evolves. The scheduler can make
use of these parameters even if delivered quite late, since the scheduler uses historical data
in addition to recent h data.
The Prompt Processing system also executes code that is specific to an observing program.
For science exposures, the dcode is divided into a front-end – that is abel to compute the
parameters sent back to the observator, and a back end , Alert Production (AP), is the specific
sciencecodethatdetectstransientobjects.
Thedetectedtransientsarepassedoff toanotherservince,which ecoredthedatadateina
catalog which can be queried offline and sends to an ensemble of transient brokers. Data are
transmittedtoenduserseitherviafeedsfromanLSST-provisionedbrokerorviacommunity-
providedalertbrokers.
AP runs in the context of the wide-fast-deep survey, deep drilling program and TBD other
programs. OtherobservingprogramsmayalsoincludeAPasasciencecode, ormayhave
codes of their own.
2.1.3.1.2 CalibrationOperations
Inadditiontocollectingdataforscienceprograms, the
telescopeandcameraareinvolvedinmanycalibrationactivities.
the Baseline Claibraitons include flats and biases. Dariks are nto anticipated. LSST has an
additionalcalibrationdevvicesinitsbaselineacollumantedBeamprojectoraswell.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
4

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
Nominally, a three-hour period each afternoon is scheduled for Observatory Operations to
take dark and flat calibration data. As noted above, calibration data may be taken during
the night when conditions are not suitable for science observations. As well, the LSST dome
is specified as being light-tight, enabling certain calibration data to be collected whenever
operationally feasible, egardless of the time of day.
Althoughtherearestandardcadencesforcalibrationoperations,thefrequencyofcalibration
data-takingissensitivetothestabilityofthecameraandtelescope.Certainprocedures,such
asreplacementofa filter,cleaningofamirror,andwarmingofthecamera,maysubsequently
equireadditionalcalibrationoperations. Ingeneral,calibrationoperationswillbemodified
over the lifetime of the survey as understanding of the LSST telescope and camera improves.
The Prompt Processing Service computes and promptly eturns QA parameters (referred to
as “telemetry”) to the observing system. Note that the quality of calibrations needed for
Prompt Processing science operations may be less stringent than calibrations needed for
otherprocessing,suchasannualreleaseprocessing.
An operations strawman, which illuminates the general need for prompt processing, is that
therearetwodistinct,high-leveltypesofcalibrations.
• Nightly flats, biases and darks consist of about 10 broad-band flatfield exposures in
eachcamera filter,about10biasframesacquiredfromrapidreadsofanun-illuminated
camera,andoptional10darkimagesacquiredfromreadsofanun-illuminatedcamera
at the cadence of the expected exposure time. Observers will consider the collection of
these nightly calibrations as a single operational sequence that is typically carried out
priortothestartofnightlyobserving.ThePromptProcessingsystemcomputesparam-
eters for quality assessment of these calibration data, and eturns the QA parameters
to the observing system. Examples of defects that could be detected are the presence
of light in a bias exposure and a deviation of a flat field from the norm, indicating a
possible problem with the flat-field screen or its illumination. The sequence is consid-
ered complete when processing (which necessarily lags acquisition of the pixel data) is
finishedoraborted.
• Narrow-band flatsandcalibrationsinvolvingthecollimatedbeamprojectorhelpdeter-
mine the response of the system, as a function of frequency, over the optical surfaces.
Theprocessofcollectingthesecalibrationsislengthy;thebandpassoverallLSST filters
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
5

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
(760nm)islargecomparedtothe 1nmilluminationsource, andoperationsusingthe
CBP must be repeated many times as the device is moved to sample all the optical paths
in the system. The length of time needed to collect these calibrations leads to the re-
quirementthatthePromptProcessingsystembeavailableduringtheday.
Time for an absolutely dark dome, which is important for these calibrations, is subject to
anoperationalschedule.Thisscheduleneedstoprovideformaintenanceandimprovement
projectswithinthedome. Thesecalibrationsmaybetakenoncloudynightsoranyother
time.Becausetheseoperationsarelengthy,andtimetoobtainthecalibrationsquitepossibly
precious, prompt processing is needed to run QA codes to help assure that the data are
appropriate for use. Note, the prompt processing system will not be used to construct these
calibrations.
Considerationofthelengthycalibrations,andthecomplexityofschedulingthemmeansthat
the system must be reasonably available when needed. An approach equires minimal coor-
dination between the observing and the archive center, which as described below is respon-
sible for maintenance of the system, is a default daily maintenance window, with deviations
negotiatedasneeded.
2.1.3.2 Operational Scenarios
2.1.3.2.1 CodePerformanceProblems
Shouldacoderunlongerthanbudgeted, and
the pace of processing fail to keep up with the pace of images, operations input is needed
because there are trade-offs, as this would affect the production of timely QA data. How-
ever, note that when this situation occurs no immediate human intervention is needed. The
PromptProcessingsystemprovidesanumberofpolicies(whichareTBD)toObservatoryOp-
erations that can be selected via the OCS interface. These policies are used to prioritize the
need for prompt production of QA versus completeness of science processing, decide the
conditions when science processing should abort due to timeout, and determine how any
backlogofprocessing(includingbacklogscausedbyproblemswithinternationalnetworking)
is managed. The policies may need to be sensitive to details of the observation sequence;.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
6

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
2.1.3.2.2 OfflineBackup
Whenneeded, allofthisprocessing, includingbothgenerating
QA parameters and running science codes, can be executed under offline conditions at the
Archive Center at a later time. The products of this processing may still be of operational or
scientific value even if they are not produced in a timely manner. Considering Alert Produc-
tion, for example, while alerts may not be transmitted in offline conditions, transients can
stillbeincorporatedintotheportionoftheL1Databasethat ecordstransients. QAimage
parameters used to gauge whether an observation meets quality equirements can still be
produced and ingested to the OCS system.
2.1.3.2.3 ChangeControl
UpgradestotheLSSTCamPromptProcessingServicesarepro-
duced in the LSST Data Facility. Change control of this function is coordiniated with the Ob-
servatory, with the Observatory having an absolute say about insertion and evaluation of
changes.
2.2 LSSTCamArchivingService
2.2.1 Scope
Thissectiondescribestheconceptofoperationsforarchivingdesignatedrawdataacquired
from the main LSST camera to the permanent archive.
2.2.2 Overview
2.2.2.1 Description
The LSSTCam Archiving Service acquires pixel and header data and
arranges for the data to arrive in the Observatory Operations Data Server and in the Data
Backbone.
2.2.2.2 Objective
Theobjectiveofthissystemistoacquiredesignatedrawdatafromthe
LSST main camera and header data from the OCS system, and to place appropriately format-
ted data files in the Data Backbone. The service needs to have the capability of archiving at
the nominal frame rate for the main observing cadence, and to perform “catch up” archiving
at twice that rate.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
7

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
2.2.2.3 OperationalContext
LSSTCamArchivingisaserviceprovidedbyLDFaspartof
the Level 1 system. It is presented to Observatory Operations as an OCS-commandable de-
vice. Thearchivingsystemoperatesindependentlyfromrelatedobservatoryservices,such
as data acquisition, as well as other Level 1 services, such as prompt processing. However,
a normal operational mode is operation of the service such that data are ingested promptly
intothepermanentarchiveandintotheObservatoryOperationsDataServerforfast-access
byObservatoryOperationsstaff.
2.2.3 OperationalConcepts
2.2.3.1 NormalOperations
TheLSSTCamArchivingServicerunswheneveritisneeded.
Operationalgoalsaretoprovidepromptarchivingofcameradataandtoprovideexpeditious
catch-uparchivingafterserviceinterruptions.
LSSTCam data is, by default, ingested into the permanent archive and into the Observatory
Operations Data Server. However, while all science and calibration data from the main cam-
era equire ingest into the Observatory Operations Server, some data (e.g., one-off calibra-
tions,engineeringdata,smoketestdata,etc.) maynot equirearchivingintheDataBackbon
permanentstore.ObservatoryOperationsmaydesignatedatawhichwillnotbearchived.
2.2.3.2 Operational Scenarios
2.2.3.2.1 Delayed Archiving
In delayed archiving, Observatory Operations may need to
prioritizetheingestionofdataintothearchivingsystembasedonoperationaldependencies
with the Observatory Operations Data Service. The archiving service provides a number of
policies(whichareTBD)toObservatoryOperationsthatcanbeselectedviatheOCSinterface
inordertoprioritizedataingestion.
Otheroperationalparametersofinterestincluderate-limitingwhennetworkbandwidthisa
concern.
2.2.3.2.2 ChangeControl
UpgradestotheLSSTCamArchvingServiceareproducedinthe
LSSTDataFacility.ChangecontrolofthisfunctioniscoordiniatedwiththeObservatory,with
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
8

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
theObservatoryhavinganabsolutesayaboutinsertionandevaluationofchanges.
2.3 Spectrograph Archiving Service
2.3.1 Scope
Thissectiondescribestheconceptofoperationsforarchivingrawdataacquiredfrominstru-
mentsontheAuxiliaryTelescopetothepermanentarchive.
2.3.2 Overview
2.3.2.1 Description
TheAuxiliaryTelescopeis aseparatetelescopeatthe Summitsite,
located on a ridge adjacent to that of the main telescope building. This telescope supports a
spectrophotometerthatmeasuresthelightfromstarsinverynarrowbandwidthscompared
tothe filterpassbandsonthemainLSSTcamera. Thepurposeofthespectrophotometer
is to measure the absorption, which is how light from astronomical objects is attenuated as
it passes through the atmosphere. By pointing this instrument at known “standard stars”
that emit light with a known spectral distribution, it is possible to estimate the extinction.
This information is used to derive photometric calibrations for the data taken by the main
telescope.
The Auxiliary Telescope camera produces 2-dimensional CCD images, but the headers and
associatedmetadataaredifferentthantheLSSTCamdatabecausespectra,notimagesofthe
sky, are ecorded. The Auxiliary Telescope slews 1:1 with the main LSSTCam, which implies
two exposures every 39 seconds.
From the point of view of LSST Data Facility Services for Observatory Operations, the spec-
trographontheAuxiliaryTelescopeisanindependentinstrumentthatiscontrolledindepen-
dentlyfromthemainLSSTCam. Thus, theoperationsofandchangestoLSSTDataFacility
servicesforthisinstrumentmustbedecoupledfromallothers.
TheAuxiliaryTelescopeanditsspectrographaredevicesunderthecontroloftheobservatory
controlsystem(OCS).ThespectrographcontainsasingleLSSTCCD.TheCameraDataSystem
(CDS) for the single CCD in the spectrograph uses a readout system based on the LSSTCam
electronics and will present an interface for the Archiver to build FITS files. Telescope data
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
9

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
productsaredescribedLSE-140.
2.3.2.2 Objective
The Spectrograph Archiving Service reads pixel data from the Spectro-
graph verison of the CDS and metadata available in the overall Observatory Control System
and builds FITS files. The service archives the data in a way that the data are promptly avail-
able to Observatory Operations via the Observatory Operations Data Service, and that the
data appear in the Data Backbone.
2.3.3 OperationalConcepts
ArchivingisundercontrolofOCS,withthesamebasicoperationalconsiderationsastheCCD
data from LSSTCam. Keeping in mind the differences between the two systems, the con-
cept of operations for LSSTCam archiving apply (see section on LSSTCam Archiving Service).
One differing aspect is that these data are best organized temporally, while some data from
LSSTCamareorganizedspatially.
There is no prompt processing of Spectrograph data in a way that is analogous to the prompt
proceessingofLSSTCamdata.
2.3.3.1 NormalOperations
UndernormaloperationstheSpectrographArchivingService
isundercontroloftheObservatoryControlSystem.
2.3.3.2 Operational Scenarios
2.3.3.2.1 Change Control
Upgrades to the Spectrograph Archiving Service are produced
intheLSSTDataFacility.ChangecontrolofthisfunctioniscoordiniatedwiththeObservatory,
withtheObservatoryhavinganabsolutesayaboutinsertionandevaluationofchanges.
2.4 EFD ETL Service
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
10

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
2.4.1 Scope
The Engineering and Facility Database (EFD) is a system used in the context Observatory Op-
erations. It contains all data, apart from pixel data acquired by the Level 1 archiving systems,
of interest to LSST originating from any instrument or any operation related to observing.
TheEFDisanessential ecordoftheactivitiesofObservatoryOperations. Itcontainsdatafor
which there is no substitute, as it ecords raw data from supporting instruments, instrumental
conditions,andactionstakingplacewithintheobservatory.
ThissectiondescribestheconceptofoperationsforingestingtheEFDdataintotheLSSTData
Backbone and transforming this data into a format suitable for offline use.
2.4.2 Overview
2.4.2.1 Description
The Original Format EFD, maintained by Observatory Operations, is
conceived of as collection of files and approximately 20 autonomous relational database in-
stances,allusingthesamerelationaldatabasetechnology.TherelationaltablesintheOrig-
inal Format EFD have a temporal organization. This organization supports the need within
Observatory Operations to support high data ingest and access rates. The data in the Origi-
nal Format EFD is immutable, and will not change once entered.
The EFD also includes a large file annex that holds flat files that are part of the operational
ecordsofthesurvey.
2.4.2.2 Objective
TheprimemotivationbehindtheEFDETLServiceistobeabletorelate
the time series data to raw images and computed entities produced by L1, L2, and L3 pro-
cessing, and to hold these quantities in a manner that is accessible using the standard DM
methodsfor fileandrelationaldataaccess.
ThebaselinedesigncalledforsubstantiallyalloftheEFDrelationaland flat-filematerialtobe
ingestedintowhatiscalleftheReformattedEFD.
1. There is a need to access a substantial subset of the Original Format EFD data in the
general context of Level 2 data production and in the Data Access Centers. This access
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
11

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
issupportedbyaquery-accessoptimized,separatelyimplementedrelationaldatabase,
generallycalledtheReformattedEFD.Aprimeconsiderationistorelatethetimeseries
data to raw images and computed entities produced by L1, L2, and L3 processing.
2. To be usable in offline context, files from the Original Format EFD need to be ingested
into the LSST Data Backbone. This ingest operation equires provenance and metadata
associatedwiththese files.
3. Because the Original format EFD is the fundamental ecord related to instrumentation,
the actions of observers, and related data, the data contained within it cannot be re-
computed, and in general there is no substitute for this data. Best practice for disaster
recoveryistonotmerelyreplicatetheOriginalFormatEFDliveenvironment,butalsoto
makeperiodicbackupsandingeststoadisaterrecoverysystem.
2.4.2.3 Operational Context
Ingest of data from the Original Format EFD into the Re-
formatted EFD must be controlled by Observatory Operations, based on the principle that
Observatory Operations controls access to the Original Format EFD esources. The prime
framework for controlling operations is the OCS system. Operations in this context will be
controlledfromtheOCSframework.
2.4.2.4 Risks
The query load applied by general staff on the Original Format EFD at the
BaseCentermaybedisruptivetotheprimarypurpose,servingObservatoryOperations.
2.4.3 OperationalConcepts
2.4.3.1 NormalOperations
2.4.3.1.1 OriginalFormatEFDOperations
ObservatoryOperationsisresponsibleforOrig-
inal Format EFD operations in the period where LSST Operations occurs. Observatory Oper-
ations will copy database state and related operational information into a disaster recovery
store at a frequency consistent with a Disaster Recovery Plan approved by the LSST ISO. The
LSST Data Facility will provide the disaster recovery storage esource. The DR design proce-
dure should consider whether normal operations may begin prior to a complete estore of
theOriginalFormatEFD.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
12

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
If future operations of the LSST telescope beyond the lifetime of the survey do not provide for
operation and access to the Original Format EFD, the LSST Data Facility will assume custody
of the Original Format EFD and arrange appropriate service for these data (and likely move
the center of operations to NCSA) in the period of data production following the cessation of
LSST operations at the Summit and Base Centers.
LSST staff working on behalf of any operations department will have access to the Original
FormatEFDattheBaseCenterforone-off studies,includingstudyingthemeritsofdatabeing
loadedintotheReformattedEFD.DataofongoinginterestwillbeloadedintotheReformatted
EFD.
2.4.3.1.2 EFDLargeFileAnnexHandlingandOperations
UndercontrolofanOCS-commandable
device, the LSST Data Facility will ingest the designated contents of the file annex of the Orig-
inal Format EFD into the data backbone. The LSST Data Facility will arrange that these files
participate in whatever is developed for disaster recovery for the files in the Data Backbone.
These files will also participate in the general file metadata and file-management service as-
sociated with the Data Backbone, and thus be available using I/O methods of the LSST stack.
2.4.3.1.3 Reformatted EFD Operations
• The Reformatted EFD is replicated to the US DAC and the Chilean DAC.
• LDFwillextract,transformandloadintotheReformattedEFDpointersto filesthathave
been transferred from the EFD large file annex into the Data Backbone.
• LDF will extract, transform and load designated tabular data from the Original Format
EFD into the Reformatted EFDs residing in the Data Backbone at NCSA and the Base
Center.
• “Designated” data will include:
Anyquantitiesusedinaproductionprocess.
Anyquantitiesdesignatedbyanauthorizedchangecontrolprocess.
• TheinformationintheReformattedEFDisavailabletoanyauthorizedindependentDAC
which may choose to host a copy.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
13

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
2.4.3.2 Operational Scenarios
2.4.3.2.1 ETLControl
TheExtract,TansformandLoadoperationisunderthecontrolof
ObservingOperations.
2.4.3.2.2 DisasterRecoveryandDRTestingfortheOriginalFormatEFD
ObservingOp-
erations will periodically test a estore in a disaster recovery scenario.
2.4.3.2.3 DisasterRecoveryandDRTestingfortheReformattedEFD
ShouldtheRefor-
matted EFD relational database be eproducible from the Original Format EFD, disaster re-
coveryisprovidedbyare-ingestfromtheoriginalformat.DRtestingincludesre-establishing
operations of the Reformatted EFD relational database and ETL capabilities from the Origi-
nal Format EFD. Ingested files from the file annex can be recovered by the general disaster
recoverycapabilitiesoftheDataBackbone.
2.4.3.2.4 ChangeControl
UpgradestotheEFDETLServiceareproducedbytheLSSTData
Facility.ChangecontrolofthisfunctioniscoordiniatedwiththeObservatory,withtheObser-
vatoryhavinganabsolutesayaboutinsertionandevaluationofchanges.
2.5 OCS-DrivenBatchService
2.5.1 Scope
TheOCS-drivenBatchServiceprovidesanOCS-commandabledeviceforObservatoryOpera-
tionsstaff tosubmitbatchjobstotheCommissioningCluster,andoptionallyrendevouswith
asmallamountof eturneddataviatheTelemetryGateway.
2.5.2 Overview
2.5.2.1 Description
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
14

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
2.5.2.2 Objective
2.5.2.3 OperationalContext
TheserviceisanOCS-commandabledevicewhichrunsinder
thecontrolofObservatoryOperations.
2.5.3 OperationalConcepts
2.5.3.1 NormalOperations
2.5.3.2 Operational Scenarios
2.5.3.2.1 ChangeControl
UpgradestotheOCS-drivenBatchServiceareproducedbythe
LSSTDataFacility.ChangecontrolofthisfunctioniscoordiniatedwiththeObservatory,with
theObservatoryhavinganabsolutesayaboutinsertionandevaluationofchanges.
2.6 ObservatoryOperationsDataService
2.6.1 ScopeofDocument
This section describes the services provided to Observatory Operations to access data that
satisfies the equirements that are unique to observing operations. These equirements in-
cludeservicelevelsappropriatefornightlyoperations.
2.6.2 Overview
2.6.2.1 Description
The Observatory Operations Data Service provides fast-access to re-
cently acquired data from Observatory instruments and designated datasets stored in the
LSSTpermanentarchive.
2.6.2.2 Objective
Thereisaneedfor egularandad-hocaccesstoLSSTdatasetsforstaff
and tools working in the context of Observatory Operations. The quality of service (QoS)
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
15

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
needed for these data is distinct from the general availability of data via the Data Backbone.
Access to data provided by the Observatory Operations Data Service is distinguished from
normal access to the Data Backbone in the role the data play in providing straightforward
feedback to immediate needs that support nightly and daytime operations of the Observa-
tory. Newly acquired data is also a necessary input for some of these operations. The service
mustprovideaccessmethodsthatarecompatiblewiththesoftwareaccessneeds.
2.6.2.3 OperationalContext
TheObservatoryOperationDataServiceisprovidedbythe
LSST Data Facility to Observatory Operations, and is used by observers and automated sys-
temstoaccessthedataresidentthere.Theserviceprovidestheavailabilityandservicelevels
needed to support Observatory Operations for a subset of the data that is critical for short-
termoperationalneeds.
TheObservatoryOperationsDataServicesupplementsthemoregeneralDataBackboneby
providing access to a subset of data at a QoS that is different (and higher) than the general
Data Backbone. Less critical data is provided to Observatory Operations by the Data Back-
bone, which provides service levels provided generally to staff anywhere in the LSST project.
For general access to the data for assessment and investigation at the Base Center, the ser-
vice level is the same for any scientist working generally in the survey.
The Observatory Operations Data Service is instantiated at the Base Center. Therefore, the
Observatory Operations Data Service does not directly support activities which must occur
whencommunicationsbetweentheSummitandBasearedisrupted.
The service operates in-line with the Spectrograph and LSSTCam Archiving Services. Newly
acquired raw data are first made available in the Observatory Operations server, and then
areingestedintotheDataBackbonepermanentarchive.
The intent is to provide access to:
• Anupdatingwindowofrecentlyacquiredandproduceddata,andhistoricaldataidenti-
fied by policy. An example policy is “last week’s raw data”.
• OtherdataasspecificallyidentifiedbyObservatoryOperations.Thismaybe file-based
data or data resident in database engines within the Data Backbone.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
16

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
AsignificantusecasefortheObservatoryOperationsDataServiceistoprovidenear-realtime
accesstorawdataontheCommissioningCluster.
2.6.2.3.1 Interfaces
File system export: The Observatory Operations Data Service pro-
videsaccessviaaread-only filesysteminterfacetodesignatedcomputersintheObservatory
Operations-controlledenclaves.
Butler interfaces: Use of the LSST Stack is advocated for Observatory Operations, and so
access to this data is possible via access methods supported by the LSST stack. The standard
access method provided by the LSST stack is through a set of abstractions provided by a
softwarepackagecalledtheButler.TheObservatoryOperationsDataServiceprovidesbutler
context, and updates that context continuously as new data (for example, new raw images)
becomesavailable.
Native interfaces: Not all needed application in the Observatory Operations context will use
theLSSTstackandwillnotbeabletoavailthemselvesofButlerabstractions. Theservice
accommodatesthisneedbyproviding filesplacedpredictablyintoadirectoryhierarchy.
Http(s) interface: The Observatory Operations Data Service also exposes its file system via
http(s). Use of the Observatory Authentication and Authorization system is equired for this
access.
2.6.2.4 Risks
• Concern: The need includes a continuously updated window of newly created data, in
contrast to the other Butler use cases. How well the current set of abstractions work in
a system that is ingesting new raw data is unknown to the author.
• Concern:Similarly,datanormallyresidentindatabasesispartofthedesiderata.Fulfill-
ingthesedesiderataincludesolutionsfromanETLinto flat files,toestablishingmirrored
databases.Therearecurrentlynoactionableusecasesforrelationaldata.Thetechnol-
ogytomaintainsubsetsofrelationaldataaredistinctfromthetechnologiestomaintain
subsets of files. It is likely that if relational data are needed, caches of relational data will
need to be made by extract, transform and load into a file-format such as SQLite.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
17

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
• Concern: This service needs to be available in TBD operational enclaves. (and limited to
those enclaves).
2.6.3 OperationalConcepts
2.6.3.1 NormalOperations
2.6.3.2 Operational Scenarios
2.7 ObservatoryOperationsQAandBaseComputingTaskEndpoint

Back to top


3 ServicesforOfflineCampaignProcessing
TheLSSTDataFacilityprovidesspecific “offline” (i.e.,notcoupledtoObservatoryOperations)
dataproductionservicestogenerateLevel2dataproducts,aswellasLevel1-specificcalibra-
tiondata(e.g.,templatesforimagedifferencing).Bulkbatchproductionoperationsconsists
ofexecutinglargeorsmallprocessingcampaignsthatusereleasedsoftwareconfiguredinto
pipelinestoproducedataproducts,suchascalibrationsandDRPproducts.Processingcam-
paignsinclude
• Annual Release Processing: Processing of payloads of tested work flows at NCSA and
satellitesitesthroughandincludingingestofreleaseproductsinto filestores,relational
databases,andtheDataBackbone,includingsystemqualityassurance.
• Calibration Processing: processing of payload tested work flows at NCSA and satel-
lite sites through and including ingest of release products into file stores, relational
databases,andtheDataBackbone,includinginitialqualityassurance.Calibrationpro-
duction occurs at various cadences from potentially daily to annual, depending on the
calibrationdataproduct.
• SpecialProgramsandMiscellaneousProcessing:processingotherthanspecificallyenu-
merated.
• Batch framework upgrade testing: Test suites run after system upgrades and other
changestoverifyoperations.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
18

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
• Payload Testing Verification and validation: of work flows from the continuous build
systemontheproductionhardwarelocatedofNCSAandsatellitesites.
TheconceptofoperationsforbatchproductionservicesservingOfflineCampaignProcessing
isdescribedinthefollowingsection.
3.1 BatchProductionServices
3.1.1 Scope
Thissectiondescribestheoperationalconceptsforbatchproductionservices,whichareaset
ofservicesusedtoprovidedesignatedofflinecampaignprocessing.
3.1.2 Overview
3.1.2.1 Description
Batch production service operations consists of executing large or
smallprocessingcampaignsthatusereleasedsoftwareconfiguredintopipelinestoproduce
dataproducts,suchascalibrationsanddatareleaseproducts.
3.1.2.2 Objective
Batch production services execute designated processing campaigns to
achieveLSSTobjectives.Examplecampaignsincludecalibrationproduction,datareleasepro-
duction, “after-burner” processingtomodifyoraddtoadatarelease,at-scaleintegrationtest-
ing, producing datasets for data investigations, and other processing as needed. Campaign
processingservicesprovide first-orderQAofdataproducts.
• A campaign is a set of pipelines, a set of inputs to run the pipelines against, and a
method of handling the outputs of the pipelines.
• A campaign satisfies a need for data products. Campaigns produce the designated
batch data products specified in the DPDD [LSE-163], and other authorized data prod-
ucts.
• Campaigns can be large, such as an annual release processing, or small, such as pro-
ducingafewcalibrations.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
19

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
3.1.2.3 OperationalContext
Batchproductionservicesexecutecampaignsoncomputing
esourcestoproducethedesiredLSSTdataproducts,whicharemeasuredagainst first-level
qualitycriteria.
3.1.3 OperationalConcepts
Apipelineisabodyofcode,typicallymaintainedandoriginatedwithintheScienceOperations
group. Each pipeline is an ordered sequence of individual steps. The output of one or more
steps may be the input of a subsequent step downstream in the pipeline. Pipelines may pro-
duce final end data products in release processing, may produce calibrations or other data
products used internally within LSST operations, may produce data products for investiga-
tionsrelatedtoalgorithmdevelopment,andmayproducedataproductsfortestingpurposes
thatcannotbesatisfiedusingdevelopmentinfrastructure.
A campaign is the set of all pipeline executions needed to achieve a LSST objective.
• Each campaign has one or more pipelines.
• Eachpipelinepossessesoneormoreconfigurations.
• Eachcampaignhasacoverageset,enumeratingthedistinctpipelineinvocations.There
is a way to identify the input data needed for each invocation.
• Eachcampaignhasanorderingconstraintthatspecifiesanydependenciesontheorder
ofrunningpipelinesinacampaign.
• Each campaign has an adjustable campaign priority reflecting LSST priority for that ob-
jective.
• Eachpipelineinvocationmay equireoneormoreinputpipelinedatasets.
• Each pipeline invocation produces one or more output pipeline data sets. Notice that,
for LSST, a given file may be in multiple data sets.
• For each input pipeline data set there is a data handling scheme for handling that data
setinawaythatinputsareproperly etrievedfromthearchiveandmadeavailablefor
pipelineuse.
• For each output pipeline data set there is a data handling scheme for handling that data
set in a way that outputs are properly archived.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
20

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
The key factor in the nature of the LSST computing problem is the inherent trivial parallelism
due to the nature of the computations. This means that large campaigns can be divided into
ensemblesofsmaller,independentjobs,eventhoughsomejobsmay equireasmallnumber
ofnodes.
BatchProductionServicesaredistinctfromotherservicesthatmayusebatchinfrastructure,
suchasDevelopmentSupportServices.Also,thereareotherscenarioswherepipelinesneed
to be run outside the batch production service environment. For example, alternate envi-
ronmentsincludebuild-and-test,capabledesk-sidedevelopmentinfrastructure,andad-hoc
runningoncentraldevelopmentinfrastructure.
From these considerations, the LSST Data Facility separates the concerns of a reliable pro-
duction service from these other use cases, which do not share the concerns of production.
Thisalsoallowsforsupportinginfrastructuretoevolveindependently. Exampleproduction
serviceconcernsinclude
• Supportingreliableoperationofanensembleofmanycampaigns,respectingpriorities.
• Dealingwiththeproblemsassociatedwithlarge-scaleneeds.
• Dealingwiththecampaigndocumentation,presentation,curationandsimilaraspects
offormallyproduceddata.
Computing esourcesareneededtocarryoutacampaign. BatchprocessingoccursonLSST-
dedicated computing platforms at NCSA and CC-IN2P3, and potentially on other platforms.
Resourcesotherthanforcomputation(i.e.,CPUandlocalstorage),suchascustodialstorage
to hold final data products and network connectivity, are also needed to completely execute
a pipeline and completely realize the data handling scheme for input and output data sets.
Computing esourcesarephysicalitemswhicharenotalways fitforuse. Theyhavesched-
uledandunscheduleddowntimes,andmayhavescheduledavailability.Themanagementof
campaigns, provided by the Master Batch Job Scheduling Service equires:
1. thedetectionofunscheduleddowntimesof esources
2. recoveryofexecutingpipelinesaffectedbyunscheduleddowntimes,and
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
21

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
3. bestuseofavailable esources.
Oneclassofpotential esourcesareopportunistic esourceswhichmaybeverycapaciousbut
notguaranteethatjobsruntocompletion. These esourcesmaybeneededincontingency
circumstances.TheMasterBatchJobSchedulingServiceiscapableofdifferentiatingkillsfrom
otherfailures,soastoenableuseofthese esources.
The types of computing platforms that may be available, with notes, are as follows.
PlatformType
Notes
NCSA batch production
computing system
Ethernet cluster with competent cluster file
system.
NCSA L1 computing for
promptprocessing
Sharednothingmachines,availablewhennot
neededforobservingoperations.
NCSAL3computing
TBD
CC-IN2P3bulkcomputing Institutionalexperienceissharednothingma-
chines + competent caches and large volume
storage.
“Opportunistic” HPC
LSSTtypejobsrunninginallocatedorinback-
fill context on HPC computers. [Backfill con-
textimpliesjobscanbekilledatunanticipated
times].
An Orchestration system is a system that supports the execution of a pipeline instance. The
basicfunctionalityisasfollows:
• Pre-jobcontext:
Supportspre-handlingofanyinputpipelinedatasetswhenin-jobcontextforinput
dataisnot equired.
Pre-stagesintoaplatform’sstoragesystem,ifavailable
Producescondensedversionsofdatabasetablesintoportablelightweightformat
(e.g., MySQL to SQLite, flat table, etc.)
DealswithTBDplatform-specificedgeservices.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
22

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
Identitiesandprovidesforlocalidentityonthecomputingplatforms.
Providescredentialsandend-pointinformationforanyneededLSSTservices.
• In-job context:
Providesstage-inforanyin-jobpipelineinputdatasets
Providesanybutlerconfigurationsnecessarilyprovidedfromin-jobcontext.
Invokesthepipelineandcollectspipelineoutputstatusandotheroperationaldata
Providesany “pilotjob” functionality.
Provides stage-out for pipeline output data sets when stage-out equires job con-
text.
• Post-jobcontext:
Ingestsanydesignateddataintodatabasetables.
Arranges for any post-job stage out from cluster file systems
Arrangesfordetailedingestintocustodialdatasystems
Transmitsjobstatustoworkloadmanagement,definedbelow.
AMasterBatchJobSchedulingService:
• Considers the ensemble of available compute esources and the ensemble of cam-
paigns.
• Dispatches pipeline invocations to an Orchestration System based on esource availabil-
ityandconsideringpriorityofcampaigns.
• Considers pipeline failures eported by the Orchestration System.
Identifies errors indicative of a problem with computing esources, and arranges
forincident eport.
Identifies some computational errors, and arranges for incident eport.
Retriesfailedpipelineinvocations,ifappropriate.
• Exposesprogressofthecampaigntorelevantentities.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
23

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
• Provides appropriate logging and events (N.b. critical events can be programmed to
initiateanincident).
Quality support:
Operationsaresupportedbythefollowingconcepts,definedasfollowsforthisdocument.
• Quality Assurance (QA) is what people do. This is identifying the issue and arranging for
fixes. One source of input is quality controls, described below. Another source of input
aretheoperationalandscientificdataproducts.
• A Quality Control (QC) is a software artifact that produces some sort of data that con-
tainsmeasureofquality.Thisdataartifactmaybe
Simplyproduced, ecordedandnotused,becauseitseemsusefulforsomefuture,
likely etrospective purpose.
Displayedorpresentedforqualityanalysis.
Fed as input into active quality control which is software that automatically affects
theexecutionofacampaign.
Fedintosoftwarethatcomputesadditionaldownstreamqualitycontroldata.
3.1.3.1 NormalOperations
Duringnormaloperations,BatchProductionServiceswillcon-
duct a number of concurrent campaigns that support LSST goals. These campaigns will be
drawnfrom
• RunstovalidateDataReleaseProcessing,
• DataReleaseProcessingitself.
• After-burnerprocessing(tocorrectspecificerrorsinnot-yet-releaseddataproducts).
• Calibrationprocessing.
• Miscellaneousprocessing.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
24

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
WhileBatchProductionServiceswillusethemajorityofLSSTbatchcapability,theymayshare
the LSST batch infrastructure with certain Level 1 services that equire offline processing and
withLevel3batchawardees.Resourceconflictsaresortedoutandexpressedasprioritiesfor
eachrespectivecampaign.
The system is programmed to deal with anticipated errors. Human eye is applied during
working hours, and can be summoned when events in the underlying systems generate inci-
dents.
Each campaign is monitored for technical progress, both in in the sense of analyzing and
responding to overtly flagged errors, and general monitoring and human assessment of the
overallperformanceoftheservice.
First-orderQualityAssuranceisasfollows:
1. Quality controls are considered by an LSST Data Facility Production Scientist and other
staff. Data Facility staff apply any standard authorized mitigations, such as eprocessing,
flagging anomalies, etc. The Production Scientist within the LSST Data Facility under-
stands the full suite of quality controls, alerts Science Operations group to anomalies,
andcollaboratesindiagnosisandmitigationofproblems,asrequested.
2.TheserviceprovidedbytheLSSTDataFacilityProductionScientistusesoperationaland
scientific acumen to assess the data products at a first level, in addition to monitoring
theextantqualitycontrols.Particularattentionispaidto
(a)operationallycriticaldata(e.g.,nextnight’s flatsneededforL1processing)
(b) aprocessingcampaignthatis esourceintensive,henceexpensivetoredo(orhas
expensiveconsequences)
(c) known problematic output data sets that are not adequately covered by existing
qualitycontrols.
(d) known problematic input data sets not adequately covered by existing quality con-
trols.
Closecollaborationismaintainedbetween firstorderqualityassuranceandthebroadersci-
entific quality assurance in the project. Information obtained from first order quality assur-
anceiscontinuouslyfedbacktoScienceOperations.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
25

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
Campaigncloseoutprovidesthatalloutputsarein finalform,documentationandotherarti-
factshavebeenproduced,andallpartiesareactivelynotifiedaboutthestatusofacampaign.
3.1.3.2 Operational Scenarios
3.1.3.2.1 Initiatecampaign
CampaignsareinitiatedinresponsetoanLSSTobjective,by
specifying an initial set of pipelines, a coverage set, and an initial priority. The Batch Pro-
duction Service is consulted with a reasonable lead time. Consistent with LSST processes,
pipelines can be modified or added (for example, in the case of after-burners) during a cam-
paign. These changes and additions are admitted when the criteria of change control pro-
cessesaresatisfied,including
• relevantbuild-andtestcriteria
• the impact of esource-intensive campaigns is approved and understood
• production-scaletestcampaigns
3.1.3.2.2 Terminatefailedcampaign
Reasonsforacampaignfailurewillbedocumented
andsubmittedtoScienceOperationsforreview.Deletionofdataproductsneedstobesched-
uled so that it occurs after the review is completed. This includes backing out files, materials
from databases, and other production artifacts from the Data Backbone, and maintaining
production ecords as these activities occur.
3.1.3.2.3 Pause campaign
Stop a long running campaign from proceeding allowing for
TBDinterventions.
3.1.3.2.4 Dealwithproblematiccampaign
LSSTisalargesystem. Pipelineswillevolve
and be maintained. There will be the campaigns, described in the operations documents. It
isthenatureofthesystemthatasissuesemergeextra esourceswillbeneededtoprovide
focused scrutiny on aspects of production for some pipeline. In many cases problems will be
resolvedbybug fixes,oraddressedbyqualitycontrolsandchangestoprocesses.Anysystem
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
26

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
needs to support mustering focused effort on quality analysis that is urgent, and lacks an
adequate basis for robust quality controls. The LSST Data Facility Batch Production Services
staff contribute effort to to solve these problems, in collaboration with Science Operations
(orotherpartiesresponsibleforcodes).
3.1.3.2.5 Dealwithdefectivedata
Productiondatamaybedeemeddefectiveimmedi-
ately as the associated pipelines terminate or after a period of time when inspection pro-
cesses run. Such data need to be marked such that they will not be included in release data
and will be set aside for further analysis.
3.1.3.2.6 Dealwithsuddenlack(orsurplus)in esources
Asnotedabove,forlargescale
computing,theamountof esourceavailabletosupportallcampaignswillvaryduetosched-
uledandunscheduledoutages.
Thetechnicalsystemrespondstoanincreaseordecreasein esourcesbyrunningmoreor
fewjobs,oncetheworkloadmanagerisawareofthenewlevelof esources. Thetechnical
system responds to hardware failures on a running job in just like any other system – with
the ultimate recovery being to delete an partial data and etry, while respecting the priorities
oftherespectivecampaigns.

Back to top


4 DataAccessHostingServicesforAuthorizedUsers
The LSST Data Facility provides authorized users and sites access to data via a set of services
that are integrated with the overall Authentication and Authorization (AA) System. These
services are hosted by LSST Data Facility at the US and Chilean Data Access Centers and will
includehostingelementsoftheLSSTSciencePlatform.
4.1 UserDataAccessServices
ServicehostingelementsoftheLSSTSciencePlatform.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
27

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
4.2 BulkDataDistributionService
Serviceprovidingbulkdatadownloadtositessupportinggroupsofusers.
4.3 HostingofFeedstoBrokers
The LSST Data Facility hosts the alert distribution system and supports users of the LSST
mini-broker,aswellasprovidersofcommunitybrokers.

Back to top


5 Data,Compute,and ITSecurityServices
The LSST Data Facility provides a set of general IT services which support the LSST use-case-
specificservicesmentionedinprevioussections.These “undifferentiatedheavinglifting” ser-
vicesinclude
• DataBackboneServicesproviding fileingestion,management,movementbetweenstor-
agetiers,anddistributiontosites.
• Managed Database Services providing database administration for all database tech-
nologiesandschemamanagedfortheproject.
• BatchComputingandDataStagingEnvironmentServicesprovidingbatchcapabilities
on each LSST-provided platform at NCSA and the Base Center.
• ContainerizedApplicationManagementServicesprovidingelasticcapabilitiesforde-
ployingcontainerizedapplicationsatNCSAandtheBaseCenter.
• Network-based ITSecurityServicesprovidingproject-wideintrusiondetection,vulnera-
bilityscanning,logcollectionandanalysis,andincidentandeventdetection,andverifi-
cationofcontrols.
• AuthenticationandAuthorization(AA)Servicesprovidingcentralmanagementofiden-
tities,supportingworkflowsandvariousauthenticationmechanisms,andoperatingAA
endpointsattheSummit,Base,andArchiveSites.
The concept of operations for each of these services is described in the following sections.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
28

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
5.1 DataBackboneServices
5.1.1 Scope
The Data Backbone is a set of data storage, management and movement esources dis-
tributed between the Base Center and NCSA. The scope of the Data Backbone includes both
filesanddatamaintainedbyrelationalandotherdatabaseenginesholdingthe ecordofthe
survey and used by L1, L2, and Data Access Center services.
The Data Backbone provides read-only data service to the US and Chilean DACs, but does
not host data stores where DAC users create state. This is done to create a hard and easily
enforceable separation of technologies, where no flaw in a DAC can corrupt the data pro-
ducedbyL1andL2productionsystems. Forexample,DAC esourcessuchasQservanduser
databases,colloquiallyknowasMyDBs,areprovisionedinthecontextofadataaccesscenter,
nottheDataBackbone.
The Data Backbone ensures that designated data sets are replicated at both sites. The data
backboneprovidesanenclaveenvironmentthatisorientedtowardprotectingdatabyman-
agement,operational,andtechnicalcontrols,includingprocessessuchasmaintainingdisas-
terrecoverycopies.
5.1.2 Overview
5.1.2.1 Description
Filesinthedatabackbonearepresentedas filesystemmountsand
dataaccessservices.Database-residentdataarepresentedasmanageddatabaseservices.
5.1.2.2 Objective
• Replication of designated file data within LSST Data Facility sites at NCSA and the Base
Center.
• Replication of designated relational tables and data maintained in other database en-
gines at NCSA and the Base Center.
• Implementationofpolicy-based flowstothedisasterrecoverystores.Atthetimeofthis
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
29

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
writing,disasterrecoverystoresincludetheNCSAtapearchive,CC-IN2P3,andcommer-
cialproviders.
• IngestofimagesproducedbytheSpectrograph,ComCam,andLSSTCaminstruments.
• IngestoftheEngineeringandFacilityDatabaseandassociatedLargeFileAnnex.
• IngestofdataproductsfromL1andL2productionprocessing.
• Ingest of data from TBD other sources, approved by a change control process.
• ServingdatatoL1,L2,andotherapprovedproductionprocesses.
• Serving data to the US and Chilean Data Access Centers.
• Integrity checking and other curation activates related to data integrity and continuity
ofoperations.
5.1.2.3 OperationalContext
5.1.3 OperationalConcepts
5.1.3.0.1 Files
FilesintheDataBackbonepossesspathnameswhicharesubjecttochange
through the lifetime of the LSST project, which at the time of this writing is seen as serving
thelastdatareleasethrough2034.
Robustidentificationofa fileinvolves
• Obtainingalogical filenamethroughqueryingmetadataandprovenance.
• Possibly migrating a file from a medium where the file is not directly accessible, such as
tape, to a medium where the file is accessible.
• Selectingadistinctinstanceofthe filefrompossiblymanyreplicas.
• Accessing the file though an access method such as a file system mount or Http(s).
The project has identified several caches of data that are used in production circumstances.
The distinguishing circumstances for these caches involve quality of service equirements
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
30

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
forperformanceandavailability. AbsentsophisticatedQoSin filesystems,performancere-
quirements are met by controlling access to the underlying storage via caching. Availability
isassuredbydecouplingthecachefromthedatabaseprovidingmetadata,provenance,and
location information. Application-level cache management provides path names within the
cachetotheapplication.
Casual use of data for short periods may rely on knowledge such as file paths, but is subject to
disruption when paths are e-arranged, or should the underlying storage technology change,
suchasintroductionofobjectstores.
5.1.3.0.2 DataManagedbyDatabases
Replicationofdatabaseinformationisspecificto
the database technology involved. Databases identified as holding permanent ecords of the
survey are in the Data Backbone in the sense that they are instantiated in the context of
a security enclave with management, operational, and technical controls needed to assure
preservation of this data, and that the principal concern of enclave management is that data
reside at the Base and at NCSA driven by business need.
5.1.3.1 Operational Scenarios
5.1.3.1.1 Availablity and Change Management
Catalog-based access systems such as
indicatedfortheDataBackbonearelimitedbydatabaseavailabilityaswellastheavailability
of the file store and its access methods.
Time-criticalapplicationsinvolvingtheObservatoryOperationsDataServiceandaccesstoL1
templatesforpromptprocessingprotectthemselvesbyhavingcachesasdescribedabove.
5.2 ManagedDatabaseServices
5.2.1 Scope
ManagedDatabaseServicesprovidedatabaseaccesstodatathatresideinrelationalornon-
relationaldatabasesandgenerallymeetatleastoneofthefollowingcriteria:
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
31

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
• Data originate outside of the LSST Data Facility, but are (or are potentially) used in L1
andL2processing,especiallyinthesenseofdatainputsneededto eproduceorrefine
a L1 or L2 computation.
• Data originate as data produced within the L1 or L2 production processes, and are
meant to be retained for some period of time.
• Dataareproduction-relatedmetadata.
• DataareusedasdatacouplingforprocessesinvolvedinmaintainingL1andL2products
or other aspects of the LSST Data Facility.
5.2.2 Overview
In LSST, a distinction is made between patterns of storage of data in a database engine
(schema, for purposes here) and an implementation of the schema in a database engine
which stores the data. In LSST, common schema are used and shared in many scenarios in
distinct but schema-compatible databases.
As an example, a common relational database schema can be used in development, unit
test,integration,andproduction,butrealizedindifferentrelationaldatabasesoftware,e.g.,
SQLite3indevelopmentandaheavycommercialdatabaseinproduction.
5.2.2.1 Description
5.2.2.2 Objective
TheprimaryfocusofManagedDatabaseServicesis, asoutlined, not
the support of developers, but the support of production and data needing custody or cura-
tion.Whilesomedatabaseschemadesignisperformedinmanageddatabaseservices,byno
means is all schema designed by Managed Database Services. Managed Database Services
does have a role in determining the fitness for use of any schema present in databases it
operates.
5.2.2.3 Operational Context
The operational context for Managed Database Services is
the context of the LSST Data Backbone within the LSST Data facility.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
32

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
Partofthecontextistoconsolidatedatabasetechnologieswhereappropriate.
5.2.3 OperationalConcepts
• Selecttechnologyappropriateformanageddatabaseinstances
• Present a managed database service hosting the equired schema
• Support the evolution of schema in a managed database service
• Providethelevelofserviceneededforeachmanageddatabaseinstance
• Providecapacityplanning
• Provideinstallation
• Provideconfiguration
• Providedatamigration
• Provideperformancemonitoring
• Providesecurity
• Providetroubleshooting
• Providebackupanddatarecovery
• Providedatareplicationwhereneeded
5.3 BatchComputingandDataStagingEnvironmentServices
5.3.1 Scope
Batch Computing and Data Staging Services provide primitives used by the Master Batch Job
Scheduling Service. Batch Computing and Data Staging Services are provides in a distinct
implementationforthatistailoredforeachbatchsystemdeployment.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
33

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
5.3.2 Overview
Batch Computing and Data Staging Services are provided at NCSA and the Base Center.
Analogous (but not identical) services are provided by MoU to the LSST Data Facility by CC-
IN2P3,aswellasbyanycommercialbatchprovisioningandagency esources,suchasXSEDE.
5.3.2.1 Description
BothNCSAandtheBaseCenterwillhaveacorebatchinfrastructure
thatusesbatchsystemlogictopartitionapoolofbatch esourcestovariousenclavesatthe
respectivesites,withpoliciesthatgovernprioritiesand filesystemsexposedforbatchnodes
running in the context of each enclave.
AttheBaseCenter,BatchServicesaresuppliedtotheCommissioningClusterandtheChilean
Data Access Center from this pool.
AtNCSA,BatchServicesaresuppliedtotheDevelopment, Integration,GeneralProduction,L1
and US Data Access Centers enclaves from this pool.
Anadditionalpoolofbatch esourcesateachsiteisdrawnfromidlenodesinthecoreKu-
bernetes provisioning. An enhanced goal is the unification of esource management of the
Kubernetesnodesandthebatchpool.
Data staging efers to mechanisms needed to move data, primarily files, between the Data
Backboneandthestorageusedbybatchprograms.Thismaybeassimpleasacopyoperation
between mounted file systems, or as complex as a staging via http or FTP.
5.3.2.2 Objective
BatchComputingandDataStagingServicessupportLSSTbatchopera-
tionsbyprovidingabatchsystemsupportedbydatamovementprimitives.
• Provideabatchscheduler.
• Provide any enclave-specific esources. An example is distinct head nodes for different
enclaves.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
34

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
• Provideenclave-specificconfigurations, includingconfigurationsneededforinforma-
tionsecurityandworkprocesses.
• Integrate ITCintothebatchsystem.
5.3.2.3 OperationalContext
BatchComputingandDataStagingServicesuse esources
inthemasterprovisioningenclaveandexposethemtoagivenenclave,implementingpolicies
appropriatetothatenclave.
5.3.3 OperationalConcepts
5.3.3.1 OperationalScenarios
Animportantconsiderationisthatthese esourcesdonot
haveaconstantlevelofusewithineachenclave,andthatovertimethehardware esources
needed for batch operations in an enclave will change.
Operating conditions may change as well. For example even with container abstractions, it
maybenecessarytopartitionthebatch esourcestosupporttwoversionsofanoperating
system.
Somewhatanalogously,thebatchsystemmayopportunisticallyuseidlenodesprovisioned
forelasticKubernetescomputing.
Lastly,NCSAhassubstantial esourcesforpromptprocessing,suchasalertproduction. Schedul-
ing jitter and performance may preclude using a single batch scheduler for general offline
productionandpromptprocessing.BatchComputingandDataStagingServicescovershav-
ingmultipleschedulerinstances.
5.4 Containerized Application Management Services
5.4.1 Scope
ContainerizedApplicationManagementServicesprovideanelasticcapabilitytodeploycon-
tainerizedapplications. Theseservicesareprovidedwithdistinctconfigurationstailoredfor
eachenclave,butareprovisionedonacommonpoolof ITC esourcesresidingintheMaster
ProvisioningEnclavesateachsite.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
35

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
5.4.2 Overview
TherearetwoinstancesofContainerizedApplicationManagementServices-oneattheBase
Center and one at NCSA. These instances are the basis for servicing elastic computing needs
at each site and a portable abstraction for symmetric deployment on commercial provision-
ing.
5.4.2.1 Description
BothNCSAandtheBaseCenterwillhaveacontainerizedinfrastruc-
ture that logically partitions a pool of Kubernetes esources to various enclaves at the respec-
tivesites,withpoliciesthatgovernprioritiesand filesystemsexposedtoapplicationsrunning
in the context of each enclave.
AttheBaseCenter,ContainerizedApplicationManagementServicesaresuppliedtotheCom-
missioningClusterandtheChileanDataAccessCenterfromthispool.
AtNCSA,ContainerizedApplicationManagementServicesaresuppliedtotheDevelopment,
Integration,GeneralProduction,L1andUSDataAccessCenterenclavesfromthispool.
5.4.2.2 Objective
TheobjectiveoftheseservicesistosupportLSSTelasticoperationsby
providing a containerized application management system compatible with LSST equire-
ments.
• Provide containerized application management to each enclave, respecting enclave-
specificcontrolsincludinginformationsecurityandworkrules.
• Providestorageforcontainersandcontainermanagement.
• Provideadequatecapacityforeachsite.
5.4.2.3 OperationalContext
Theseservicesuse esourcesinthemasterprovisioningen-
clave at each site and expose them to a given enclave, implementing policies appropriate to
thatenclave.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
36

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
5.4.3 OperationalConcepts
5.4.3.1 OperationalScenarios
Animportantconsiderationisthatthese esourcesdonot
haveaconstantlevelofusewithineachenclave,andthatovertimethehardware esources
needed for elastic services in an enclave will change.
Operating conditions may change as well. For example, even with container abstractions,
itmaybenecessarytopartitionthethehardware esourcestosupporttwoversionsofan
operatingsystem.
5.5 Network-based ITSecurityServices
5.5.1 Scope
Thissectiondescribesnetwork-basedoperationalinformationsecurityservicessupporting
theObservatoryOperationsandtheLSSTDataFacility.
5.5.2 Overview
5.5.2.1 Description
The LSST Network-based IT Security Service provides technical con-
trols for operational security assurance. These controls provide data that support the LSST
Master Information Security Plan and IT security processes such as incident detection and
resolution.
5.5.2.2 Objective
TheobjectiveoftheNetwork-based ITSecurityServiceistoprovide:
• Network Security Monitoring, including monitoring of high-rate data connections for
data transfer across the LDF system boundaries (but excluding certain high rate trans-
fers, such as the Level 1 service access to the Camera Data System), including deploy-
mentoftechnologiesforActiveResponseandBlockingofAttacks.
• VulnerabilityManagementforcomputersandapplicationsoftwareintheenclaves.
• Thetechnicalframeworktofacilitateefficient IncidentDetectionandResponse,includ-
ingcentrallogcollection/eventcorrelationforsecuritypurposes.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
37

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
• Management of certain access controls, such as firewalls and bastion hosts, used for
administrativeaccess.
• Host-basedintrusiondetectionclientsdeployedonendsystemsasappropriate.
• Securityconfigurationmanagementandauditingtobaselinestandards.
5.5.2.3 OperationalContext
Thegeneralapproachtooperationalinformationsecurityis
thatthereisaLSST InformationSecurityOfficer(ISO)who eportstotheHeadoftheLSST
construction project, and will transition to eport to the Head of the Operations project.
The ISO drafts a Master Security Program [LPM-121] plan, which the Head approves of as
appropriatelymitigatingthe InformationSecurityrisk.TheHeadthenassumesresponsibility
for the residual risk of the plan. This is the security risk that remains, given faithful execution
oftheplan. The ISOoverseesimplementationandevolutionoftheplan, seeingthatitis
faithfully implemented and noting when mitigation and changes are needed. The ISO does
any equiredstaff workfortheHead;forexample,runningstaff training. The ISOisinformed
of and keeps ecords on security incidents, and is responsible for evolution of the security
planandevolutionofsecuritythreats.
The ISO is responsible for a Information Security Response Team, which deals with actual or
latent potential breaches in information security. The Incident Response Team is made of a
set of draftees from the various operations departments, with the draft weighted towards
departmentswithexpertiseandresponsibilityforcriticaloperationsandcriticalinformation
security needs.
The ISO runs the annual security plan assessment. The management of each construction
subsystemandoperationsdepartmentisresponsibleforannualrevisionofaDepartmental
SecurityPlanthatcomplieswiththeMasterPlan.Thesedepartmentalplansinclude
• Acomprehensivelistof ITassets,applicationsandservices.
• A list of security controls the department applies to each asset (technical and opera-
tional).
• A list of controls supplied by others that are relied on.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
38

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
These controls apply to all offered services and all supported ITC. Reporting is easiest if the
systems offered are under good configuration control. Under a good system, the security
plans are living and updated by an effective change control process.
Verification: The ISO oversees a group that provides network-based security services de-
scribed in the Objective part of this concept of operations.
A general approach to LSST-specific networking is the use of software-defined networking.
This provides for isolation of networking supporting security enclaves. In particular, this al-
lowsfortheseparationofcriticalinfrastructureforObservatoryOperationsandtheLSSTData
Facilityfromofficeorotherroutinenetworking.
The context for these security enclaves cover the following production services in the LSST
project, though other enclaves may join if feasible and desired by the relevant operational
partners.Thesenetworksmayparticipateinthisinfrastructure,butarecurrentlyseenasthe
responsibilitiesofAURAandNCSA.
ProductionService
SecurityEnclave
LevelOneServices
SplitBetweenNCSAandChile
BatchProductionservices NCSAportion,excludingsatellitecenter
US Data Access Center Ser-
vices
NCSA
Critical Observing Enclave
Services
SummitandBaseCenter
ChileanDataAccessCenter
Services
BaseSite
DataBackboneServices BaseCenterandNCSA
5.5.2.4 Risks
These are the standard elements of an information security infrastructure
which are needed for a credible IT security project. Certain elements of the system are near
thestateoftheartduetothedataratesinvolved. Lackofcredibleinfrastructureinthis
area will be seen as a flaw in the overall construction plan, preparing the LSST MREFC for
operations.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
39

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
5.5.3 OperationalConcepts
5.5.3.1 Normal Operations
The following elements provide the functionality needed to
implementthenetwork-basedsecurityelmentsoftheLSSTMaster InformationSecurityPlan:
• Intrusion Detection Systems (IDS) detect patterns of network activity that indicate at-
tacks on systems, compromise of systems, violations of Acceptable Use of systems,
abuseofsystems,andothersecurity-relatedmatters.
• VulnerabilityScanningdetectssoftwareserviceswithvulnerableconfigurationsorun-
patchedversionsofsoftwarevianetwork fingerprinting.Thesystemscansdesignated
systemssubjecttoablack-list. Inadditiontoscanningforvulnerabilities,portscanning
for firewall audits and ARP scanning for network asset management can also be con-
ducted.
• CentralLogCollectionandEventGenerationcollectssyslogsandotherdesignatedlogs
for storage (making logs invulnerable for an attacker to modify) and processes the logs
todetectsignaturesindicatingacompromiseorpoorsecuritypractices.
• Firewalls and bastion hosts provide a layer of active security. A typical use of a bastion
host is to provide a layer of security between networks used to administer computers
andmoregeneralnetworks.
• Host-based IntrusionDetectioncomplementsnetworkmonitoringbydetectingactions
within a system not visible from the network with tools such as auditd and OSSEC. This
componentalsomonitorsthe filesystemsandchecksfor filesystemintegrity.
• Active Response blocks communication with entities outside the observation site net-
works. This component is typically used to block “bad actor” entities outside the obser-
vationsitenetwork.
• CentralConfigurationManagementenforcesabaselinesecurityandconfigurationon
allsystems.
Newsystemsbeingdeployedmustbe “hardened” toasecuritybaselineandvettedbysecurity
professionalsbeforemovingintooperationsoraftermajorconfigurationchanges.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
40

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
5.5.3.2 Operational Scenarios
Vulnerability scanning periodically assesses designated ports
ondesignatedcomputerssensingvulnerabilities.Anexampleofwhenthisserviceisapplied
in a crucial case is assessing the effectiveness of the program of work patching a critical vul-
nerability.
Intrusion detection can detect, for example, an attempt to compromise a system. The detec-
tion system interacts with the active response system to cut off the attacker’s access to the
computer. The intrusion detection system can also be used to aid in the investigation of an
attackduringtheincidentresponseandhandlingofasecurityincident.
Host-based Intrusion Detection checks for attacks against a host from the perspective of the
host. Examplesincludemultiplefailedremoteloginsas eportedbythehost,or eportsof file
system changes that do not accompany an approved request for change or do not fall within
amaintenancewindow.
The networks at the Observatory site must be monitored by intrusion detection systems.
Acceptable IDS solutions include Bro and Snort. These systems must be able to handle the
traffic load from various network segments at 10GB to 100GB speeds. The IDS systems must
be placed at strategic locations and account for any expansion or changes in the network
without the need to completely retool the IDS systems.
The information produced by the system is accessible by LSST staff involved in LSST informa-
tion security, “landlords” hosting systems, and other parties with a valid interest in the data,
totheextent equiredbythesite’sspecificsecurityplans.
ActiveResponseistypically eferredtoasaBlackHoleRouter(BHR)sinceitpeerswiththe
border outersonanetworkandofferstheshortest outertodestinationsbeingblocked.
Quagga and ExaBGP are two examples of BHR software.
The central Configuration Management System will enforce a security baseline and config-
uration on all systems. Examples of this technology are Puppet and Chef. In the event that
Windowssystemswillbedeployedonsite,asystemenforcingGroupPolicyObjectsandWSUS
forpatchingwillhavetobeavailable. Itis equiredthatthissystemwouldalsobetheDomain
ControllerusingActiveDirectoryandfederatewithLSST’sUnified IdentityManagementsys-
tem.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
41

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
The central log collectors are responsible for collecting and archiving all logs collected as de-
scribed in the previous section. The collectors must be able to store at minimum six months
of logs with a rotating windows deleting the oldest logs to maintain disk space. In addition
tothelogcollectors, thereisaSEIM/analysissystem. Thissystemisusedforreal-timelog
alerts, searches, and visualization. ElasticSearch, Kibana, and OSSEC are three examples of
such software. This server spools a copy of the logs from the central collectors but may not
be able to keep the full time window due to overhead or log metadata storage.
Allsystems, bothworkstationsandservers, are equiredtosendsystemlogstoacentral
collector. For Linux systems, syslog must be configured to send a copy of all logs in realtime
to the central collector. For Windows systems, software such as Snare will be installed to
sendWindowsEventlogstothecentralcollectorusingthesyslogprotocol.Otheralternatives
exist for log collection such as logstash an open source log collection tool that can be used to
collect logs from a wide variety of platforms.
Networkdevicesarealso equiredtosendsystemlogstothecentralcollector. Notethatthis
is different than any network logs a switch or outer would send. The system logs efer to
events such as device logins, configuration changes, and other system specific events. Note
thatthisisa equirementonlyifthedevicehasthisfeatureavailable.
Networkdevicessuchas outersor firewallsthatareplacedontheingress/egresspointsofa
vlanorthenetworkmustsend firewallor outerACLlogstothecentralcollector.
It is best practice for network devices to also send netflow to a central collector. However, if
netflow is collected and forwarded to the central collector it must not be at the expense of
networkordeviceperformance.
Any other devices not classified as a server, workstation, or networking device must be con-
figured to send logs to the central collector if this feature is available. An example of a device
that falls into this category is a VOIP appliance or a VPN appliance.
5.6 Authentication and Authorizations Services
SeeLSE-279.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
42

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04

Back to top


6 ITCProvisioningandManagement
ITC is managed in distinct enclaves. Enclaves are defined based on administrative and secu-
rity controls, and operational availability equirements. Enclaves may span geographic sites,
with elements in both the Base Facility in La Serena and at NCSA. Enclaves may share comput-
ing and other esources. Central administration is operated by NCSA staff, including remote
administration of the Base Facility, with pair-of-hands" support staff in Chile.
Theoperationalenclavesareasfollows:
• MasterProvisioningEnclaves,eachatNCSAandtheBaseFacility.TheMasterProvision-
ing Enclaves provide administrative, security, and core computing infrastructure that
canbeprovisionedformanyenclaves.
• Level 1 Enclave, which spans Chile and NCSA to support prompt processing and archiv-
ingservices.
• GeneralProductionEnclave,whichhostsproductioninfrastructureforofflineprocess-
ing,hostingVOEventdistribution,andBulkExportServicepresentationnodes.
• GeneralBaseEnclave,whichprovidesgeneralcomputinganddataaccessforinvestiga-
tionsbyObservatoryOperations.
• US Data Access Center Enclave, which presents for each authorized users the ability to
query the Qserv database, make custom state in MyDB databases and user areas on file
systems,accessacustomJupyterHub,access filesviashell,andsubmitbatchjobs.
• ChileanDataAccessCenterEnclave,whichpresentsforeachauthorizeduserstheability
to query the Qserv database, make custom state in MyDB databases and user areas on
file systems, access a custom JupyterHub, access files via shell, and submit batch jobs.
• DataBackboneEnclave,spanningChiletoNCSA,whichhoststheprimary ecordofthe
surveyandsynchronizesacrosssites.ManageddataincludesrawdataacquiredbyLevel
1services,dataproducedbyprojectprocesses, filesfromthelarge fileannexoftheEFD,
and relational databases that are not part of the DACs.
• Wide Area Network, which provides connectivitiy between border outers of La Serena,
NCSA,CC-IN2P3andotherdesignatedsites.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
43

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
LDM-129,theDM InfrastructureDesignDocument,isnowconsideredobsolete.

Back to top


7 ServiceManagementandMonitoring
The LSST Data Facility provides a set of services supporting overall management of services,
aswellasmonitoringinfrastructurewhichcollectsinformationaboutrunningservicesforser-
vicedelivery,incidentresponse,planningforfutureupgrades,andsupportingchangecontrol.
Servicemanagementprocessesaredrawnfromthe ITIL ITServiceManagementvocabulary.
7.1 ServiceManagementProcesses
7.1.1 Overview
Thissectionbrieflydescribesfunctionsandprocessesofservicemanagementthatareimple-
mentedacrossallserviceand ITClayersoftheLSSTDataFacility.Theseelementsweredrawn
fromthe InformationTechnology InfrastructureLibrary(ITIL)whichisanindustry-standard
vocabularyfor ITservicemanagement.
ITServiceManagementprocessesinclude
1. Service Design: Building a service catalog and arranging for changes to the service of-
fering,includinginternalsupportingservices.
2. ServiceTransition:Specifyingneededchanges,assessingthequalityofproposedchanges,
and controlling the order and timing of inserting changes into the system.
ChangeManagement
providesauthorizationforstreamsofchangestoberequested,
fortheinsertionofchangesintothereliableproductionsystem,andfortheassess-
ment of the success of these changes.
Release Management
interacts with project producing a specific change to ensure
that a complete change is presented to change management for approval into the
live system. Examples areas that are typically a concern are accompanying docu-
mentationandsecurityaspects.
Configuration Management
provides an accurate model of the components in the
livesystemsufficienttounderstandchanges,andsupportoperations.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
44

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
3. Service Delivery: operating the current set and configuration of production services.
Servicedeliveryprocessesmustsatisfythedetailedservicedeliveryconceptspresented
elsewhereinthisdocument.
Request Response
Incident Response
Problem Management
7.2 ServiceMonitoring
7.2.1 Scope
Thissectiondescribesoperationalconceptsofsystems-levelandservice-levelmonitoringfor
servicesoperatedbytheLSSTDataFacility.
7.2.2 Overview
7.2.2.1 Description
Theservicemonitoringsystemisthesourceoftruthforthehealth
and status of all operational services within its scope. The monitoring system deals with
quality controls related to service delivery. These data have both etrospective and real-time
uses:
• Acquires data from subordinate monitoring systems within components that are not
bespoken LSST software. These monitoring systems may have an API, log files, SNMP,
andTBDotherinterfaces.
• AcquiresdatafromnativeLSSTinterfaces,includinginterfacingtotheloggingpackage
(pex.log), event package (ctrl_events), L1 logging, L1 events, scoreboards (Redis), TBD
Qserv,anddatafromotherindependentpackages.
• Probesservicesfrommonitoringagentsandingestsqualitycontrolparameters.
• Synthesizes new quality control data from existing quality control data (for example,
correlating a series of events before generating an event that will issue a page).
• Can generate events based on performance or malfunction which can trigger incident
responseforservicesand ITC,includingtoanon-NCSAincidentresponsesoftware.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
45

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
• Can generate eports used for problem management, availability management, capacity
management,vendormanagementandsimilarprocesses.
• Providesdashboard(orcomfort)displayssatisfyingtheusecasesdefinedbelow.
• ProvidesforinstantiationofdisplaysanywherewithintheLSSToperationalenvironment
(concernsportingvs.remotedisplay,paintdisplaywithhighlatency).
• Providesforpubliclyvisibledisplaysanddisplaysvisibleonlytothoseauthorizedbythe
LSSTAuthenticationandAuthorizationsystem.
• Is sensitive to dynamic deployment of services to ITC esources.
• Issensitivetomodesofdeploymentandtest,integration,developmentwhengenerat-
ing alerts, painting displays, and ecording data for etrospective use (concerns segre-
gationandseparation).
• Isitselfhighlyreliableandavailable.
• Providesfordisconnectedoperationsbetweengeographicsites(Summit,BaseandNCSA)
andenclaves(e.g.,ObservingCriticalandnon-ObservingCritical).
7.2.2.2 Objective
ThesetofservicesandinfrastructurereliedonbyLSSTDataFacilityop-
erationsisinherentlydistributedduetothedistributeddeploymentoftheLDFservices.Reli-
ableoperationsofLDFservicesinvolvescomponentsinstantiated(atleast)inChile,atNCSA,
and at CC-IN2P3, as well as the networks between these sites.
A dataset based on the operational characteristics of the facilities, hardware, software and
other elements of service infrastructure is needed to support service management, service
delivery, service transition, and ITC-level activities, as well as to provide health and status in-
formation to the users of the systems. This dataset must be substantially unified, so that all
activities are supported by a single source of truth. From a unified dataset, for example, staff
concerned with availability management of a service can obtain ecords that consistently re-
flectavailabilityinformationgeneratedbyincidentresponseactivities,whilestaff concerned
with capacity management can obtain information on how capacity is provided by ITC activi-
ties.
In general, service management needs both a subset of the data that is needed for ITC man-
agementanddatawhichmaynotbesuppliedbytraditional ITCmonitoring.Examplesofdata
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
46

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
not supplied by ITC monitoring include the end-to-end availability of a service that tolerates
hardware faults, user-facing comfort displays which address specific areas of interest, and
controlsthatmonitordata flowintodisasterrecoverystoresforconsistencywithcreationof
data.
7.2.2.3 OperationalContext
LDFservicesrelyon ITChostedatNCSA,theChileanBase
Center, satellite computing centers, test stands at LSST Headquarters, wide area networks,
andpossiblyothersourcesofinfrastructure.Eachofthesesourcespossessesorganization-
specific (non-uniform) ITC monitoring and service management information on which LDF
services rely. In all cases the LSST Data Facility needs to centrally acquire sufficient data
to provide for management of LDF services, while minimizing coupling to the ITC or service
provisioning from these sites. The coupling should be defined in an internal Service Level
Agreement(SLA)orsimilarwritteninstrument.
7.2.3 OperationalConcepts
7.2.3.1 NormalOperations
7.2.3.1.1 Level1Services
Level1servicesareinstantiatedatNCSAandtheChileanBase
Center.TheservicestheLDFreliesonthatmayprovidemonitoringinformationaredescribed
in the table below. The monitoring system may acquire additional essential data by agents,
consistentwithSLAsandsystemsengineeringbestpractice.
Table3:SourcesofMonitoringData
Reliance
Subordinate monitoring interfaces pro-
vided
WANfromChileanborder outertoUIUC
WideareanetworkactivityareaofLDF
NetworktransitfromChileanL1infrastructure
subsystem to Chilean border outer
ObservatoryOperations
Network transport on UIUC campus to L1 in-
stallationareainNPCF
U of I networking, NCSA networking
OCSinterfaces(bridge,telemetryinjection) ObservatoryOperations
CDSinterface
ObservatoryOperations
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
47

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
BaseCentercomputingroom esources
Observatory Operations, Computing Facility
manager
“PairofHands” assistanceinChile
TBD(ifany)
ITC for L1 system, exclusive of reliances listed
otherwise
LDF ITC group or relied upon NCSA groups
NCSA/NPCFfacility esourcemanagement NCSA/NPCFfacilitymanagement
Service-specific code and service-level per-
formance as a part of the overall system,
component-levelaspectofL1internals
InterfacesprovidedbyLDFsoftwaregroup
Table4:UsesofMonitoringServiceDataProducts
Entity
Need
Notes
Incidentresponse
Events indicating service
faults.
TBDthesedirectlygeneratenotifica-
tions(page),(andhavetheright filter-
ingsemantics)
Problemmanagement
Incidentinformationandin-
formation about marginal
or near-miss events de-
tected.
ObservingOperations,DPP
Staff, and LSST HQ
Comfort displaysindicating
real-time status of services
used.
NCSA staff should be able to see
the same information as Observa-
tory Operations staff to prevent con-
fusioninincidentresponse. Itisim-
portant to note that monitoring re-
lied on by Observatory Operations
in Chile having reliances on NSCA
need to operate and provide appro-
priatesubsetsofinformationtoeach
site,shouldtheconnectivitybetween
sitesbedisrupted.
Alertusers
Information about when
alerts are being exported
and flowstovariousbroker-
likeentities.
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
48

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
Availabilitymanagement Queries, eports and dis-
plays focused on historical
contributions to failures by
reliance.
Capacitymanagement Queries, eports and dis-
plays focused on historical
usageof esources.
Contract and SLA manage-
ment
Queries, eports, and dis-
plays of quantities related
to performance, e.g., re-
sponsetimes,qualityofma-
terialsorservices.
ITCstaff
Supplemental information
to ITCmonitoring.
7.2.3.1.2 Batch Production Services
7.2.3.1.3 DataBackboneServices
7.2.3.1.4 DataAccessHostingServices
7.2.3.1.5 WideAreaNetworks
Manyofthehardwarecomponentswhichmakeupthe
WAN will be managed by different entities (ISPs) based on who owns the particular section
of the network. Typically all ISPs run their own SNMP network to monitor the health of the
devices. LDF service monitoring taps into this information base and collects and forwards it
tothecentralconsoleforhealthandstatusmonitoring.
7.2.3.2 OperationalScenarios
Apredefinedhierarchyofroleswhichwillincludedifferent
levels of users as listed below:
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
49

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
• GenericUser
• SystemAdmininistrator
• ScienceUser
• LSSTDMAdmininistrator
• HardwareOperator(ISPs)
• CameraControlSystemAdministrator
• ObservatoryControlSystemAdministrator
• SuperUser
The level of access and response capabilities will be as defined in the user-profile. In the case
of a “Generic User,” it maybe necessary onlyto showif the LSSTsystem is upand running.
Therecanbeagraphical epresentationofthestatusofthesystemsandsubsystems.
For a “Super User” who will have access to detailed status information on the systems and
subsystems, will be able to see in-depth event history and status eports (through log-scraping
and fault databases). The Super User will also be able to access the logs database through
thesameportal.
In-between levels of access will be defined as per the definition of the roles and responsibili-
ties of the user.

Back to top


References
[1]
[LSE-140]
, Dubois-Felsmann, G., 2016,
Auxiliary Instrumentation Interface between Data
Management and Telescope
, LSE-140, URL
https://ls.st/LSE-140
[2]
[LSE-163]
, Jurić, M., etal., 2017,
LSSTDataProductsDefinitionDocument
, LSE-163, URL
https://ls.st/LSE-163
[3]
[LPM-121]
, Petravick, D.L., Withers, A., 2016,
LSST Master Information Security Policy
, LPM-
121, URL
https://ls.st/LPM-121
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
50

Draft
LARGESYNOPTICSURVEYTELESCOPE
ConOpsforLSSTDataFacilityServices
LDM-230
LatestRevision2017-07-04
[4]
[LSE-279]
, Withers, A., 2017,
Concept of Operations for Unified LSST Authentication and Au-
thorization Services
, LSE-279, URL
https://ls.st/LSE-279

Back to top


8 Acronyms
Thefollowingtablehasbeengeneratedfromtheon-lineGaiaacronymlist:
Acronym Description
CAM CAMera
DAC
DataAccessCentre
DM
DataManagement
DMLT DMLeadershipTeam
DPAC DataProcessingandAnalysisConsortium
ESA
EuropeanSpaceAgency
LSST Large-apertureSynopticSurveyTelescope
NCSA NationalCenterforSupercomputingApplications
TBD ToBeDefined(Determined)
US
UnitedStates
DRAFTNOTYETAPPROVED
– Thecontentsofthisdocumentaresubjecttoconfigurationcontrolbythe
LSSTDMTechnicalControlTeam. –
DRAFTNOTYETAPPROVED
51

Back to top