Prezentace na téma: "Tier1 – Tier2 cooperation FZK – FZU example Pomocne slidy pro delsi prezentaci."— Transkript prezentace:
email@example.com Tier1 – Tier2 cooperation FZK – FZU example Pomocne slidy pro delsi prezentaci
firstname.lastname@example.org FZU (Prague) Supports ALICE and ATLAS (plus other non LHC VOs) Associated to FZK in the ATLAS planning FZK will store data simulated at FZU AOD data to FZU will be copied via FZK FZK hosts file catalog for ATLAS data at FZU’s SE ALICE: Tier1 – Tier2 association is not (yet) used, data from FZU directly to CERN
email@example.com Organization Regular meetings FZK – Tier2’s twice a year, usually just before GridKa TAB (Technical Advisory Board) meeting A place to speak to technical stuff and managers, very good presence of people from FZK and from Tier2’s ATLAS Separate sessions in the scope of the meeting just before TAB Monthly phone conference Mailing list firstname.lastname@example.org Wiki pages https://uimon.cern.ch/twiki/bin/view/Atlas/GridKaCloud
email@example.com ATLAS – Data Management ATLAS DDM (Distributed Data Management) recognizes Tier1 clouds –Tier1 operates: LFC catalog, where all data within the cloud are registered ATLAS vobox with running DQ2 FTS server No special machines are required at Tier2, but Tier2 may run own vobox and catalog.
firstname.lastname@example.org FTS tests First tests using FTS server at CERN, FTS at FZK was not yet available Good peak results for CERN -> FZU direction, much worse when many D0 jobs running at FZU farm (not understood why) No routing for FZU -> CERN (reserved for OPN) Graphs from CERN FTS monitor on the next slide
email@example.com CERN – FZU transfers Data transfers via FTS tested using CERN FTS (April 2006)
firstname.lastname@example.org FZK - FZU Channel FZK-FZU available since May 2006. Good performance for FZU -> FZK, problems for FZK -> FZU. Discussed via emails many times, very slow progress in debugging. Network tested via iperf, seems OK. Problem probably in overloaded disk servers, not yet confirmed. To get all relevant people involved, tests were planned for one week 5.-9.6. All associated Tier2’s with srm SE involved Details not to be presented on official meeting: Plan did not succeed – holidays in Germany (5.6.), crash of filesystems at FZK just on Friday before the long weekend, recovery only on Tuesday 6.6. Another problem immediately occurred, first tests only on Wednesday. Admin people at FZK could not spend too much time on it due to other meetings. Support only during working hours. Migration to a new FTS server took quite long, all tests were done with an old server. Results Basic functionality worked Many problems with performance, stability, site’s availability,
email@example.com Example of achieved rates Wednesday 7.6. –200GB from FZU to FZK via FTS request –Results next day: Done: 92 Hold: 108 Errors: 25 transfers failed due to my mistake - target file already existed. 82 Transfer timed out 1 TRANSFER Operation Timed out During earlier tests the channel FZU – FZK achieved almost 100 Mbytes/s Limited not by network (iperf measurement was done), but by disk servers See wiki pages https://twiki.cern.ch/twiki/bin/view/Main/PragueAndServiceChallenge4ThroughputTests
firstname.lastname@example.org ATLAS Tier0 exercise Only datasets copied to FZK were available for FZU Started 19.6. 19.6. – 20.6. scheduled downtime at FZK for mw upgrade, later extended to 21.6. evening 21.6. – CERN – FZK transfers submitted to FTS server at CERN were still ending in the CERN-CERN channel Which dataset to use? Looking for complete datasets available at FZK: dq2 listDatasetsInSite -c FZKTAPE 22.6. first subscriptions to FZU (T0.D.run000831.RAW): Did not succeed 23.6. (Friday), 17:00: 3 complete datasets at FZKTAPE 26.6. Fetchers for Tier2’s added to FZK vobox
email@example.com 26.6. first files copied from FZK to FZU using DQ2 –Dataset with RAW data, big files 1.9 GB –23 Gbytes transferred in 1 hour –Only 4 files copied, the rest timed out during transfer (FTS and disk servers problem), the remaining traffic due to retrials Later a few subscriptions FZK -> FZU, AOD datasets, sometimes failed due to missing source files (automatic deletion after 12 hours)
firstname.lastname@example.org FTS tests FZK - FZU Transfer of 50 files, each file 1GB Starts fast, then timeouts occur: Transfer of 100 files, each file 1GB Started when load on Tier1 disk servers low
email@example.com Conclusions We managed to copy a few datasets of Tier0 test data using dq2 Far from expected rates (FTS tuning and disk servers stability must improve) Operation requires a lot of effort Common procedures and instructions for Tier2’s needed Jeste par poznamek: –Odezva na problemy je nekdy rychla, ale casto i trivialni zmena trva cely den (rano nahlasit, pak adminy presvedcit, ze to je skutecne problem FZK a druhy den to mit zmenene/restartovane) –Obvykle odezva jen behem dne, o vikendech nejsou reseny ani zavazne vypadky –ATLAS nema nikoho v FZK na miste, reprezentant ATLAS nema zadna vyssi prava pro spravu, ani pristup k log souborum – vse se musi delat pres sysadminy a dost to zdrzuje –Na ATLAS GridKaCloud schuzich nekdy chybi nekdo z FZK, kdo vi o technickem usporadani a soucasenm stavu. Obcas se pripojuje Andreas Heiss. –Neprobiha predavani zkusenosti ci podpora z Tier1 na Tier2, obcas spise opacne (vyjimky jsou, treba rady s nastavenim site od Bruna, pomoc s dCache od Doris) –Stabilita FZK velmi mala, caste problemy s filesystemem, v lete problemy s chlazenim (ano, ani vodni chlazeni vse nezvladne), upgrade na novou verzi MW trval 3 pracovni dny, … –Maly prostor pro ATLAS v ramci Tier0 testu, soubory byly mazany uz po 12 hodinach a jejich prenos na Tier2 vetsinou neuspel – nez na ne prisla rada, uz tam nebyly –Zjistuji, jak je to s modelem vztahu Tier1 – Tier2 v ALICE –Necitim zadny problem v tom, ze Tier1 je v jinem state, pristup FZK ke vsem Tier2 se mi zda stejny (urcite neni, ale vubec to neni poznat)
Your consent to our cookies if you continue to use this website.