Brazos Data Analysis Site Monitoring Utility

◊ Mitchell Institute Computing on the Texas A&M Brazos Cluster ◊

I - Data Transfers  |  II - Data Holdings  |  III - Job Status  |  IV - Site Availability  |  V - Alerts  |  View All

Updated:   Wednesday, 2018-12-12 17:45 UTC         ( Wednesday, 2018-12-12 11:45 CST )

Warning: You must enable JavaScript for optimal site functionality!


Data Transfers to the Brazos Cluster

PhEDEx Production Data Transfer Quality (Last 48 Hours) PhEDEx Load Test Transfer Quality (Last 48 Hours)
PhEDEx Production Data Transfer Rate (Last 48 Hours) PhEDEx Load Test Transfer Rate (Last 48 Hours)
PhEDEx Production Data Transfer Quality (Last 45 Days) PhEDEx Load Test Transfer Quality (Last 45 Days)
PhEDEx Production Data Transfer Rate (Last 45 Days) PhEDEx Load Test Transfer Rate (Last 45 Days)
PhEDEx Production Data Transfer Quality (Last 52 Weeks) PhEDEx Load Test Transfer Quality (Last 52 Weeks)
PhEDEx Production Data Transfer Rate (Last 52 Weeks) PhEDEx Load Test Transfer Rate (Last 52 Weeks)
Production Data Load Tests
↑ Click to Enlarge Images Select → Hourly Daily Weekly

- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 1 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
1.50 MiB/s 5.40 GiB 2 0 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
1.50 MiB/s 5.40 GiB 2 1 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
93.2 KiB/s 8.10 GiB 3 0 1
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
95.4 KiB/s 8.20 GiB 3 3 0
127 KiB/s 11.0 GiB 4 0 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
124 KiB/s 10.7 GiB 4 0 1
124 KiB/s 10.7 GiB 4 0 1
135 KiB/s 11.6 GiB 4 0 1
- - - - -
- - - - -
1.50 MiB/s 129 GiB 48 0 9
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
2.18 MiB/s 189 GiB 70 3 13
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
5.10 MiB/s 3.10 TiB 400 84 310
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
5.10 MiB/s 3.10 TiB 400 84 310
115 KiB/s 69.8 GiB 26 0 1
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
90.8 KiB/s 54.9 GiB 20 14 21
109 KiB/s 66.0 GiB 24 9 13
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
84.3 KiB/s 51.0 GiB 19 0 7
111 KiB/s 67.1 GiB 25 0 15
124 KiB/s 75.2 GiB 28 0 10
131 KiB/s 79.4 GiB 28 0 11
- - - - -
- - - - -
1.50 MiB/s 891 GiB 332 0 126
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
2.25 MiB/s 1.32 TiB 502 23 204
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
1.20 MiB/s 3.10 TiB 400 353 1,010
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
1.20 MiB/s 3.10 TiB 400 353 1,010
113 KiB/s 293 GiB 109 0 26
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
21.2 KiB/s 54.9 GiB 20 78 232
25.4 KiB/s 66.0 GiB 24 76 218
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
109 KiB/s 282 GiB 105 0 44
117 KiB/s 303 GiB 113 0 47
119 KiB/s 309 GiB 115 0 26
127 KiB/s 329 GiB 116 0 22
- - - - -
- - - - -
1.40 MiB/s 3.50 TiB 1,310 0 427
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
2.02 MiB/s 5.10 TiB 1,912 154 1,042
↓ Production PhEDEx Data Transfers Load Test ↓ ← Select → Hour Day Week Month
Link Status Linked Node Link Status Rate Bytes Files Expired Errors
Valid T0_CH_CERN_Export Valid
Valid T1_DE_KIT_Buffer Valid
Valid T1_DE_KIT_Disk Null
Valid T1_ES_PIC_Buffer Valid
Valid T1_FR_CCIN2P3_Buffer Valid
Valid T1_IT_CNAF_Buffer Valid
Valid T1_RU_JINR_Buffer Valid
Valid T1_RU_JINR_Disk Null
Valid T1_RU_JINR_MSS Valid
Valid T1_UK_RAL_Buffer Valid
Valid T1_US_FNAL_Buffer Valid
Valid T1_US_FNAL_Disk Valid
Valid T2_BE_IIHE Valid
Valid T2_CH_CERN Valid
Valid T2_DE_DESY Valid
Excluded T2_ES_CIEMAT Valid
Valid T2_FR_IPHC Valid
Valid T2_IT_Pisa Valid
Valid T2_UK_London_IC Valid
Valid T2_UK_SGrid_Bristol Valid
Valid T2_US_Caltech Agent Down
Valid T2_US_Florida Valid
Valid T2_US_MIT Valid
Valid T2_US_Nebraska Valid
Valid T2_US_Purdue Valid
Valid T2_US_UCSD Valid
Valid T2_US_Vanderbilt Valid
Valid T2_US_Wisconsin Valid
Agent Down T3_US_Colorado Agent Down
Excluded T3_US_FNALLPC Excluded
Agent Down T3_US_Rice Agent Down
Agent Down T3_US_TTU Agent Down
Totals ( 2018-12-12 17:25 UTC )


Data Holdings on the Brazos Cluster

PhEDEx Queued Production Data Volume (Last 48 Hours) PhEDEx Resident Production Data Volume (Last 48 Hours)
PhEDEx Queued Production Data Volume (Last 45 Days) PhEDEx Resident Production Data Volume (Last 45 Days)
PhEDEx Queued Production Data Volume (Last 52 Weeks) PhEDEx Resident Production Data Volume (Last 52 Weeks)
Production Data
↑ Click to Enlarge Images Select → Hourly Daily Weekly

Production Data Subscribed PhEDEx Data Resident PhEDEx Data ( 2018-12-12 17:25 UTC )
Group Name Items Files Bytes Items Files Bytes Percent
AnalysisOps 23 433 359 GiB 23 433 359 GiB 100.0 %
DataOps 2 228 566 GiB 2 228 566 GiB 100.0 %
FacOps 4 426 1.68 TiB 4 426 1.68 TiB 100.0 %
higgs 75 4,779 11.6 TiB 75 4,779 11.6 TiB 100.0 %
Totals 104 5,866 14.1 TiB 104 5,866 14.1 TiB 100.0 %

HEPX Disk Store Usage    ( 2018-12-12 10:50 CST )
Directory Bytes Percent Date Modified
PhEDEx Monte Carlo
12.6 TiB 9.0 % 2018-11-30 19:24 UTC
PhEDEx RelVal
1.41 TiB 1.0 % 2018-12-10 20:54 UTC
PhEDEx CMS Data
183 GiB 0.1 % 2018-04-10 20:55 UTC
PhEDEx Load Tests
75.5 GiB 0.1 % 2018-10-18 20:44 UTC
User Output
115 TiB 82.8 % 2018-12-12 16:01 UTC
Miscellaneous
4.85 TiB 3.5 % 2018-12-12 16:45 UTC
Total 139 TiB 100.0 % 2018-12-12 16:45 UTC
↑ Click to Expand or Collapse Table

Total Disk Usage ( 2018-12-12 11:45 CST )
FData Partition Usage
215 TiB of 303 TiB


Job Status of the Brazos Cluster

Submitted Jobs Summary (Last 48 Hours) Running Jobs Summary (Last 48 Hours)
Job Runtime Days Expended per Calendar Day (Last 48 Hours) Processor Usage per Expended Runtime Efficiency (Last 48 Hours)
Job Termination Status Summary (Last 48 Hours) Job Termination Success Efficiency (Last 48 Hours)
Job Termination Failure Application Diagnostics (Last 48 Hours) Job Termination Failure Grid Diagnostics (Last 48 Hours)
Submitted Jobs Summary (Last 45 Days) Running Jobs Summary (Last 45 Days)
Job Runtime Days Expended per Calendar Day (Last 45 Days) Processor Usage per Expended Runtime Efficiency (Last 45 Days)
Job Termination Status Summary (Last 45 Days) Job Termination Success Efficiency (Last 45 Days)
Job Termination Failure Application Status (Last 45 Days) Job Termination Failure Grid Status (Last 45 Days)
Submitted Jobs Summary (Last 52 Weeks) Running Jobs Summary (Last 52 Weeks)
Job Runtime Days Expended per Calendar Day (Last 52 Weeks) Processor Usage per Expended Runtime Efficiency (Last 52 Weeks)
Job Termination Status Summary (Last 52 Weeks) Job Termination Success Efficiency (Last 52 Weeks)
Job Termination Failure Application Status (Last 52 Weeks) Job Termination Failure Grid Status (Last 52 Weeks)
Process Cycle Statistics Click to Enlarge Images ↓
Initiation & Runtime Termination Status ← Select → Hourly Daily Weekly

Month | Week | Day | Hour | Back <<
SLURM Queue Job Status ( 2018-12-12 11:45 CST )
User Name Queue Run Status Processors Memory CPU Run Hours Run Hours Limit Queue Hours
Shu Liao
MPI-CORE8 2 / 2 46 / 46 5.59 GiB 23.7 22.0 0.00
Jorge Morales
MIXED 7 / 7 31 / 31 104 GiB 93.4 97.0 0.00
David Overton
STKHD 8 / 8 256 / 256 14.9 GiB 3.47 0.00 0.00
Totals MIXED 17 / 17 333 / 333 124 GiB 121 119 0.00
↑ Click to Expand or Collapse Table Select Sorting By → User Queue Status

Month | Week | Day | Hour | Back <<
Condor Queue Job Status ( 2018-12-12 11:45 CST )
User Name Universe Run Status Processors Memory CPU Run Hours Run Hours Limit Queue Hours
Totals - - / - - / - - - - -
No Jobs Queued Select Sorting By → User Universe Status

CMS Recent Job Activity on the Local Cluster by User ( 2018-12-12 17:25 UTC )
Job Activity Per User (Last Day)
User Name Pending Running Terminated App Failed Grid Failed CPU Usage Run Hours
Jorge Daniel Morales Mendoza 0 0 33 54.5 % 45.5 % 33.9 % 4.60
Jorge Morales 0 0 17 100.0 % 0.0 % 79.1 % 10.3
Totals 0 0 50 70.0 % 30.0 % 65.2 % 14.9
↑ Click User Row for Job Details Select → Hour Day Week


Service Availability of the Brazos Cluster

Service Availability Percentage
Day Week Month
100 % 100 % 94 %

Brazos Cluster Heartbeat Tests ( 2018-12-12 11:45 CST )
SSH Link FData Filesystem Mount FData Partition Usage "DU" Query Status "DU" Query Timer
Pass Pass 215 TiB of 303 TiB Pass 26.2 Seconds

Brazos Cluster Usage Load Statistics ( 2018-12-12 11:45 CST )
Occupied + Scheduled Nodes Occupied + Scheduled Processors Load Average per CPU Physical Memory Use Virtual Memory Use
16.4 % of 318 37.2 % of 4,768 28.7 % of 4,768 36.8 % of 12.6 TiB 0.0 % of 0.00 iB
Login02 Head Node Usage Load Statistics
Running Processes User & System CPU Use Net Load Average Physical Memory Use Virtual Memory Use
3 of 291 0.7 %   &   0.6 % 6.0 % ( 8 Users ) 7.7 % of 31.4 GiB 0.0 % of 5.00 GiB

Brazos Cluster Queue Utilization Statistics ( 2018-12-12 11:45 CST )
Queue Accessible Cores Active Cores (Running) (hepx/all users) Requested Cores (Queued) (hepx/all users) Other Core States
(Held, Waiting, Exiting) (hepx/all users)
STAKEHOLDER 1,632 262/262 0/0 0/0
STAKEHOLDER-4G 1,888 25/85 0/0 0/0
BACKGROUND 2,912 0/0 0/0 0/0
BACKGROUND-4G 1,888 0/36 0/0 0/0
INTERACTIVE 3,520 0/0 0/0 0/0
SERIAL 216 0/0 0/0 0/0
SERIAL-LONG 216 0/0 0/0 0/0
MPI-CORE8 1,224 48/48 0/0 0/0
MPI-CORE32 832 0/640 0/0 0/0
MPI-CORE32-4G 448 0/0 0/0 0/0

Service Availability Monitoring (SAM) Tests
Itemized SAM Test Results (Last 48 Hours)
SRM-GetPFNFromTFC (_cms_Role_production)
SRM-VOGet (_cms_Role_production)
SRM-VOPut (_cms_Role_production)
WN-analysis (_cms_Role_lcgadmin)
WN-basic (_cms_Role_lcgadmin)
WN-cvmfs (_cms_Role_lcgadmin)
WN-env (_cms_Role_lcgadmin)
WN-frontier (_cms_Role_lcgadmin)
WN-isolation (_cms_Role_lcgadmin)
WN-remotestageout (_cms_Role_lcgadmin)
WN-mc (_cms_Role_lcgadmin)
WN-squid (_cms_Role_lcgadmin)
WN-xrootd-fallback (_cms_Role_lcgadmin)
CONDOR-JobSubmit (_cms_Role_lcgadmin)
↑ SAM Metric
← 2018-12-10 18:00 UTC
2018-12-12 18:00 UTC →
SAM Test Site Quality Summary (Last 45 Days)
← 2018-10-29 00:00 UTC
2018-12-13 00:00 UTC →
← 2018-10-29 00:00 UTC
2018-12-13 00:00 UTC →
↑ Plot Cells Link to Details ↑

↓ Grid Client CRAB Analysis Test Suite (CATS) Completed Job Status ( 2018-12-12 17:25 UTC )
Output Host → Local: Brazos Cluster Remote: Fermi National Laboratory
Output Size → Small Large Small Large
CRAB3 Pass:15   Fail:3   Other:0
2018-12-12 16:37 UTC
↑ Test Results Link to Job Details ↑


Alert Summary (Details)

Data Transfer Quality Test Status: PASSED   ( 2018-12-12 17:25 UTC )
Load Test Transfer Quality Test Status: PASSED   ( 2018-12-12 17:25 UTC )
Load Tests Missing Test Status: PASSED   ( 2018-12-12 17:25 UTC )
PhEDEx Transfer Test Status: PASSED   ( 2018-12-12 17:25 UTC )
PhEDEx Corruption Test Status: PASSED   ( 2018-12-12 17:25 UTC )
Disk Usage Test Status: PASSED   ( 2018-12-12 16:50 UTC )
Disk Quota Test Status: PASSED   ( 2018-12-12 17:35 UTC )
Disk Permissions Test Status: PASSED   ( 2018-12-12 16:50 UTC )
PhEDEx Mismatch Test Status: PASSED   ( 2018-12-12 17:35 UTC )
Torque Stalled Test Status: PASSED   ( 2018-12-12 17:45 UTC )
Condor Stalled Test Status: PASSED   ( 2018-12-12 17:45 UTC )
SAM Failed Test Status: PASSED   ( 2018-12-12 17:25 UTC )
SAM Missing Test Status: PASSED   ( 2018-12-12 17:25 UTC )
CATS Failed Test Status: FAILED   ( 2018-12-12 17:25 UTC )
CATS Missing Test Status: PASSED   ( 2018-12-12 17:25 UTC )
Cluster Heartbeat Test Status: PASSED   ( 2018-12-12 17:45 UTC )



I - Data Transfers  |  II - Data Holdings  |  III - Job Status  |  IV - Site Availability  |  V - Alerts  |  View All