Brazos Data Analysis Site Monitoring Utility

◊ Mitchell Institute Computing on the Texas A&M Brazos Cluster ◊

I - Data Transfers  |  II - Data Holdings  |  III - Job Status  |  IV - Site Availability  |  V - Alerts  |  View All

Updated:   Friday, 2018-10-19 16:45 UTC         ( Friday, 2018-10-19 11:45 CDT )

Warning: You must enable JavaScript for optimal site functionality!


Data Transfers to the Brazos Cluster

PhEDEx Production Data Transfer Quality (Last 48 Hours) PhEDEx Load Test Transfer Quality (Last 48 Hours)
PhEDEx Production Data Transfer Rate (Last 48 Hours) PhEDEx Load Test Transfer Rate (Last 48 Hours)
PhEDEx Production Data Transfer Quality (Last 45 Days) PhEDEx Load Test Transfer Quality (Last 45 Days)
PhEDEx Production Data Transfer Rate (Last 45 Days) PhEDEx Load Test Transfer Rate (Last 45 Days)
PhEDEx Production Data Transfer Quality (Last 52 Weeks) PhEDEx Load Test Transfer Quality (Last 52 Weeks)
PhEDEx Production Data Transfer Rate (Last 52 Weeks) PhEDEx Load Test Transfer Rate (Last 52 Weeks)
Production Data Load Tests
↑ Click to Enlarge Images Select → Hourly Daily Weekly

- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 0 1
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 1 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 0 1
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 1 2
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
124 KiB/s 10.7 GiB 4 0 4
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
63.5 KiB/s 5.50 GiB 2 0 9
95.7 KiB/s 8.30 GiB 3 0 4
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 8 0
- - - - -
- - - - -
- - - - -
124 KiB/s 10.7 GiB 4 0 0
31.1 KiB/s 2.70 GiB 1 0 11
124 KiB/s 10.7 GiB 4 0 1
0.00 iB/s 0.00 iB 0 0 13
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 0 13
- - - - -
- - - - -
- - - - -
- - - - -
563 KiB/s 48.6 GiB 18 8 55
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
107 KiB/s 64.4 GiB 24 0 29
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
68.1 KiB/s 41.2 GiB 15 0 61
99.9 KiB/s 60.4 GiB 22 0 45
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 58 0
- - - - -
- - - - -
- - - - -
129 KiB/s 77.8 GiB 29 0 16
62.1 KiB/s 37.6 GiB 14 0 61
129 KiB/s 77.8 GiB 29 0 14
65.1 KiB/s 39.4 GiB 14 1 57
- - - - -
- - - - -
62.1 KiB/s 37.6 GiB 14 0 53
- - - - -
- - - - -
- - - - -
- - - - -
721 KiB/s 436 GiB 161 59 336
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
78.7 KiB/s 204 GiB 76 0 141
- - - - -
- - - - -
25.9 KiB/s 67.3 GiB 25 1 79
30.5 KiB/s 79.0 GiB 32 1 39
26.9 KiB/s 69.8 GiB 26 0 74
- - - - -
- - - - -
- - - - -
31.1 KiB/s 80.5 GiB 30 0 51
77.4 KiB/s 201 GiB 73 1 180
93.3 KiB/s 242 GiB 88 0 148
- - - - -
- - - - -
22.2 KiB/s 57.5 GiB 22 0 10
- - - - -
0.00 iB/s 0.00 iB 0 145 148
- - - - -
- - - - -
55.3 KiB/s 143 GiB 150 1 95
95.3 KiB/s 247 GiB 92 4 59
75.6 KiB/s 196 GiB 73 0 153
107 KiB/s 277 GiB 103 0 67
101 KiB/s 262 GiB 93 1 102
- - - - -
- - - - -
94.2 KiB/s 244 GiB 91 0 91
- - - - -
- - - - -
- - - - -
- - - - -
914 KiB/s 2.31 TiB 974 154 1,437
↓ Production PhEDEx Data Transfers Load Test ↓ ← Select → Hour Day Week Month
Link Status Linked Node Link Status Rate Bytes Files Expired Errors
Valid T0_CH_CERN_Export Valid
Valid T1_DE_KIT_Buffer Valid
Valid T1_DE_KIT_Disk Null
Valid T1_ES_PIC_Buffer Valid
Valid T1_FR_CCIN2P3_Buffer Valid
Valid T1_IT_CNAF_Buffer Valid
Valid T1_RU_JINR_Buffer Valid
Valid T1_RU_JINR_Disk Null
Valid T1_RU_JINR_MSS Valid
Valid T1_UK_RAL_Buffer Valid
Valid T1_US_FNAL_Buffer Valid
Valid T1_US_FNAL_Disk Valid
Valid T2_BE_IIHE Valid
Valid T2_CH_CERN Valid
Valid T2_DE_DESY Valid
Excluded T2_ES_CIEMAT Valid
Valid T2_FR_IPHC Valid
Valid T2_IT_Pisa Valid
Valid T2_UK_London_IC Valid
Valid T2_UK_SGrid_Bristol Valid
Valid T2_US_Caltech Valid
Valid T2_US_Florida Valid
Valid T2_US_MIT Valid
Valid T2_US_Nebraska Valid
Valid T2_US_Purdue Valid
Valid T2_US_UCSD Agent Down
Valid T2_US_Vanderbilt Valid
Valid T2_US_Wisconsin Valid
Agent Down T3_US_Colorado Agent Down
Agent Down T3_US_Rice Agent Down
Agent Down T3_US_TTU Agent Down
Totals ( 2018-10-19 16:25 UTC )


Data Holdings on the Brazos Cluster

PhEDEx Queued Production Data Volume (Last 48 Hours) PhEDEx Resident Production Data Volume (Last 48 Hours)
PhEDEx Queued Production Data Volume (Last 45 Days) PhEDEx Resident Production Data Volume (Last 45 Days)
PhEDEx Queued Production Data Volume (Last 52 Weeks) PhEDEx Resident Production Data Volume (Last 52 Weeks)
Production Data
↑ Click to Enlarge Images Select → Hourly Daily Weekly

Production Data Subscribed PhEDEx Data Resident PhEDEx Data ( 2018-10-19 16:25 UTC )
Group Name Items Files Bytes Items Files Bytes Percent
AnalysisOps 23 433 359 GiB 23 433 359 GiB 100.0 %
DataOps 2 228 566 GiB 2 228 566 GiB 100.0 %
FacOps 4 230 272 GiB 4 230 272 GiB 100.0 %
higgs 75 4,779 11.6 TiB 75 4,779 11.6 TiB 100.0 %
Totals 104 5,670 12.7 TiB 104 5,670 12.7 TiB 100.0 %

HEPX Disk Store Usage    ( 2018-10-19 08:50 CST )
Directory Bytes Percent Date Modified
PhEDEx Monte Carlo
12.6 TiB 9.2 % 2018-08-24 23:08 UTC
PhEDEx RelVal
7.00 KiB 0.0 % 2018-01-09 19:51 UTC
PhEDEx CMS Data
183 GiB 0.1 % 2018-04-10 20:55 UTC
PhEDEx Load Tests
75.5 GiB 0.1 % 2018-10-18 20:44 UTC
User Output
114 TiB 83.5 % 2018-10-19 13:01 UTC
Miscellaneous
4.84 TiB 3.6 % 2018-10-19 13:49 UTC
Total 136 TiB 100.0 % 2018-10-19 13:49 UTC
↑ Click to Expand or Collapse Table

Total Disk Usage ( 2018-10-19 11:45 CST )
FData Partition Usage
212 TiB of 303 TiB


Job Status of the Brazos Cluster

Submitted Jobs Summary (Last 48 Hours) Running Jobs Summary (Last 48 Hours)
Job Runtime Days Expended per Calendar Day (Last 48 Hours) Processor Usage per Expended Runtime Efficiency (Last 48 Hours)
Job Termination Status Summary (Last 48 Hours) Job Termination Success Efficiency (Last 48 Hours)
Job Termination Failure Application Diagnostics (Last 48 Hours) Job Termination Failure Grid Diagnostics (Last 48 Hours)
Submitted Jobs Summary (Last 45 Days) Running Jobs Summary (Last 45 Days)
Job Runtime Days Expended per Calendar Day (Last 45 Days) Processor Usage per Expended Runtime Efficiency (Last 45 Days)
Job Termination Status Summary (Last 45 Days) Job Termination Success Efficiency (Last 45 Days)
Job Termination Failure Application Status (Last 45 Days) Job Termination Failure Grid Status (Last 45 Days)
Submitted Jobs Summary (Last 52 Weeks) Running Jobs Summary (Last 52 Weeks)
Job Runtime Days Expended per Calendar Day (Last 52 Weeks) Processor Usage per Expended Runtime Efficiency (Last 52 Weeks)
Job Termination Status Summary (Last 52 Weeks) Job Termination Success Efficiency (Last 52 Weeks)
Job Termination Failure Application Status (Last 52 Weeks) Job Termination Failure Grid Status (Last 52 Weeks)
Process Cycle Statistics Click to Enlarge Images ↓
Initiation & Runtime Termination Status ← Select → Hourly Daily Weekly

Month | Week | Day | Hour | Back <<
SLURM Queue Job Status ( 2018-10-19 11:45 CST )
User Name Queue Run Status Processors Memory CPU Run Hours Run Hours Limit Queue Hours
Tarini Konchady
BKGND 4 / 4 4 / 4 7.08 GiB 159 156 0.00
Jorge Morales
MIXED 3 / 3 26 / 26 92.9 GiB 21.3 26.0 0.00
Totals MIXED 7 / 7 30 / 30 100 GiB 180 182 0.00
↑ Click to Expand or Collapse Table Select Sorting By → User Queue Status

Month | Week | Day | Hour | Back <<
Condor Queue Job Status ( 2018-10-19 11:45 CST )
User Name Universe Run Status Processors Memory CPU Run Hours Run Hours Limit Queue Hours
Xuji Zhao
Grid 2 / 2 2 / 2 20.0 KiB 21.8 0.00 668
Totals Grid 2 / 2 2 / 2 20.0 KiB 21.8 0.00 668
↑ Click to Expand or Collapse Table Select Sorting By → User Universe Status

CMS Recent Job Activity on the Local Cluster by User ( 2018-10-19 16:25 UTC )
Job Activity Per User (Last Day)
User Name Pending Running Terminated App Failed Grid Failed CPU Usage Run Hours
Jorge Morales 0 0 17 70.6 % 0.0 % 81.8 % 10.4
Totals 0 0 17 70.6 % 0.0 % 81.8 % 10.4
↑ Click User Row for Job Details Select → Hour Day Week


Service Availability of the Brazos Cluster

Service Availability Percentage
Day Week Month
100 % 87 % 96 %

Brazos Cluster Heartbeat Tests ( 2018-10-19 11:45 CST )
SSH Link FData Filesystem Mount FData Partition Usage "DU" Query Status "DU" Query Timer
Pass Pass 212 TiB of 303 TiB Pass 26.0 Seconds

Brazos Cluster Usage Load Statistics ( 2018-10-19 11:45 CST )
Occupied + Scheduled Nodes Occupied + Scheduled Processors Load Average per CPU Physical Memory Use Virtual Memory Use
36.3 % of 322 28.0 % of 4,800 26.8 % of 4,800 34.9 % of 12.7 TiB 0.0 % of 0.00 iB
Login02 Head Node Usage Load Statistics
Running Processes User & System CPU Use Net Load Average Physical Memory Use Virtual Memory Use
3 of 334 1.2 %   &   0.7 % 6.0 % ( 7 Users ) 5.1 % of 31.3 GiB 0.5 % of 5.00 GiB

Brazos Cluster Queue Utilization Statistics ( 2018-10-19 11:45 CST )
Queue Accessible Cores Active Cores (Running) (hepx/all users) Requested Cores (Queued) (hepx/all users) Other Core States
(Held, Waiting, Exiting) (hepx/all users)
STAKEHOLDER 1,632 2/2 0/0 0/0
STAKEHOLDER-4G 1,888 24/24 0/0 0/0
BACKGROUND 2,912 4/4 0/0 0/0
BACKGROUND-4G 1,888 0/0 0/0 0/0
INTERACTIVE 3,520 0/0 0/0 0/0
SERIAL 216 0/0 0/0 0/0
SERIAL-LONG 216 0/0 0/0 0/0
MPI-CORE8 1,224 0/800 0/0 0/0
MPI-CORE32 832 0/384 0/0 0/0
MPI-CORE32-4G 448 0/0 0/0 0/0

Service Availability Monitoring (SAM) Tests
Itemized SAM Test Results (Last 48 Hours)
SRM-GetPFNFromTFC (_cms_Role_production)
SRM-VOGet (_cms_Role_production)
SRM-VOPut (_cms_Role_production)
WN-analysis (_cms_Role_lcgadmin)
WN-basic (_cms_Role_lcgadmin)
WN-cvmfs (_cms_Role_lcgadmin)
WN-env (_cms_Role_lcgadmin)
WN-frontier (_cms_Role_lcgadmin)
WN-isolation (_cms_Role_lcgadmin)
WN-remotestageout (_cms_Role_lcgadmin)
WN-mc (_cms_Role_lcgadmin)
WN-squid (_cms_Role_lcgadmin)
WN-xrootd-fallback (_cms_Role_lcgadmin)
CONDOR-JobSubmit (_cms_Role_lcgadmin)
↑ SAM Metric
← 2018-10-17 17:00 UTC
2018-10-19 17:00 UTC →
SAM Test Site Quality Summary (Last 45 Days)
← 2018-09-05 00:00 UTC
2018-10-20 00:00 UTC →
← 2018-09-05 00:00 UTC
2018-10-20 00:00 UTC →
↑ Plot Cells Link to Details ↑

↓ Grid Client CRAB Analysis Test Suite (CATS) Completed Job Status ( 2018-10-19 16:25 UTC )
Output Host → Local: Brazos Cluster Remote: Fermi National Laboratory
Output Size → Small Large Small Large
SLURM Pass:0   Fail:5   Other:0
2018-10-19 17:07 UTC
Pass:0   Fail:5   Other:0
2018-10-18 18:03 UTC
Pass:5   Fail:0   Other:0
2018-10-19 17:32 UTC
Pass:0   Fail:2   Other:0
2018-10-18 18:32 UTC
CRAB3 No Test Results
Found for Prior Week
↑ Test Results Link to Job Details ↑


Alert Summary (Details)

Data Transfer Quality Test Status: PASSED   ( 2018-10-19 16:25 UTC )
Load Test Transfer Quality Test Status: FAILED   ( 2018-10-19 16:25 UTC )
Load Tests Missing Test Status: PASSED   ( 2018-10-19 16:25 UTC )
PhEDEx Transfer Test Status: PASSED   ( 2018-10-19 16:25 UTC )
PhEDEx Corruption Test Status: PASSED   ( 2018-10-19 16:25 UTC )
Disk Usage Test Status: PASSED   ( 2018-10-19 13:50 UTC )
Disk Quota Test Status: PASSED   ( 2018-10-19 16:35 UTC )
Disk Permissions Test Status: PASSED   ( 2018-10-19 13:50 UTC )
PhEDEx Mismatch Test Status: PASSED   ( 2018-10-19 16:35 UTC )
Torque Stalled Test Status: PASSED   ( 2018-10-19 16:45 UTC )
Condor Stalled Test Status: PASSED   ( 2018-10-19 16:45 UTC )
SAM Failed Test Status: PASSED   ( 2018-10-19 16:25 UTC )
SAM Missing Test Status: PASSED   ( 2018-10-19 16:25 UTC )
CATS Failed Test Status: FAILED   ( 2018-10-19 16:25 UTC )
CATS Missing Test Status: FAILED   ( 2018-10-19 16:25 UTC )
Cluster Heartbeat Test Status: PASSED   ( 2018-10-19 16:45 UTC )



I - Data Transfers  |  II - Data Holdings  |  III - Job Status  |  IV - Site Availability  |  V - Alerts  |  View All