Brazos Data Analysis Site Monitoring Utility

◊ Mitchell Institute Computing on the Texas A&M Brazos Cluster ◊

I - Data Transfers  |  II - Data Holdings  |  III - Job Status  |  IV - Site Availability  |  V - Alerts  |  View All

Updated:   Friday, 2018-04-27 02:20 UTC         ( Thursday, 2018-04-26 21:20 CDT )

Warning: You must enable JavaScript for optimal site functionality!


Data Transfers to the Brazos Cluster

PhEDEx Production Data Transfer Quality (Last 48 Hours) PhEDEx Load Test Transfer Quality (Last 48 Hours)
PhEDEx Production Data Transfer Rate (Last 48 Hours) PhEDEx Load Test Transfer Rate (Last 48 Hours)
PhEDEx Production Data Transfer Quality (Last 45 Days) PhEDEx Load Test Transfer Quality (Last 45 Days)
PhEDEx Production Data Transfer Rate (Last 45 Days) PhEDEx Load Test Transfer Rate (Last 45 Days)
PhEDEx Production Data Transfer Quality (Last 52 Weeks) PhEDEx Load Test Transfer Quality (Last 52 Weeks)
PhEDEx Production Data Transfer Rate (Last 52 Weeks) PhEDEx Load Test Transfer Rate (Last 52 Weeks)
Production Data Load Tests
↑ Click to Enlarge Images Select → Hourly Daily Weekly

- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
858 KiB/s 3.10 GiB 2 0 0
- - - - -
- - - - -
746 KiB/s 2.70 GiB 1 0 0
739 KiB/s 2.70 GiB 1 0 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
2.29 MiB/s 8.50 GiB 4 0 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 8 0
- - - - -
- - - - -
156 KiB/s 13.5 GiB 5 0 0
115 KiB/s 9.90 GiB 4 0 0
124 KiB/s 10.7 GiB 4 0 0
- - - - -
- - - - -
- - - - -
124 KiB/s 10.7 GiB 4 0 0
127 KiB/s 11.0 GiB 4 0 0
127 KiB/s 11.0 GiB 4 0 0
- - - - -
- - - - -
121 KiB/s 10.4 GiB 4 0 0
- - - - -
124 KiB/s 10.7 GiB 4 0 0
- - - - -
- - - - -
208 KiB/s 18.0 GiB 16 0 1
124 KiB/s 10.7 GiB 4 0 3
124 KiB/s 10.7 GiB 4 0 0
124 KiB/s 10.7 GiB 4 0 0
131 KiB/s 11.3 GiB 4 0 0
- - - - -
- - - - -
124 KiB/s 10.7 GiB 4 0 0
- - - - -
- - - - -
- - - - -
- - - - -
1.81 MiB/s 160 GiB 69 8 4
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
62.1 KiB/s 37.6 GiB 14 13 0
- - - - -
- - - - -
125 KiB/s 75.3 GiB 28 0 0
110 KiB/s 66.7 GiB 27 0 0
111 KiB/s 67.1 GiB 25 0 2
- - - - -
- - - - -
- - - - -
120 KiB/s 72.5 GiB 27 0 0
127 KiB/s 77.0 GiB 28 1 0
123 KiB/s 74.2 GiB 27 0 0
- - - - -
- - - - -
117 KiB/s 70.6 GiB 27 0 0
- - - - -
115 KiB/s 69.8 GiB 26 0 6
- - - - -
- - - - -
184 KiB/s 111 GiB 112 0 1
120 KiB/s 72.5 GiB 27 0 3
120 KiB/s 72.5 GiB 27 0 0
124 KiB/s 75.2 GiB 28 0 0
131 KiB/s 79.3 GiB 28 0 0
- - - - -
- - - - -
124 KiB/s 75.2 GiB 28 0 0
- - - - -
- - - - -
- - - - -
- - - - -
1.77 MiB/s 1.07 TiB 479 14 12
44.4 KiB/s 115 GiB 63 8 2
152 KiB/s 395 GiB 203 75 10
- - - - -
99.7 KiB/s 259 GiB 112 1 29
- - - - -
40.0 KiB/s 104 GiB 66 0 0
25.6 KiB/s 66.3 GiB 25 0 1
- - - - -
0.00 iB/s 0.00 iB 0 7 0
153 KiB/s 396 GiB 164 1 40
234 KiB/s 607 GiB 269 0 4
- - - - -
75.8 KiB/s 196 GiB 92 0 1
1.70 KiB/s 4.40 GiB 7 0 0
51.3 KiB/s 133 GiB 52 0 30
- - - - -
- - - - -
25.1 KiB/s 65.1 GiB 40 0 25
- - - - -
41.5 KiB/s 108 GiB 52 0 1
25.1 KiB/s 65.1 GiB 24 0 0
50.6 KiB/s 131 GiB 63 0 3
38.9 KiB/s 101 GiB 53 0 1
25.1 KiB/s 65.1 GiB 27 0 0
54.5 KiB/s 141 GiB 59 0 2
76.4 iB/s 198 MiB 1 0 0
- - - - -
85.5 iB/s 222 MiB 1 0 0
- - - - -
- - - - -
- - - - -
1.11 MiB/s 2.88 TiB 1,373 92 149
98.4 KiB/s 255 GiB 95 13 12
- - - - -
- - - - -
117 KiB/s 304 GiB 113 3 13
110 KiB/s 284 GiB 115 1 17
61.1 KiB/s 158 GiB 59 0 13
- - - - -
- - - - -
- - - - -
116 KiB/s 301 GiB 112 0 17
122 KiB/s 316 GiB 115 5 12
94.4 KiB/s 245 GiB 89 14 76
- - - - -
- - - - -
114 KiB/s 296 GiB 113 0 14
- - - - -
109 KiB/s 282 GiB 105 2 43
- - - - -
- - - - -
167 KiB/s 432 GiB 452 7 70
119 KiB/s 309 GiB 115 0 16
119 KiB/s 309 GiB 115 0 14
119 KiB/s 309 GiB 115 0 12
126 KiB/s 325 GiB 115 0 13
- - - - -
- - - - -
117 KiB/s 303 GiB 113 0 17
- - - - -
- - - - -
- - - - -
- - - - -
1.67 MiB/s 4.32 TiB 1,941 45 359
↓ Production PhEDEx Data Transfers Load Test ↓ ← Select → Hour Day Week Month
Link Status Linked Node Link Status Rate Bytes Files Expired Errors
Valid T0_CH_CERN_Export Valid
Valid T1_DE_KIT_Buffer Valid
Valid T1_DE_KIT_Disk Null
Valid T1_ES_PIC_Buffer Valid
Valid T1_FR_CCIN2P3_Buffer Valid
Valid T1_IT_CNAF_Buffer Valid
Valid T1_RU_JINR_Buffer Valid
Valid T1_RU_JINR_Disk Null
Valid T1_RU_JINR_MSS Valid
Valid T1_UK_RAL_Buffer Valid
Valid T1_US_FNAL_Buffer Valid
Valid T1_US_FNAL_Disk Valid
Valid T2_BE_IIHE Valid
Valid T2_CH_CERN Valid
Valid T2_DE_DESY Valid
Excluded T2_ES_CIEMAT Valid
Valid T2_FR_IPHC Valid
Valid T2_IT_Pisa Valid
Valid T2_UK_London_IC Valid
Valid T2_UK_SGrid_Bristol Valid
Valid T2_US_Caltech Valid
Valid T2_US_Florida Valid
Valid T2_US_MIT Valid
Valid T2_US_Nebraska Valid
Valid T2_US_Purdue Valid
Valid T2_US_UCSD Valid
Valid T2_US_Vanderbilt Valid
Valid T2_US_Wisconsin Valid
Agent Down T3_US_Colorado Agent Down
Agent Down T3_US_Rice Agent Down
Agent Down T3_US_TTU Agent Down
Totals ( 2018-04-27 01:55 UTC )


Data Holdings on the Brazos Cluster

PhEDEx Queued Production Data Volume (Last 48 Hours) PhEDEx Resident Production Data Volume (Last 48 Hours)
PhEDEx Queued Production Data Volume (Last 45 Days) PhEDEx Resident Production Data Volume (Last 45 Days)
PhEDEx Queued Production Data Volume (Last 52 Weeks) PhEDEx Resident Production Data Volume (Last 52 Weeks)
Production Data
↑ Click to Enlarge Images Select → Hourly Daily Weekly

Production Data Subscribed PhEDEx Data Resident PhEDEx Data ( 2018-04-27 01:55 UTC )
Group Name Items Files Bytes Items Files Bytes Percent
AnalysisOps 23 433 359 GiB 23 433 359 GiB 100.0 %
DataOps 2 228 566 GiB 2 228 566 GiB 100.0 %
FacOps 4 230 272 GiB 4 230 272 GiB 100.0 %
higgs 48 2,033 4.81 TiB 48 2,033 4.81 TiB 100.0 %
susy 13 219 258 GiB 13 219 258 GiB 100.0 %
Totals 90 3,143 6.24 TiB 90 3,143 6.24 TiB 100.0 %

HEPX Disk Store Usage    ( 2018-04-26 20:50 CST )
Directory Bytes Percent Date Modified
PhEDEx Monte Carlo
6.06 TiB 4.1 % 2018-04-11 06:25 UTC
PhEDEx RelVal
7.00 KiB 0.0 % 2018-01-09 19:51 UTC
PhEDEx CMS Data
183 GiB 0.1 % 2018-04-10 20:55 UTC
PhEDEx Load Tests
19.0 KiB 0.0 % 2018-04-27 01:20 UTC
User Output
131 TiB 89.2 % 2018-04-27 01:01 UTC
Miscellaneous
4.84 TiB 3.3 % 2018-04-27 01:41 UTC
Total 147 TiB 100.0 % 2018-04-27 01:41 UTC
↑ Click to Expand or Collapse Table

Total Disk Usage ( 2018-04-26 21:20 CST )
FData Partition Usage
216 TiB of 303 TiB


Job Status of the Brazos Cluster

Submitted Jobs Summary (Last 48 Hours) Running Jobs Summary (Last 48 Hours)
Job Runtime Days Expended per Calendar Day (Last 48 Hours) Processor Usage per Expended Runtime Efficiency (Last 48 Hours)
Job Termination Status Summary (Last 48 Hours) Job Termination Success Efficiency (Last 48 Hours)
Job Termination Failure Application Diagnostics (Last 48 Hours) Job Termination Failure Grid Diagnostics (Last 48 Hours)
Submitted Jobs Summary (Last 45 Days) Running Jobs Summary (Last 45 Days)
Job Runtime Days Expended per Calendar Day (Last 45 Days) Processor Usage per Expended Runtime Efficiency (Last 45 Days)
Job Termination Status Summary (Last 45 Days) Job Termination Success Efficiency (Last 45 Days)
Job Termination Failure Application Status (Last 45 Days) Job Termination Failure Grid Status (Last 45 Days)
Submitted Jobs Summary (Last 52 Weeks) Running Jobs Summary (Last 52 Weeks)
Job Runtime Days Expended per Calendar Day (Last 52 Weeks) Processor Usage per Expended Runtime Efficiency (Last 52 Weeks)
Job Termination Status Summary (Last 52 Weeks) Job Termination Success Efficiency (Last 52 Weeks)
Job Termination Failure Application Status (Last 52 Weeks) Job Termination Failure Grid Status (Last 52 Weeks)
Process Cycle Statistics Click to Enlarge Images ↓
Initiation & Runtime Termination Status ← Select → Hourly Daily Weekly

Month | Week | Day | Hour | Back <<
SLURM Queue Job Status ( 2018-04-26 21:20 CST )
User Name Queue Run Status Processors Memory CPU Run Hours Run Hours Limit Queue Hours
Tao Huang
BKGND-4G 87 / 87 87 / 87 324 GiB 3,248 3,219 3.17
Jorge Morales
STKHD-4G 1 / 1 24 / 24 89.4 GiB 5.22 5.00 1.90
Ryan Mueller
STKHD 1 / 1 1 / 1 1.77 GiB 6.85 6.00 0.81
Josh Winchell
STKHD 11 / 11 22 / 22 38.9 GiB 46.8 110 0.00
kawinwanichakij
MIXED 15 / 15 15 / 15 48.4 GiB 159 271 1.07
Totals MIXED 115 / 115 149 / 149 503 GiB 3,466 3,611 6.96
↑ Click to Expand or Collapse Table Select Sorting By → User Queue Status

Month | Week | Day | Hour | Back <<
Condor Queue Job Status ( 2018-04-26 21:20 CST )
User Name Universe Run Status Processors Memory CPU Run Hours Run Hours Limit Queue Hours
Totals - - / - - / - - - - -
No Jobs Queued Select Sorting By → User Universe Status

CMS Recent Job Activity on the Local Cluster by User ( 2018-04-27 02:00 UTC )
Job Activity Per User (Last Day)
User Name Pending Running Terminated App Failed Grid Failed CPU Usage Run Hours
Jorge Daniel Morales Mendoza 0 0 18 0.0 % 0.0 % 31.7 % 2.87
Jorge Morales 0 0 17 0.0 % 0.0 % 87.7 % 9.26
Ryan Dalrymple Mueller 0 2 447 2.7 % 2.7 % 97.7 % 1,535
Totals 0 2 482 2.5 % 2.5 % 97.5 % 1,547
↑ Click User Row for Job Details Select → Hour Day Week


Service Availability of the Brazos Cluster

Service Availability Percentage
Day Week Month
0 % 85 % 94 %

Brazos Cluster Heartbeat Tests ( 2018-04-26 21:20 CST )
SSH Link FData Filesystem Mount FData Partition Usage "DU" Query Status "DU" Query Timer
Pass Pass 216 TiB of 303 TiB Pass 23.2 Seconds

Brazos Cluster Usage Load Statistics ( 2018-04-26 21:20 CST )
Occupied + Scheduled Nodes Occupied + Scheduled Processors Load Average per CPU Physical Memory Use Virtual Memory Use
35.3 % of 329 22.4 % of 4,856 26.0 % of 4,856 34.1 % of 12.9 TiB 0.0 % of 0.00 iB
Login02 Head Node Usage Load Statistics
Running Processes User & System CPU Use Net Load Average Physical Memory Use Virtual Memory Use
4 of 372 2.3 %   &   2.1 % 41.0 % ( 15 Users ) 18.2 % of 31.3 GiB 2.4 % of 5.00 GiB

Brazos Cluster Queue Utilization Statistics ( 2018-04-26 21:20 CST )
Queue Accessible Cores Active Cores (Running) (hepx/all users) Requested Cores (Queued) (hepx/all users) Other Core States
(Held, Waiting, Exiting) (hepx/all users)
STAKEHOLDER 1,632 103/103 0/0 0/0
STAKEHOLDER-4G 1,952 244/244 0/0 0/0
BACKGROUND 2,912 0/0 0/0 0/0
BACKGROUND-4G 1,952 87/87 0/0 0/0
INTERACTIVE 3,584 0/0 0/0 0/0
SERIAL 216 0/0 0/0 0/0
SERIAL-LONG 216 0/0 0/0 0/0
MPI-CORE8 1,288 0/656 0/0 0/0
MPI-CORE32 832 0/0 0/0 0/0
MPI-CORE32-4G 448 0/0 0/0 0/0

Service Availability Monitoring (SAM) Tests
Itemized SAM Test Results (Last 48 Hours)
SRM-GetPFNFromTFC (_cms_Role_production)
SRM-VOGet (_cms_Role_production)
SRM-VOPut (_cms_Role_production)
WN-analysis (_cms_Role_lcgadmin)
WN-basic (_cms_Role_lcgadmin)
WN-cvmfs (_cms_Role_lcgadmin)
WN-env (_cms_Role_lcgadmin)
WN-frontier (_cms_Role_lcgadmin)
WN-isolation (_cms_Role_pilot)
WN-remotestageout (_cms_Role_lcgadmin)
WN-mc (_cms_Role_lcgadmin)
WN-squid (_cms_Role_lcgadmin)