Brazos Data Analysis Site Monitoring Utility

◊ Mitchell Institute Computing on the Texas A&M Brazos Cluster ◊

I - Data Transfers  |  II - Data Holdings  |  III - Job Status  |  IV - Site Availability  |  V - Alerts  |  View All

Updated:   Monday, 2018-06-18 03:20 UTC         ( Sunday, 2018-06-17 22:20 CDT )

Warning: You must enable JavaScript for optimal site functionality!


Data Transfers to the Brazos Cluster

PhEDEx Production Data Transfer Quality (Last 48 Hours) PhEDEx Load Test Transfer Quality (Last 48 Hours)
PhEDEx Production Data Transfer Rate (Last 48 Hours) PhEDEx Load Test Transfer Rate (Last 48 Hours)
PhEDEx Production Data Transfer Quality (Last 45 Days) PhEDEx Load Test Transfer Quality (Last 45 Days)
PhEDEx Production Data Transfer Rate (Last 45 Days) PhEDEx Load Test Transfer Rate (Last 45 Days)
PhEDEx Production Data Transfer Quality (Last 52 Weeks) PhEDEx Load Test Transfer Quality (Last 52 Weeks)
PhEDEx Production Data Transfer Rate (Last 52 Weeks) PhEDEx Load Test Transfer Rate (Last 52 Weeks)
Production Data Load Tests
↑ Click to Enlarge Images Select → Hourly Daily Weekly

- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
424 KiB/s 1.50 GiB 2 0 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
424 KiB/s 1.50 GiB 2 0 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
124 KiB/s 10.7 GiB 4 0 0
- - - - -
- - - - -
125 KiB/s 10.8 GiB 4 0 0
114 KiB/s 9.90 GiB 4 0 0
124 KiB/s 10.7 GiB 4 0 0
- - - - -
- - - - -
- - - - -
124 KiB/s 10.7 GiB 4 0 0
127 KiB/s 11.0 GiB 4 0 0
128 KiB/s 11.0 GiB 4 0 0
- - - - -
- - - - -
121 KiB/s 10.5 GiB 4 0 0
- - - - -
124 KiB/s 10.7 GiB 4 0 0
- - - - -
- - - - -
246 KiB/s 21.2 GiB 16 0 0
124 KiB/s 10.7 GiB 4 0 0
124 KiB/s 10.7 GiB 4 0 0
124 KiB/s 10.7 GiB 4 0 0
131 KiB/s 11.3 GiB 4 0 0
- - - - -
- - - - -
124 KiB/s 10.7 GiB 4 0 0
- - - - -
- - - - -
- - - - -
- - - - -
1.94 MiB/s 171 GiB 72 0 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
124 KiB/s 75.2 GiB 28 0 0
- - - - -
- - - - -
120 KiB/s 72.6 GiB 27 1 0
114 KiB/s 69.2 GiB 28 0 0
124 KiB/s 75.1 GiB 28 0 1
- - - - -
- - - - -
- - - - -
124 KiB/s 75.2 GiB 28 0 0
123 KiB/s 74.2 GiB 27 2 0
127 KiB/s 77.0 GiB 28 0 0
- - - - -
- - - - -
121 KiB/s 73.2 GiB 28 0 0
- - - - -
124 KiB/s 75.2 GiB 28 0 0
- - - - -
- - - - -
210 KiB/s 127 GiB 116 0 0
120 KiB/s 72.5 GiB 27 0 5
124 KiB/s 75.2 GiB 28 0 0
120 KiB/s 72.5 GiB 27 0 0
125 KiB/s 75.7 GiB 27 0 0
- - - - -
- - - - -
120 KiB/s 72.5 GiB 27 0 0
- - - - -
- - - - -
- - - - -
- - - - -
1.88 MiB/s 1.14 TiB 502 3 6
- - - - -
1.00 MiB/s 2.60 TiB 968 0 1
- - - - -
615 KiB/s 1.60 TiB 635 0 3
31.9 KiB/s 82.7 GiB 23 0 0
187 KiB/s 486 GiB 193 476 10
- - - - -
- - - - -
- - - - -
186 KiB/s 482 GiB 172 0 0
353 KiB/s 914 GiB 336 325 2
- - - - -
29.9 KiB/s 77.4 GiB 29 0 5
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
123 KiB/s 318 GiB 136 0 0
38.2 KiB/s 99.0 GiB 41 0 1
91.2 KiB/s 236 GiB 82 4 3
36.9 KiB/s 95.7 GiB 28 2 0
- - - - -
193 KiB/s 500 GiB 166 0 1
53.6 KiB/s 139 GiB 47 0 2
- - - - -
- - - - -
10.0 KiB/s 25.9 GiB 22 0 0
- - - - -
- - - - -
- - - - -
2.90 MiB/s 7.58 TiB 2,878 807 28
116 KiB/s 301 GiB 112 0 20
- - - - -
- - - - -
115 KiB/s 299 GiB 111 7 17
107 KiB/s 277 GiB 112 5 25
114 KiB/s 295 GiB 110 0 30
- - - - -
- - - - -
- - - - -
117 KiB/s 303 GiB 113 0 24
118 KiB/s 305 GiB 111 6 18
120 KiB/s 310 GiB 113 0 21
- - - - -
- - - - -
114 KiB/s 296 GiB 113 0 22
- - - - -
116 KiB/s 301 GiB 112 0 22
- - - - -
- - - - -
163 KiB/s 423 GiB 424 9 201
115 KiB/s 298 GiB 111 0 29
118 KiB/s 306 GiB 114 0 20
112 KiB/s 290 GiB 108 0 21
118 KiB/s 306 GiB 108 0 21
- - - - -
- - - - -
114 KiB/s 295 GiB 110 0 15
- - - - -
- - - - -
- - - - -
- - - - -
1.74 MiB/s 4.50 TiB 1,982 27 506
↓ Production PhEDEx Data Transfers Load Test ↓ ← Select → Hour Day Week Month
Link Status Linked Node Link Status Rate Bytes Files Expired Errors
Valid T0_CH_CERN_Export Valid
Valid T1_DE_KIT_Buffer Valid
Valid T1_DE_KIT_Disk Null
Valid T1_ES_PIC_Buffer Valid
Valid T1_FR_CCIN2P3_Buffer Valid
Valid T1_IT_CNAF_Buffer Valid
Valid T1_RU_JINR_Buffer Valid
Valid T1_RU_JINR_Disk Null
Valid T1_RU_JINR_MSS Valid
Valid T1_UK_RAL_Buffer Valid
Valid T1_US_FNAL_Buffer Valid
Valid T1_US_FNAL_Disk Valid
Valid T2_BE_IIHE Valid
Valid T2_CH_CERN Valid
Valid T2_DE_DESY Valid
Excluded T2_ES_CIEMAT Valid
Valid T2_FR_IPHC Valid
Valid T2_IT_Pisa Valid
Valid T2_UK_London_IC Valid
Valid T2_UK_SGrid_Bristol Valid
Valid T2_US_Caltech Valid
Valid T2_US_Florida Valid
Valid T2_US_MIT Valid
Valid T2_US_Nebraska Valid
Valid T2_US_Purdue Valid
Valid T2_US_UCSD Valid
Valid T2_US_Vanderbilt Valid
Valid T2_US_Wisconsin Valid
Agent Down T3_US_Colorado Agent Down
Agent Down T3_US_Rice Agent Down
Agent Down T3_US_TTU Agent Down
Totals ( 2018-06-18 02:55 UTC )


Data Holdings on the Brazos Cluster

PhEDEx Queued Production Data Volume (Last 48 Hours) PhEDEx Resident Production Data Volume (Last 48 Hours)
PhEDEx Queued Production Data Volume (Last 45 Days) PhEDEx Resident Production Data Volume (Last 45 Days)
PhEDEx Queued Production Data Volume (Last 52 Weeks) PhEDEx Resident Production Data Volume (Last 52 Weeks)
Production Data
↑ Click to Enlarge Images Select → Hourly Daily Weekly

Production Data Subscribed PhEDEx Data Resident PhEDEx Data ( 2018-06-18 02:55 UTC )
Group Name Items Files Bytes Items Files Bytes Percent
AnalysisOps 23 433 359 GiB 23 433 359 GiB 100.0 %
DataOps 2 228 566 GiB 2 228 566 GiB 100.0 %
FacOps 4 230 272 GiB 4 230 272 GiB 100.0 %
higgs 75 4,779 11.6 TiB 75 4,779 11.6 TiB 100.0 %
susy 13 219 258 GiB 13 219 258 GiB 100.0 %
Totals 117 5,889 13.0 TiB 117 5,889 13.0 TiB 100.0 %

HEPX Disk Store Usage    ( 2018-06-17 20:50 CST )
Directory Bytes Percent Date Modified
PhEDEx Monte Carlo
12.8 TiB 8.7 % 2018-05-28 02:55 UTC
PhEDEx RelVal
7.00 KiB 0.0 % 2018-01-09 19:51 UTC
PhEDEx CMS Data
183 GiB 0.1 % 2018-04-10 20:55 UTC
PhEDEx Load Tests
19.0 KiB 0.0 % 2018-06-18 00:58 UTC
User Output
125 TiB 84.6 % 2018-06-18 01:01 UTC
Miscellaneous
4.84 TiB 3.3 % 2018-06-18 01:47 UTC
Total 147 TiB 100.0 % 2018-06-18 01:47 UTC
↑ Click to Expand or Collapse Table

Total Disk Usage ( 2018-06-17 22:20 CST )
FData Partition Usage
220 TiB of 303 TiB


Job Status of the Brazos Cluster

Submitted Jobs Summary (Last 48 Hours) Running Jobs Summary (Last 48 Hours)
Job Runtime Days Expended per Calendar Day (Last 48 Hours) Processor Usage per Expended Runtime Efficiency (Last 48 Hours)
Job Termination Status Summary (Last 48 Hours) Job Termination Success Efficiency (Last 48 Hours)
Job Termination Failure Application Diagnostics (Last 48 Hours) Job Termination Failure Grid Diagnostics (Last 48 Hours)
Submitted Jobs Summary (Last 45 Days) Running Jobs Summary (Last 45 Days)
Job Runtime Days Expended per Calendar Day (Last 45 Days) Processor Usage per Expended Runtime Efficiency (Last 45 Days)
Job Termination Status Summary (Last 45 Days) Job Termination Success Efficiency (Last 45 Days)
Job Termination Failure Application Status (Last 45 Days) Job Termination Failure Grid Status (Last 45 Days)
Submitted Jobs Summary (Last 52 Weeks) Running Jobs Summary (Last 52 Weeks)
Job Runtime Days Expended per Calendar Day (Last 52 Weeks) Processor Usage per Expended Runtime Efficiency (Last 52 Weeks)
Job Termination Status Summary (Last 52 Weeks) Job Termination Success Efficiency (Last 52 Weeks)
Job Termination Failure Application Status (Last 52 Weeks) Job Termination Failure Grid Status (Last 52 Weeks)
Process Cycle Statistics Click to Enlarge Images ↓
Initiation & Runtime Termination Status ← Select → Hourly Daily Weekly

Month | Week | Day | Hour | Back <<
SLURM Queue Job Status ( 2018-06-17 22:20 CST )
User Name Queue Run Status Processors Memory CPU Run Hours Run Hours Limit Queue Hours
Jorge Morales
STKHD-4G 1 / 1 24 / 24 89.4 GiB 25.0 25.0 0.00
Totals STKHD-4G 1 / 1 24 / 24 89.4 GiB 25.0 25.0 0.00
↑ Click to Expand or Collapse Table Select Sorting By → User Queue Status

Month | Week | Day | Hour | Back <<
Condor Queue Job Status ( 2018-06-17 22:20 CST )
User Name Universe Run Status Processors Memory CPU Run Hours Run Hours Limit Queue Hours
Totals - - / - - / - - - - -
No Jobs Queued Select Sorting By → User Universe Status

CMS Recent Job Activity on the Local Cluster by User ( 2018-06-18 02:55 UTC )
Job Activity Per User (Last Day)
User Name Pending Running Terminated App Failed Grid Failed CPU Usage Run Hours
Jorge Morales 0 0 17 0.0 % 0.0 % 88.0 % 9.57
Totals 0 0 17 0.0 % 0.0 % 88.0 % 9.57
↑ Click User Row for Job Details Select → Hour Day Week


Service Availability of the Brazos Cluster

Service Availability Percentage
Day Week Month
100 % 100 % 93 %

Brazos Cluster Heartbeat Tests ( 2018-06-17 22:20 CST )
SSH Link FData Filesystem Mount FData Partition Usage "DU" Query Status "DU" Query Timer
Pass Pass 220 TiB of 303 TiB Pass 21.7 Seconds

Brazos Cluster Usage Load Statistics ( 2018-06-17 22:20 CST )
Occupied + Scheduled Nodes Occupied + Scheduled Processors Load Average per CPU Physical Memory Use Virtual Memory Use
0.6 % of 309 1.2 % of 4,696 0.2 % of 4,696 1.7 % of 12.3 TiB 0.0 % of 0.00 iB
Login02 Head Node Usage Load Statistics
Running Processes User & System CPU Use Net Load Average Physical Memory Use Virtual Memory Use
3 of 327 2.4 %   &   1.2 % 32.0 % ( 10 Users ) 25.0 % of 31.3 GiB 0.0 % of 5.00 GiB

Brazos Cluster Queue Utilization Statistics ( 2018-06-17 22:20 CST )
Queue Accessible Cores Active Cores (Running) (hepx/all users) Requested Cores (Queued) (hepx/all users) Other Core States
(Held, Waiting, Exiting) (hepx/all users)
STAKEHOLDER 1,632 0/0 0/0 0/0
STAKEHOLDER-4G 1,952 24/24 0/0 0/0
BACKGROUND 2,912 0/33 0/0 0/0
BACKGROUND-4G 1,952 0/0 0/0 0/0
INTERACTIVE 3,584 0/0 0/0 0/0
SERIAL 216 0/0 0/0 0/0
SERIAL-LONG 216 0/0 0/0 0/0
MPI-CORE8 1,288 0/0 0/0 0/0
MPI-CORE32 832 0/0 0/0 0/0
MPI-CORE32-4G 448 0/0 0/0 0/0

Service Availability Monitoring (SAM) Tests
Itemized SAM Test Results (Last 48 Hours)
SRM-GetPFNFromTFC (_cms_Role_production)
SRM-VOGet (_cms_Role_production)
SRM-VOPut (_cms_Role_production)
WN-analysis (_cms_Role_lcgadmin)
WN-basic (_cms_Role_lcgadmin)
WN-cvmfs (_cms_Role_lcgadmin)
WN-env (_cms_Role_lcgadmin)
WN-frontier (_cms_Role_lcgadmin)
WN-isolation (_cms_Role_pilot)
WN-remotestageout (_cms_Role_lcgadmin)
WN-mc (_cms_Role_lcgadmin)
WN-squid (_cms_Role_lcgadmin)
WN-xrootd-fallback (_cms_Role_lcgadmin)
CONDOR-JobSubmit (_cms_Role_lcgadmin)
CONDOR-JobSubmit (_cms_Role_pilot)
↑ SAM Metric
← 2018-06-16 03:00 UTC
2018-06-18 03:00 UTC →
SAM Test Site Quality Summary (Last 45 Days)
← 2018-05-05 00:00 UTC
2018-06-19 00:00 UTC →
← 2018-05-05 00:00 UTC
2018-06-19 00:00 UTC →
↑ Plot Cells Link to Details ↑

↓ Grid Client CRAB Analysis Test Suite (CATS) Completed Job Status ( 2018-06-18 02:55 UTC )
Output Host → Local: Brazos Cluster Remote: Fermi National Laboratory
Output Size → Small Large Small Large
SLURM Pass:5   Fail:0   Other:0
2018-06-17 17:07 UTC
Pass:5   Fail:0   Other:0
2018-06-17 18:07 UTC
Pass:5   Fail:0   Other:0
2018-06-17 17:32 UTC
Pass:2   Fail:0   Other:0
2018-06-17 18:33 UTC
CRAB3 Pass:18   Fail:0   Other:0
2018-06-15 16:37 UTC
↑ Test Results Link to Job Details ↑


Alert Summary (Details)

Data Transfer Quality Test Status: PASSED   ( 2018-06-18 02:55 UTC )
Load Test Transfer Quality Test Status: PASSED   ( 2018-06-18 02:55 UTC )
Load Tests Missing Test Status: PASSED   ( 2018-06-18 02:55 UTC )
PhEDEx Transfer Test Status: PASSED   ( 2018-06-18 02:55 UTC )
PhEDEx Corruption Test Status: PASSED   ( 2018-06-18 02:55 UTC )
Disk Usage Test Status: PASSED   ( 2018-06-18 01:50 UTC )
Disk Quota Test Status: PASSED   ( 2018-06-18 03:15 UTC )
Disk Permissions Test Status: PASSED   ( 2018-06-18 01:50 UTC )
PhEDEx Mismatch Test Status: PASSED   ( 2018-06-18 03:15 UTC )
Torque Stalled Test Status: PASSED   ( 2018-06-18 03:20 UTC )
Condor Stalled Test Status: PASSED   ( 2018-06-18 03:20 UTC )
SAM Failed Test Status: PASSED   ( 2018-06-18 02:55 UTC )
SAM Missing Test Status: PASSED   ( 2018-06-18 02:55 UTC )
CATS Failed Test Status: PASSED   ( 2018-06-18 02:55 UTC )
CATS Missing Test Status: PASSED   ( 2018-06-18 02:55 UTC )
Cluster Heartbeat Test Status: PASSED   ( 2018-06-18 03:20 UTC )



I - Data Transfers  |  II - Data Holdings  |  III - Job Status  |  IV - Site Availability  |  V - Alerts  |  View All