Brazos Data Analysis Site Monitoring Utility

◊ Mitchell Institute Computing on the Texas A&M Brazos Cluster ◊

I - Data Transfers  |  II - Data Holdings  |  III - Job Status  |  IV - Site Availability  |  V - Alerts  |  View All

Updated:   Friday, 2019-07-19 13:05 UTC         ( Friday, 2019-07-19 08:05 CDT )

Warning: You must enable JavaScript for optimal site functionality!


Data Transfers to the Brazos Cluster

PhEDEx Production Data Transfer Quality (Last 48 Hours) PhEDEx Load Test Transfer Quality (Last 48 Hours)
PhEDEx Production Data Transfer Rate (Last 48 Hours) PhEDEx Load Test Transfer Rate (Last 48 Hours)
PhEDEx Production Data Transfer Quality (Last 45 Days) PhEDEx Load Test Transfer Quality (Last 45 Days)
PhEDEx Production Data Transfer Rate (Last 45 Days) PhEDEx Load Test Transfer Rate (Last 45 Days)
PhEDEx Production Data Transfer Quality (Last 52 Weeks) PhEDEx Load Test Transfer Quality (Last 52 Weeks)
PhEDEx Production Data Transfer Rate (Last 52 Weeks) PhEDEx Load Test Transfer Rate (Last 52 Weeks)
Production Data Load Tests
↑ Click to Enlarge Images Select → Hourly Daily Weekly

- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
124 KiB/s 10.7 GiB 4 0 0
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 2 10
0.00 iB/s 0.00 iB 0 3 9
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
124 KiB/s 10.7 GiB 4 0 3
124 KiB/s 10.7 GiB 4 0 2
124 KiB/s 10.7 GiB 4 0 0
64.2 KiB/s 5.50 GiB 2 0 6
- - - - -
- - - - -
93.2 KiB/s 8.10 GiB 3 0 4
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
655 KiB/s 56.4 GiB 21 5 34
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
129 KiB/s 77.8 GiB 29 0 1
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 21 65
0.00 iB/s 0.00 iB 0 28 58
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
124 KiB/s 75.2 GiB 28 0 13
129 KiB/s 77.8 GiB 29 0 17
129 KiB/s 77.8 GiB 29 0 2
99.3 KiB/s 60.1 GiB 21 0 46
- - - - -
- - - - -
111 KiB/s 67.1 GiB 25 0 23
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
721 KiB/s 436 GiB 161 49 225
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
123 KiB/s 319 GiB 119 0 2
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
0.00 iB/s 0.00 iB 0 97 252
0.00 iB/s 0.00 iB 0 107 240
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
95.3 KiB/s 247 GiB 92 0 115
119 KiB/s 309 GiB 115 0 64
123 KiB/s 319 GiB 119 0 5
96.6 KiB/s 251 GiB 88 2 172
- - - - -
- - - - -
109 KiB/s 282 GiB 105 0 101
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
666 KiB/s 1.69 TiB 638 206 951
↓ Production PhEDEx Data Transfers Load Test ↓ ← Select → Hour Day Week Month
Link Status Linked Node Link Status Rate Bytes Files Expired Errors
Valid T0_CH_CERN_Export Valid
Valid T1_DE_KIT_Buffer Agent Down
Valid T1_DE_KIT_Disk Null
Valid T1_ES_PIC_Buffer Agent Down
Valid T1_FR_CCIN2P3_Buffer Agent Down
Valid T1_IT_CNAF_Buffer Agent Down
Valid T1_RU_JINR_Buffer Agent Down
Valid T1_RU_JINR_Disk Null
Valid T1_RU_JINR_MSS Agent Down
Valid T1_UK_RAL_Buffer Agent Down
Valid T1_US_FNAL_Buffer Valid
Valid T1_US_FNAL_Disk Valid
Valid T2_BE_IIHE Valid
Valid T2_CH_CERN Valid
Valid T2_DE_DESY Valid
Excluded T2_ES_CIEMAT Valid
Valid T2_FR_IPHC Agent Down
Valid T2_IT_Pisa Valid
Valid T2_UK_London_IC Valid
Valid T2_UK_SGrid_Bristol Valid
Valid T2_US_Caltech Valid
Valid T2_US_Florida Valid
Valid T2_US_MIT Valid
Valid T2_US_Nebraska Valid
Valid T2_US_Purdue Valid
Valid T2_US_UCSD Valid
Valid T2_US_Vanderbilt Valid
Valid T2_US_Wisconsin Valid
Agent Down T3_US_Colorado Agent Down
Excluded T3_US_FNALLPC Excluded
Agent Down T3_US_Rice Agent Down
Agent Down T3_US_TTU Agent Down
Totals ( 2019-07-19 12:55 UTC )


Data Holdings on the Brazos Cluster

PhEDEx Queued Production Data Volume (Last 48 Hours) PhEDEx Resident Production Data Volume (Last 48 Hours)
PhEDEx Queued Production Data Volume (Last 45 Days) PhEDEx Resident Production Data Volume (Last 45 Days)
PhEDEx Queued Production Data Volume (Last 52 Weeks) PhEDEx Resident Production Data Volume (Last 52 Weeks)
Production Data
↑ Click to Enlarge Images Select → Hourly Daily Weekly

Production Data Subscribed PhEDEx Data Resident PhEDEx Data ( 2019-07-19 12:55 UTC )
Group Name Items Files Bytes Items Files Bytes Percent
AnalysisOps 23 433 359 GiB 23 433 359 GiB 100.0 %
DataOps 2 228 566 GiB 2 228 566 GiB 100.0 %
FacOps 4 426 1.68 TiB 4 426 1.68 TiB 100.0 %
higgs 67 1,043 1.33 TiB 67 1,043 1.33 TiB 100.0 %
upgrade 4 251 6.95 TiB 4 251 6.95 TiB 100.0 %
Totals 100 2,381 10.9 TiB 100 2,381 10.9 TiB 100.0 %

HEPX Disk Store Usage    ( 2019-07-19 05:50 CST )
Directory Bytes Percent Date Modified
PhEDEx Monte Carlo
8.92 TiB 4.8 % 2019-04-09 02:27 UTC
PhEDEx RelVal
1.41 TiB 0.8 % 2018-12-10 20:54 UTC
PhEDEx CMS Data
549 GiB 0.3 % 2019-04-09 02:27 UTC
PhEDEx Load Tests
75.5 GiB 0.0 % 2018-10-18 20:44 UTC
User Output
164 TiB 88.8 % 2019-07-19 10:01 UTC
Miscellaneous
4.88 TiB 2.7 % 2019-07-19 10:49 UTC
Total 184 TiB 100.0 % 2019-07-19 10:49 UTC
↑ Click to Expand or Collapse Table

Total Disk Usage ( 2019-07-19 08:05 CST )
FData Partition Usage
256 TiB of 303 TiB


Job Status of the Brazos Cluster

Submitted Jobs Summary (Last 48 Hours) Running Jobs Summary (Last 48 Hours)
Job Runtime Days Expended per Calendar Day (Last 48 Hours) Processor Usage per Expended Runtime Efficiency (Last 48 Hours)
Job Termination Status Summary (Last 48 Hours) Job Termination Success Efficiency (Last 48 Hours)
Job Termination Failure Application Diagnostics (Last 48 Hours) Job Termination Failure Grid Diagnostics (Last 48 Hours)
Submitted Jobs Summary (Last 45 Days) Running Jobs Summary (Last 45 Days)
Job Runtime Days Expended per Calendar Day (Last 45 Days) Processor Usage per Expended Runtime Efficiency (Last 45 Days)
Job Termination Status Summary (Last 45 Days) Job Termination Success Efficiency (Last 45 Days)
Job Termination Failure Application Status (Last 45 Days) Job Termination Failure Grid Status (Last 45 Days)
Submitted Jobs Summary (Last 52 Weeks) Running Jobs Summary (Last 52 Weeks)
Job Runtime Days Expended per Calendar Day (Last 52 Weeks) Processor Usage per Expended Runtime Efficiency (Last 52 Weeks)
Job Termination Status Summary (Last 52 Weeks) Job Termination Success Efficiency (Last 52 Weeks)
Job Termination Failure Application Status (Last 52 Weeks) Job Termination Failure Grid Status (Last 52 Weeks)
Process Cycle Statistics Click to Enlarge Images ↓
Initiation & Runtime Termination Status ← Select → Hourly Daily Weekly

Month | Week | Day | Hour | Back <<
SLURM Queue Job Status ( 2019-07-19 08:10 CST )
User Name Queue Run Status Processors Memory CPU Run Hours Run Hours Limit Queue Hours
Steven Clark
STKHD-4G 5 / 5 40 / 40 18.6 GiB 55.6 55.0 0.00
Jorge Morales
STKHD 1 / 1 50 / 50 93.1 GiB 60.4 60.0 0.00
CMS: Andrea Sciaba
STKHD 1 / 1 1 / 1 1.77 GiB 0.11 0.18 0.00
Adrian Thompson
MPI-CORE8 1 / 1 32 / 32 2.79 GiB 9.72 9.00 0.00
Totals MIXED 8 / 8 123 / 123 116 GiB 126 124 0.00
↑ Click to Expand or Collapse Table Select Sorting By → User Queue Status

Month | Week | Day | Hour | Back <<
Condor Queue Job Status ( 2019-07-19 08:10 CST )
User Name Universe Run Status Processors Memory CPU Run Hours Run Hours Limit Queue Hours
Totals - - / - - / - - - - -
No Jobs Queued Select Sorting By → User Universe Status

CMS Recent Job Activity on the Local Cluster by User ( 2019-07-19 12:55 UTC )
Job Activity Per User (Last Day)
User Name Pending Running Terminated App Failed Grid Failed CPU Usage Run Hours
Mehdi Rahmani 0 0 2 0.0 % 0.0 % 54.8 % 0.02
Totals 0 0 2 0.0 % 0.0 % 54.8 % 0.02
↑ Click User Row for Job Details Select → Hour Day Week


Service Availability of the Brazos Cluster

Service Availability Percentage
Day Week Month
100 % 100 % 99 %

Brazos Cluster Heartbeat Tests ( 2019-07-19 08:05 CST )
SSH Link FData Filesystem Mount FData Partition Usage "DU" Query Status "DU" Query Timer
Pass Pass 256 TiB of 303 TiB Pass 25.0 Seconds

Brazos Cluster Usage Load Statistics ( 2019-07-19 08:10 CST )
Occupied + Scheduled Nodes Occupied + Scheduled Processors Load Average per CPU Physical Memory Use Virtual Memory Use
31.5 % of 302 44.4 % of 4,568 42.3 % of 4,568 45.3 % of 12.0 TiB 0.0 % of 0.00 iB
Login02 Head Node Usage Load Statistics
Running Processes User & System CPU Use Net Load Average Physical Memory Use Virtual Memory Use
3 of 275 1.1 %   &   0.6 % 53.0 % ( 4 Users ) 7.4 % of 31.4 GiB 0.0 % of 5.00 GiB

Brazos Cluster Queue Utilization Statistics ( 2019-07-19 08:10 CST )
Queue Accessible Cores Active Cores (Running) (hepx/all users) Requested Cores (Queued) (hepx/all users) Other Core States
(Held, Waiting, Exiting) (hepx/all users)
STAKEHOLDER 1,632 51/51 0/0 0/0
STAKEHOLDER-4G 1,904 40/68 0/0 0/0
BACKGROUND 2,912 0/320 0/0 0/80
BACKGROUND-4G 1,904 0/38 0/0 0/32
INTERACTIVE 4,816 0/0 0/0 0/0
SERIAL 904 0/20 0/0 0/0
SERIAL-LONG 904 0/0 0/0 0/0
MPI-CORE8 1,240 32/304 0/0 0/0
MPI-CORE32 832 0/128 0/0 0/0
MPI-CORE32-4G 448 0/0 0/0 0/0

Service Availability Monitoring (SAM) Tests
Itemized SAM Test Results (Last 48 Hours)
SRM-GetPFNFromTFC (_cms_Role_production)
SRM-VOGet (_cms_Role_production)
SRM-VOPut (_cms_Role_production)
WN-analysis (_cms_Role_lcgadmin)
WN-basic (_cms_Role_lcgadmin)
WN-cvmfs (_cms_Role_lcgadmin)
WN-env (_cms_Role_lcgadmin)
WN-frontier (_cms_Role_lcgadmin)
WN-isolation (_cms_Role_lcgadmin)
WN-remotestageout (_cms_Role_lcgadmin)
WN-mc (_cms_Role_lcgadmin)
WN-squid (_cms_Role_lcgadmin)
WN-xrootd-fallback (_cms_Role_lcgadmin)
CONDOR-JobSubmit (_cms_Role_lcgadmin)
↑ SAM Metric
← 2019-07-17 13:00 UTC
2019-07-19 13:00 UTC →
SAM Test Site Quality Summary (Last 45 Days)
← 2019-06-05 00:00 UTC
2019-07-20 00:00 UTC →
← 2019-06-05 00:00 UTC
2019-07-20 00:00 UTC →
↑ Plot Cells Link to Details ↑

↓ Grid Client CRAB Analysis Test Suite (CATS) Completed Job Status ( 2019-07-19 12:55 UTC )
Output Host → Local: Brazos Cluster Remote: Fermi National Laboratory
Output Size → Small Large Small Large
CRAB3 No Test Results
Found for Prior Week
↑ Test Results Link to Job Details ↑


Alert Summary (Details)

Data Transfer Quality Test Status: PASSED   ( 2019-07-19 12:55 UTC )
Load Test Transfer Quality Test Status: PASSED   ( 2019-07-19 12:55 UTC )
Load Tests Missing Test Status: PASSED   ( 2019-07-19 12:55 UTC )
PhEDEx Transfer Test Status: PASSED   ( 2019-07-19 12:55 UTC )
PhEDEx Corruption Test Status: PASSED   ( 2019-07-19 12:55 UTC )
Disk Usage Test Status: PASSED   ( 2019-07-19 10:50 UTC )
Disk Quota Test Status: PASSED   ( 2019-07-19 12:55 UTC )
Disk Permissions Test Status: PASSED   ( 2019-07-19 10:50 UTC )
PhEDEx Mismatch Test Status: PASSED   ( 2019-07-19 12:55 UTC )
Torque Stalled Test Status: PASSED   ( 2019-07-19 13:05 UTC )
Condor Stalled Test Status: PASSED   ( 2019-07-19 13:05 UTC )
SAM Failed Test Status: PASSED   ( 2019-07-19 12:55 UTC )
SAM Missing Test Status: PASSED   ( 2019-07-19 12:55 UTC )
CATS Failed Alert is Disabled
CATS Missing Alert is Disabled
Cluster Heartbeat Test Status: PASSED   ( 2019-07-19 13:05 UTC )



I - Data Transfers  |  II - Data Holdings  |  III - Job Status  |  IV - Site Availability  |  V - Alerts  |  View All