SYSINFO: Cluster System Information 9-DEC-1989, V3.4, VMS V5.0-V5.2 This shows critical information about the cluster system: at the moment this includes the number of interactive users. The display format is either in headed columns for VAXclusters or in numbers & words for a single node. It is designed to be fast and small with little resource usage. The information. =============== Maximum interactive jobs Current interactive jobs Busyness The "busy" measure is defined thus: tc = total CPU time = sample period (assumes no CPU "leakage") (per CPU) ti = interrupt-mode CPU time during the sample period (per CPU) Ta = Sum(tc) across the N CPUs on the local node = total CPU time available to processes & interrupts Tp = Sum(tc-ti) across the N CPUs on the local node = CPU time available to processes Uw = number of COM(O) processes Ur = number of CUR processes (not necessarily = N) Uc = Ur + Uw = number of users competing for CPU time Ua = Uc x Tp = (Ur + Uw) x Sum(tc) ---- ---------- Ta Sum(tc-ti) = weighted number of users competing for CPU time after allowing for interrupt mode time. Ub = Ua - Ur = weighted number of processes waiting for CPU B = Ub / N = "busyness" The interrupt mode usage allowance ensures that if it's high (e.g. due to MSCP disk serving or cluster boot serving), the busyness is higher than it would otherwise be. The information is sampled every 30 seconds. These numerical values can be changed by a site, as can the "busy" algorithm and the display format. The information can be made available to site-supplied programs via the callable interface. The display program. =================== SYSINFO.EXE This can be called as an ordinary image or as a CLI from log-in. To be a CLI it must be in "SYS$SYSTEM" (which can be a search list). By default LOGINOUT also needs TABLES, where is the name for this. E.g. SPAWN/CLI=SYSINFO requires SYSINFO in SYS$SYSTEM and SYSINFOTABLES in SYS$SHARE SPAWN/CLI=SYSINFO/TAB=DCLTABLES requires SYSINFO in SYS$SYSTEM and DCLTABLES in SYS$SHARE It uses the lock manager and system locks to get info about nodes: to be able to do this when run as an image it needs CMEXEC privilege. Executive mode is required so that system locks can be manipulated. SYSLCK privilege won't do as it won't exist in some environments in which this is run (e.g. SPAWNed user processes) whereas CMEXEC does exist if entered from executive mode log-in (LOGINOUT is installed with CMEXEC). If you don't have a cluster and don't want to run a server SYSINFO will still work. Also note that if it is not run at a high priority it may not see many compute-wait processes. There will be no smoothing by the moving average. On a VWS workstation virtual terminal window created for the log-in of SYSINFO as a CLI, it should wait for any key to be pressed before exiting (causing deletion of the window). It will wait up to 60 seconds. It should also do this for LAT sessions because they may be in MS-DOS windows (e.g. VAXMATEs). I can't test this feature! When run as a CLI it responds to CTRL/Y and hangup notification by exiting immediately. If the node on which it is running is the "quietest" node as defined below, it suppesses modem hangup to give the user a chance to log-in without losing the connection. Usage = (busy + current-users) / maximum-users This algorithm is replaceable by a site-specific one. The display program needs 1 TQELM, 1 ENQLM and 3 ASTLM units. Sample output. ============= 12-OCT-1989 18:33 Max Users Busy VAXB 40 23 0.0 VAXA 40 19 2.1 VAXC 8 5 1.6 The callable interface. ====================== SYSINFO.OBJ, DISPLAY.OBJ, SCAN.OBJ, GATHER.OBJ, MSGDEF.MSG, SYSINFO.OPT 1) SCAN.OBJ: scanning the cluster, calling routines for each node found. status = SYSINFO_SCAN (param) Param is passed uninterpreted to the routines it calls which can be replaced by site-specific ones. Calls: SYSINFO_USAGE (replaceable) SYSINFO_REMOTE_NODE, SYSINFO_LOCAL_NODE (replaceable) SYSINFO_GATHER (in GATHER.OBJ, not replaceable) 2) USAGE.OBJ: "quiet" metric, for determining if the local node is quietest. usage.f = SYSINFO_USAGE (param,info) It returns an F-format floating-point value. Info is the block of info about a node (described below), passed by reference. Param is the value passed to SYSINFO_SCAN. 3) DISPLAY.OBJ: default display routines. status = SYSINFO_DISPLAY() This is the callable form of the SYSINFO display program. It performs actions such as Ctrl/Y handling if run as a CLI and uses $QIOs to the terminal. status = SYSINFO_REMOTE_NODE (param, node, info) This is called by SYSINFO_SCAN for each remote cluster node. Param is the value passed to SYSINFO_SCAN. Node is a descriptor for the node name (DTYPE=Z, CLASS=Z). Info is the block of info about a node (described below), passed by reference. status = SYSINFO_LOCAL_NODE (param, node, info, quiet-flag) This is called by SYSINFO_SCAN for the local node. It is always called after all remote cluster nodes. Param is the value passed to SYSINFO_SCAN. Node is a descriptor for the node name (DTYPE=Z, CLASS=Z). Info is the block of info about a node (described below), passed by reference. Quiet-flag is zero (false) or one (true) according to whether the local node is the quietest node (has lowest value returned by SYSINFO_USAGE); passed by reference. The info block is mapped by infodef in SYSINFOLIB: info_b_format = 0 if info is invalid (e.g. no server, timed-out) (current format code = 2) info_w_busy count of busy processes info_w_jobs count of interactive jobs info_w_limit limit of interactive jobs 4) SYSINFO.OBJ: image & CLI entry point. SYSINFO_MAIN () This just calls SYSINFO_DISPLAY and then exits. 5) MSGDEF.MSG: messages and $FAO strings. The facility code can be changed to suit a site's requirements. Messages with no text are suppressed (only the text part is ever used). The number and order of $FAO parameters must not be changed. Messages used if running as a CLI: Wait_prompt: used for workstation windows. Quietest, Quieter: used if the local node is the quietest node. Return, Break: used if there is no hangup on logout (either because local node is quietest or not a modem line); Break is used if the line is /secure_server. 6) GATHER.OBJ: the information gatherer. Used by the server normally but if there is no server on the local node SYSINFO_SCAN calls this directly. This needs symbols in SYS$SYSTEM:SYS.STB. 7) SYSINFO.OPT: linking the CLI/image. LINK/NOTRACEBACK/NOUSERLIB/NOSYSSHR SYSINFO/OPT STARLET.OLB is needed for LIB$GET_EF/FREE_EF when linking a CLI. The shareable version in LIBRTL could be used but I think it's better not to have any shareable library references in a CLI (thought at VMS V4.5 it seems to activate OK) - it might stop working at a later release of VMS. That's why /NOSYSSHR is used. Note the necessity of getting the CLI entry point to be the first byte in the image. The server. ========== SERVER.OBJ, GATHER.OBJ & MAIN.OBJ These modules use the lock manager to provide key items of info about the local cluster node extremely quickly with minimal overhead. The server uses the LIB$INITIALISE mechanism to start itself and continues for the life of the image entirely at executive mode. It just needs to be linked into an image which is to run permanently on each VAXcluster node - no code modifications are required. It needs CMEXEC privilege, 1 TQELM, 1 ASTLM and 1 ENQLM unit. It should be run at a priority of at least 6 and preferably 9 or 10 so that it is able to count interactive compute-wait processes correctly (these are normally between priorities 4 and 9). If you have a permanent system detached process on all nodes already, you can just link the module into the image, otherwise use MAIN.OBJ. $ LINK/NOTRACEBACK main,SERVER,GATHER,SYS$SYSTEM:SYS.STB/SELECT $ RUN /DETACHED - /INPUT=NL:/OUTPUT=NL: - /PRIV=(NOALL,CMEXEC) - /PROCESS="Sysinfo_Server" - /PRIORITY=9 - main Second and subsequent attempts to start a server on a VAXcluster node signal the SYSINFO_DUPLSERVER status to show that a server is already active. The username. ============ To set up a username using AUTHORIZE: ADD /CLI=/CLITAB=DCLTABLES/FLAGS=CAPTIVE/NOPASS An example (from AUTHORIZE): Username: INFO Owner: System Information Account: UIC: CLI: SYSINFO Tables: DCLTABLES Default: LGICMD: Login Flags: Lockpwd Restricted Diswelcome Dismail Disreport Disreconnect Captive Primary days: Mon Tue Wed Thu Fri Secondary days: Sat Sun Primary 000000000011111111112222 Secondary 000000000011111111112222 Day Hours 012345678901234567890123 Day Hours 012345678901234567890123 Network: ----- No access ------ ----- No access ------ Batch: ----- No access ------ ----- No access ------ Local: ##### Full access ###### ##### Full access ###### Dialup: ##### Full access ###### ##### Full access ###### Remote: ##### Full access ###### ##### Full access ###### Expiration: (none) Pwdminimum: 6 Login Fails: 0 Pwdlifetime: (none) Pwdchange: (none) Last Login: 25-FEB-1987 14:53 (interactive), (none) (non-interactive) Maxjobs: 4 Fillm: 50 Bytlm: 8192 Maxacctjobs: 0 Shrfillm: 0 Pbytlm: 0 Maxdetach: 1 BIOlm: 18 JTquota: 1024 Prclm: 0 DIOlm: 18 WSdef: 300 Prio: 5 ASTlm: 24 WSquo: 400 Queprio: 0 TQElm: 10 WSextent: 1024 CPU: 0 00:00:05 Enqlm: 500 Pgflquo: 2048 Authorized Privileges: Default Privileges: OPER It has OPER privilege so it can always log-in. The priority is set to 5 to ensure rapid log-in. WSQUO is set to be sufficient for the image. CPU is limited to guard against possible looping problems. Other quotas are left at the default values. There is no password. Site-supplied server setup routine. ================================== This can be linked-in with the server to adjust various parameters when the server starts up. Specification of the routine (FORTRAN-like, arguments read/write passed by reference): INTEGER*4 FUNCTION SYSINFO_USER_SETUP (timer,history,prio) INTEGER*4 timer,history,prio Called at user-mode. Each parameter is set to its default value on entry and may be modified by the routine. Attempts to set values outside the permitted range are ignored (the value is reset to the nearest legal value). The function should return a normal VMS status. An error (an even status) will cause the server to return to its caller. Timer: unsigned number of seconds between timer ASTs to sample the busy statistic. Default = 30, minimum = 5. History: size of the historical data that is used for the moving average. Default = 15, maximum = 100. Prio: the minimum priority for compute-wait processes that count towards the busy statistic. Default = 4, minimum = 1. Site-supplied busy algorithm. ============================ This can be linked-in with the server to obtain the busy statistic. Note that the moving average is calculated independantly of this. Specification of the routine (FORTRAN-like, arguments passed by reference): INTEGER*4 FUNCTION SYSINFO_USER_BUSY (context,busy) INTEGER*4 context,busy Called at executive mode as the server works at that mode. Note that it therefore requires no privilege to use $CMKRNL. Context is available for use by the routine for storing 4 bytes of data between calls. It is initially zero. Busy is the result. It is zero on entry. The function should return a normal VMS status (an even value indicates that no value of busy was obtained). A status code for this is provided in MSGDEF.MSG. Known Restrictions. ================== The display routine uses direct terminal i/o rather than RMS (for speed and so it can trap ctrl/y, etc.). This means it won't work in batch jobs, network servers, etc. There is no clean, straightforward way to run-down the executive mode resources used by the server at image exit (as opposed to process exit). I've tried a number of methods but they all have problems. Until I find a suitable technique, the server will refuse to work at executive mode if there is a supervisor mode exit handler (it will work at user mode instead, which requires SYSLCK privilege). [VMS V5.2 note: a new Executive-mode rundown mechanism has been added which may solve this problem.] In unusual circumstances such as VAXcluster state transitions, the locking operations performed by SYSINFO may take an appreciable amount of time instead of the normal negligible time (assuming the servers are at high enough priority). To guard against such peculiarities, SYSINFO times-out locking attempts after four seconds. Ian Kitching, System Manager, Computer Services, Anglia Higher Education College, East Road, Cambridge CB1 1PT