.!LAVCMGM.RNO .!Created 02/09/87 Martin Brunecky .PAGE SIZE 58,70 .LM 5 .NO FILL .NO JUSTIFY .FLAGS SUBSTITUTE .FLAGS BOLD .LM 40 # .BL 4 $$DATE .BL 7 .LM 5 .CENTER .BL 2 .CENTER ^*LAVC INSTALLATION _& MANAGEMENT\* .BL 8 .FLAGS ACCEPT ! .! .! Now comes regular RUNOFF setup for the BODY of the DOCCUMENT .! Please, AVOID non-default RUNOFF setting wherever possible .! .NO FLAGS SUBSTITUTE .FLAGS UNDERLINE .FLAGS ACCEPT _ .FLAGS BOLD .LM 5 .ENABLE BAR .FILL .JUSTIFY .LAYOUT 2,2 .HEADERS ON .STYLE HEADERS 7,0,0 .AUTOPARAGRAPH .PAGING .DISPLAY NUMBER RL .FIRST TITLE .TITLE LAVC INSTALLATION _& MANAGEMENT .NUMBER PAGE 1 .PAGE .CENTER TABLE OF CONTENTS .BL .REQUIRE "LAVCMGM.RNT" .DISPLAY NUMBER D .NUMBER PAGE 1 .PAGE .ST ^*INTRODUCTION\* .HL 1 ^*INTRODUCTION\* This document has been extracted from our LAVC management specifications. It defines basic implementation strategies and cluster management approach. It also introduces procedures designed to manage and maintain cluster environment. The document is NOT fully up to date, since LAVC management requires continuous development effort. .HL 1 ^*OBJECTIVES\* .LIST 1 "*" .LE;Highlight basic LAVC implementation strategies. .LE;Define LAVC management tasks and responsible personell .LE;Define LAVC meber node configuration techniques. .LE;Define the structure and contents of the LAVC configuration database .LE;Standardize LAVC node startup and shutdown procedures .LE;Provide for propagation of VMS and other software upgrades to LAVC members .LE;Standardize the approach for the configuration of test LAVC subsets .ELS .HL 1 ^*REFERENCE DOCUMENTS\* .LIST 1 .LE;Digital AA-JP20A-TE VMS Local Area VAXcluster Manual .LE;Digital AA-Y513A-TE Guide to VAXclusters .LE;Digital AI-Y514B-TE Guide to VAX/VMS Software Installation .ELS .PG .ST ^*LAVC MANAGEMENT\* .HL 1 ^*LAVC MANAGEMENT\* .HL 2 ^*VMS SYSTEM MANAGEMENT\* The VMS system management has two main responsibilities: .LIST 1 "-" .LE;Make decisions that relate to optimizing overall performance and operation efficiency of the system .LE;Perform tasks that relate to day-to-day overall management and control of the system: .ELS Basic responsibilities above may be broken into: .LIST 0 "-" .BL .LE;Installing and upgrading the system .LE;Making system specific modifications .LE;Controlling system operation .LE;Maintaining system security .LE;Optimizing system performance .LE;Future requirements planning .ELS .HL 2 ^*LAVC MANAGEMENT\* In standard DEC LAVC configuration using a single BOOT node, muti-node management is simplified to a single (boot) node management. There is a single copy of VMS used by all the cluster members, single set of authorization files (SYSUAF.DAT, NETUAF.DAT, RIGHTSLIST.DAT) and a single queue system (JBCSYSQUE.DAT). Since not all the LAVC satellites have the same HW configuration, there is still some need for indiviual node-specific set-up, tuning and node access management. .BL Our LAVC is configured using multiple BOOT nodes, since each node has to be capable of stand-alone operation (except for diskless nodes). Thus, there will be multiple copies of VMS and some layered VMS products. Each boot node must have a current copy of system management files (SYSUAF.DAT etc.) for stand-alone operation. However, when booted as a cluster member, each such node uses common, cluster-wide LAVC database. .BL Stand-alone operation is typically neded for product requirements and installation procedures testing. considered to be a sub-set of LAVC operation. Therefore it uses the same configuration, set-up and management files as in LAVC, using local copies of LAVC management files. For security reasons, any changes to system management files in stand-alone mode are NOT applied backwards to LAVC. .BR .HL 2 ^*LAVC AND GROUP MANAGEMENT\* Due to our specific environment, management responsibilities are split among LAVC and GROUP MANAGERS. The major task of the GROUP MANAGER is to manage his development group's computing environment in LAVC, especially on nodes assigned to group. In stand alone operation, GROUP MANAGER has a full management control of the node. .BL The following list defines basic ranges of reponsibilities: .BL ^*LAVC MANAGER\* .LIST 0 .LE;Overall LAVC configuration planning and control .LE;LAVC performance monitoring and tuning .LE;Hardware maintenance and capacity planning .LE;MASTER and SPARE nodes software management .LE;LAVC data base management: .LIST 0 "-" .LE;Authorization file and rights database. Only LAVC manager may add/create/modify LAVC user accounts. .LE;Decnet proxy logins .LE;Cluster-wide queue system .LE;Cluster-wide logical names .LE;Cluster-wide startup and login procedures .ELS .LE;LAVC wide resources management (disk and account quotas) .LE;LAVC accounting .LE;Coordinate with GROUP managers .LE;Manage cluster-wide installed non-DEC products .LE;Maintain LAVC management procedures .LE;Implement and maintain LAVC user training .LE;Maintain LAVC site and master document sets .ELS .BL ^*GROUP MANAGER\* .LIST 0 .LE;Define software available on group's workstations .LE;Plan and control releases of software installed on group's workstations. .LE;Plan and authorize workstation use on group basis (which group(s) have access) .LE;Plan workstation resources utilization .LE;Inform LAVC manager of group requirements in LAVC .LE;Manage and maintain group's workstation startup datafile .LE;Manage and maintain GROUP startup and login files .LE;Plan and assist in group workstations environment tuning. .LE;Manage stand-alone workstation operation (used for software installation testing produc requirement evaluation). In stand-alone operation, GROUP manger has full control over the workstation. .ELS .TP 10 Group manager accounts are identified by an account name in form gggMGR, where ggg denotes particular group. In STAND ALONE mode, GROUP manager account has the same priviliges and quotas as the SYSTEM account. In LAVC, group manager's privileges are limited to: .LIT GRPNAM may insert in group logical name table DETACH may create detached processes LOG_IO may do logical i/o GROUP may affect other processes in same group PRMCEB may create permanent common event clusters PRMMBX may create permanent mailbox TMPMBX may create temporary mailbox OPER operator privilege EXQUOTA may exceed quota NETMBX may create network device VOLPRO may override volume protection PHY_IO may do physical i/o PRMGBL may create permanent global sections GRPPRV group access via system protection .EL .ST ^*LAVC IMPLEMENTATION\* .PG .HL 1 ^*LAVC IMPLEMENTATION\* Our specific LAVC installation is based on individually booting nodes, one of which is declared to be cluster "MASTER" and one "SPARE" (failover). Each node serves it's disks for cluster-wide data access, each node provides batch queue(s) and device queue(s) for connected devices. .BL The MASTER node's system disk holds a complete VMS systems, including all the products available cluster-wide, and all of the cluster configuration data (LAVC database). .BR The SPARE node's system disk is a backup copy of a MASTER; it's data are used in case the MASTER node (or disk) fails. The LAVC management provides for semi-automatic failover of LAVC database between MASTER and SPARE disks. It does NOT provide failover capability for diskless nodes. .BL .BB When technically possible, dual porting of the LAVC MASTER (and SPARE) disks will be implemented. This will allow for semi-automatic DISK failover, including failover for diskless nodes. .EB .BL Any disk-based node holds (at least) the "required" VMS operating system. This allows for node stand-alone operation, and reduces disk-server load on the LAVC "boot" node. Only missing VMS components are accessed via LAVC software from the MASTER or SPARE node's disk (using a search list in sys_$sysroot). Thus, for example, help libraries, code examples and infrequently accessed libraries need not be duplicated to be available. In contrary, system booting from it's own disk may survive MASTER node failure and continue operation using SPARE node's disk. .BL .HL 2 ^*Implementation Strategies\* To satisfy changing requirements of an R_&D environment, the LAVC configuration MUST provide for flexible re-configuration, including: .LIST 1 "o" .LE;Capability to leave LAVC for any (disk-based) LAVC member .LE;Easy re-configuration of any node to serve as "boot" node for several (diskless) satellites .LE;Co-existence of several VMS releases in LAVC .LE;Co-existence of several VMS layered products releases in LAVC .LE;Simple way to propagate any software update to selected LAVC nodes .LE;Centralized LAVC management and organization. .ELS ^*To satisfy the goals above, the entire LAVC configuration must be implemented using files completely separated form the standard VMS system directory tree.\* Directory tree containing such data is referred to as LAVC database throughout this documment. .PG .HL 2 ^*LAVC nodes\* For the LAVC, each node may be classified as one of the following: .LIST 1 "*" .LE;MASTER NODE - the main LAVC node. It's system disk holds the full VMS operating system and the primary copy of LAVC management data. .LE;SPARE NODE - the failover for MASTER node. It's system disk holds up-to date copy of VMS and LAVC management data from MASTER. .LE;FULL NODE - the fully functional cluster member, with VMS (subset) copy on it's disk. Full node may be used as LAVC "BOOT" node for diskless nodes. .LE;DISKLESS NODE - limited functionality node, which does NOT have VMS system on it's disks. Must boot VMS remotely, using one of the nodes above as a "boot" node (in DEC LAVC teminology SATELLITE NODE). .ELS Any node (except for DISKLESS) may leave the cluster and operate in STAND-ALONE mode, using local copy of VMS and subset of LAVC management data. The differences in cluster or stand-alone operation may be sumarized as: .BL ^*CLUSTERED NODE (WORKSTATION)\* .LIST 0 "-" .BL .LE;Direct access to large, high speed disk drives .LE;Direct access to any spooled printer in cluster .LE;Direct access to any batch queue in cluster .LE;Cluster supported data backup .LE;Full set of VMS utilities and other licensed products .LE;Access to any SW product available in cluster .LE;User privileges are under strict control .ELS ^*STAND-ALONE NODE (WORKSTATION)\* .LIST 0 "-" .BL .LE;Group manager has full control of the system (workstation) .LE;System may be used for any experiments at system level .LE;VMS support may be limited (not all utilities, products) .LE;The rest of cluster may be accessed via DECNET .LE;Node data can not be backed-up by LAVC management .ELS .PG .HL 2 ^*VMS SYSTEM TREE CONFIGURATIONS\* The following VMS system tree schematic are included to explain VMS components placement in standard VMS system tree as oposed to standard LAVC and our LAVC configurations. .BL 2 ^*STANDARD VMS SYSTEM TREE\* .LT sys$sysdevice _____|_____ | [SYS0.] | sys$sysroot=DUA0:[SYS0.] |___________| | ___________________________|___________________________ ___|____ ___|____ ___|____ ___|____ ___|____ |[SYSMGR]| |[SYSEXE]| |[SYSLIB]| |[SYSHLP]| |[SYSUPD]| |________| |________| |________| |________| |________| sys$manager sys$system sys$library sys$help sys$update .EL .BL 2 ^*LAVC BOOT/SATELLITE NODE VMS SYSTEM TREE\* .LT sys$sysdevice _____|_____ | [SYSn.] | sys$specific=DUA0:[SYSn.] |___________|| sys$common=DUA0:[V4COMMON.] |___________| (see note below) | ___________________________|___________________________ ___|____ ___|____ ___|____ ___|____ ___|____ |[SYSMGR]| |[SYSEXE]| |[SYSLIB]| |[SYSHLP]| |[SYSUPD]| |________|| |________|| |________|| |________|| |________|| |________| |________| |________| |________| |________| sys$manager sys$system sys$library sys$help sys$update .EL .C;^*sys_$sysroot = sys_$specific,sys_$common\* .BL .TP 5 Any VMS files are accessed using logical name sys_$sysroot. Since in cluster environment sys_$sysroot is a search-list, each file is looked up in the node-specific directory (sys_$specific:[nnnn]) first; if not found, the common directory tree (sys_$common:[nnnn] is used. .BL ^*NOTE\*, examples here refer to sys_$common as DUA0:[V4COMMON.]. In the true VMS implementation, sys_$common translates to DUA0:[SYSn.SYSCOMMON.], which is an alternate entry name for directory DUA0:[V4COMMON]. Before an attempt to ^*delete\* the system specific tree, this entry MUST be removed ( _$ SET FILE /REMOVE DUA0:[SYS]SYSCOMMON.DIR ). .BL .TP 5 In the LAVC environment, most of the VMS files are located in the common directory tree [V4COMMON.]. Only node-specific files (such as pagefiles, system parameters, accounting data) are placed in the system specific tree [SYSn.]. There is a separate [SYSn.] specific directory tree for each satellite using particular BOOT node. .BR LAVC satellite node configuration is performed by the procedure sys_$manager:SATELLITE__CONFIG.COM. This procedure creates (or removes) system-specific [SYSn.] tree containing all the node-specific files (on the BOOT node), and prepares everything for satellite boot and first startup. .BL 2 .TP 30 ^*LAVC FULL NODE VMS SYSTEM TREE\* .LT _____|_____ | [SYSn.] | sys$specific |___________|| sys$common |___________|| master$DUA0:[V4COMMON.] |_ _ _ _ _ _|| spare$DUA0:[V4COMMON.] |_ _ _ _ _ _| | ___________________________|___________________________ ___|____ ___|____ ___|____ ___|____ ___|____ |[SYSMGR]| |[SYSEXE]| |[SYSLIB]| |[SYSHLP]| |[SYSUPD]| |________|| |________|| |________|| |________|| |________|| |________|| |________|| |________|| |________|| |________|| |_ _ _ _ || |_ _ _ _ || |_ _ _ _ || |_ _ _ _ || |_ _ _ _ || |_ _ _ _ | |_ _ _ _ | |_ _ _ _ | |_ _ _ _ | |_ _ _ _ | sys$manager sys$system sys$library sys$help sys$update .EL .BR;^*sys_$sysroot#=#node_$DUA0:[SYSn.],node_$DUA0:[SYSn.SYSCOMMON], .BR;#################master_$DUA0:[V4COMMON],spare_$DUA0:[V4COMMON]\* .BR;#################(all concealed device names translated) .BL .TP 15 The LAVC FULL node basic configuration corresponds to that of a LAVC BOOT node. Thus any such node may be used as a BOOT node for diskless satellites. In addition, VMS common directory trees on MASTER and SPARE node are added to provide files not available on the local node's disk. .BL Contrary to a standard DEC VMS configuration, the our cluster-wide management files are located in the LAVC database (located on the MASTER node) instead of the sys_$common system tree (and accessed using logical name pointers). .LIST 0 "-" .BL .LE;SYSUAF.DAT - common cluster authorization file .LE;NETUAF.DAT - common cluster network proxy file .LE;RIGHTSLIST.DAT - common cluster rights (identifier) database .LE;JBCSYSQUE.DAT - common cluster JOB queue control file .LE;VMSMAIL.DAT - common cluster MAIL database .ELS .BB The DECNET databse NETNODE_REMOTE.DAT can NOT be made cluster-wide, since it contains download information for DISKLESS satellites, which is boot node specific. Therefore, each (FULl) node must have it's own NETNODE_REMOTE.DAT in sys$common:[SYSEXE], and DECNET databse updates MUST be propagated explicitly. .EB .BL .TP 8 The full node common tree [V4COMMON.] need not contain all the VMS files and layered VMS products. Such files and products are found in the [V4COMMON.] tree of the MASTER or SPARE node. .BL .TP 8 For redundancy, the MASTER [V4COMMON.] tree and LAVC database are duplicated on the SPARE node. Should the MASTER node fail, any necessary file will be automatically located on the SPARE node. .BL In some instances, LAVC member configuration may prefer to use a file (product) located on the MASTER node, even though it has a copy in it's own system tree (for example, a different VMS product release is loaded locally for testing). To standardize access to files located on LAVC master node, logical name ^*lavc_$root\* always points to MASTER and SPARE node's [V4COMMON.] trees. In stand-alone configuration, lavc_$root is identical to sys_$sysroot. .BR .TP 28 .HL 2 ^*LAVC NODE startup\* Common, generic SYSTARTUP.COM is used on any LAVC member node. In contrary to typical SYSTARTUP.COM, our version ^*does not\* contain any node specific commands. The only specific information hard-coded in the generic SYSTARTUP.COM are the MASTER and SPARE node disks. Should those change, it is necessary to update generic sys_$common:[SYSMGR]:SYSTARTUP.COM on each node in cluster. .BL The NODE specific startup commands are located in the LAVC database under directory lavc_$data:[nodename]. This allows the same node to join the cluster booting from different boot node (or it's own disk) without any need to change or move startup files. .BL The generic startup procedure executes nested startup files according to node-specific list. The majority of such procedures are common, executed on many different nodes (such as product startups, or commonly used common LAVC procedures stored in lavc$data:[LAVCCOM]). .BL Nested procedures should be coded as re-usable, to allow for testing and re-start if necessary. Since the startup files are maintained in a common location, significant differences in individual node startups should not occur. .TP 10 .HL 2 ^*LAVC NODE Access Control\* LAVC assumes common, cluster wide authorization file, allowing any user to use any node in cluster. In our R_&D environment it may be necessary to restrict access to some nodes. LAVC management implements the access control based on user group membership. For interactive users, each node may define different access rights for users belonging to the same accounting group. .BR Node access is controlled by logical name ^*lavc$access\*, which defines a list of group access rights in a form: .BL .C;DEFINE/SYS LAVC$DATA "+001-230+250-120" .BL where: .LIST 0 "o" .LE;+001 grants access to members of 001 (system) group .LE;-230 denies access to members of 230 group .LE;+ALL grants access to anybody unless explicitly denied .ELS Users not explicitly listed in node access list are allowed to use node in "RESTRICTED" mode (lowered priority ...) .BL Users denied access to node are notified by login process and logged off. .BL Access control does NOT apply to NETWORK access (controlled by PROXY database) and to BATCH (access is required to allow automatic load balancing). .HL 2 ^*LAVC NODE Shutdown\* Special account "SHUT" is provided to perform proper VMS shutdown sequence. On workstation nodes, "SHUT" may be used by any local workstation user, no privileges are required. .BL SHUT account allows to reboot node ether as a LAVC member, or as stand-alone node. It also provides option to force a local node to DISMOUNT disks from cluster with /ABORT (should the node leave cluster for longer period). Without this option, any access to unavaliable disk waits (hangs) until node comes back. .BB .BL ^*NOTE\*: Failure to DISMOUNT cluster-wide mounted disks for the node booting stand-alone (off-cluster) may force the ENTIRE CLUSTER shutdown. Stand-alone boot changes mount-count on node's disks. If any such disk was NOT dismounted from cluster BEFORE node comes back, cluster attempt to re-mount that disk back fails on changed mount count, resulting in endless mount verification. .EB .TP 20 .HL 2 ^*LAVC Data Base\* LAVC Data Base resides on LAVC MASTER and SPARE disk. Necessary subsets of the database are copied to local nodes disks to allow stand-alone operation. Such subsets are updated on daily basis. .BL .TP 20 LAVC Data Base is pointed to by logical root device lavc_$data. This device contains the following directories and files: .BL ^*[LAVCCOM]\*##Basic, common cluster wide DATA and PROCEDURES: .LIST 0 "o" .BL .LE;SYSUAF.DAT - LAVC wide authorization file .LE;NETUAF.DAT - LAVC wide proxy database .LE;RIGHTSLIST.DAT - LAVC wide rights identifier database .LE;JBCSYSQUE.DAT - LAVC common queue file .LE;LAVCWATCH.COM - dynamic re-configuration process startup .LE;LAVCDISKS.COM - disk-mounting procedure for all the disks .LE;LAVCLOGNM.COM - cluster-wide logical names set-up .LE;LAVCPRINT.COM - print queue/forms set-up for entire cluster .LE;LAVCBATCH.COM - batch queue set-up for entire cluster .LE;LAVCSHUT.COM - node shutdown procedure ("SHUT" user login) .LE;LAVCLOGIN.COM - cluster-wide login procedure .LE;LAVCNOTE.TXT - notice displayed on user login .LE;SYSTARTUP.COM - master copy of SYSTARTUP.COM procedure .LE;SYSHUTDWN.COM - master copy of SYSHUTDWN.COM procedure .ELS .BL .TP 15 ^*[node]\*##Node-specific data and procedures (should be minimal) .LIST 0 "o" .bl .LE;NODESTART.DAT - Node Startup procedures list .LE;NODESPEC.COM - Node specific terminals/devices set-up. .LE;NODENOTE.TXT - Node specific login notice .LE;MODPARAMS.DAT - copy of node's sys_$system:MODPARAMS.DAT .ELS .BL .TP 10 ^*[GRPnnn]\*##Group specific data and procedures .LIST 0 "o" .BL .LE;GRPnnnSTART.COM - Group startup definitions .LE;GRPnnnLOGIN.COM - Group specific LOGIN procedure .ELS The [GRPnnn] directory is owned and maintained by the GROUP MANAGER. Using group startup and login file, group manager effectively controls his group's working environment on any node in the cluster. .BL .TP 15 ^*[LAVCMGM]\*##Management / Maintenance files and procedures: .LIST 0 "o" .BL .LE;MGMSYSCPY.COM - VMS copy to a new node's system device .LE;MGMSYSUPD.COM - VMS system update to another node .LE;MGMDATUPD.COM - LAVC database update to other nodes .LE;MGMCOLECT.COM - LAVC management data collection .LE;MGMQUESTA.COM - LAVC JOB/QUEUE system re-start .LE;MGMTAILOR.COM - VMS tailoring procedure .LE;MGMADDUSR.COM - LAVC user authorization procedure .LE;MGMREMCMD.COM - Execute command(s) on remote node(s) .ELS Using the files from the LAVC database, only following management files will be used from the VMS system directory tree: .BL .LIST 0 "o" .LE;sys$common:[sysmgr]SYSTARTUP.COM (copy from LAVC database) .LE;sys$common:[sysmgr]SYSHUTDWN.COM (copy from LAVC database) .LE;sys$specific:[sysmgr]LAVCWATCH.COM (created by LAVC wide one) .LE;sys$specific:[sysmgr]ACCOUNTNG.DAT - node accounting data .LE;sys$specific:[sysmgr]OPERATOR.LOG - node operator log (if used) .LE;sys$specific:[sysexe]NET_*.DAT - node DECNET database .LE;sys$specific:[sysexe]MODPARAMS.DAT (and other AUTOGEN files) .LE;sys$specific:[sysexe]PAGEFILE.SYS (if not remote) .LE;sys$specific:[sysexe]SWAPFILE.SYS (if not remote) .LE;sys$specific:[sysexe]SYSDUMP.DMP (if used) .ELS .TP 20 .HL 2 ^*Adding a Diskless Node\* "DISKLESS" is considered any node which DOES NOT USE a VMS copy on it's local disk. (a node with a VMS copy may still be booted as "diskless"). .LIST 1 .LE;Create a node database in directory lavc$data:[nodename] on the LAVC MASTER node. This database should contain tailored versions of NODESTART.DAT and NODESPEC.DAT. .LE;Use the DEC procedure sys_$manager:SATELLITE__CONFIG.COM on the selected BOOT node to prepare BOOT node for remote satellite. .LE;Inspect / modify the MODPARAMS.DAT in the node specific system tree. In most cases the required parameters for the UIS (VWS) software must be added (use files MGMGPXPAR.DAT or MGMSTARPAR.DAT from lavc$data:[LAVCMGM]). .LE;Boot the new diskless node. It automatically configures DECNET database, performs AUTOGEN and reboots. Since the boot node's sys_$common:SYSTARTUP.COM is already the generic LAVC startup procedure, rebooted system starts using files from lavc_$data:[nodename]. .ELS .HL 2 ^*Removing a Diskless Node\* Always use sys_$manager:SATELLITE__CONFIG.COM to remove a diskless node. Manual removal is a risky operation. .BL Since SATELLITE__CONFIG.COM does NOT allow removal of a node which is currently a cluster member, node removal MUST be performed with node down. .TP 20 .HL 2 ^*Adding a Full-function Node\* A full function node is capable of being the "BOOT" node for diskless satellites. It has a VMS copy on it's local disk, including DECNET and LAVC software (some libraries, help files and examples need not be present). .LIST 1 .LE;Create node database lavc$data:[nodename] on the LAVC MASTER node. Such database should contain tailored versions of NODESTART.DAT and NODESPEC.DAT. .LE;Boot the new node "DISKLESS" as described above. You do not have to create large pagefiles nor modify MODPARAMS.DAT, since the next step requires only limited VMS functionality. Additionally, UIS (VWS) software does not have to be started and lavc$data:[nodename] may be empty. .LE;Use the LAVC management procedure MGMSYSCPY.COM to down-load required (sub)set of VMS. MGMSYSCPY.COM must be executed on the TARGET node (it checks node HW configuration). .LE;Tailor the MODPARAMS.DAT created by MGMSYSCPY.COM, if necessary. .LE;Shut the diskless node down and remove it's root on the boot node using sys_$manager:SATELLITE__CONFIG.COM. .LE;Boot the new node from local disk. Startup then automatically configures DECNET database, performs AUTOGEN and reboots. Since MGMSYSCPY provided the generic startup procedure in sys_$common:SYSTARTUP.COM, the rebooted system starts by using files from lavc_$data:[nodename]. .LE;Tailor the target VMS system using MGMTAILOR.COM and particular tailoring control file. .ELS .TP 20 .BB .HL 2 ^*Installing Layered Products\* VMS layered products are always installed using VMSINSTAL.COM. Our LAVC management provides special, captive account VMSINST which gives full interface to VMSINSTAL.COM to any authorized user. Product installation thus may be performed by any workstation user AUTHORIZED to do so by the LAVC manager (given the password). .BL VMSINST account may be used for software installation development and testing as well. In any case, VMSINST is targeted for use on "FULL" nodes (workstations holding it's own copy of VMS). Using VMSINST on boot nodes (11/780) is prohibited for safety reasons. .HL 2 ^*Installing VMS system updates\* Minor VMS updates may be installed using the approach for Layered Products, using the VMSINST account. Major VMS updates MUST be installed by the LAVC manager. .BL Since our LAVC uses mutiple copies of the VMS operating system, procedures will be developed to PROPAGATE updates to individual nodes, as opposed to performing the VMS update on each node separately. .EB .PG .ST ^*CONSTRAINTS AND LIMITATIONS\* .HL 1 ^*CONSTRAINTS AND LIMITATIONS\* Initial design contains only limited fail over from MASTER to spare node. The LAVCWATCH procedure changes LAVC logicals, thus affecting any subsequent file access. However, any files accessed on the MASTER at the moment the MASTER hangs (crashes) will remain accessed effectively blocking process execution until MASTER (or SPARE) comes back, or the process is explicitly terminated. This applies also to the JBCSYSQUE.DAT, thus the job/queue system must be restarted on each node. The LAVCWATCH procedure accomplishes this task by killing and re-starting the JOBCTL process. .ST ^*NAMING CONVENTIONS\* .HL 1 ^*NAMING CONVENTIONS\* The following name prefixes are mandatory to any files used for LAVC management: .LIST 1 "*" .LE;^*LAVC...###\*for files used in LAVC startup and operation .LE;^*NODE...###\*for any node specific files used in startup .LE;^*GRPnnn...#\*for group-specific files .LE;^*MGM...####\*for management utility files .ELS Any logical names related to LAVC startup and operation use prefix ^*LAVC_$...\*. .BL .PG .ST ^*ROUTINES and FUNCTIONS\* .HL 1 ^*ROUTINES and FUNCTIONS\* .HL 2 ^*LAVCSTART.COM\* Common, generic SYSTARTUP.COM is used on any LAVC node. Embeded within procedure body is the information about LAVC MASTER and SPARE nodes, to allow location of LAVC data base. This procedure must be updated only if the basic LAVC configuration changes. .BL Master copy of procedure is maintained in lavc_$data:[LAVCCOMM] as SYSTARTUP.COM, and copied into each node's sys_$manager directory after the VMS system load, or if the LAVC configuration (MASTER, SPARE disk) changes. .BL Procedure flow: .LIST 1 .LE;Locates the LAVC database mounting the MASTER and SPARE node's disks .LE;Starts the LAVC_WATCH process, which is responsible for dynamic LAVC reconfiguration (should MASTER or SPARE node fail). LAVC__WATCH process (along with other functions) maintains basic LAVC logical names: .LIST 0 "*" .BL .LE;lavc_$data###- pointer to LAVC Data Base. .LE;sys_$sysroot#- with added roots on MASTER and SPARE nodes .LE;lavc_$root###- pointer to roots on MASTER and SPARE nodes .ELS .LE;Locates node-specific file lavc_$data:[node]NODESTARTUP.DAT This file is a list of procedures to perform during particular node startup. .LE;Performs all procedures listed in NODESTARTUP.COM. Procedures are excuted synchronously, or as a detached process running uder specified UIC. .LE;Checks for any errors encountered during system startup. If any, creates notification message to be displayed by system-wide login, and mails such a message to LAVC manager (if possible). .LE;If no errors, procedure issues notification reply and allows user logins. In the case of severe (fatal) errors, it restricts access to holders of the OPER privilege (LAVC and GROUP managers). If the "private" bootstrap has been performed, restricts logins to holders of OPER privilege, without user notification. .ELS .TP 15 .HL 2 ^*LAVCWATCH.COM\* Cluster-wide procedure to monitor any significant cluster configuration changes and act accordingly. Procedure is automatically executed by the generic LAVC startup (doe NOT need to be listed in NODESTART.DAT). .BL On the first execution (at startup time), procedure performs configuration functions, and creates it's copy on the local system's disk. This copy is then executed as a detached process LAVC__WATCH. (Local copy is used to prevent LAVC__WATCH hang-up if the MASTER disk goes off-line). .BL In context of detached process, procedure periodically checks presence of the MASTER and SPARE node in cluster. On any configuration change, it performs configuration functions. .BL Configuration functions currently include: .LIST 1 "o" .LE;Main LAVC logical names maintennance: lavc$data, lavc$root, sys$sysroot. On any LAVC configuration change, logical names are adjusted to a new configuration. .LE;Cluster quorum monitoring. If the cluster membership drops (either due regular shutdown or node crash), process adjusts quorum to possible minimum to prevent cluster hang-up on lost quorum. .LE;Job/Queue system restart. If the cluster configuration (location of the JBCSYSQUE.DAT) changes, Job/Queue system must be re-started. LAVC__WATCH process waits (10 minutes) after the LAVC configuration change; if the change appears to be permannent, it restarts the Job/Queue system using management procedure MGMQUESTA.COM. .ELS .BL Procedure master copy is maintained in lavc_$data:[LAVCCOM], updates are (daily) propagated to each node's [LAVCDATA.LAVCCOM] directory. .TP 15 .HL 2 ^*LAVCDISKS.COM\* Cluster-wide procedure to mount any disks public to the CLUSTER. The procedure contains embeded list of all LAVC served disk devices and their labels. .BL Transversing it's list, procedure mounts any disk it finds. In LAVC environment, mounts are cluster-wide. In stand-alone mode, procedure finds local disks only, and mounts them locally. .BL Procedure also accepts additional arguments: .LIST 0 "-" .BL .LE;P1=DISMOUNT or REBUILD (default = MOUNT) .LE;P2=nodename (operation limited to disks on "node") .LE;P3=/CLUSTER (dismount is /Abort/Cluster) .ELS .BL Procedure master copy is maintained in lavc_$data:[LAVCCOM], updates are (daily) propagated to each node's [LAVCDATA.LAVCCOM] directory. .TP 20 .HL 2 ^*LAVCLOGNM.COM\* LAVC system wide logical names. Procedure creates CLUSTER / SYSTEM wide logical names on each node in cluster. Such names include: .LIST 1 "-" .LE;Definition of logical names for standard VMS system files SYSUAF.DAT, NETUAF.DAT, RIGHTSLIST.DAT (pointing to LAVC data base). .LE;Functional devices logical names (refer to PTP 147-14670-000 "R_&D Account Reconfiguration") .LE;Other logical names as needed. .ELS Procedure master copy is maintained in lavc_$data:[LAVCCOM], updates are (daily) propagated to each node's [LAVCDATA.LAVCCOM] directory. .BR .TP 20 .HL 2 ^*LAVCPRINT.COM\* LAVC-wide device queue setup. Procedure: .LIST 1 .LE;Starts the queue manager using cluster-wide JOB controller file lavc_$data:[LAVCCOM]JBCSYSQUE.DAT .LE;Defines all the named forms used in LAVC .LE;Defines (initializes) cluster wide queues .LE;Starts queues for devices present on local node .ELS Procedure master copy is maintained in lavc_$data:[LAVCCOM], updates are (daily) propagated to each node's [LAVCDATA.LAVCCOM] directory. .BR; .TP 20 .HL 2 ^*LAVCBATCH.COM\* LAVC wide batch system setup. Procedure: .LIST 1 .LE;Starts queue manager (if not already active). .LE;Defines (initialize) all cluster-wide batch queues. There will be: .LIST 0 "-" .bl .LE;Generic, cluster wide SYS_$BATCH queue .LE;Node's standard queue node_$BATCH (later with enabled generic processing) .LE;Special, named queues created where necessary (build queues, maketest queues etc.). Such queues must be explicitly set /NoEnable__Generic to prevent their use for normal (generic) jobs. .ELS .LE;Starts queues present on the local node. .ELS In stand-alone mode, logical name SYS_$BATCH points to node_$BATCH queue. .BL Procedure master copy is maintained in lavc_$data:[LAVCCOM], updates are (daily) propagated to each node's [LAVCDATA.LAVCCOM] directory. .BR .TP 25 .HL 2 ^*LAVCLOGIN.COM\* Standard system wide LOGIN procedure: .LIST 1 .LE;Checks for the "Group Manager" account, adjusts account privileges if necessary. .LE;Check if the user is authorized to use this node, terminates process if not (giving notification). Checking based on the user accont clasification. .LE;Modifies user's base priority based on user's account rights. .LE;Displays lavc_$data:[LAVCCOM]LAVCNOTE.TXT (cluster wide daily notice, if present and interactive mode) .LE;Displays lavc_$data:[LAVCCOM]NODENOTE.TXT (node specific daily notice, if present and interactive mode) .LE;Displays sys_$manager:NODEBOOT.TXT (node startup error log, created by system startup in case of errors) .LE;Executes lavc_$data:[LAVCGRPnnn]GRPnnnn.COM group login procedure (both in Interactive, Batch, Network or Other mode) if available. .LE;Executes user private procedure sys$login:LOGIN.COM .ELS Procedure master copy is maintained in lavc_$data:[LAVCCOM], updates are (daily) propagated to each node's [LAVCDATA.LAVCCOM] directory. .BR .TP 20 .HL 2 ^*LAVCSHUT.COM\* Captive LOGIN procedure for the "SHUT" account. This account allows workstation user to properly shut-down his workstation without need for any special privileges (the SHUT account has all the privileges required). .BL SHUTDOWN may be executed ONLY on WORKSTATION nodes, and ONLY by an interactive users logged locally to the workstation. The procedure: .LIST 1 .LE;Prompts for next boot configuration (as LAVC member or STAND-ALONE) and updates system parameter VAXCLUSTER, if necessary. .LE;Prompts for standard shutdown questions .LE;Performs shutdown with options REMOVE__NODE (in LAVC) and REBOOT__CHECK .LE;Logs shutdown information .ELS Procedure master copy is maintained in lavc_$data:[LAVCCOM], updates are (daily) propagated to each node's [LAVCDATA.LAVCCOM] directory. .BR .TP 40 .HL 2 ^*NODESTART.DAT\* Node specific start-up command procedures list. LAVC management assumes that most startup operations / procedures will be common to many LAVC members. However, each node may use different sub-set of such procedures. Any commands executed during node startup therefore must be comming from either LAVC COMMON procedure, PRODUCT specific startup, or be included in NODESPEC.COM. For details on NODESTART.DAT format please refer to the following section named "DATA STRUCTURES". .TP 15 .HL 2 ^*NODESPEC.COM\* Node-specific startup procedure. Performs node - specific operations which ^*can not be cluster-wide\*. Such operations typically include: .BL .LIST 0 "-" .LE;Defines node access rights by logical name LAVC$ACCESS. .LE;Installs node-specific images (if necessary) .LE;Installs additiona page/swap files .LE;Configures special local devices (unless for some technical reason we must use sys_$manager:syconfig.com) .LE;Sets-up terminal ports, printers etc. .ELS For device set-up, call to LAVCDEVSET.COM should be used. This code allows to execute (test) procedure on running system, with devices already allocated to users. .BR Procedure master copy is maintained in lavc_$data:[node], updates are (daily) propagated to each node's [LAVCDATA.node] directory. .TP 45 .HL 2 ^*MGMSYSCPY.COM\* (available) Procedure used by system manager to propagate a copy of the VMS operating system to a disk on a new LAVC member. It assumes new LAVC member is booted diskless, and procedure is executed on the target node. .BL Procedure: .LIST 1 .LE;Prompts for node parameters .LE;Uses either BACKUP (to copy the entire VMS tree) or parameter-directed VMS procedure sys_$update:VMSKITBLD.COM to copy the basic selected VMS subset to the target disk. .LE;Creates new page/swap/dump files .LE;Creates the intial version of sys_$system:MODPARAMS.DAT. On VAX workstations this file already contains requirements for the UIS (VWS) software. .LE;Creates a file sys_$manager:SYSTARTUP.INI to be executed at the first boot from created system disk. .BR;This procedure will: .LISt 0 "-" .BL .LE;Configure the DECNET database (incl. known nodes copy) .LE;Execute sys_$update:AUTOGEN.COM to reboot the system .ELS .LE;Copies LAVC generic startup file SYSTARTUP.COM into sys$common:[sysexe]:SYSTARTUP.COM .LE;Copies LAVC generic startup file SYSHUTDWN.COM into sys$common:[sysexe]:SYSHUTDWN.COM .LE;Copies a required subset of LAVC data base [LAVCCOM], [node] to the target disk. .LE;Logs action in LAVC database. .ELS .BR .TP 15 .HL 2 ^*MGMDATUPD.COM\* (available) Procedure checks the LAVC database for modified files, and propagates any such files to SPARE node. Files existing in individual node LAVC database subset are updated as well. Procedure action is logged in LAVC database. .BL Procedure is intended to run in BATCH mode, presumably during night hours, after system backups complete. .BL Procedure arguments (defaults are hardcoded): .LIST 0 "-" .BL .LE;P1 - MASTER node disk to use as database source .LE;P2 - SPARE node disk to update .LE;P3 - list of local node's disks to update .LE;P4 - options (SUBMIT) .ELS .PG .TP 25 .HL 2 ^*MGMSYSUPD.COM\* (not yet available) Procedure checks the LAVC MASTER node system tree for new / modified system files and propagates any such updated files to selected node(s). Procedure action is logged in LAVC database. Procedure does not handle system data (SYSUAF.DAT, ACCOUNTNG.DAT etc). .BL Procedure prompts for arguments: .LIST 0 "-" .BL .LE;P1 - modification date (modified/since=date) .LE;P2 - MASTER node to be used as system files source .LE;P3 - target node(s) .LE;P3 - options .ELS .TP 15 .HL 2 ^*MGMREMCMD.COM\* Procedure executes DCL command(s) on remote node(s) using DECNET SET HOST facility. Procedure prompts for arguments: .LIST 0 "-" .BL .LE;P1 - username|password to use for log-in .LE;P2 - list of DCL commands/data separated by "|" .LE;P3 - list of nodes separated by "|" .ELS .TP 25 .HL 2 ^*MGMTAILOR.COM\* Procedure tailors the VMS oprating system, using tailoring control file. .BL DELETE operation deletes all the "target" files prescribed by the control file, except for those which can not be found on "source" VMS tree (and thus can not be restored later). .BL RESTORE operation restores all the files prescribed by the control file, unless such file allready exists on target. .BL Procedure prompts for arguments: .LIST 0 "-" .BL .LE;P1 - operation DELETE or RESTORE .LE;P2 - "source" disk with VMS directory tree .LE;P3 - "target" disk with VMS directoery tree .LE;P4 - tailoring control file .ELS .TP 10 The tailoring control file [.TLR] format is identical to standard VMS tailoring files: .BL [directory]filename.type .BL Any line starting with "$" is considered a direct DCL command, comments may be included using prefix "$!". .TP 15 .HL 2 ^*MGMQUESTA.COM\* Procedure is used to re-start JOB/QUEUE system which hang due to loss of the disk holding JBCSYSQUE.DAT (or any other reason). It must be executed with ALL the privileges enabled. Procedure: .LIST 0 "-" .BL .LE;Aborts the JOBCTL process .LE;Aborts any print symbionts found .LE;Starts the JOBCTL proceess using sys$system:STARTUP JOBCTL .LE;Executes LAVC$DATA:[LAVCCOM]LAVCPRINT.COM .LE;Executes LAVC$DATA:[LAVCCOM]LAVCBATCH.COM .ELS .TP 15 .HL 2 ^*MGMCOLLECT.COM\* Procedure used to (daily) collect accounting and other system management data. Currently, the folowing files are handled: .LIT - sys$manager:ACCOUNTNG.DAT --> target:ACC_MAY05.node - sys$manager:OPERATOR.LOG --> target:OPR_MAY05.node - sys$errorlog:ERRLOG.SYS --> target:ERR_MAY05.node .EL Procedure argument P1 defines the target device:[directory] for collected data. Procedure performs all the necessary actions necessary to open new files. After successfull copy, original data are deleted to assure system disk will not overflow. .BL Target files are labeled with the date of collection. It is assumed that procedure is executed at midnight, thus file labeled MAY05 will contain data BEFORE MAY 05. .PG .ST ^*DATA STRUCTURES\* .HL 1 ^*DATA STRUCTURES .HL 2 ^*NODESTART.DAT\* Node - specific startup data file NODESTART.DAT contains list of command procedures to invoke during node startup in format: .BL .C;^*f|UIC|pathname|description\* .BL where: .LIST 1 "*" .LE;*f is a severity flag for particular file [I|E|W|F]. Flag is used to classify severity of error if particular startup file can not be found. .BL In special instances, flag "_$" signalls that the entire line should be executed as DCL command. This facility is intended for exceptions only. .LE;^*UIC\* Non-empty field requests DETACHED process to execute particular procedure, under prescribed UIC. This feature is intended for GROUP startup procedures, for processes that must be postponed and or may be executed in parallel to speed-up startup. .LE;^*pathname\* is a full pathname to a command procedure to execute, examples are lavc_$data:[LAVCCOM]LAVCDISKS, or sys_$system:NETCONFIG.COM. .LE;^*description\* is an explanatory text, displayed during startup or in error messges. .ELS NODESTART.DAT may contain comments, flagged by exclamation point "!" in column one. .BL Most of the startup procedures listed in node's NODESTARTUP.DAT will be located either in cluster-wide lavc-$data:[LAVCCOM], or (for DIGITAL layered products) under sys_$manager. However, any PRODUCT startup files will be located in particular PRODUCT directories. .TP 20 .BL ^*Example of the NODESTART.DAT\* .LT ! NODESTART.DAT - Node specific startup files for node CHOPIN ! History: ! 02/09/87,,,MXB, Example set-up ! E||lavc$data:[LAVCCOM]LAVCDISKS.COM|Mounting cluster disks E||lavc$data:[LAVCCOM]LAVCLOGNM.COM|Creating system logical names E||lavc$data:[CHOPIN]NODEDESPEC.COM|Configuring terminal ports W||lavc$data:[LAVCCOM]LAVCPRINT.COM|Starting device queues W||lavc$data:[LAVCCOM]LAVCBATCH.COM|Starting batch queues E||sys$manager:STARTNET.COM|Starting DECNET F||sys$manager:STARTVWS.COM|Starting Vax Workstation Software W||s7kdsk:[GSYS]GSSTARTUP.COM|Starting S7000 software W|[230,1]|lavc$data:[GRP230]GRP230START.COM|S7K group startup W|[250,1]|lavc$data:[GRP250]GRP250START.COM|S5K group startup W||comdsk:[NETDIST.MISC]NETSTART.COM|Starting TCP/IP ! ! end of NODESTARTUP.COM .EL .TP 30 .ST ^*ERROR AND EXCEPTION HANDLING\* .HL 1 ^*ERROR-EXCEPTION HANDLING\* Startup procedures report errors using standard VMS format: .BL .C;%fac-sev-ident, Message text .BL For system startup procedures, facility code is STARTUP. .BL 2 LAVC common startup procedures use the following global symbols to report errors back to generic LAVCSTARTUP.COM (SYSTARTUP.COM): .LIST 0 "-" .BL .LE;^*sysstawar\*==systawar+"warning description"+systaCRLF .LE;^*sysstaerr\*==systaerr+"error description"+systaCRLF .LE;^*sysstafat\*==systafat+"fatal error description"+systaCRLF .ELS The LAVCSTART.COM procedure checks symbols above; if any of symbols is not empty, it is used in notification message creation, and on fatal errors the startup procedure disables non-operator logins. .!HL 1 ^*TESTING\* .!HL 1 ^*SCHEDULE\*