TECHNICAL INFORMATION ABOUT VMS_SHARE Version 7.2 Feb 1990 1. INTRODUCTION VMS_SHARE is designed to package a series of files into a form that can be easily mailed across many different networks. Difficulties arise with doing this because of the many and varied possibilities for corruption of data in transit. For example, line wrapping, case folding, transposition of key characters etc. VMS_SHARE encodes files before transmission so that these things may be kept under control and proper restoral effected at the receiving end. For a given series of files to be packaged, VMS_SHARE combines them into a single large 'text archive' file that can be unpacked into its component files simply by running it as a command procedure at the receiving end. For convenience, VMS_SHARE will optionally split the result into multiple parts that can be individually mailed and recombined at the receiving end. 2. WHAT VMS_SHARE DOES NOT DO Becase VMS_SHARE relies on electronic mail to ship the files, there are no protocols that can be used to check the accuracy of the received file(s). There is a reliance on the underlying mail system to get everything there in one piece and unchanged. VMS_SHARE is unable to ask for retransmission of missing or damaged pieces. VMS_SHARE should therefore be used to send files only via essentially reliable mail systems which can get files, whose characters fall within certain bounds, there intact. VMS_SHARE does not deal with binary files, although it will encode any non printing 8-bit byte into a transmittable format. However, record and file formats are not preserved. VMS_SHARE should generally be used only with text files. 3. LIMITATIONS OF MAILERS AND HOW VMS_SHARE GETS AROUND THEM Various mail systems have different limitations within them. For instance, they will wrap or truncate lines that are too long, they may limit the size of an individual mail message, they may transpose characters incorrectly if the underlying character set is different from the transmitter (ASCII/EBCDIC is a good example of this). VMS_SHARE encodes the files in different ways to get around the problems. Please note however, that the encoding techniques are NOT foolproof. We have merely tried to anticipate all possible corruptions and devise an encoding scheme that ensures the conditions under which corruption occurs does not arise. If a form of corruption that has not been anticipated occurs, corruption to the transmitted files will be irreparable except through manual editing. 3.1 Maximum Size of a Mail Message Many mail systems cannot cope with single mail messages larger than a fixed number of bytes and will truncate messages or maybe even fail to deliver them altogether. This is a real problem if a large software package is being sent. VMS_SHARE tries to overcome this by splitting the packaged files into several parts, each part being smaller than some fixed size. For example, we might send a total of 300 blocks of code as 15 parts each of 20 blocks or less. VMS_SHARE will automatically split at the 20 block boundary. The actual value is configurable via a logical name (SHARE_PART_SIZE). It should be noted that mail headers added on route can account for several blocks worth of extra space so this should be realised when setting the maximum part size. 3.2 Maximum Line Length Many mail systems do not like lines longer than some fixed maximum length, a maximum length of 80 characters is typical. This results in longer lines being wrapped or truncated at seemingly arbitrary positions. VMS_SHARE tries to cope with this by wrapping long lines itself and inserting markers to allow them to be rejoined at the receiving end. What VMS_SHARE does is to prefix each line with a flag character. This flag character says EITHER 'this is the first part of a line' OR 'this line is a continuation of the previous line'. The wrapping point is chosen carefully to avoid leaving any trailing blanks at the end of a line as these are sometimes removed by mail systems in transit - the trailing blanks are moved to the start of the continued line. The maximum line size is configured into the TPU code as a global value and can be easily changed if required. It is not intended that this value should be altered by the average user however. 3.3 Trailing Blanks While on the subject of trailing blank removal, VMS_SHARE will translate the last character on a line if it is a blank character, into a special 'escape sequence' that can be recognized by the receiving end and translated back. Thus, the transmitted file is immune to trailing blank removals or additions because there are none. 3.4 Escaped Characters Probably the biggest problem is that a mail message moving through many different systems on route to the destination may undergo character conversions (for example - ASCII to EBCDIC if moving from VAX to an IBM). Unfortunately, not all systems keep similar translation tables and characters can get translated into something unexpected at the remote end. Culprits are caret (^), tilde (~), square and curly brackets ( [ ] { } ) and a few others. VMS_SHARE deals with this problem by replacing each of the troublesome characters - the ones mentioned above plus any non-printing character - by an escape sequence. The escape sequence is recognized at the receiving end and is translated back to the original character. Obviously, to work correctly, the escape sequence itself must be immune from translation problems. The escape technique used is to replace each character by a string of the form `xx where the ` symbol flags the start of an escape sequence and 'xx' is a 2-digit string which is the hexadecimal form of the ASCII code for the character. Naturally, the ` character itself must be escaped in this form to avoid confusion. For example, a space would be replaced by `20 and a tab by `09. 3.5 Detecting Damaged Files with Checksums In cases where some corruption occurs despite the encodings used by VMS_SHARE, detection of damage (BUT NOT REPAIR!) should be possible because each file is checked for accuracy using a checksum once it has been unpacked. VMS_SHARE uses the currently undocumented CHECKSUM command to produce a checksum value for the source file. This checksum is carried across in the packed share file and checked when the file is restored. A failed match causes a message and the receiver can take action to try to locate and repair the damage. The DCL command CHECKSUM filename writes the checksum value into a DCL symbol called CHECKSUM$CHECKSUM. 4. VMS_SHARE IMPLEMENTATION VMS_SHARE is provided as a combination of DCL and TPU code in order to ensure that it will run on any VMS system. A specific program would be faster of course but then portability is not guaranteed. The DCL part of the software is used merely to pick up parameters and parse filenames, passing them to the TPU code in a scratch file. The TPU code does the hard work of packaging the files, wrapping lines, escaping characters and generating multiple parts. As distributed, the DCL and TPU code are bundled into a single large procedure but there is no reason why the TPU code could not be extracted and made into a section file for enhanced speed. The modifications required are quite straightforward. 5. USING VMS_SHARE As distributed, VMS_SHARE is run as a command procedure (usually via a suitable symbol set up to point to it) thus:- $ @VMS_SHARE filespecs sharefile where 'filespecs' is a comma separated list of wildcarded filenames to be packaged, and 'sharefile' is the name to be given to the packaged files (the name will be suffixed by the part number) There are some restrictions on the filenames that can be used: - Subdirectories may be used provided that they are beneath the current directory. It is not permitted to package files in other directories. - The name of the sharefile must not appear in the 'filespecs' list because the software cannot package a file into itself. - At least one valid file must be given in 'filespecs' or no sharefile will be produced. 6. UNPACKING A VMS_SHARE FILE In general, a package delivered using the VMS_SHARE software will arrive in a number of parts, from 1 up to 'n'. All parts should be concatenated together in order. It is NOT necessary to remove superfluous mail headers from any part other than part 1 prior to concatenation. The resulting combined file should then be executed as a command procedure in order to unpack the resulting files. 6.1 Typical Unpack Sequence A typical sequence of events goes like this: - Set your default directory to a scratch directory which is empty. - Go into MAIL and select the folder which contains the parts of the package. - Extract part 1 into a file, using the command 'EXTRACT/NOHEADER file' Extract part 2 into a file, using the command 'EXTRACT/NOHEADER/APPEND file' ... ... Extract part n into a file, using the command 'EXTRACT/NOHEADER/APPEND file' - Read warning below BEFORE proceeding!!! - Execute as a command procedure, using the following command: $ @file 6.2 Warning It is strongly suggested that the generated command procedure ('file.SHAR' in the above example) be carefully checked before execution. It is possible that unscrupulous persons might tamper with the source before sending it and introduce a virus into the VMS_SHARe'd code. There is nothing that VMS_SHARE can do about this automatically. However, since all the files should be human readable it should be possible to detect fraudulent code by manual checking. Certainly the lines starting with '$' symbols, and the TPU code near the start, should be checked carefully as these are most likely to be troublesome. 7. DECLARATION AND DISCLAIMER This software is in the public domain and may be freely distributed without charge as required. However, all copyright notices and references to the author in the source must be left intact. Third party modifications may be made to the source but any errors arising from their use are entirely the responsibility of the modifier. The author accepts no responsibility for the suitability of this software for any specific purpose. Any errors arising from its use are entirely the responsibility of the user. Andy Harper Kings College London UK