.RIGHT MARGIN 80 .TITLE;RMS for Datatrieve .SUBTITLE;Datatrieve Pre-Symposium Seminar, Spring 1983 .CENTER;What is RMS .PARAGRAPH RMS is a set of software services which provides a uniform method of accessing records in files. It is inherent in VMS, and added on to IAS, RSX and RSTS/E: it is therefore available for all operating systems where Datatrieve is available. .PARAGRAPH RMS supports several file types: Sequential, Relative and Indexed. .BLANK Datatrieve does not support Relative files as domains, and would read existing files sequentially. .PARAGRAPH.TEST PAGE 4 RMS supports Fixed length and Variable length records. .BLANK It also supports some other record types for compatibility with existing files. .PARAGRAPH.TEST PAGE 4 RMS arbitrates file sharing, allowing more than one program to access the same file. It also arbitrates block locking, so that two programs will not attempt to modify the same record at the same time. .PARAGRAPH.TEST PAGE 8 RMS controls the conversion of data on disk (or other storage medium) to data within a record. Data on a disk is in blocks, each 512 bytes long. RMS uses one or more blocks to form a bucket, which may then contain one or more records. The bucket size is fixed when the file is created: from that point, the user (program) need deal only with single records. Note, however, that the buckets must be stored in buffers within the users' program: in Datatrieve-11, this uses up pool. .BLANK.TEST PAGE 11 .CENTER;Sequential Files. .PARAGRAPH Sequential files are the smallest for a given amount of data, are compatible with programs which do not use RMS (such as FCS on IAS and RSX, Stream on RSTS), can be stored on mediums other than disk (Magnetic tape), can be transmitted over communication lines, and generally have the least overhead. The file contains data, plus: a record length descriptor for variable length records only; and padding if the records don't fall on word boundaries (on PDP-11). If the data is normally accessed in a sequential manner, Sequential files will be the fastest to process. .BLANK Disadvantages: files must be sequentially accessed. New records can only be added to the end of the file, normally records can only be deleted from the end of the file. Slow access to center of file, or processing non-sequentially (going back one record usually means starting over again from the beginning). Sharing limited to multiple access for Read Only. .BLANK.TEST PAGE 5 .CENTER;Relative Files. .PARAGRAPH Datatrieve does not support access by record number. Existing relative files may be accessed sequentially. .BLANK.TEST PAGE 8 .CENTER;Indexed Files. .PARAGRAPH Access may be by key, or sequentially (default order by primary key). Keys may have duplicate entries, or no duplicates (guards against duplicate data). By accessing sequentially by primary key, the file is "automatically sorted". Accessing by keys allows fast access to data in the center of the file. Files can be shared for read or write. .BLANK Disadvantages: Highest overhead in disk space, memory used, amount of software code needed. Must be stored on disk only. Indexed files occupy more space as both the data and the index must be stored: one index area for each key, plus 15 bytes in each bucket, plus 7 bytes for each record. In addition, space can be "lost" if records are moved or deleted. The file must have one key, the data of which may not change (records can be deleted and re-entered with new key field). .BLANK .CENTER;Creating a file. .PARAGRAPH A file can be created for any domain with the DEFINE#FILE command. This file will always work, but may not be optimum. .PARAGRAPH A file can also be created using the DEF (PDP-11) or CREATE (VAX) utility. To do this, one must know the attributes of the file. The best place to begin is to create a file with Datatrieve (or another language), then examine the file with DSP (PDP-11) or ANALYZE/RMS__FILE (VAX), and choose the areas which may be changed for better performance in a given application. .PARAGRAPH The approximate sequence of decisions is: .BLANK Choose a file name. .BLANK.TEST PAGE 6 Should the old file be superseded? I recommend not superseding the old file: create a new version, check to see it is correct, then purge (delete) the old file. RSTS/E users should back up the old file first. The exception is a file used for temporary storage (scratch data): having this superseded keeps old versions from piling up on disk. .BLANK Should the file be Sequential, or Indexed. .BLANK.TEST PAGE 3 Record Size: the size required to hold the data. This may be obtained from an existing file, and is also given by Datatrieve when the record is defined. .BLANK.TEST PAGE 5 On Sequential files, should records cross block boundaries: generally yes, unless the file is shared with an existing program which already calls for non-spanned blocks. If the record size is greater then 512 bytes, blocks must be spanned. Spanning packs the data onto the disk with no wasted space. .BLANK.TEST PAGE 5 Carriage Return Control: usually say yes, this allows files to be PIPed or TYPEd to your CRT screen, or printed on a terminal or line printer. This does not increase file overhead, as no extra data is added to the file. Exception: if FORTRAN control is required for an existing application. .BLANK.TEST PAGE 6 KEYS (if indexed file): .LIST .BLANK TYPE: string, integer, binary, or packed decimal, determined by the type of data. Strings do not have to be ASCII characters. .BLANK.TEST PAGE 4 POSITION: the location within the record, which can be segmented. Datatrieve-11 does not allow creation of segmented keys, but may be able to read an existing file with segmented keys. .BLANK.TEST PAGE 4 SIZE: the length of a key. Strings are 1 to 255 bytes, Integer and Binary are 2 or 4 bytes (1 or 2 words), and Packed Decimal is 1 to 16 bytes (1 to 31 digits plus sign). .BLANK.TEST PAGE 3 NAME: Datatrieve does not use the name, but placing one in the file documents the field use, for reference later. .BLANK DUPLICATES: if you want more than one record to have the same key. .BLANK.TEST PAGE 3 CHANGES: if you want to allow the value of a key to change. The primary key never allows changes: bad data must be deleted and re-entered. .BLANK.TEST PAGE 3 NULL VALUE: allows entering a record with no key. This record cannot be retrieved by key, but can be retrieved by RFA or default. .END LIST .BLANK.TEST PAGE 6 AREAS: normally, the file is one area, so the index is next to the data. By defining separate areas, one could put the indexes for several files next to each other with the data elsewhere, or on the VAX, the area on one disk and the data on another. Unless the parameters of an application are known in great detail, do not separate areas. .BLANK.TEST PAGE 5 PLACEMENT CONTROL: allows specifying the location the file will occupy on a disk. Can optimize disk activity if several files are normally accessed together by placing them next to each other on the disk. The position is usually lost when the disk is copied with a backup utility. .BLANK.TEST PAGE 4 INITIAL ALLOCATION: if the size of the file is known, it is better to allocate space for it on the disk when it is created than to have to extend it later. You should at least try to allocate most of the space required initially. .BLANK.TEST PAGE 4 DEFAULT EXTENSION QUANTITY: if it does become necessary to extend the file, the fewer extents the better. If it is known that large quantities of data will be added to the file at a time, use a larger extension size. .BLANK.TEST PAGE 12 BUCKET SIZE (for indexed files). Usually, use the smallest bucket size which will work (the bucket must be large enough to hold at least one record). Larger bucket size may increase speed, but will also require more buffer space: in Datatrieve-11, this means pool. A file with a bucket size greater than 3 may not work at all in Datatrieve-11. On the VAX: if increasing the bucket size will decrease the number of levels, (in the index) then a performance increase results. The RMS utilities will tell you how many levels you have. A larger bucket size will almost always increase speed if accessing sequentially by primary key; it will speed keyed access only if level size is reduced, or if several records with the same or adjacent keys are normally required at the same time, or one after the other. .BLANK.TEST PAGE 6 CONTIGUOUS FILES: are generally faster, but require contiguous space on disk. On VAX only, can specify the file to be created contiguous if space is available, non-contiguous otherwise (on the PDP-11, the create will fail if contiguous space not available). Most disk backup utilities make all files contiguous even if the directory doesn't indicate that the file is contiguous. .BLANK.TEST PAGE 8 FILL FACTORS for indexed files (index and data). When the file is initially filled (with CNV, IFL, or CONVERT), the fill factor will specify how much empty space to leave in the index and data areas. If it is known that more data will be added to the file, leave empty space for it when initially filling the file, for faster additions later. If little or no data addition or change will be made, fill the file. Datatrieve does not honor fill factors, and will always fill completely. .BLANK.TEST PAGE 3 PROTECTION: specify the file protection codes. You can always change them later. .BLANK.TEST PAGE 8 .CENTER;Reclaiming unused or lost space. (Indexed files) .PARAGRAPH If old records are deleted or moved, the file may contain lost space (this happens especially with dictionaries on the PDP-11). Deleting records in an indexed file still leaves the control information in the bucket, which uses space, and may eventually take up so much space that there is no room for data in that bucket. The space may be reclaimed by: .LIST .TEST PAGE 5 .LE;Convert the file to sequential, then re-populate the indexed file. This is done with CNV/IFL on the PDP-11, or CONVERT on the VAX, and may also be done by SORT or other programs (or even with Datatrieve, but this will be much slower than the RMS utilities). This will always reclaim space, and fill factors can be honored when re-populating. .TEST PAGE 4 .LE;On the PDP-11, the QCPRS task distributed with Datatrieve for compressing dictionaries should be used regularly on active dictionaries, and will also compress other indexed files, though this use is not guaranteed by DEC. .TEST PAGE 7 .LE;On the VAX, if a file requires only a primary key which is a string, then it can be a Prologue#3 file, in which the key strings can be compressed, and the CONVERT/RECLAIM utility will reclaim unused space. Prologue#3 files may be the most compact for a given application. These files can be used ONLY on the VAX: if data has to be transported to a PDP-11, the file must be converted on the VAX to a Prologue 1 or 2 or sequential file. .END LIST .BLANK.TEST PAGE 10 .CENTER;Examples. .PARAGRAPH An example of an application could be a telephone directory. .BLANK Number#######Name#################Dept.##Floor#Frame .BLANK 7799#########Lederman,#Bart#Z.####PNA####20####19T5-8 .BLANK.TEST PAGE 4 If the file is used primarily to print out a directory, then it could be sequential. There is no need for the extra overhead of indexed files if the data is just printed out occasionally. .BLANK.TEST PAGE 6 If it is desired to have one telephone number per person, then make the telephone number a key field with duplicates not allowed: this would make it impossible to have two entries with the same number. If the application is normally printed by telephone number, then this would also eliminate the need to sort the file, as it will already be ordered by number. .BLANK.TEST PAGE 4 If the file is part of an on-line inquiry system, then keys MIGHT help depending upon the type of lookup. If lookup is by number, keys would probably make access faster; this would mean commands such as .BREAK FIND#TELEPHONE#WITH#NUMBER#=#7799. .BREAK.TEST PAGE 3 If lookup is by name, and the entire name is known, keys would make access faster; such as .BREAK FIND#TELEPHONE#WITH#NAME#EQUALS#"LEDERMAN". .BREAK If lookup is by name, and the normal access is .BREAK FIND#TELEPHONE#WITH#NAME#CONTAINING#"LEDE" .BREAK.TEST PAGE 6 then keys ^&WILL NOT\& make access faster. When a CONTAINING lookup is performed, it is necessary to examine the entire contents of each field, requiring the entire file to be accessed sequentially, so keys do not help. Lookups of the type "field#EQUALS#literal" or "field#<#literal" do take advantage of indexed file organization, and are faster on keyed fields.