.RIGHT MARGIN 80
.TITLE;RMS for Datatrieve
.SUBTITLE;Datatrieve Pre-Symposium Seminar, Spring 1983
.CENTER;What is RMS
.PARAGRAPH
RMS is a set of software services which provides a uniform method of accessing
records in files. It is inherent in VMS, and added on to IAS, RSX and RSTS/E:
it is therefore available for all operating systems where Datatrieve is
available. 
.PARAGRAPH
RMS supports several file types: Sequential, Relative and Indexed.
.BLANK
Datatrieve does not support Relative files as domains, and would read existing
files sequentially. 
.PARAGRAPH.TEST PAGE 4
RMS supports Fixed length and Variable length records.
.BLANK
It also supports some other record types for compatibility with existing files.
.PARAGRAPH.TEST PAGE 4
RMS arbitrates file sharing, allowing more than one program to access the same
file. It also arbitrates block locking, so that two programs will not attempt
to modify the same record at the same time. 
.PARAGRAPH.TEST PAGE 8
RMS controls the conversion of data on disk (or other storage medium) to data
within a record. Data on a disk is in blocks, each 512 bytes long. RMS uses one
or more blocks to form a bucket, which may then contain one or more records.
The bucket size is fixed when the file is created: from that point, the user
(program) need deal only with single records. Note, however, that the buckets
must be stored in buffers within the users' program: in Datatrieve-11, this
uses up pool. 
.BLANK.TEST PAGE 11
.CENTER;Sequential Files.
.PARAGRAPH
Sequential files are the smallest for a given amount of data, are compatible
with programs which do not use RMS (such as FCS on IAS and RSX, Stream on
RSTS), can be stored on mediums other than disk (Magnetic tape), can be
transmitted over communication lines, and generally have the least overhead.
The file contains data, plus: a record length descriptor for variable length
records only; and padding if the records don't fall on word boundaries (on
PDP-11). If the data is normally accessed in a sequential manner, Sequential
files will be the fastest to process. 
.BLANK
Disadvantages: files must be sequentially accessed. New records can only be
added to the end of the file, normally records can only be deleted from the end
of the file. Slow access to center of file, or processing non-sequentially
(going back one record usually means starting over again from the beginning).
Sharing limited to multiple access for Read Only. 
.BLANK.TEST PAGE 5
.CENTER;Relative Files.
.PARAGRAPH
Datatrieve does not support access by record number. Existing relative files
may be accessed sequentially. 
.BLANK.TEST PAGE 8
.CENTER;Indexed Files.
.PARAGRAPH
Access may be by key, or sequentially (default order by primary key). Keys may
have duplicate entries, or no duplicates (guards against duplicate data). By
accessing sequentially by primary key, the file is "automatically sorted".
Accessing by keys allows fast access to data in the center of the file. Files
can be shared for read or write. 
.BLANK
Disadvantages: Highest overhead in disk space, memory used, amount of software
code needed. Must be stored on disk only. Indexed files occupy more space as
both the data and the index must be stored: one index area for each key, plus
15 bytes in each bucket, plus 7 bytes for each record. In addition, space can
be "lost" if records are moved or deleted. The file must have one key, the data
of which may not change (records can be deleted and re-entered with new key
field). 
.BLANK
.CENTER;Creating a file.
.PARAGRAPH
A file can be created for any domain with the DEFINE#FILE command. This file
will always work, but may not be optimum. 
.PARAGRAPH
A file can also be created using the DEF (PDP-11) or CREATE (VAX) utility. To
do this, one must know the attributes of the file. The best place to begin is
to create a file with Datatrieve (or another language), then examine the file
with DSP (PDP-11) or ANALYZE/RMS__FILE (VAX), and choose the areas which may be
changed for better performance in a given application. 
.PARAGRAPH
The approximate sequence of decisions is:
.BLANK
Choose a file name.
.BLANK.TEST PAGE 6
Should the old file be superseded? I recommend not superseding the old file:
create a new version, check to see it is correct, then purge (delete) the old
file. RSTS/E users should back up the old file first. The exception is a file
used for temporary storage (scratch data): having this superseded keeps old
versions from piling up on disk. 
.BLANK
Should the file be Sequential, or Indexed.
.BLANK.TEST PAGE 3
Record Size: the size required to hold the data. This may be obtained from an
existing file, and is also given by Datatrieve when the record is defined. 
.BLANK.TEST PAGE 5
On Sequential files, should records cross block boundaries: generally yes,
unless the file is shared with an existing program which already calls for
non-spanned blocks. If the record size is greater then 512 bytes, blocks must
be spanned. Spanning packs the data onto the disk with no wasted space. 
.BLANK.TEST PAGE 5
Carriage Return Control: usually say yes, this allows files to be PIPed or
TYPEd to your CRT screen, or printed on a terminal or line printer. This does
not increase file overhead, as no extra data is added to the file. Exception:
if FORTRAN control is required for an existing application. 
.BLANK.TEST PAGE 6
KEYS (if indexed file):
.LIST
.BLANK
TYPE: string, integer, binary, or packed decimal, determined by the type of
data. Strings do not have to be ASCII characters. 
.BLANK.TEST PAGE 4
POSITION: the location within the record, which can be segmented. Datatrieve-11
does not allow creation of segmented keys, but may be able to read an existing
file with segmented keys. 
.BLANK.TEST PAGE 4
SIZE: the length of a key. Strings are 1 to 255 bytes, Integer and Binary are 2
or 4 bytes (1 or 2 words), and Packed Decimal is 1 to 16 bytes (1 to 31 digits
plus sign). 
.BLANK.TEST PAGE 3
NAME: Datatrieve does not use the name, but placing one in the file documents
the field use, for reference later. 
.BLANK
DUPLICATES: if you want more than one record to have the same key.
.BLANK.TEST PAGE 3
CHANGES: if you want to allow the value of a key to change. The primary key
never allows changes: bad data must be deleted and re-entered. 
.BLANK.TEST PAGE 3
NULL VALUE: allows entering a record with no key. This record cannot be
retrieved by key, but can be retrieved by RFA or default. 
.END LIST
.BLANK.TEST PAGE 6
AREAS: normally, the file is one area, so the index is next to the data. By
defining separate areas, one could put the indexes for several files next to
each other with the data elsewhere, or on the VAX, the area on one disk and the
data on another. Unless the parameters of an application are known in great
detail, do not separate areas. 
.BLANK.TEST PAGE 5
PLACEMENT CONTROL: allows specifying the location the file will occupy on a
disk. Can optimize disk activity if several files are normally accessed
together by placing them next to each other on the disk. The position is
usually lost when the disk is copied with a backup utility. 
.BLANK.TEST PAGE 4
INITIAL ALLOCATION: if the size of the file is known, it is better to allocate
space for it on the disk when it is created than to have to extend it later.
You should at least try to allocate most of the space required initially. 
.BLANK.TEST PAGE 4
DEFAULT EXTENSION QUANTITY: if it does become necessary to extend the file,
the fewer extents the better. If it is known that large quantities of data will
be added to the file at a time, use a larger extension size. 
.BLANK.TEST PAGE 12
BUCKET SIZE (for indexed files). Usually, use the smallest bucket size which
will work (the bucket must be large enough to hold at least one record). Larger
bucket size may increase speed, but will also require more buffer space: in
Datatrieve-11, this means pool. A file with a bucket size greater than 3 may
not work at all in Datatrieve-11. On the VAX: if increasing the bucket size
will decrease the number of levels, (in the index) then a performance increase
results. The RMS utilities will tell you how many levels you have. A larger
bucket size will almost always increase speed if accessing sequentially by
primary key; it will speed keyed access only if level size is reduced, or if
several records with the same or adjacent keys are normally required at the
same time, or one after the other. 
.BLANK.TEST PAGE 6
CONTIGUOUS FILES: are generally faster, but require contiguous space on disk.
On VAX only, can specify the file to be created contiguous if space is
available, non-contiguous otherwise (on the PDP-11, the create will fail if
contiguous space not available). Most disk backup utilities make all files
contiguous even if the directory doesn't indicate that the file is contiguous. 
.BLANK.TEST PAGE 8
FILL FACTORS for indexed files (index and data). When the file is initially
filled (with CNV, IFL, or CONVERT), the fill factor will specify how much empty
space to leave in the index and data areas. If it is known that more data will
be added to the file, leave empty space for it when initially filling the file,
for faster additions later. If little or no data addition or change will be
made, fill the file. Datatrieve does not honor fill factors, and will always
fill completely. 
.BLANK.TEST PAGE 3
PROTECTION: specify the file protection codes. You can always change them
later. 
.BLANK.TEST PAGE 8
.CENTER;Reclaiming unused or lost space. (Indexed files)
.PARAGRAPH
If old records are deleted or moved, the file may contain lost space (this
happens especially with dictionaries on the PDP-11). Deleting records in an
indexed file still leaves the control information in the bucket, which uses
space, and may eventually take up so much space that there is no room for data
in that bucket. The space may be reclaimed by: 
.LIST
.TEST PAGE 5
.LE;Convert the file to sequential, then re-populate the indexed file.
This is done with CNV/IFL on the PDP-11, or CONVERT on the VAX, and may
also be done by SORT or other programs (or even with Datatrieve, but
this will be much slower than the RMS utilities). This will always
reclaim space, and fill factors can be honored when re-populating. 
.TEST PAGE 4
.LE;On the PDP-11, the QCPRS task distributed with Datatrieve for
compressing dictionaries should be used regularly on active
dictionaries, and will also compress other indexed files, though this
use is not guaranteed by DEC. 
.TEST PAGE 7
.LE;On the VAX, if a file requires only a primary key which is a
string, then it can be a Prologue#3 file, in which the key strings can be
compressed, and the CONVERT/RECLAIM utility will reclaim unused space.
Prologue#3 files may be the most compact for a given application.
These files can be used ONLY on the VAX: if data has to be transported
to a PDP-11, the file must be converted on the VAX to a Prologue 1 or 2
or sequential file. 
.END LIST
.BLANK.TEST PAGE 10
.CENTER;Examples.
.PARAGRAPH
An example of an application could be a telephone directory.
.BLANK
Number#######Name#################Dept.##Floor#Frame
.BLANK
7799#########Lederman,#Bart#Z.####PNA####20####19T5-8
.BLANK.TEST PAGE 4
If the file is used primarily to print out a directory, then it could be
sequential. There is no need for the extra overhead of indexed files if the
data is just printed out occasionally. 
.BLANK.TEST PAGE 6
If it is desired to have one telephone number per person, then make the
telephone number a key field with duplicates not allowed: this would make it
impossible to have two entries with the same number.  If the application is
normally printed by telephone number, then this would also eliminate the need
to sort the file, as it will already be ordered by number. 
.BLANK.TEST PAGE 4
If the file is part of an on-line inquiry system, then keys MIGHT help
depending upon the type of lookup. If lookup is by number, keys would probably
make access faster; this would mean commands such as 
.BREAK
FIND#TELEPHONE#WITH#NUMBER#=#7799.
.BREAK.TEST PAGE 3
If lookup is by name, and the entire name is known, keys would make access
faster; such as 
.BREAK
FIND#TELEPHONE#WITH#NAME#EQUALS#"LEDERMAN".
.BREAK
If lookup is by name, and the normal access is 
.BREAK
FIND#TELEPHONE#WITH#NAME#CONTAINING#"LEDE"
.BREAK.TEST PAGE 6
then keys ^&WILL NOT\& make access faster. When a CONTAINING lookup is
performed, it is necessary to examine the entire contents of each field,
requiring the entire file to be accessed sequentially, so keys do not help.
Lookups of the type "field#EQUALS#literal" or "field#<#literal" do take
advantage of indexed file organization, and are faster on keyed fields.