DISK DATA CACHING DRIVER -- CDDRIVER This driver implements a "fully-associative" type of caching for data read from disk via a standard VMS disk device driver. The initialization, sizing, return of statistical information and recording of I/O operations performed on the disk are controlled via special I/O functions. Assemble and link CDDRIVER by executing the CDBLD.COM procedure. The driver is loaded and one unit connected via the CDLOAD.COM procedure; up to 7 more units may be connected as required. The specific design of CDDRIVER was dictated by AEPSC's own unique needs. Our installation relies heavily on a real-time data base which is accessed directly via logical block QIOs. No amount of RMS tuning will help. Also, there is no "file" I/O, so there is no advantage to doing fancy things like "read-ahead" or caching based on file-ids. By recording I/O operations, however, 80% of all disk operations are reads so some form of caching should help. This first version of CDDRIVER uses queues to identify and control the caching of disk logical blocks. For large caches, a significant amount of time may be spent searching these lists for where the data may be located or should be saved. Also, cached data is mapped page by page rather than relying on any physical contiguity of the memory allocated to the cache. This has an advantage in NOT requiring any SYSGEN parameter changes prior to installing CDDRIVER (i.e. REALTIME_SPTS, SPTREQ, etc.). Performance measurements on CDDRIVER indicates that the driver adds about 8% overhead to I/O operations that can't be resolved from the cache. Average transfer rates of roughly 270 Kbytes/sec with no cache increased to 1829 Kbytes/sec with 100% hits from cache. The "break even" point appears to be at a 15-20% hit rate. Future versions will implement set-associative caching, and may fully map the cache to the system address space. And maybe, with some help from the DECUS community, CDDRIVER could be setup to hook disk port drivers to cache cluster-wide disks. BASIC LOGIC Disk data caching is initialized via the IO$_SETCHAR!IO$M_CD_STARTUP QIO function which associates a CDDRIVER unit with a specific disk device unit to be cached. As currently defined, up to 8 CD: units (CDA0:-CDA7:) may be assigned to cache 8 different physical (or virtual) disks. The startup QIO function requires that the requesting task has a channel assigned to the disk to be cached which is passed in QIO parameter 1. CDDRIVER verifies that the channel is assigned to a disk device and checks to make sure caching is not active for the disk on any other CD: device. If all checks pass, CDDRIVER saves the normal disk driver's start I/O entry point from the disk's Driver Dispatch Table (DDT), and then inserts the address of its own alternate start I/O entry point into the disk's DDT. From this point on, all disk I/O requests are intercepted by the cache driver. For each I/O request intercepted by CDDRIVER from the disk driver, CDDRIVER verifies that the request is for the specific disk unit being cached and that the request is a "direct" I/O. All other requests are passed immediately back to the disk driver for handling. CDDRIVER handles read operations by scanning the cache for each logical block being transferred. Page and swap operations are not cached unless the IO$M_CD_PAGSWAPIO modifier was specified when the cache was started. If the entire transfer can be resolved from cache, the user's data buffer is filled from the cache and the I/O completed. If any portion of the transfer is not cached, CDDRIVER alters the IRP$L_PID field to force VMS post-processing to reenter the cache driver when the disk read completes, and returns control to the normal disk driver's start I/O entry point. The handling of Write operations is controlled by several initialization options. By default, CDDRIVER only loads data from blocks written that were already cached. Specifying the IO$M_CD_FLUSH modifier at startup time forces CDDRIVER to unconditionally invalidate any cached data. If IO$M_CD_LOAD was used instead, all data written to disk will be loaded into the cache on I/O completion. Note that the data is not available in the cache until the write to disk completes successfully. All write operations are passed back to the disk driver to be executed. If data is to be loaded into cache (by default or IO$M_CD_LOAD), CDDRIVER regains control from VMS post-processing in the same manner as described for read processing. CDDRIVER is called on I/O completion to load data into the cache. If the I/O failed, any cached data is invalidated. Otherwise the data is copied from the user's data buffer to cache data buffers. The IRP$L_PID field is then restored and the I/O is completed in the normal fashion. MEMORY USAGE CDDRIVER allocates pages to the cache by pulling entries from the free page list. Control structures required to track cache usage and provide mapping information for each cache block/page are allocated from non-paged pool. A check is made whenever a disk's cache is initialized or extended to prevent depletion of the free page list. CDDRIVER moves data between the cache and process buffers 1 page at a time, remapping the cache and process buffers as required using 3 system page table entries: 1 to map the cache page and 2 to map 1-2 pages of the user's data buffer depending on the byte offset within page of the buffer's start address. This arrangement allows extension of the cache while the cache is active and does not place large requirements on the number of free SPTs or non-paged pool space required. SPECIAL I/O FUNCTIONS -- DEFINITIONS The CDDEFIN.MAR module declares most of the data structures and special I/O function code modifiers used by CDDRIVER. The CD$IODEF macro contained in CDDRIVER.MLB may be invoked by applications to declare these symbols for their reference. These symbols are also declared as globals and available at link time from the CDDRIVER.OLB object library. SPECIAL I/O FUNCTIONS -- IO$_SETCHAR/IO$_SETMODE CDDRIVER uses the IO$_SETCHAR/IO$_SETMODE functions along with special function modifiers to control the cache: IO$M_CD_STARTUP - Starts disk data caching. IO$M_CD_SHUTDOWN - Terminates disk data caching. IO$M_CD_PURGE - Invalidates all cached data. IO$M_CD_EXTEND - Adds pages of memory to the cache. These modifiers are mutually exclusive; specifying more than one results in SS$_ILLIOFUNC return condition. One modifier may be used in combination with any other valid modifier: IO$M_CD_ZERO - Clears all statistics on the cache. The IO$_SETMODE!IO$M_CD_STARTUP QIO function starts caching for a physical disk on the CDDRIVER unit assigned to the channel specified in the QIO "chan" argument (eg. CDA0:, CDA3:). QIO parameter 1 (P1) specifies, by value, the process's channel number assigned by the process to the device to be cached (eg. DRC1:). Parameter 2 (P2) specifies the size of the cache to be allocated which will be rounded up to the next 64 page/block boundary. A cache size of 0 is valid and may be used along with IO$_READxBLK QIOs to simply monitor all I/O requests being issued to the disk device. The following additional function modifiers may be used with IO$M_CD_STARTUP: IO$M_CD_FLUSH - Forces invalidation of any cached data on a write to disk. IO$M_CD_LOAD - Loads cache with all data written to disk (only when IO$M_CD_FLUSH is not specified); otherwise only those blocks written that were already cached will be updated in the cache. IO$M_CD_PAGSWAPIO- Enables caching of page and swap I/O operations; otherwise, page and swap I/O operations are not cached (even if not being cached, page/swap writes will invalidate any cached data). Once started, IO$_SETMODE!IO$M_EXTEND function can be used to add pages to the cache at any time. QIO parameter 1 (P1) specifies the number of pages to add. The other SETMODE/SETCHAR modifiers take on no QIO parameters and should be self explanatory. SPECIAL I/O FUNCTIONS -- IO$_SENSECHAR/IO$_SENSEMODE CDDRIVER uses the IO$_SENSECHAR/IO$_SENSEMODE functions to return information regarding the cache when the IO$M_CD_GETINFO function modifier is specified. Otherwise, the address of the UCB of the disk being cached is returned in the second long-word of the I/O status block (contents of UCB$L_DEVDEPEND). The IO$_SENSEMODE!IO$M_CD_GETINFO QIO must specify the address and size of the user's buffer to be filled with information on the cache in QIO parameters 1 and 2 (P1 and P2) respectively. This buffer should be at least xx bytes long to avoid a SS$_BUFFEROVF return status. The data returned in the user's buffer is: +-------+-------+-------+-------+ 0 | Generic device name of cached | | disk (counted ASCII string) | / . / / . / +-------+-------+-------+-------+ 16 | (reserved) | Disk unit # | +-------+-------+-------+-------+ 20 | Cache size in blocks/pages | +-------+-------+-------+-------+ 24 | Number of reads processed | +-------+-------+-------+-------+ 28 | Total number of blocks read | +-------+-------+-------+-------+ 32 |Number of blocks loaded by read| +-------+-------+-------+-------+ 36 | Number of blocks read once | +-------+-------+-------+-------+ 40 |Number of blocks "hit" in cache| +-------+-------+-------+-------+ 44 | Number of writes processed | +-------+-------+-------+-------+ 48 | Total number of blocks written| +-------+-------+-------+-------+ 52 |Number of blocks loaded by writ| +-------+-------+-------+-------+ 56 | Number of blocks written once | +-------+-------+-------+-------+ 60 Managers should use the information returned by IO$M_CD_GETINFO to tune the cache. The cache hit rate is determined by the number of blocks "hit" in cache divided by the sum of the total blocks read or written. Preliminary performance data suggests that a hit rate of at least 15-20% is required to "break even" with the overhead incurred by the caching algorithm. The "number of blocks read once" indicates how often data loaded into cache was never accessed again; if this is high CDDRIVER's overhead is probably hurting performance. The "number of blocks written once" indicates how often a block was loaded into cache after a write and never read again. If this number is high and the cache was initialized with IO$M_CD_LOAD, reinitialize the cache without this option. If this number is still high, using the IO$M_CD_FLUSH (i.e. never load cache on write) should improve performance. SPECIAL I/O FUNCTIONS -- IO$_READVBLK/IO$_READLBLK The standard read QIO functions allow a process to monitor all I/O requests issued for the disk associated with the cache driver. Two subfunction modifiers may be specified: IO$M_TIMED - Controls timeout of the request with resolution of +/- 10 mSec via QIO parameter 3 (P3). IO$M_CD_RRD - Records all direct read and write operations; otherwise only write operations are recorded in the user's buffer. The read QIO completes when the user's buffer is filled or the timeout period specified with the IO$M_TIMED modifier has expired. The user's data buffer is filled with fixed length records containing information on all direct I/O write and read (if IO$M_CD_RRD specified) operations intercepted by CDDRIVER. Two record formats are defined depending on whether the original request was a virtual or logical operation. For logical block I/O (IRP$V_VIRTUAL bit clear in IRP$W_STS), the record format is, +-------+-------+-------+-------+ 0 | IRP$W_STS | IRP$W_FUNC | +-------+-------+-------+-------+ 4 | Starting Logical Block | +-------+-------+-------+-------+ 8 | Transfer Block Count | +-------+-------+-------+-------+ 12 | Requestor's IPID | +-------+-------+-------+-------+ For virtual block I/O (IRP$V_VIRTUAL bit set in IRP$W_STS), the record format is, +-------+-------+-------+-------+ 0 | IRP$W_STS | IRP$W_FUNC | +-------+-------+-------+-------+ 4 | Starting Logical Block | +-------+-------+-------+-------+ 8 | Transfer Block Count | +-------+-------+-------+-------+ 12 | Requestor's IPID | +-------+-------+-------+-------+ 16 | Starting Virtual Block | +-------+-------+-------+-------+ 20 |File Seq Number| File Number | +-------+-------+-------+-------+ 24 | Segmented I/O | Rel Vol Number| +-------+-------+-------+-------+ Consult the "Guide to Writing a Device Driver" for the definition of fields within the IRP$W_FUNC and IRP$W_STS words. The "Segmented I/O" field is set "true" if the operation has been segmented by the XQP/ACP. Note also that the PID returned is the VMS "internal PID" and must be converted to an external PID (see VMS release notes) to correspond with PIDs displayed by SHOW PROCESS, etc. The number of bytes returned in the user's data buffer is placed in the second 16-bit word of the I/O status block. USE WITH VDDRIVER One of the primary reasons a virtual disk driver was included with this submission was for use with CDDRIVER. If, by recording I/O operations with CDDRIVER, a specific set of files can be identified as having relatively high I/O rates, a virtual disk can be assigned to a contiguous container file, initialized, mounted, and accessed just as any other physical disk device. By copying the high I/O rate files to the virtual disk and then caching the virtual disk, only these files will be subject to caching. The cache can than be sized to optimize performance and minimize the impact on physical memory. Alternatively, the virtual disk size and cache size could be made identical (within constraints of available memory) and all files on the virtual disk copied to the null device to initially load the cache. From this point on, only write operations will be performed to the physical disk, and the VDDRIVER/CDDRIVER performance should start to approach the speed of a true "memory disk" with the added benefit that the virtual/physical disk device always contains current data. VERSION 5 IMPLEMENTATION Under Version 4 of VMS (CDDRIVER.V47), CDDRIVER forked to IPL$_QUEUEAST to access the cache data structures. It was thought that this might improve overall system performance since the driver wouldn't be hogging time at IPL$_SYNC when it didn't have to. With Version 5 and SMP, there were enough changes to the I/O support routines that were vague (working without fiche) that CDDRIVER.V50 executes at the cached disk driver's fork IPL. The biggest reason for this was that when the disk driver is called at fork level, the driver (or someone) apparently could specify some "affinity" for which processor the driver should be executing on. A fork to IPL$_QUEUEAST may break this affinity. *** NOTE *** Despite the use of SMP macros in CDDRIVER.V50, the driver has never been linked or tested in a SMP environment. To the best of my meager understanding of spin locks, etc., I think the coding is correct. TEST PROGRAMS A subdirectory of test programs has been included with this submission. These programs were written to debug/exercise CDDRIVER; they may be in various states of (in)completion so don't judge them too harshly. Paul R. Sorenson AEP/Engineering Computer Support Center 1 Riverside Plaza Columbus, OH 43215