94 THE FRACTAL STRUCTURE OF DATA REFERENCE
Figure 7.8. Persistent cylinder images as a function ofwindow size.
Figure 7.9. Persistent file storage as a function ofwindow size.
Transient and Persistent Data Access
95
Figure 7.10.
Requests
to
persistent track images
as
a function
of
window size.
Figure 7.11.
Requests
to
persistent cylinder images
as a
function
of
window size.
96 THE FRACTAL STRUCTURE OF DATA REFERENCE
Figure 7.12. Requests to persistent files as a function ofwindow size.
and file storage that met the persistence criterion stated by (7.3). It should be
noted that the percentage of persistent data depends even more strongly on
the level of granularity, than does the percentage of active data. For track
images, percentages in the range of 10
-
20 percent of active data were found to
be persistent in a window of 24 hours; for files, the corresponding percentages
were in the range of 50 to 75 percent. The phenomenon of persistence appears
to be particularly important at the file level of granularity.
Figures 7.10 through 7.12 present the amount of
I/O associated with the
persistent data just discussed. Again, we see that persistence is increasingly
important at higher levels of granularity. At both of the installations presented
in the figure, 90 percent or more of the
I/O over a period of 24 hours was
associated with persistent files.
The results of Figures 7.10 through 7.12 provide a strong confirmation
that
I/O tuning is worth
-
while, despite the large fluctuations of load typical
of measurements taken at different times or on different days. A substantial
fraction of all files do exhibit persistent activity, and those that do tend to be
the ones that dominate the overall
I/O load.
3.
We now focus strictly on file activity, as observed using the OS/390 System
Measurement Facility (
SMF). The use of SMF, rather than I/O tracing, allows
much longer time windows to be analyzed. The results of this section are based
mainly on the file open/close event traces contained in the
SMF record types
PERIODS UP TO ONE MONTH
Transient and Persistent Data Access
97
14, 15, 62, and 64, plus miscellaneous other records relating to file creates,
renames, and deletes. Due to the use of this source of data,
I/O activity is
accounted for based upon the software
-
supplied EXecute Channel Program
(
EXCP) counts, as placed into the SMF records just mentioned.
This section presents the results of a study in which SMF data over a period
of one month was obtained at two
OS/390 installations. These were:
C. A moderate
-
sized installation with a mix of on
-
line CICS, IMS, and DB2
database activity, plus TSO.
D. A large installation with on
-
line DB2 database, batch, and TSO activity.
Both installations had adopted active policies for Hierarchical Storage Man
-
agement (HSM). At both installations, general
-
purpose (primary) disk storage
contained, for the most part, only files referenced within the relatively recent
past. The policies for the management of the remaining files, administered
via System Managed Storage (SMS), called for SMS to migrate unused data,
first to a compressed disk storage archive, then to tape after a further period of
non
-
use. SMS would also recall such unused data, back to primary storage, on
an as
-
needed basis.
To a remarkable degree, the files opened at both installations tended to be
ones that had previously been open in the very recent past. Nevertheless, long
gaps between open requests to a given file also occurred with a substantial
probability. Figure 7.13 presents the resulting distribution of file interarrival
times at each installation.
Figure 7.13. Distribution offile interarrival times, based upon open requests.
98
THE FRACTAL STRUCTURE OF DATA REFERENCE
It should not be surprising that the curves for both installations exhibit
heavy
-
tailed behavior. Both curves appear to conform reasonably well to a
mathematical model of the form (1.4), in that both resemble a straight line
when plotted in a log
-
log format.
Very few requests (about .5 percent at installation C, for example) ask for
data that have not been used for five days or longer. Thus, Figure 7.13 suggests
that an
HSM policy calling for migration of unused data after one week or
more would have a good chance of being acceptable from the standpoint of
application performance.
Figure 7.14 presents the average amount of storage associated with files that
were active during various windows of time, ranging from about 15 hours up
to 31 days. In cases where a file was created or scratched during a given study
window, Figure 7.14 includes only the file’s storage while allocated.
Figure 7.14 is adjusted, however, to ignore the impact of storage manage
-
ment. For example, if a file was migrated to tape during a given study window,
then this action has no effect on the storage demand accounted for by the figure.
To understand the implications of Figure 7.14, it is useful to think through
what the figure would look like in several specific examples:
1. A collection of static, continuously active files. In this case, the figure
would be a straight, horizontal line.
2. A series of transient files which are created at random times, referenced at
the time that they are created, not referenced afterward, and never scratched.
The longer such files are allowed to accumulate (the more generous we are
Figure 7.14. Average active file storage over periods up to one month.