From Failure Trace Archive

Main: WikiSandbox

This document describes the Failure Trace Archive format.

The trace format is organized hierarchically as follows:
Platform -> Node -> Component -> Event Trace.

INSERT picture of schema.

We summary the meaning of each table below. Table names are shown in bold.

A platform contains a set of nodes. Examples of a platform include SETI@home, desktops at Microsoft.

A node contains a set of components, which is a software module or hardware resource of the node. Each node can have several components (e.g. CPU speed, availability memory, client availability), each of which has a corresponding trace.

A component describes attributes of a software module or hardware resource of a node.

component_perf is the component performance, as measured through benchmarks for example.

A creator is the person responsible for the trace data set. This table stores details about citations and copyright.

An event_trace is the trace of an event, with all of corresponding timing information

event_state is the state corresponding to an event_trace. For example, for CPU availability, the event_state could be the idleness of the CPU. For host availability, it could be the monitoring information associated with the event.


A description of the table attributes appears below.

platform
platform_idA unique number identifying this platform. >> It allows one to differentiate pools of nodes.
platform_namename of the platform (e.g. "Berkeley_NOW_Lab_Fall_1998")
platform_locationlocation name of the platform source (e.g. "Berkeley NOW Lab - Soda Hall 2nd Floor, USA, Planet Earth")
platform_typetype of the platform (cluster, multicluster, grid, desktop_grid, or volunteer_computing)
misc_notesmiscellaneous notes
node
node_idunique ID for this node
platform_idid of the platform containing node
node_namename of node
node_ipIP address
node_locationlocation of the node (e.g. country, geographic coordinates)
timezonetime zone of the resource (second offset from GMT)
proc_modelprocessor name, model, version number
os_namename and version of the resource OS
cores_per_procnumber of cores per processor
num_procsnumber of processors for this node
mem_sizenumber of bytes of memory
disk_sizenumber of bytes of disk space
up_bwnumber of bytes/sec of upload speed
down_bwnumber of bytes/sec of download speed
metric_idunique ID for performance metric (e.g. benchmark)
notesother notes related to this resource
node_perf
metric_idunique ID for performance metric (benchmark)
component_idunique ID for the component
node_idunique ID for this node
platform_idID of platform containing node
sfpop_speedmaximum single precision floating point speed (ops/sec)
dfpop_speedmaximum double precision floating point speed (ops/sec)
iop_speedinteger operation speed (ops/sec)
i_valinteger
f_valfloat
s_valstring
component
component_idunique ID for this component
node_idID of the node containing this component
platform_idID of platform containing this node
node_nameName of the node
component_typetype of this component trace (0 -> host availability, network, CPU, client, memory, etc)
trace_startwhen the trace event first appeared (epoch time)
trace_endwhen the trace event last appeared (epoch time)
resolutionresolution of the traces in seconds
creator
creator_idID of creator of this component trace data
component_idunique ID for this component trace data
node_idID of the node corresponding to this trace
platform_idID of platform containing node
creatorname(s) of the person(s) who recorded the event traces
citecitation (bibtex, etc) for using the data from the event traces
copyrightdetails of the copyright and rights reserved
event_trace
event_idunique ID of event state
component_idunique ID for this component trace data
node_idunique ID for this node
platform_idID of platform that is the node parent
node_namename of node
event_typetype of event (0 -> unavailability, 1-> availability).  Event id's up to 10,000 are reserved; the rest can be user defined
event_start_timestart of this event (UNIX epoch time)
event_end_timeend of this event (UNIX epoch time)
event_end_reasonreason the event type or state changed at the end of this trace (for example, reason that CPU became unavailable: 0=undefined, 1=miscellaneous, 2=mouse_activity, 3=keyboard_activity, 4=scheduled_downtime, 5=graceful_shutdown, 6=hard_shutdown)
event_state
event_idunique ID of event state
component_idunique ID for this component trace data
node_idunique ID for this node
platform_idID of platform that is the node parent
i_valinteger
f_valfloat (for example, 0% - 100% for CPU availability)
s_valstring
Retrieved from http://fta.scem.westernsydney.edu.au/index.php?n=Main.WikiSandbox
Page last modified on July 10, 2009, at 08:57 PM