This document describes the Failure Trace Archive format.
The trace format is organized hierarchically as follows:\\
Platform -> Node -> Component -> Event Trace.
INSERT picture of schema.
We summary the meaning of each table below. Table names are shown in bold.
A '''platform''' contains a set of nodes. Examples of a platform include SETI@home, desktops at Microsoft.
A '''node''' contains a set of components, which is a software module or hardware resource of the node. Each node can have several components (e.g. CPU speed, availability memory, client availability), each of which has a corresponding trace.
A '''component''' describes attributes of a software module or hardware resource of a node.
'''component_perf''' is the component performance, as measured through benchmarks for example.
A '''creator''' is the person responsible for the trace data set. This table stores details about citations and copyright.
An '''event_trace''' is the trace of an event, with all of corresponding timing information
'''event_state''' is the state corresponding to an event_trace. For example, for CPU availability, the event_state could be the idleness of the CPU. For host availability, it could be the monitoring information associated with the event.
----
A description of the table attributes appears below.
|| border=1 width=100%
||! '''platform''' !||
||platform_id ||A unique number identifying this platform. >> It allows one to differentiate pools of nodes.||
||platform_name ||name of the platform (e.g. "Berkeley_NOW_Lab_Fall_1998") ||
||platform_location ||location name of the platform source (e.g. "Berkeley NOW Lab - Soda Hall 2nd Floor, USA, Planet Earth") ||
||platform_type ||type of the platform (cluster, multicluster, grid, desktop_grid, or volunteer_computing) ||
||misc_notes ||miscellaneous notes ||
|| border=1 width=100%
||! '''node''' !||
||node_id ||unique ID for this node ||
||platform_id ||id of the platform containing node ||
||node_name ||name of node||
||node_ip ||IP address ||
||node_location ||location of the node (e.g. country, geographic coordinates)
||timezone ||time zone of the resource (second offset from GMT)||
||proc_model ||processor name, model, version number||
||os_name ||name and version of the resource OS ||
||cores_per_proc ||number of cores per processor ||
||num_procs ||number of processors for this node||
||mem_size ||number of bytes of memory ||
||disk_size ||number of bytes of disk space||
||up_bw ||number of bytes/sec of upload speed ||
||down_bw ||number of bytes/sec of download speed ||
||metric_id ||unique ID for performance metric (e.g. benchmark) ||
||notes ||other notes related to this resource||
|| border=1 width=100%
||! '''node_perf''' !||
||metric_id ||unique ID for performance metric (benchmark)||
||component_id ||unique ID for the component||
||node_id ||unique ID for this node ||
||platform_id ||ID of platform containing node ||
||sfpop_speed ||maximum single precision floating point speed (ops/sec)||
||dfpop_speed ||maximum double precision floating point speed (ops/sec)||
||iop_speed ||integer operation speed (ops/sec)||
||i_val ||integer ||
||f_val ||float ||
||s_val ||string||
|| border=1 width=100%
||! '''component''' !||
||component_id ||unique ID for this component ||
||node_id ||ID of the node containing this component ||
||platform_id ||ID of platform containing this node ||
||node_name ||Name of the node||
||component_type ||type of this component trace (0 -> host availability, network, CPU, client, memory, etc) ||
||trace_start ||when the trace event first appeared (epoch time)||
||trace_end ||when the trace event last appeared (epoch time)||
||resolution ||resolution of the traces in seconds||
|| border=1 width=100%
||! '''creator''' !||
||creator_id ||ID of creator of this component trace data ||
||component_id ||unique ID for this component trace data ||
||node_id ||ID of the node corresponding to this trace ||
||platform_id ||ID of platform containing node||
||creator ||name(s) of the person(s) who recorded the event traces||
||cite ||citation (bibtex, etc) for using the data from the event traces ||
||copyright ||details of the copyright and rights reserved||
|| border=1 width=100%
||! '''event_trace''' !||
||event_id ||unique ID of event state||
||component_id ||unique ID for this component trace data||
||node_id ||unique ID for this node ||
||platform_id ||ID of platform that is the node parent||
||node_name ||name of node||
||event_type ||type of event (0 -> unavailability, 1-> availability). Event id's up to 10,000 are reserved; the rest can be user defined ||
||event_start_time ||start of this event (UNIX epoch time)||
||event_end_time ||end of this event (UNIX epoch time) ||
||event_end_reason ||reason the event type or state changed at the end of this trace (for example, reason that CPU became unavailable: 0=undefined, 1=miscellaneous, 2=mouse_activity, 3=keyboard_activity, 4=scheduled_downtime, 5=graceful_shutdown, 6=hard_shutdown)||
|| border=1 width=100%
||! '''event_state''' !||
||event_id ||unique ID of event state ||
||component_id ||unique ID for this component trace data ||
||node_id ||unique ID for this node ||
||platform_id ||ID of platform that is the node parent||
||i_val ||integer ||
||f_val ||float (for example, 0% - 100% for CPU availability)||
||s_val ||string||