THE ARCHIVE EXCHANGE FORMAT (AXF) IS AN open-STANDARD focused on the LONG-TERM STORAGE, PRESERVATION AND TRANSPORT of file-based content.
The Archive eXchange Format (AXF) standard is an encapsulation format for generic file-based content which allows any size and type of file or file collection to be stored, transported and preserved on any type of operating system, file system, storage media or technology.
AXF provides a physical implementation of an object store and was developed to address the need for a long-term open storage and preservation format and overcome the limitations in legacy container formats such as Tape ARchive (TAR). AXF technology abstracts the underlying file and operating system technology facilitating long-term portability and accessibility to the files contained in the container while adding several Open Archive Information Systems (OAIS) preservation characteristics to provide long-term protection for file-based assets. AXF defines the specific method for encapsulation of file-based collections and their associated metadata as well as the mechanism for storage on media storage technology (data tape, magnetic disk, optical media, flash media, Cloud, etc.). AXF adds self-describing features to both the AXF Objects as well as the media on which they are stored, allowing independence from the systems which originally created them. AXF provides a universal and open file-system view of all stored objects, files and metadata allowing exchange with any applications which also comply to the published standard.
AXF takes the concept of an an object store to the physical level as a fully self-describing, self-contained encapsulation format for complex file collections. AXF provides a standardized way of storing files or file collections of any type and size, along with limitless encapsulated metadata of any size and type, on any type of storage technology or device (including flash media, spinning disk, data tape, cloud and others). AXF is independent of the host operating system and file system as well as from the application that originally created the AXF Object.
AXF can be described as a container that encapsulates any number of related files (of any type and size) into a fully self-describing and constrained object package. It supports the inclusion of any amount of open or proprietary, structured or unstructured metadata encapsulated as part of the object, providing its self-descriptive characteristics. AXF extends this self-descriptive nature to the storage media that contain AXF Objects, allowing access using any AXF-aware system ensuring long-term accessibility and protection regardless of whether the original application which created the object is available.
AXF is also used in the cloud-based transport, storage and preservation of file-based assets and asset collections. It provides a standardized, protected, and authenticated container for complex collections as a single data package that can be transported, stored, restored, verified and distributed.
AXF is based on a file and storage medium-agnostic architecture which abstracts the underlying file system, operating system and storage technology. AXF Objects contain any type, any number and any size of files as part of its payload, accompanied by any amount and type of structured or unstructured metadata, checksum and provenance information, full indexing structures, and other data within a single self-describing, encapsulated package. The AXF Object includes an embedded file system, which helps abstract complexities and limitations of underlying storage technology, file systems and operating systems. AXF Object can exist on any data tape, disk, flash, optical media, or other storage technology and can be used for network transport of data.
As a self-contained and self-describing format, AXF supports large-scale archive systems as well as simple standalone applications, facilitating encapsulation or wrapping, long-term protection and content transport between systems conformant to the AXF standard from different vendors.
AXF is an IT-centric implementation, supporting any type of file encapsulation including database files, binary executables, documents and image files. It supports the Open Archive Information Systems (OAIS) model as well as preservationist features such as provenance for both media and objects, unique identifiers (UUID, UMID, ISCI, etc.) support, geo-location tagging, error detection down to the file and structure level and data-validity spot checking.
GEneral AXF Concepts
Now we can take a closer look at how AXF Media and Objects are constructed by deconstructing them piece-by-piece. First, we can examine some of the more general aspects of AXF.
EMBEDDED FILE SYSTEM
The AXF standard is based on an embedded file system. AXF offers a translation between any type of generic file set and logical block positions on any type of storage medium being used with or without its own file system. AXF encapsulates a related set of files with any type and amount of ancillary metadata (structured or unstructured) into a single container.
AXF is intended to overcome the limitations of other encapsulation formats, which cannot support complex file structures with millions (or more) of files or handle large files particularly well. Because of its embedded file system, AXF does not depend on the storage technology. Although optimized for the storage of large assets or file collections, the AXF format can be applied to any environment where an open and accessible storage encapsulation format is required for any type of file collection.
Each AXF Object is a fully self-contained, encapsulated collection of files, metadata and any other ancillary information which adds relevancy or value to its contents. AXF is designed to handle a single file encapsulation or many millions of files of any type and size. AXF Objects are equivalent regardless of whether they are created on data tape, disk, flash, or optical media with or without file systems. This makes the creation and handling of AXF Objects on differing media deterministic once the standard has been implemented.
Block and Chunk Alignment
To aide in resiliency and performance across any storage device, technology or medium, each data structure and element contained within an AXF Object must be aligned on pre-defined chunk boundaries. These can be independent of the storage technology or medium, and can be different for each AXF Object contained on that medium if the medium itself supports this. Chunk size is typically a integer multiple or sub-multiple of blocks on block-based media which also aides in the recoverability of AXF Objects in case where indices have been damaged or corrupted.
During each AXF Object creation, copy or movement operation, the application is responsible for ensuring the data is aligned on the block boundary basis defined for the destination storage technology or media. AXF Objects can be specifically tuned to the underlying storage to optimize performance and storage efficiency even down to the level of tuning on an AXF-object-by AXF-object basis, depending on the average, minimum and maximum file sizes contained within the file payload.
Binary Structure Container
The Binary Structure Container is a simple binary envelop which wraps/contains payload information. For AXF structural elements, this payload is simple XML but in the case of Generic Metadata Containers, it can be anything from binary to text. The Binary Structure Container allows the application to comprehend its contents allowing it to be stored, validated, tracked and reliably recovered regardless of its nature or origin.
A Closer Look
This diagram shows how AXF Objects are laid out on storage media as well as the relationship between payload and structural elements inside the AXF Object.
AXF includes structures which make AXF media self-describing. This means AXF media can be exchanged between systems with no additional information provided as everything necessary to properly comprehend, interpret and recover the data is included on the media itself.
For linear media, AXF includes an ISO/ANSI standard VOL1 volume label. This is included for compatibility purposes with other applications to ensure they do not erroneously handle AXF formatted media and to signal applications that do understand AXF they can immediately access the objects contained on the medium.
The AXF Medium Identifier structure contains the AXF volume signature, a UUID and label for the media as well as information about the storage medium itself. The implementation of the Medium Identifier differs slightly depending on whether the storage medium is linear or nonlinear, and whether it includes a file system or not, but the overall structures are fully compatible. This structure adds the self–describing media characteristics to AXF so systems that support the format can immediately understand everything necessary about the media to be able to index, read and recover its contained AXF Objects.
AXF OBJECT INDEX
The AXF Object Index is an optional structure that assists in the rapid recoverability of AXF–formatted media by foreign systems. Information contained in this structure is sufficient to recover and rapidly reconstruct the entire catalog of AXF Objects on the storage medium – think of it as an advanced File Allocation Table (FAT). This Object Index can be periodically written at various points on the storage media providing enhanced, rapid indexing and recoverability. In a case where the application has not maintained these optional AXF Object Index structures, the contents of each AXF Object can still be reconstructed by simply processing each AXF Object Footer structure adding to the multilevel enhanced resiliency of the format.
Now we can look into an AXF Object a little more closely.
Each AXF Object commences with an AXF Object Header, a structure containing descriptive XML metadata describing the actual contents of the AXF Object such as its unique identifier (UMID/GUID), creation date, descriptive information, file tree information, permissions, etc.
Generic Metadata Container
Following the AXF Object header is any number of optional AXF Generic Metadata packages. These are self-contained, open metadata containers for applications to include AXF Object-specific metadata. This metadata can be structured or unstructured, open or vendor specific, binary or XML, and provides a flexible and dynamic space in which to enrich the depth of the AXF Object and permanently link it to the encapsulated file payload of the object. There are no constraints or standards governing the type of metadata, the number of packages, or their contents. In the case where there is no metadata to store along with the AXF Object, this structure is simply omitted. Metadata (XML, binary or other) can also be stored in the file payload of the AXF Object as well but their context may make processing difficult for third-party applications during subsequent restore or replication operations.
Next is a File Payload Start structure which marks the start of the file payload of the AXF Object. This is a simple, empty structure which can be easily located by even the most simple AXF applications.
The File Payload consists of zero or more File Data + File Padding + File Footer triplets. This is the actual byte data of the files to be stored within the AXF Object container. File Padding is used to ensure the chunk alignment of all AXF Object files and structure elements. This is a fundamental requirement of the AXF standard. The File Footer structure contains the exact size of the preceding file along with file-level checksums, original path information, etc. which can be processed by the application on-the-fly during operations to ensure data integrity.
Next is a File Payload Stop structure which marks the end of the file payload of the AXF Object. This is a simple, empty structure which can be easily located by even the most simple AXF applications.
The final portion of an AXF Object is the AXF Object Footer which is essentially a repeat of the information contained in the Object Header with some additional information captured during the creation of the AXF Object itself such as file checksums, block positions, file permissions, etc. The Object Footer is fundamental to the resiliency of AXF and allows efficient re-indexing of AXF media by foreign systems when the content of the media is not previously known (i.e. the self-describing nature of AXF Objects).
Each AXF Object component described above is itself encapsulated in a Binary Structure Container envelope which provides structure identification, index structure checksums, classification information, media/mime types, etc.