The POOMA I/O classes have been designed to provide efficient I/O services
while keeping to the design philosophy of POOMA. POOMA I/O supports the
abstractions that make the POOMA framework powerful and flexible by making
the classes that embody them persistent. As with the rest of POOMA, the
I/O system is both flexible and extensible by users as well as by developers.
The simplest I/O model is based on inserting data items into input or output streams. Data is typically extracted in the same order as originally stored. Object-oriented applications present special problems for I/O since the the ability of users to add new data types means that many if not most types are unknown to the system. Systems that support object serialization usually have some means of prescribing how the data contained in complex types is to be marshaled and inserted into a stream. Once this definition is in place, new object types can be read or written in the same way as intrinsic types. C++ allows users to overload the insertion operators (<< and >>) for this very purpose. However, as the structure of data types becomes more complicated, the burden falling on users to serialize new types for storage can be quite heavy. Several languages and frameworks provide means of facilitating object serialization. These include, for example, JAVA and Python.
The next level of sophistication in object storage is object persistence. In an object persistence model, objects are stored as a collection of discrete entities, each individually retrievable at random from a collection of objects. A full-featured object-oriented database (OODB) knows enough about the structure of the object types in its collection to perform sophisticated queries based on object metadata.
There is often a tradeoff between these two categories of services. Object serialization is typically more efficient than object database persistence since data is simply marshaled and inserted into a stream. However, the requirement that data-consuming applications know what types of objects to expect as well as their sequence often leads to overly tight coupling between data-producing and data-consuming applications. Thus serialization is fine for monolithic applications performing what amounts to state dumps, but not as good for multi-application collaborative environments. On the other hand, there are many situations when one would just as soon not have the overhead of an object-oriented database no matter how streamlined.
Object-oriented applications benefit enormously from object-oriented data management. After all, the principal reason many programmers prefer object-oriented languages is so that they can create and exploit new data types. Object storage systems provide a way to store and retrieve user-defined types as easily as intrinsic types.
The first level of the design is comprised of a set of classes called the storage classes that are transparent to users. They organize any given storage resource into byte records. The system does not necessarily know the internal structure of a byte record, only its length in bytes. Records are elements of byte arrays. Each array is independently accessible within a storage resource and each record or element of a byte array is also independently addressable. A range of elements within a byte array can be read or written in one operation. These byte arrays are automatically extended whenever an operation writes past the current number of elements. Arrays are members of a collection called a storage set which serves as the logical interface to storage in terms of arrays. The physical storage in this implementation is a disk file, but a storage set is an abstraction barrier that need not be associated with a file in general. For example, future implementations may support storage sets based on databases or remote application resources.
The second level is made up of the object storage classes. These classes view storage as a set of typed objects called an object set. Any instance of a type supported by the I/O system can be stored along with a descriptive label in one operation. Object sets can be queried to reveal the number of objects contained, the types of objects contained, the number of objects of each type, and the labels of each object. A single operation is sufficient to retrieve an object given either its name, or its instance ID which is equivalent to its position in the list of object instances for a given type.
The storage of specific types is enabled by specializations of two generic classes: object serializers and object adapters. As one would infer from the discussion above, serializers serialize objects to a stream, whereas adapters adapt specific types to storage and retrieval in an object set. Adapters often use the services of serializers. The object storage classes in turn use the services provided by the storage set and byte array classes.
To support a different storage type or format, or to optimize I/O for performance, one need only modify the basic storage classes thus leaving the object storage classes unchanged. Several different types of storage can coexist in the same application. The benefit of this design is that new types can be supported simply by creating new serializer and adapter specializations. Our intent is to allow users as well as developers to extend the range of supported types by writing a small amount of new code, or by writing a simple high-level description of the new classes.
The main goal of the POOMA I/O design is to achieve a high level of support for object storage and management without incurring the overhead of a full-featured object-oriented database. Straightforward storage and retrieval operations are provided based on simple queries.
It was also considered important to expose the basic I/O mechanisms through the storage set and byte array classes so that developers could gauge the performance implications of an implementation based on generic storage abstractions. The separation of basic I/O from object management permits performance to be optimized without requiring modifications in any portion of the object management layer.
Historically, an object persistence model was considered first and object serialization later. The compatibility of these two models, as well as a straightforward solution for supporting and leveraging both, emerged later in design iteration cycles. Thus, in this release users can store and retrieve POOMA objects in an object set, but cannot serialize the same objects to a standard output stream. This feature will be added in the next release.
Since storage adapters are currently hand-crafted, there are only a few basic types supported at this time. Experience gained in writing adapters and serializers for this release will allow us to semi-automate the process of adding support for new types. Some capability of this kind as well as full coverage of all POOMA objects is intended for the next major release of the software.
This release supports standard native binary I/O. Future releases may support storage using the HDF5 format.
The following section provides details of the object set interface.
ObjectSet() This is the default constructor. Constructed this way, an object set is unusable until an open() operation places it in an appropriate state attached to a particular storage resource.Example:ObjectSet(const std::string& name, StorageResourceType type, StorageAccessMode mode) This is the primary constructor. The arguments are:
name The name of the object set. For file-based storage (the only type for this release) this is literally the name of the file.type This is an instance of an enumerated type called StorageResourceType whose allowed values for this release are:
StdStorage Standard binary file mode An instance of an enumerated type called StorageAccessMode that defines the access mode. The allowed values are:
storageIn Read-only access storageOut Write-only storageOutTrunc Write-only; destroy data if the resource exits storageInOut Read-write; append new data to existing data storageInOutTrunc Read-write; destroy existing data if the resource exists
ObjectSet obset("DataFile.std", Std5Storage, storageInOutTrunc);Creates an object set obset as a binary file whose name will be "DataFile.std." The file is opened for read-write, but if a file by that name already exists, all exisiting data will be destroyed (i.e., the file will be truncated).
int open(const std::string& name, StorageResourceType type, StorageAccessMode mode) Opens a default object set or closed set assuming all attributes of the object set are new. The arguments have the same meaning as in the main constructor. It returns 0 if successful.
int open(const std::string& name, StorageAccessMode mode) This variant assumes that the storage resource type has already been set. It generates an error if the object set has only been default constructed, and returns 0 if successful.Examples:
status= obset.open("DataFile.std", storageIn); assert(status==0);Opens the previous file (assuming it has been closed) in read-only mode.status= obset.open("OtherData.dat", stdStorage, storageOutTrunc); assert(status==0);Having closed the previous object set, this opens a completely different resource of a different type (standard binary in this case) for output, destroying any pre-existing version.
template <class T>Examples:
long store(T& t, const std::string& objectName) Stores an instance of the given type along with a user-defined label. The function returns the object ID assigned by the object set. Valid IDs are zero or greater.t The given memory-resident object instance.template <class T>
objectName The user-assigned name or label to be associated with this instance.
int retrieve(T& t, long id) Retrieves an object given its ID. It returns 0 if successful.t The memory-resident object instance to be instantiated from the persistent version.template <class T>
id The ID for the stored instance.
int retrieve(T& t, const std::string& objectName) Retrieves an object given its label. Labels are not unique. If there is more than one object of the given type with the same label, it restores the first one. It returns 0 if successful.t The memory-resident object instance to be instantiated from the persistent version.
objectName The user-assigned name or label associated with this instance.
int nTimeSteps=1000; long id= obset.store(nTimeSteps, "Number of Time Steps"); assert(id>=0);Stores the given int instance with the associated label "Number of Time Steps." An integer (long) ID is returned.int nSteps; int status= obset.retrieve(nSteps,id); assert(status==0);Retrieves the value previously stored given the ID, presumably known. Alternatively one could use:status= obset.retrieve(nSteps,"Number of Time Steps"); assert(status==0);
const std::string& name() const Returns the name of the object set.StorageAccessMode mode() const Returns the current access mode.
bool isOpen() const Boolean operation to check whether the set is open.These functions query the contents of an object set:bool isClosed() const Boolean operation to check whether the set is closed.
int numTypes() const Returns the number of types in the set.Examples:int numInstances(const std::string& typeName) Returns the number of instances of a given type referred to by type name.
typeName The name of the type in question.int numInstances(long typeID) const Returns the number of instances of a given type referred to by type ID.typeID The type ID or index. Within a given object set, the types contained are indexed from 0, ..., (number of types -1).const std::string& typeName(long typeID) const Returns the type name given a type ID.typeID The type ID or index.long typeID(const std::string& typeName) Returns the type ID given the type name.typeName The name of the type in question.const std::string& objectName(const std::string& typeName, long instanceID) Returns the object name given a type name and instance ID.typeName The name of the type in question.const std::string& objectName(long typeID, long instanceID) Returns the object name given the type ID and the instance ID.
instanceID The instance of this type. Instances are numbered from 0, ..., (number of instances -1) for a given type.typeName The name of the type in question.
instanceID The instance of this type.
The following is based on the premise that the application has opened an existing file by creating an instance called obset in read-only mode. The application generates a report on the contents of the file.
The next example is based on a similar premise. In this case, the application knows that there are several instances of complex<double> called "Field Value." Complex numbers are a templated type in C++ whose conventional type designation in POOMA I/O is "std::complex<T>." The application collects the values by retreiving each instance of this type that matches the name and putting it in a standard C++ vector container.std::string obsetName= obset.name(); int nTypes= obset.numTypes(); std::cout<<"Contents of ObjectSet "<<obsetName<<std::endl; std::cout<<"Number of types = "<<nTypes<<std::endl; if(nTypes!=0){ std::cout<<"Type Type Name Number of Instances"<<std::endl; int numInstances; int j; for(int i=0; i<nTypes; i++){ numInstances= obset.numInstances(i); std::cout<<i<<" "<<obset.typeName(i)<<" " <<numInstances<<std::endl; std::cout<<" Instance Object Name"<<std::endl; for(j=0; j<numInstances; j++){ std::cout<<" "<<j<<" " <<obset.objectName(i,j)<<std::endl; } std::cout<<std::endl; } }
vector<std::complex<double> > fieldVals; std::complex<double> complexVal; int nInstances= obset.numInstances("std::complex<T>"); int status; for(int i=0; i<nInstances; i++){ if(obset.objectName("std::complex<T>",i)=="Field Value"){ status= obset.retrieve(complexVal,i); assert(status==0); fieldVals.push_back(complexVal); } }
Type | Designation | Description |
int | "int" | Native int |
long | "long" | Native long |
float | "float" | Native float |
double | "double" | Native double |
Type | Designation | Description |
std::complex<T> | "std::complex<T>" | Complex numbers from the standard numerical library. T may be float or double. |
Type | Designation | Description |
std::string | "std::string" | Standard string of arbitrary length. |
Type | Designation | Description |
Vector<Dim,T,Engine=Full> | "Vector<Dim,T>" | Pooma Vector class based on the standard Full engine where the dimension D may be any size, and T is int, long, float, double, or std::complex<T>. |
Type | Designation | Description |
Array<Dim,T,Brick> and Array<Dim,T,CompressibleBrick> | "Array<Dim,T,Brick>" and "Array<Dim,T,CompressibleBrick>" respectively | Pooma Array of dimension Dim=1,... 7 of Brick or CompressibleBrick engine types. T may be int, long, float or double in this release. |
Type | Designation | Description |
Interval<Dim> | "Interval<Dim>" | Pooma Interval of dimension Dim=1,... 7. |
// create arrays Array<2> a, b; // create an object set to store the data; // truncate the file if it already exists ObjectSet dataSet("Doof2dDB.dat", stdStorage, storageOutTrunc); // get problem size int n; std::cout << "Size (typically 100-1000): "; std::cin >> n; int i, niters = n/2; // create a description for this run using a string stream // and then store as a string variable std::ostringstream strstrm; strstrm<<"This is a run of the Doof2d example with " <<" problem size N="<<n<<"."<<std::endl; strstrm<<"Stencils were not used in this run."<<std::endl; std::string descr= strstrm.str(); dataSet.store(descr,"Run Description"); // store the problem size and number of iterations dataSet.store(n,"Problem Size"); dataSet.store(niters, "Number of Iterations"); // create array domain and resize arrays Interval<1> N(1,n); Interval<2> domain(N,N); // store the problem domain interval dataSet.store(domain,"Problem Domain Interval"); a.initialize(domain); b.initialize(domain); // get domains and constant for diffusion stencil Interval<1> I(2,n-1), J(2,n-1); const double fact = 1.0/9.0; // store the numerical constant factor used to calculate dataSet.store(fact,"Numerical Factor"); // reset array element values a = 0.0; b=0.0; double initialVal= 1000.0; a(niters,niters) = initialVal; // store the initial peak value dataSet.store(initialVal,"Initial Peak Value"); // Run 9pt doof2d without coefficients using expression std::cout << "Diffusion using expression ..." << std::endl; std::cout << "iter = 0, a_mid = " << a(niters,niters) << std::endl; for (i=1; i<=niters; ++i) { b(I,J) = fact * (a(I+1,J+1) + a(I+1,J ) + a(I+1,J-1) + a(I ,J+1) + a(I ,J ) + a(I ,J-1) + a(I-1,J+1) + a(I-1,J ) + a(I-1,J-1)); a = b; std::cout << "iter = " << i << ", a_mid = " << a(niters,niters) << std::endl; // for each iteration store the result array // labeled by iteration number strstrm.str(""); strstrm<<"Result at iteration "<<i; dataSet.store(a,strstrm.str()); } dataSet.close();
If one were to write and execute the content report generator example given above on this file the output would read:
The next example assumes that the application programer has some familiarity with the data-producing application. Let visArray(array,string) be the API to some hypothetical visualization tool that renders false color images of POOMA 2d arrays where array is the array and string is a standard string label for the plot. The following code segment would take the database file generated by the modified Doof2d example and produce plots.Contents of ObjectSet Doof2dDB.dat Number of Types=5 Type Type Name Number of Instances 0 std::string 1 Instance Object Name 0 Run Description 1 int 2 Instance Object Name 0 Problem Size 1 Number of Iterations 2 Interval<Dim> 1 Instance Object Name 0 Problem Domain Interval 3 double 2 Instance Object Name 0 Numerical Factor 1 Initial Peak Value 4 Array<Dim,T,Brick> (however many iterations) Instance Object Name 0 Result at Iteration 1 1 Result at Iteration 2 2 Result at Iteration 3 ... (however many iterations)
There are several other ways that the data could be recovered assuming less familiarity with the application, and using the object set queries to learn more. More sophisticated queries are needed in order to do a good job of acquiring data when nothing a priori is known about the contents of a dataset. Such queries are planned for the next version of POOMA.ObjectSet dset("Doof2dDB.dat", stdStorage, storageIn); int nIters; int status; status= dset.retrieve(nIters,"Number of Iterations"); assert(status==0); std::string plotLabel; Array<2> array; for(int i=0;i<nIters;i++){ plotLabel= dset.objectName("Array<Dim,T,Brick>",i); status= dset.retrieve(array,i); assert(status==0); visArray(array,plotLabel); } dset.close()
[Prev] | [Home] | [Next] |