Exploring in-memory HDF5 and early evaluations

dc.contributor.committeeChairChen, Yong
dc.contributor.committeeMemberMengel, Susan A.
dc.creatorVijayakumar, Kalaranjani
dc.date.available2015-06-18T14:42:44Z
dc.date.issued2015-05
dc.description.abstractMany scientific big data applications have iterative computations and can re-use the results from previous stages in their workflow. HDF5 is one such library that provides scientists with wide range of facilities to perform the scientific data management and computation. Like all the other existing scientific I/O libraries, HDF5 library is an entirely disk based model where the results from various stages of computation are always stored in disk. In our research, we propose to place the results in-memory and to re-use them for the future requests to avoid expensive disk accesses. As the data is residing in-memory, it is not persistent. In order to provide persistence and avoid disk read in such scenarios, lineage information that includes the source dataset and the computation that resulted in the current dataset are stored in memory. Lineage information is captured in a metadata structure as attributes of the dataset for each data block in-memory by intercepting the IO call. This captured lineage metadata can be used to re-compute the dataset without reading the disk. We have evaluated our in-memory architecture with different IO patterns, where the contiguous IO pattern proved to be efficient in a linear fashion, whereas the efficiency of non-contiguous IO pattern remains unpredictable. In addition, we have evaluated our lineage tracking module over the traditional disk based approach for re-constructing the lost datasets.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/2346/62283
dc.language.isoeng
dc.rights.availabilityUnrestricted.
dc.subjectIn-memory
dc.subjectLineage information
dc.subjectHDF5 library
dc.titleExploring in-memory HDF5 and early evaluations
dc.typeThesis
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas Tech University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
VIJAYAKUMAR-THESIS-2015.pdf
Size:
654.23 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
Questions_1.txt
Size:
50 B
Format:
Plain Text
No Thumbnail Available
Name:
Questions.txt
Size:
50 B
Format:
Plain Text

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.85 KB
Format:
Plain Text
Description: