Final Examination Schedule


Winter 2017


Friday, January 19

Yun-Ming Shih

Chair: Dr. Munehiro Fukuda
Candidate: Master of Science in Computer Science & Software Engineering
10:00 A.M.; UW1 370
MASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

Parallel computing has been widely used for processing big data, but most systems only handle simple structured data types and do not support complex structured data. As many scientific data are highly structured, such as climatological data, the requirements have to be addressed. UWCA is a web serviced climate analysis application that uses Multi-Agent Spatial Simulation (MASS), a parallelization library created by Distributed Systems Laboratory at Computing & Software Systems Division, University of Washington Bothell (DSLab), for its computations. Without a proper file handling system, the master node becomes a bottleneck causing slow performances. This inspired DSLab to develop the MASS Parallel I/O layer for parallel file reading.

Similar frameworks have been developed to handle structured data in parallel. ROMIO gives a precise control to structured data, but the system has too many features making the system complicated. SciHadoop gives a simple computing model but requires converting science data to text type for processing. Our proposed system, MASS HDFS, is a MASS Parallel I/O layer with additional write functionality and uses HDFS for file distribution. It aims to provide the capability to handle structured data while maintaining simplicity of usage. MASS HDFS can read data into distributed arrays without introducing a single point of bottleneck or data conversion.

The performance evaluation shows that for a 200MB file with up to eight computing nodes, MASS HDFS spends 0.6 seconds on file open and 0.09 seconds on file read.
Opening a 50MB file using 12.5 million Place elements takes 25 minutes. Using 50 million Place elements to open a 50MB file takes four times longer. Reading data with 50 million Place elements can be done within 2.5 minutes, and eight times faster using 12.5 million Place elements. Our project made parallel I/O possible as well as demonstrating the potentiality of processing data on a per-data-item scale using four computing nodes.

Back to top

Questions: Please email