Thesis/Project Final Exam Schedule

 

Final Examination Schedule

PLEASE JOIN US AS THE FOLLOWING CANDIDATES PRESENT THEIR CULMINATING WORK.

Spring 2020
 

Friday, May 22

Yingming Dang

Chair: Dr. Kelvin Sung
Candidate: Master of Science in Computer Science & Software Engineering

11:00 A.M.; Online
Toon Shading

The rapid advancement of computing hardware, especially Graphics Processing Unit (GPU), has allowed the rendering of increasingly complex images in real-time. Meanwhile, in the area of photorealism, the evolving illumination models are also becoming more capable of producing physically plausible images. In addition to photorealism, there’s also an increasing need for non-photorealistic rendering, which synthesizes images with specific artistic style or enhancements for applications such as technical illustration. Toon shading is one of the popular non-photorealistic rendering techniques that mimic the artistic style of traditional cartoons.

The goals of this project are to analyze, understand, and implement the illumination models typically found in modern commercial 3D graphical applications along with prominent toon shading techniques: color ramp and outline. We studied the evolution of photorealistic illumination approximations and then, based on the simple Lambertian model for diffuse reflection, we implemented recent variances and improvements of the Cook-Torrance microfacet based Bi-Direction Reflection Distribution Function (BRDF) specular reflection of light. We further improve our model by integrating the ground-truth ambient occlusion (GTAO) to better approximate ambient light inter-reflections. This physically-based photorealistic illumination model is then enhanced to support non-photorealism with color ramp and outline toon shader techniques.

The final illumination model is demonstrated with both simple geometric objects for highlighting specific effects and interesting and complex 3D models for verifying the general correctness. Our delivered illumination system supports a variety of parameters for users to fine-tune the desirable effects. The source code from this project, along with this documentation, serve as excellent references for those who are interested in understanding modern illumination models for real-time rendering and toon shading.

Tuesday, May 26

Emily (Yi-Hsin) Hsu

Chair: Dr. Michael Stiber
Candidate: Master of Science in Computer Science & Software Engineering

8:45 A.M.; Online
Extending a Neural Simulator to Combine Growth and Spike-Timing-Dependent Plasticity

Neural network development passes through phases, which include growing the network and its connections to tuning those connections. One of the major mechanisms of that tuning process is called spike-timing-dependent plasticity (STDP). In STDP, the strength of a synapse — the connections between two neurons — is influenced by spike order and timing in those neurons. Understanding the effect of STDP on neural network development is a central question in neuroscience. In this project, we built the infrastructure to help answer this question by extending an existing neural simulator to enable simulations that combine network growth and STDP tuning. To do this, we also implemented a serialization and deserialization capability, so that simulation state information from growth could be used as input for STDP simulation.

Rajasri Nanduri

Chair: Dr. David Socha
Candidate: Master of Science in Computer Science & Software Engineering

3:30 P.M.; Online
Map on go – development of a mobile application for information based on user intent, informed by user research

User research is often used to evaluate the usability of a product and in some cases, the insights gained from the process provide the product’s direction. The application developed for this project was informed by user research to deduce its direction. The initial goal of this project was to develop an elaborate news vertical application. As a first step in the development of this application, I built a working prototype that displayed news for a location provided by the users. Following an incremental iterative development process with regular user feedback sessions, the application that emerged stands apart from the initial application. The emerged application provides multiple types of information that are relevant to the user’s needs at the time of using the application. This application delivers relevant information through a new concept called user intent. User intent is the purpose with which a user is looking for information. Every user intent calls for different types of information and a user can have multiple intents depending on the time and their purpose. The application that emerged through this project takes the user’s intent as input and delivers information relevant to the intent. This is a different way of delivering information and can avoid users’ need to install multiple applications for different user intents, and can streamline the user experience of consuming information related to a location.

Back to top

Wednesday, May 27

Saranya Duraisamy

Chair: Dr. Munehiro Fukuda
Candidate: Master of Science in Computer Science & Software Engineering

8:45 A.M.; Online
Agent-Based Parallelization of Biological Network Motif Detection

Network motifs are subgraph patterns that occur frequently in biological networks and represent significant interaction between molecules. Discovering motifs reveal unidentified interactions that are of great importance to biological applications. But motif detection is a computationally intense process due to the exponential growth of motif patterns with an increase in network or motif size. Due to the computational complexity, existing sequential tools impose a limitation on motif sizes, and larger network analysis takes unreasonable time. The performance issue of these tools resulted in a constant drive to improve the speed with parallel approaches. But, most approaches using MapReduce, OpenMPI, and previously implemented agent-based parallelization are limited to subgraph count and don’t offer tools to detect significant motifs. Hence, this project implements parallel agent-based significant motif discovery using the MASS (Multi-Agent Spatial Simulation) library by crawling the reactive agents over the network distributed across multiple computing nodes. Additional Spark implementation helped in identifying strengths and enhancements to MASS to handle large-scale data. Compared to previous MASS agent-based implementation, the latest implementation gained at most 2x speedup and reduced memory usage by a factor of 2. Spark implementation attained almost 2x speedup compared to the sequential NemoLib tool. Although MASS implementation encountered memory limitation, both MASS and Spark implementations exhibited a higher level of parallelism with increased computing power and memory resources. Additionally, this work discusses the opportunity to parallelize graph algorithms with MASS in terms of development efforts and data reuse benefits.

Yangxiao Wang

Chair: Dr. Wooyoung Kim
Candidate: Master of Science in Computer Science & Software Engineering

11:00 A.M.; Online
Network Motif and its Applications in Bioinformatics

Many graph theory algorithms have been developed and applied to analyze biological networks. Network motif is one of the graph analyses and is defined as `the simple building blocks of complex networks.' Detection of network motif can help solve many biological problems. However, the computational power required to find a network motif is high. Existing tools for detecting network motif lack usability and accessibility. Therefore, in this project, a web-based network motif detection application is developed to provide a tool with better performance, extensive output, and better accessibility. To test the web application and further analyze biological applications of network motif, the program is applied to solve two biological problems: essential gene prediction and evolutionary process analysis. Experimental results show that the network motif can be a practical approach to solve many biological network related problems and lead to new research directions on investigating evolutionary processes.

Yangde Li

Chair: Dr. Kelvin Sung
Candidate: Master of Science in Computer Science & Software Engineering

3:30 P.M.; Online
An Architecture Evolution for Statistical Data Visualization Backend System

The College Affordability Model system started as an idea exploration system, and gradually evolved into a useful tool for legislators and policymakers over the past four years. It presents the users with how much the students and their families can make the utmost of the common sources of funds, as well as the potential debt, to pay for college. After years of development and refinement, the system has improved in modularization and encapsulation. The current website has a separate frontend system providing a user interface (UI) and backend system serving application programming interface (API). However, due to the sequential computation pattern, the existing system has a linearly increasing latency as the computation complexity grows. Additionally, the existing backend lacks the flexibility to compute different combinations of the models and thus the run time does not scale dynamically according to the required workload. These limitations are challenging for developers to extend the data models and to sustain efficient user interactions.

This project proposes to address these issues with a microservice-based backend system. The scope of the project is to design and implement the new architecture and migrate the existing backend system. By following the Domain-Driven Design (DDD), services such as computing unit, request dispatcher, API proxy, data source are abstracted, and implemented. To ensure runtime reliability, the framework is designed with a monitoring system. The system implementation is based on multiple services from AWS including CloudFront, API Gateway, S3, Lambda, and CloudWatch.

The project progress followed the agile model in managing the development lifecycle and went through several iterations. The backend framework is delivered as a Django REST application with elaborate unit tests and documentation and is PEP8 compliant. The framework is deployed and functionality verified as the new backend of the College Affordability Model website. The programmability and extendibility of the framework are demonstrated via recently developed new features to the website.

Rishabh Chauhan

Chair: Dr. Kelvin Sung
Candidate: Master of Science in Computer Science & Software Engineering

5:45 P.M.; Online
Performance Enhancement for Statistical Visualization Web Frontend

The College Affordability Model is a data visualization tool designed to analyze the cost of attending a college in the United States. The system facilitates informed decision making for policy makers due to the ability to explore existing data. The tool is presented in the form of a web application with three separated tiers: the database, backend and the frontend tiers.

A multi-tier system relies on the frontend to render the user interfaces for interaction with its users. The performance of a web frontend, defined by its load time and runtime, is important for a responsive system to engage its users. The performance of the current College Affordability Model frontend is lacking. The goal of this project is to analyze, understand, and optimize this frontend system.

This project analyzed performance optimization case studies from the industry, identified appropriate tools based on the analysis results, and derived a solution to resolve the system performance issues. The process took the form of a comparison between the existing tools and potential alternative candidates. The most appropriate tools are identified and replaced those that are less efficient in the system while the existing frontend architecture was maintained in order to retain the desirable developer-friendly characteristics including modularity, maintainability and modifiability.

A new frontend system was delivered with significantly reduced load times and runtime where user interaction responsiveness is improved. Very importantly, as indicated in survey responses, the developer-friendly architectural characteristics are indeed retained.

Back to top

Thursday, May 28

Illestar (Tzu-yu) Wu

Chair: Dr. Geethapriya Thamilarasu
Candidate: Master of Science in Cybersecurity Engineering

11:00 A.M.; Online
A Practical privacy preserving framework for wearable medical device

Wearable medical devices are growing in popularity as they are increasingly used for medical data analysis, remote patient monitoring and home care. Despite their numerous benefits, wireless communication in medical devices pose significant privacy concerns. Privacy preservation solutions that reduce the risk of sensitive healthcare data against potential data leakage or breaches are, therefore, of enormous importance. This project aims to investigate data identification techniques for privacy preservation with emphasis on limited power resources of the medical device. Specifically, this project will evaluate the application of existing solutions on detaching correlated streaming medical records from patients, propose a de-identification framework applicable to various types of inputs that medical device developers could benefit from, and perform a quantitative analysis on the performance of framework under real-time medical records.

Sandy Chau

Chair: Dr. Wooyoung Kim
Candidate: Master of Science in Computer Science & Software Engineering

1:15 P.M.; Online
Visualized Data Analytic Tool for Quantitative Multiplex Co-Immunoprecipitation (VIA-QMI)

In this project, we are building a visualized data analytic tool for Quantitative Multiplex Co-Immunoprecipitation (VIA-QMI). QMI measures Protein-Protein Interaction (PPI) network activities that determine a cell’s healthy or diseased states.  Detection of significant changes in PPIs can aid in the designing of engineered T-cells for cancer treatment. Since the current QMI platform relies on four independent programs to create analysis results, the process can be erroneous and complicated. Therefore, we are building a VIA-QMI that provides a graphical user interface and streamlines the QMI workflow, improving its usability and computational efficiency. Specifically, we expanded on the VIA-QMI with focus on the Adaptive Nonparametric Analysis (ANC). ANC helps determine which PPI are statistically significant in a cell, thus enabling engineered T-cells to attack cancer cells. We anticipate the proposed VIA-QMI will improve the usability and accessibility of the QMI analysis.

Back to top

Friday, May 29

Saransh Sharma

Chair: Dr. Geethapriya Thamilarasu
Candidate: Master of Science in Computer Science & Software Engineering

8:45 A.M.; Online
Cross System Health Data Leak Detection Using Machine Learning By Analysing Advertisement Recommendations

With the rapidly rising popularity of mobile health (mHealth) applications (apps), the privacy of users' mHealth data has become critical. Such data can reveal highly personal insights into the users' behavior, resulting in their data being susceptible to unauthorized consumption. Advertisement networks may use this information for targeted advertising without explicit user consent or knowledge. In this project, we propose a new privacy conserving approach that utilizes users' mHealth data and advertisements targeted towards them to detect potential data leaks. Traditional data leak detection systems do not utilize such correlations to predict privacy leaks. We employ machine learning algorithms to generate predictive models without adding a high computational load on the users' device. In addition, we demonstrate how such a model could deliver real-time notifications and alert users of possible leaks. We also provide a module to generate datasets for future researchers to test their models and run simulations.

Qaif Amaan Shaikh

Chair: Dr. Erika Parsons
Candidate: Master of Science in Computer Science & Software Engineering

11:00 A.M.; Online
System Reengineering and Continuous Integration of the Virtual Academic Advisor System

Choosing a major and then choosing the right college or university for students requires proper an alysis while considering complex situations to determine the best results possible. In addition to this, community college students transfer into universities, which requires completion of specific requirements to be eligible for the transfer. The responsibility of helping these students make the appropriate study plan for their career usually falls on their faculties or advisors. The Virtual Academic Advisor (VAA) software system is an approach to tackle this problem and automating the process of making these study plans. This system was envisioned by Dr. Parsons to be an interactive system that could provide pre-university students coming from community colleges with an academic plan tailored to their current background and interests. Being a research project, there has been some work done in the past by various students with focus in areas like Machine Learning, Data Structures and Database Management. Due to the heavy focus on the back-end development of the system, there is a dire need of a thorough documentation on the design and architecture of the complete system that will make it easier to maintain or enhance in the future. The system also lacks the means to continuously build, test and deploy the iterations to get better and quicker feedback for further development. Thus, this project is focused on performing software reengineering on the current system and documenting the new changes and design for better maintenance and implementing a continuous integration workflow for the VAA system. The proposed work overcomes the existing issues with its new design and architecture, a new agile methodology to refine the software development lifecycle and implementation of the new continuous integration system.

Saranya Gokulramkumar

Chair: Dr. Munehiro Fukuda
Candidate: Master of Science in Computer Science & Software Engineering

1:15 P.M.; Online
Agent Based Parallelization of Computational Geometry Algorithms

The Multi-Agent Spatial Simulation (MASS) library is a parallel programming library that utilizes agent-based modeling (ABM) to parallelize big data analysis. In this research we aim to build on the previous research using MASS and extend applicability of the library to a computationally complex problem area – computational geometry. We have developed agent-based algorithms for four problems in this area – Closest pair of points, Voronoi diagram, Convex hull, Delaunay triangulation, which is a maiden effort using ABM for such problems. This research also presents parallel solutions to these four problems using two other big data analysis platforms – Hadoop MapReduce and Apache Spark. We provide comprehensive analysis of how MASS based implementations compare to the implementations using the other two frameworks. Programmability and execution time are key criteria used to evaluate the parallel solutions. This paper discusses design approaches and algorithm specifications for four problems in all three parallel platforms and then proceed to discuss the results. Results showed that MASS library fares well in terms of providing a capability to build intuitive parallel solutions and to perform multiple analyses in-memory on the input data. Furthermore, we discovered potential areas of enhancement for the library, which can situate the MASS library as a better contender for parallelizing data analysis in future.

Anukriti Singh

Chair: Dr. Wooyoung Kim
Candidate: Master of Science in Computer Science & Software Engineering

3:30 P.M.; Online
Blindness Detection for Diabetic Patients

Diabetes is one of the significant causes of blindness, especially among aged adults. As diabetes increases, the vision of humans starts to deteriorate. This medical condition is known as diabetic retinopathy. 7.7 million people of age more than 40, have diabetic retinopathy. However, early detection of the condition and with proper treatment, vision loss can be prevented.

In our project, we have used a deep learning approach to automatically classify the fundus images into mild, moderate, severe, and proliferative diabetic retinopathy. The images in the dataset are captured under diverse illumination conditions.  A Densely Connected Convolutional neural network is used for the classification and detection of the severity level. Various pre-processing strategies like augmentation are applied using OpenCV and Keras library to remove the noise from the image dataset.  The model is trained, and the hyperparameters are tuned to maximize the performance. The model was successful in determining the severity of diabetic retinopathy with an accuracy of 0.86 and obtained a kappa score of 0.91 with training and validation accuracy of 0.9698 and 0.9669, respectively.

Back to top

Moday, June 1

Meenakshi Sethunath

Chair: Dr. Yang Peng
Candidate: Master of Science in Computer Science & Software Engineering

11:00 A.M.; Online
Schemes for Dispatching Requests of Serverless Computing Functions over Edge and Cloud

Serverless computing has become more popular recently, due to its cost efficiency and flexibility. As a traditional method, serverless computing functions typically execute in the cloud. However, the network latency to access the cloud pushes the execution of serverless computing functions to edge servers, which are closer to the data source but limited by their computing and memory availability. This tradeoff calls for using a combination of edge and cloud servers for processing data that is either latency or memory intensive. The problem that arises with this strategy is how to dispatch requests to use edge and cloud resources efficiently.

To solve this problem, we propose a unique scheme with three different dispatching algorithms that can efficiently dispatch requests of serverless computing functions to the best servers for execution. In contrast to most of the existing works, this new scheme reduces the overall latency in executing the incoming requests while satisfying the memory and budget constraints of the servers. The key idea of this scheme is to maximize the hit ratio of the requests thereby reducing the overall latency. We have conducted extensive simulations and the results are shown to be closer to the performance upper bound (lowering the overall latency).

Edward Kim

Chair: Dr. Afra Mashhadi
Candidate: Master of Science in Computer Science & Software Engineering

3:30 P.M.; Online
GeoNotify - A Navigation-Aide for Visually Impaired Users

Individuals diagnosed with a severe form of visual impairment rely heavily on some sort of travel-aide.  There are a few modern products which assist visually impaired users by guiding them around objects using vibration feedback or enhance the clarity of their vision through digital eyewear.  These mentioned solutions are not widely available due to high costs, or they’re designed only for specific individuals or situations.  Statistics show a high percentage of the population already own smartphones; therefore, we provide an affordable modern solution by creating a smartphone application to serve as a navigation-aide.  The application’s focus is on guiding visually impaired users safely from Point-A to Point-B by notifying them if any obstacles exist in their path, prior to reaching the reported locations.  The design of the application’s user interface emphasizes on accessibility in hopes to resolve the struggles outlined in documented experiments.  We rely on advancements in object recognition algorithms to classify the object for the user, so they can report issues accurately.  Due to the possible risks to safety, we ensure high-availability and reliability of this application by validating thousands of read and write requests using the notification service. 

Yi Zhao

Chair: Dr. Kelvin Sung
Candidate: Master of Science in Computer Science & Software Engineering

3:30 P.M.; Online
WYSIWYG Editor for College Affordability Model

The College Affordability Model project delivers a web system designed to visualize statistical data in a collection of databases based on user input parameters. The system allows  users to explore how much the students and their families can use from the most common sources of funds to pay for college, and the potential debt they would assume. The current web system presents a large number of input parameters and can be overwhelming for novice users. It is desirable to configure the frontend web page in the form of hiding or showing appropriate number of input parameters based on the user's comfort-level with the system. 

This WYSIWYG editor project aims at providing flexibility for users to create customized web pages according to their specific needs. The users can choose parameters that are important to them and should appear on the frontend of the web page. In this way, the values of the chosen parameters can be interactively manipulated for the computation of the desired results. The editor gives end users the power of configuring the user interface (UI) such that in addition to interacting with data end users also have control of how to interact with the visualization of data. With the WYSIWYG editor, creating a customized website no longer requires coding from developers and optimizes the required time from days to minutes.. 

This editor was first implemented based on an Angular platform and then migrated to a React platform to coordinate with the technology transition of the entire College Affordability Model project. The two fully functional editors based on the two underlying technologies demonstrated the complementary of the implementation. More importantly, the successful and straightforward migration process signifies that this WYSIWYG editor is based on an architecture that  facilitates portability and reusability.  The editor system design  is technology agnostic and can be efficiently implemented on different technologies. 

Back to top

Tuesday, June 2

Daniele Braga Pecanha

Chair: Dr. Erika Parsons
Candidate: Master of Science in Computer Science & Software Engineering

11:00 A.M.; Online
GALDAR-C++: An Extendable Multi-Library Solution for Topic Modeling in C++, based on Genetic Algorithm Iterations of Latent Dirichlet Allocation with Statistical Re-clustering

The ease with which data can be created, copied, modified, and deleted over the Internet challenges the task of determining the integrity of sources and the validity of information. Hence the importance of Data Provenance Reconstruction, which attempts to create an estimated provenance of existing datasets when no provenance information has been previously recorded.

The Provenance-Reconstruction approach proposed by the “Provenance and Traceability Research Group”, based on LDA-GA, was implemented in Java and obtained satisfactory results when applied to small datasets. However, the algorithm requires processing the dataset repeatedly which, combined with the heavy memory toll imposed by Java and the parallelization strategy implemented, results on performance degradation as the input size increases. 

This project presents an alternative scalable implementation for the LDA-GA approach that allows processing larger datasets. Our implementation uses C++, a more HPC friendly language with extensive memory control. Additionally, our solution is extendable allowing the integration with multiple LDA libraries. Among the currently integrated libraries, WarpLDA presented the best results. Compared to the most efficient version of the original implementation in Java, it obtained an accuracy increase varying between 2% and 9% and speed-up varying between  1.9x and 9.3x depending on the input size. The results obtained make this a viable solution for future studies on provenance reconstruction.

Wednesday, June 3

R. Alan Burnett

Chair: Dr. Dong Si
Candidate: Master of Science in Computer Science & Software Engineering

11:00 A.M.; Online
Applications of Machine Learning to Aviation Accident Data

This research concerns application of state-of-the-art machine learning frameworks and algorithms in binary classification models for predicting occurrences of fatal and serious injuries in aviation accidents. In addition to evaluating various performance metrics, feature importance derived from the models are computed and analyzed, to determine which features and/or individual feature values are most important (to the models) and to demonstrate shortfalls of existing techniques. Machine learning classification techniques, including Random Forest (RF)-based models (Random Forest, Random Intersection Trees, Iterative Random Forests), Gradient Boosted Machines (XGBoost, LightGBM, CatBoost, H2O), and Artificial Neural Networks (Keras over TensorFlow, Auto Encoders) are applied to datasets derived from the National Transportation Safety Board (NTSB) Aviation Accident Database and from Federal Aviation Administration (FAA) Aviation Accident and Incident Records, filtered to focus on FAA Part 91 (General Aviation) accidents involving powered, fixed-wing, aircraft. In addition to built-in feature importance computations, SHAPley Analysis is implemented for various models to obtain additional feature importance data and to better characterize model behavior. Techniques for handling imbalanced and ill-behaved data are also disclosed. Substantial improvement in model/dataset performance was made over the course of this project, with the best models obtaining AUC, F1, and MCC performance metrics of approximately .88, .63, and .56, for pre-accident data and .96, .77, and .72 for the best post-accident data.

Rigdha Acharya

Chair: Dr. Erika Parsons
Candidate: Master of Science in Computer Science & Software Engineering

5:45 P.M.; Online
Optimal Community College Academic Schedule Recommendation Framework using Genetic Algorithm with Simulated Annealing and k-Means Clustering

Community College students start their educational path with the goal of entering a University for a major of their choice. Creating the academic plans to help them achieve this goal is a time consuming and manual process and a student’s success depends on the time, experience level and availability of academic advisors who help them navigate this process. In addition, academic plans need to adapt to changes in a student’s life. Community college students usually have a diverse background and may need to change their schedules based on work, family, or other demands.

Virtual Academic Advisor (VAA) aims to automate the academic plan advising for community colleges. The previous VAA system used a JobShop scheduler as a proof of concept to generate schedules. However, the system lacked data, rating engine integration to evaluate generated schedules or comparison of scheduling methods.

In this capstone, we implement 3 different schedulers and compare them, determine the grading criteria and weights, integrate with the grading engine to generate, optimize and search the best academic schedule for a student’s preferences using Genetic Algorithm with Simulated Annealing. As a result, we are able to generate optimal schedules using this approach for the available majors and schools in the dataset. We also implement and integrate a recommendation framework to identify alternate schools and majors to help students explore options and create backup plans.

Back to top

Thursday, June 4

Justin Gilroy

Chair: Dr. Munehiro Fukuda
Candidate: Master of Science in Computer Science & Software Engineering

8:45 A.M.; Online
Dynamic Graph Construction and Maintenance

Agent-Based Modeling (ABM) is a method of solving biological and similarly structured problems by simulating the interaction of entities with the notion of Agents. MASS - Multi-Agent Spatial Simulation is a system developed by the Distributed Systems Laboratory at UWB for applying ABM to problems with the addition of distributed computing to expand beyond what is capable of being simulated on a single node. MASS today is targeted at problems that exist in multi-dimensional space thus requiring graph space problems to be mapped by the user. We have responded to this problem by enhancing MASS with: 1) graph input file formats, 2) Dynamic graph construction and modification features, and 3) graph visualization through integration with Cytoscape. This paper describes the goals and implementation of adding graph targeting features to the MASS Java to allow non-professional developers to solve and visualize these large and compute intensive problems efficiently.

Jignasha Borad

Chair: Dr. Min Chen
Candidate: Master of Science in Computer Science & Software Engineering

11:00 A.M.; Online
Integrated Web Application for Endangered Language Analysis and Documentation

43% of all languages spoken around the world are considered endangered and may become extinct by the end of the century. This indicates a significant loss of language diversity. To revitalize endangered languages, Dr. Min Chen and her research group at University of Washington developed PELDA (Platform for Endangered Language Documentation and Analysis) to host Praat and ELAN, two popular but standalone desktop-based linguistics tools, on a cloud-based Microsoft Azure platform to document and analyze endangered languages.

In addition, many endangered languages are considered as pitch accent languages and are not well analyzed. Collaborating with Dr. Miyashita (a linguistics professor, University of Montana) and Dr. James Randall (a music professor, University of Montana), Dr. Min Chen and her research group developed another application called MeTILDA (Melodic Transcription in Language Documentation and Application) to help automate the process of creating pitch graphs and analyzing pitch accent languages.

In this project, we aim to develop a one-stop web application for endangered language analysis and documentation by integrating all the above-mentioned services and functionalities in PELDA and MeTILDA, and further extending its features and capabilities. To achieve this goal, we migrate PELDA components into the Heroku cloud-based platform. We provide a coherent and consistent interface to MeTILDA for users to utilize the features and functionalities of PELDA, such as speech synthesis, audio features extraction, spectrum analysis, transcription, annotation, and translation. In addition, all annotation-related information can be saved in an XML file for future access. The project also provides users the ability to create subdirectories, allowing them to better manage their uploaded audio files, and easing the search operation on audio files.

Keywords: Annotation, API, Blackfoot language, ELAN, Endangered language, Praat, Web service

Nicola Rohde

Chair: Dr. Wooyoung Kim
Candidate: Master of Science in Computer Science & Software Engineering

1:15 P.M.; Online
NemoCluster: Graph Clustering Algorithm for Structural Variant Detection

Structural Variant detection is a problem of significant interest in the biomedical field due to the strong link between these variants and genetic and degenerative diseases. A large body of programs and approaches exist to detect these variants and they perform well on the human genome. However, benchmarks presented in this thesis show that these tools perform poorly on microbial genomes. One approach that has been shown to be effective in structural variant discovery is the use of clustering to detect anomalous regions in the genome. Well known tools such as DELLY use this approach to achieve high accuracy, however, no tools use a network-motif based clustering algorithm.

The idea of anomalous genomic regions can be likened to community detection in social networks. This can be achieved by utilizing triangle-subgraphs, or size three cliques, to calculate a triangle conductance for each edge in the network. However, using just cliques ignores a large amount of structural information within the network. This is fine in social networks where cliques represent tightly-nit groups and therefore have more significance than other structures. This however, does not extend well to other areas such as Bioinformatics, where it may be of interest to cluster networks based on network-motifs to capture more structural information contained within the graph than can be conveyed through cliques.

This thesis introduces NemoCluster, an algorithm that extends the Tectonic algorithm by generalizing the triangle conductance clustering to a network-motif conductance. Accompanying this program are benchmarks that show it performing better than Tectonic in both social networking applications as well as biological applications, such as protein-protein interaction networks, and in benchmarking networks.

Back to top

Friday, June 5

Rochelle Palting

Chair: Dr. Geethapriya Thamilarasu
Candidate: Master of Science in Cybersecurity Engineering

11:00 A.M.; Online
Methodology for Evaluating APT Attack Detection Effectiveness in Intrusion Detection Systems

Advanced Persistent Threats (APTs) pose a major risk to the security of an organization’s sensitive data, and an Intrusion Detection System (IDS) is one type of mechanism used in detecting attacks.  To aid in the improvement of an IDS’ ability to detect APT attacks, we have developed a methodology for evaluating an IDS’ detection effectiveness under such attacks.  In the proposed methodology, we identify IDS properties of interest.  For each IDS property of interest, we provide guidance for workload selection along with guidance for measuring IDS performance (metrics).  A significant effort of this study focuses on generating realistic malicious workload by utilizing the MITRE ATT&CK framework as guidance in developing attack scenarios.  We apply the methodology in a hypothetical use case example to setup the evaluation of an IDS under test, deployed in an enterprise environment, and we go through the process of evaluating the IDS’ APT attack detection effectiveness.

Back to top

Questions: Please email cssgrad@uw.edu