Funded Projects

ID Name Period
STC-16-03 Big Data Deep Learning

Big Data emerged with unprecedented values for research, development, innovation and business, and most of them have a spatiotemporal stamp. However, the transformation of Big Data into value poses grand challenges for big data management, spatiotemporal data modeling, and spatiotemporal data mining. To enable such transformation, we propose to develop a deep learning platform based on the spatiotemporal innovation current project. The platform will have advanced data management and computing technologies to mine valuable knowledge from Big Spatiotemporal Data. More robust models will be built to discover the implicit spatiotemporal dynamic patterns in climate, dust storm, and weather with remote sensing and model simulation data to solve the concerned environment and health issues. Meanwhile, user-generated data, such as PO.DAAC and social media, will be mined to improve geospatial data discovering and form a knowledge base for spatiotemporal data. In addition, high performance computing (e.g. GPU and parallel computing) and cloud computing technologies will be utilized to accelerate the knowledge discovering process. The proposed deep learning platform for Big Spatiotemporal Data will develop/integrate a suite of software for big spatiotemporal data mining, and contribute a core to spatiotemporal innovation
GMU-16-05 Data Container Study for Handling array-based data using Rasdaman, SciDB, Hive, Spark, and MongoDB

Geoscience communities have come up with various big data storage solutions, such as Rasdaman and Hive, to address the grand challenges for massive Earth observation data management and processing. To examine the readiness of current technologies and tools in supporting big Earth observation data archive, discovery, access, and processing, we investigated and compared several popular data solutions, including Rasdaman, SciDB, Hive, Spark, CliamteSpark, and MongoDB. Using different types of spatial and non-spatial queries, and datasets stored in common scientific data formats (e.g., NetCDF and HDF), the feature and performance of these data containers are systematically compared and evaluated. The evaluation metrics focus on their performance related to discover and access datasets for upper level geoscience applications. The computing resources (e.g. CPU, memory, hard drive, network) consumed while performing various queries and operations are monitored and recorded for the performance evaluation. The initial results show that 1) MongoDB has the best performance for queries on statistical and operational functions, but does not support NetCDF data format better than HDF; 2) ClimateSpark has better performance than the pure Spark and Hive in most cases, except the single point extraction in the long time series; and 3) Hive is not good at querying small datasets since it uses MapReduce as the processing engine with a lot of overhead. A comprehensive report will detail the experimental results, and compare their pros and cons regarding system performance, ease of use, accessibility, scalability, compatibility, and flexibility.
STC-15-01 Developing and Maintaining a Spatiotemporal Computing Infrastructure Project Space

Take the demands from our IAB and center projects, this project is to a) develop and maintain a spatiotemporal computing infrastructure by acquiring a high performance computing facility, b) provide spatiotemporal computing coordination and research to all center projects with computing needs by maintaining a highly capable research staff support to optimize the computing infrastructure, c) adopt and develop advanced spatiotemporal computing technologies to innovate the next generation computing tools.
STC-15-02 Mapping Secondary Cities for Resiliency and Emergency Preparedness

Mapping Secondary Cities for Resiliency and Emergency Preparedness is a project designed to build local capacity in using geospatial science and technology to create data in support of emergency preparedness. Globally, secondary cities are unique environments that are experiencing rapid urbanization and have generally been poorly mapped, yet mapping these cities is an essential activity in building resiliency, planning and managing urban growth, and devising robust emergency management plans. The project will identify countries with rapidly growing secondary cities where governments have recognized their challenges and are looking to develop policies and programs to foster manageable growth and development. Projects will be implemented in secondary cities-typically non-primary cities with a population ranging from 100,000 to 5 million. The specific emphasis for this project is focused on generating geospatial data on infrastructure for emergency preparedness, a key need in the development of secondary cities. A pilot project for Cusco, Peru is proposed to build an open source, scale-able geodataset through enhancing local expertise and facilitating alignment between communities, governments and agencies through the establishment of long term partnerships and networks. Projects teams will be comprised of local university academics and students with a focus in geospatial science, regional non-governmental organizations, and partners from local municipalities. Training programs will be designed to build capacity around generating geospatial data over the long-term from local knowledge, commercial satellite imagery, and other geographic tools. Project teams will coordinate data collection efforts targeted at city priorities that may include essential data for emergency management (e.g., building footprints, roads, river networks, city infrastructure), environmental monitoring (e.g., rivers health, water quality, open space and parks, ecosystem services), or urban planning (e.g., informal settlements, sanitation, water treatment, zoning). These datasets provide the basis for long term planning across multiple sectors. The training program will develop on-site, in-country expertise for their specific local needs laying the groundwork for follow up "train the trainer" program in other secondary cities. An overview and hands-on instruction will provided using geographic tools grounded in sound geospatial information science that include cartographic best practices, geospatial analysis, database management, and field data collection techniques. Preliminary datasets will be stored and shared using both Windows and Open Source platforms using a hybrid approach for data sharing and dissemination.
GMU-15-01 ClimateSpark: An In-memory Distributed Computing Framework for Big Climate Data Analytics Project Space

Large collections of observational, reanalysis, and climate model output data are being assembled as part of the work of the Intergovernmental Panel on Climate Change (IPCC). These collections may grow to as large as a 100 PB in the coming years. The NASA Center for Climate Simulation (NCCS) will host much of this data. Ideally, such big data can be provided to scientists with on-demand analytical and simulation capabilities to relieve them from time-consuming computational tasks. However, it is challenging to realize this goal, because processing such big data requires efficient big data management strategies, complex parallel computing algorithms, and scalable computing resources. Based on the extensive experience at NCCS and GMU in big climate data analytics, Hadoop, cloud computing, and other technologies, a high-performance computing framework, ClimateSpark, has been developed to better support big climate data analytics. A hierarchical indexing strategy has been designed and implemented to support efficient big multi-dimensional climate data management and query in a scalable environment. The high-performance Taylor-Diagram service has been developed as a tool to help climatologists evaluate different climate model outputs. A web portal has been developed to ease the remote interaction between users, data, analytic operations, and computing resources by using SQL, scala/python notebook, or RESTful API.
GMU-15-08 Automatic Near-Real- TimeFlood Detection using Suomi-NPP/VIIRS Data

Near real-time satellite-derived flood maps are invaluable to river forecasters and decision-makers for disaster monitoring and relief efforts. With the support from the JPSS Proving ground and Risk Reduction Program, a flood detection package has been developed using SNPP/VIIRS (Suomi National Polar-orbiting Partnership/ Visible Infrared Imaging Radiometer Suite) imagery to generate daily near real-time flood maps automatically for National Weather Service (NWS)-River Forecast Centers (RFC) in the USA. In this package, a series of algorithms have been developed including water detection, cloud shadow removal, terrain shadow removal, minor flood detection, water fraction retrieval and flooding water determination. The package has been running routinely with the direct broadcast SNPP/VIIRS data since 2014. Flood maps were carefully evaluated by river forecasters using airborne imagery and hydraulic observations. Offline validation was also made via visual inspection with VIIRS false-color composite images on more than 10,000 granules across a variety of scenes and comparison with river gauge observations year-round. Evaluation of the product has shown high accuracy and promising performance of the product has won positive feedback and recognition from end-users.
GMU-15-09 Planetary Defense Project Space

Programs like NASA’s Near-Earth Object (NEO) Survey supply the PD community with the necessary information that can be utilized for NEO mitigation. However, information about detecting, characterizing and mitigating NEO threats is still dispersed throughout different organizations and scientists, due to the lack of structured architecture. This project is aimed to develop a knowledge discovery search engine to provide discovery and easy access to the PD related resources by developing 1) a domain-specific Web crawler to automate the large-scale up-to-date discovery of PD related resource, and 2) a search ranking method to better rank the search results. The Web crawler is based on Apache Nutch, one of the well-recognized highly scalable web crawler. In this research, Apache Nutch is extended in three aspects: 1) a semi-supervised approach is developed to create PD-related keyword list; 2) an improved similarity scoring function is utilized to set the priority of the web pages in the crawl frontier; and 3) an adaptive approach is designed to re-crawl/update web pages. The search ranking module is built upon Elasticsearch. Rather than using the basic search relevance function of Elasticsearch, a PageRank based link analysis and a LDA based topic modelling approach are developed to better support the ranking of interconnected web pages.
UCSB-15-01 Linked Data for the National Map

The proposed project aims at providing Linked Data access to National Map vector data which resides in the ArcGIS Geodatabase format. These data include hydrography, transportation, structures, and boundaries. The project will address the challenge of how to efficiently make large data volumes available and queryable at the same times. Previous research and the PIs experience suggest that in the context of the National Map, offering hundreds of Gigabyte of Linked Data via an unrestricted endpoint will not scale. To address this challenge a variety of methods will be tested to determine the sweet spot between data dumps, i.e., just storing huge RDF files for download, on the one side, and unrestricted public (Geo)SPARQL endpoints on the other side. Methods and combination of methods will include (Geo)SPARQL-SQL rewriting, transparent Web Service proxies for WFS, Linked Data Fragments, query optimization, restricted queries via a user interface, and so forth. The sweet spot will be defined as the method (or combination of methods) that enables common usage scenarios for Linked National Map Data, i.e., that is able to retain as much of the functionality that would be provided by having full Linked Data query access via a public endpoint while keeping server load and average query runtime (for common usage queries) at an acceptable level. A Web-based user interface will expose the resulting data and make them queryable and explorable via the follow-your-nose paradigm.
Harvard-14-03 Development and application of ontologies for NHD and related TNM data layers

Feature layers in the US National Map (TNM) are fundamental contexts for spatiotemporal data collection and analysis, but largely exist independent of each other as map layers. This project will explore the use of formal ontologies and semantic technology to represent functional relationships within and between "wet" hydrography and "dry" landscape layers to express the basis for occurrence of lakes and rivers. It will then test these representations in applications for discovering and analyzing related water science data.
Harvard-14-02 Developing a place name extraction methodology for historic maps

We propose to develop an approach for automating the extraction and organization of place name information from georeferenced historic map series (in multiple languages) and will focus on scales better than 250k. Such information essential to the spatialization of unstructured text documents and social media. Phase I will be a feasibility study which evaluates best existing technologies against current extraction costs (including outsourcing) and then recommends next steps for establishing a production system; options will be: 1) do nothing as there is currently no cost effective approach, 2) make use of an existing platform and develop work flows for it, 3) develop a new system which combines existing technologies and/or develops new technologies.
GMU-14-01 Improving geospatial data discovery, access, visualization and analysis for, geospatial platform and other systems

Develop a set of efficient tools to better discover, access and visualize the data and services from and Geospatial Platform to meet the following requirements:1) Support the discovery using enhanced semantic context and inferences to improve discovery recall and precision. 2) Provide and enhance an open-source viewer capability to visualize and analyze different online map services. 3) Develop an open-source analytical workbench prototype for incorporation into the and Geospatial Platform to enable end-user computational analysis on multiple remote geospatial web services that can be captured as services for optional re-execution, resulting in analytical data products (data, graphs, maps) as a result of raster and vector overlay. 4) Supply a QoS module to check and track the service quality information
GMU-14-05 Developing a Hadoop-based middleware for handling multi-dimensional NetCDF

Climate observations and model simulations are producing vast amounts of array-based spatiotemporal data. Efficient process of these data is essential to global challenges such as climate change, natural disasters, diseases, and other emergencies. However, this is challenging not only because of the large data volume but also the intrinsic nature of high dimensionalities of climate data. To tackle this challenge, this paper proposes a Hadoop-based middleware to efficiently manage and process big climate data in a highly scalable environment. With this approach, big climate data are directly stored in Hadoop Distributed File System in its original format without any special configuration for the Hadoop cluster. A spatiotemporal index is built to bridge the logical array-based data model and the physical data layout, which enables fast data retrieve with spatiotemporal query. Based on the index, a data-partitioning algorithm is proposed to enable MapReduce to achieve high data locality and balanced workload. The proposed approach is evaluated using the NASA MERRA reanalysis climate data. The experiment results show that the Hadoop-based middleware can significantly accelerate the query and process (~10x speedup compared to the baseline test using the same cluster), while keeping the index-to-data ratio small (0.0328%). The applicability of the Hadoop-based middleware is demonstrated by a climate anomaly detection application deployed on the NASA Hadoop cluster.
GMU-14-07 Analyzing and visualizing data quality in crowdsourcing environments

In our project, we are plan to develop web and mobile-based data collection prototypes to provide methods for characterizing and assessing data quality in crowdsourcing systems and novel ways to visualize data quality metrics. The different data collection ways, hybrid databases, and quality assessment methods for crowdsourced data will be the cores in the prototype.
STC-14-01 Developing a big spatiotemporal data computing platform (continued by STC-15-01)

This research is to design and develop a general computing platform to best utilize cluster, grid, cloud, many integrated cores (MICs), graphics processing units (GPUs), GPUs/CPUs hybrid, and volunteer computing for accessing, processing, managing, analyzing, and visualizing big spatiotemporal data. Our developed computing and optimization techniques and heterogeneous resources integrator can support such a platform that can facilitate a number of applications, e.g., climate simulation, social media analyses, online visual analytics, geospatial platform, and GEOSS clearinghouse.
STC-14-00 Advancing Spatiotemporal Studies to Enable 21st Century Sciences and Applications

Many 21st century challenges to contemporary society, such as natural disasters, happen in both space and time, and require spatiotemporal principles and thinking be incorporated into computing process. A systematic investigation of the principles would advance human knowledge by providing trailblazing methodologies to explore the next generation computing models for addressing the challenges. This overarching center project is to analyze and collaborate with international leaders, and the science advisory committee to generalize spatiotemporal thinking methodologies, produce efficient computing software and tools, elevate the application impact, and advance human knowledge and intelligence. Objectives: a) Generalize spatiotemporal thinking methodologies. b) Produce efficient computing software and tools. c) Elevate the application impact, and. d) Advance human knowledge and intelligence.

* This page is under construction. We will have more information to be added.*