CMU Heinz Capstone Project — Building Sustainable Mobility Analytics Tool

Written by: Zhuolan Cheng, Yiting Jiang, Erin (Li-Hsin) Lin, Tianniu Lyu, Hajra Shahab, Emma Yeager

99P Labs

Published in

99P Labs

12 min readJun 8, 2022

Carnegie Mellon University Heinz College Capstone Project

Introduction: A Case for Sustainable Transportation

An often-cited definition of sustainability refers to it as the capacity to “meet the needs of the present without compromising the ability of future generations to meet their own needs.”[1] The sustainability of a system may be evaluated by how well it meets needs belonging to three subcategories: people, planet and prosperity. In other words, sustainability may be measured in terms of a system’s social, environmental, and economic impact.

Transportation networks touch all three pillars of sustainability. They shape our daily lives by connecting us to work, retail, and recreation; however, a system that does not connect everyone equitably reproduces social disparities. Transportation networks additionally shape our environment through their effects on land use, air quality, and climate change. Finally, they shape our economy based on their capacity to efficiently move people and goods where they are needed. The multifaceted impacts of transportation networks necessitate their careful evaluation.

[1] 1983 United Nations Commission on Environment and Development

Project Team and Impact

We are a group of interdisciplinary research scientists and engineers finishing our graduate studies at Heinz College. Our areas of expertise include transportation analysis, web development and geospatial and data analytics and visualization. Through this project, we aim to help partners at 99P Labs understand the following::

First and last mile mobility services needs at micro and macro level of transportation networks
Societal Impact of Electric Vehicles (EV) penetration through different size of EV fleet and its impact of indicators such as emissions
Infrastructure improvement projects to build a holistic environment for new technologies
Land use changes based on evolving transportation network scenarios

Our Dashboard Solution

The complexity of transportation networks presents a challenge for planners who seek to measure their sustainability. The planner must be able to understand the network as an interconnected system with multiple modes spread across roads, intersections, bridges, and rail, among others. They must also consider that the network serves a heterogeneous population with varied needs, who constantly make decisions that may shift the overall system performance over time.

Our team developed a web application to meet this challenge. Our Sustainable Mobility Analytics dashboard presents sustainability metrics for a region. The selected metrics span all three pillars of sustainability, allowing planners to consider multiple perspectives. The dashboard also visualizes metrics at a high granularity, empowering planners to understand how system performance varies across space and time.

Project Timeline

With just four months to develop the dashboard, our team managed our time carefully. We sketched out a timeline breaking the project down into three phases: scoping; research, development and deployment; and user testing and final deliverables.

Phase 1: Scoping

Phase 1 focused on project scoping. Our work in this phase involved a high-level review of the literature on sustainable transportation. This research deepened our understanding of how to evaluate transportation networks in terms of sustainability, allowing us to create a shortlist of metrics to feature on the dashboard. We also reviewed literature on transportation network visualizations in this phase, helping us form early ideas about our future dashboard layout. Building off of this research, our software development team selected tools for building the dashboard. This process is described in detail in a later section.

We also obtained access to our main data source during phase 1, a simulation of the Columbus transportation network provided by the Mobility Data Analytics Center (MAC) at Carnegie Mellon University. Understanding the structure of our data through exploratory analysis was another key aspect of this phase.

Phase 2: Research, Development, and Deployment

Phase 2 centered around moving our shortlist of sustainability metrics to the dashboard. For our research team, this involved a further review of the literature to determine methodologies for each of our metric calculations. Our metric shortlist shifted some during this phase due to data availability; for instance, our equity index was replaced with a simple map of median income due to sparse data. After determining methodologies and data availability for each metric, our research team cleaned and processed our data to prepare each metric for visualization.

Our software development team spent phase 2 refining our visualizations and building the dashboard. We started by creating wireframes and prototypes using Figma. Then, the metrics calculated by the research team were visualized using Mapbox and added to the dashboard. The team implemented filtering capabilities to make our dashboard interactive in this step, including time sliders and buttons to overlay data at the transportation link and Census Block Group levels.

*Figure 1. Project Work Breakdown Structure*

Finally, our software development team deployed the dashboard on Amazon Web Services.

Phase 3: User Testing and Final Deliverables

Our team completed user testing in phase 3. We shared the dashboard with researchers at Carnegie Mellon’s Human-Computer Interaction Institute and Heinz College. Their feedback gave us insight into how a transportation planner might interact with the tool and revealed weaknesses in our design. Following these meetings, we made small changes to the dashboard like adding use cases to help planners get a sense of the tool’s capabilities. We finalized the dashboard and developed documentation explaining our work.

We also had the opportunity to present the dashboard to our partners at 99P Labs during this phase.

Data Source

The visualizations throughout the dashboard were created using simulation data provided by the MAC at Carnegie Mellon University. The simulation approximates an average day for the Columbus transportation network. Simulation results are estimated at intervals throughout a period ranging from 6 am to 10 am, and include baseline indicators such as the car and truck inflow, travel time, free flow travel time, speed, and public transit passenger inflow. Considering the data granularity and computational load, we aggregated the simulation results from 5-second intervals to 15-minute intervals.

Methodology for Selected Metrics

Methodologies for a subset of the metrics featured on the dashboard are described below:

Energy Consumption and CO2 Emission

The Energy Data Book documents the data of fuel economy with respect to the average speed, allowing us to fit the data points into a polynomial regression model. However, the latest data has only 4 data points, so we calculated a linear regression using the 4 data points, and imputed the missing data with substituted values using the linear regression.

Using the complete 14 data points, we were able to fit the data into a polynomial regression model. The polynomial regression model calculates the energy consumption and emission rate based on the average speed.

The fuel consumption rate is computed as follows:

Fuel consumption rate (gallon/mile) = 1 / fuel economy (mile/gallon)

All carbon in the fuel is assumed to contribute to CO2 emissions after flaming. Hence, the CO2 emission rate for cars and trucks are respectively calculated by:

Car CO2 emission rate (gram/mile) = 8887 * fuel consumption rate (gallon/mile)

Truck CO2 emission rate (gram/mile) = 10180 * fuel consumption rate (gallon/mile)

After obtaining the fuel consumption/emission rates, multiplying them by vehicle-miles traveled for each link and adding up the fuel consumption/emission over the simulation horizon and/or spatial units such as block groups yields the total fuel consumption/emission of the mobility network.

Affordability

We used vehicle-miles traveled and vehicle-hours traveled as the measurement of affordability. The formula we used are as follows:

Vehicle-miles traveled (car inflow + truck inflow) * length of link

Vehicle-hours traveled (car inflow * length of link / car speed) +

(truck inflow * length of link / truck speed)

Congestion

We used two indicators for congestion, TTI and average delay. The method to calculate these indicators is:

Travel Time Index (TTI): average travel time for each 15-minute interval / free flow travel time (calculated separately for car and trucks)
Average Delay: Average speed in each 15-minute interval/ free flow speed (calculated separately for car and trucks)

Trip Trajectory

For trip trajectories, we mainly processed data from simulation results to:

Create different sequences of link IDs between an origin-destination pair for each 15-minute interval.
Combine cost that a user has to bear for each sequence of link IDs in the data between an origin-destination pair at any given 15-minute interval
Organize data files for each row to give a 15-minute interval, cost of travel through different sequences of link IDs during that interval and lastly the sequence of link IDs for each origin-destination pair.

Mobility Efficiency Productivity

Hou, Garikapati, Nag, Young, and Grushka (2019) calculate the MEP using data on opportunities, travel times, cost, and energy intensity. We follow the author’s methodology to calculate an employment-focused version of the MEP for Columbus using data on the number of employment opportunities available throughout the network and commuting speeds.

Our adaptation of their methodology for this project is summarized in five steps:

Calculate Census Block Group Centroids: Using the geopandas package for Python, we calculate the centroid of each Census Block Group in Columbus.
Calculate Isochrones: For each Census Block Group, we calculate how far it is possible to travel from the centroid under 10, 15, and 20 minute thresholds. Driving and public transit isochrones were calculated using their respective speed data. Six isochrones were calculated for each block group, representing every combination of travel time threshold and travel mode.
Calculate Cumulative Opportunities: Using data from the Longitudinal Employer-Household Dynamics Survey, we determine the number of employment opportunities that can be accessed within each isochrone.
Calculate Modal Weighting Factor: Using parameters on the energy intensity and cost of different transportation modes, we calculate weights for our cumulative opportunity measure.
Calculate MEP: Combining the results from steps 1 and 2, we calculate the number of employment opportunities available after accounting for time, cost and energy intensity using the following formula:

Where i denotes Census Block Groups, k denotes modes, t denotes travel time thresholds, oikt denotes the cumulative opportunities for Block Group i using mode k and travel time threshold t, and exp(Mkt) is the exponential of the modal weighting factor.

Software Development Workflow

Software Requirements for Proposed Solution

Dashboard Use Cases

The dashboard is designed for both technical and non-technical users, and we believe that it can create great engagement among the community. Here, we provided two use cases from two different perspectives — one as an individual user and the other as a city planner.

First, imagine we are normal nine to five office workers who drive to work every day, and we are wondering about the congestion level in the transportation network during the morning rush hours. Here’s how the dashboard can help you figure out this question.

Let’s select the sub-tab “Transportation Efficiency” and the metric “Average Delay Car”. This metric measures the congestion experienced by cars in the Columbus network. On the map, the dark blue area in the city center reveals a highly congested part of the network. If you zoom in the map, it seems that cars traveling through the Census Block close to the “Arena District” experience very significant delays.

**Figure 2: Dashboard User Cases 1, Transportation Efficiency**

The second use case we wanted to share is how policymakers can benefit from understanding the emission level in the area.

As mentioned in the previous section, Transportation accounts for 29% of total greenhouse gas emission, which is the largest contributor. So environmentally, we wanted to minimize the consumption of resources used by transportation.

So using the dashboard can help policymakers identify the areas of the city that have higher levels of emission so that they can make effective use of policies such as carbon tax.

In this case, let’s click into the sub-tab “Environmental Impact” and select emissions car, which is the total carbon dioxide generated by individual cars. Let’s view the link level this time, so you wanna deselect the neighborhood level and click the link level. The dark gray links in the network are the ones that have the highest level of carbon dioxide emissions.

And from our model, we know that the emission level is associated with the average speed of cars. That is to say, policymakers may want to look into those links with the highest emission level to investigate road conditions, speed limits, or any other factors that may affect the car speed to decrease the emission.

**Figure 3:** Dashboard User Cases 2, Environmental Impact

In all, we think metric maps we just showed, like the “Average Delay” or “CO2 Emission”, can help policymakers identify pressure points within the transportation network that may benefit from policy intervention.

**Table 1.** Transportation Sustainability Metrics

Future Work

Research and Metrics

Scaling Up to New Cities

The smart mobility dashboard and its underlying metrics combine to form a replicable framework that could be used to visualize data for other cities or geographic areas. Reworking our tool to visualize additional data would be simple given new Mobility Network Model simulation data for the transportation network, and corresponding public data for the geographic area such as Census shapefiles, ACS median income estimates, and LEHD opportunities data.

Simulation Comparisons

Implementing simulation comparisons is a longer-term goal for the smart mobility dashboard. With the comparison functionality, users could select a policy intervention scenario to compare against the average day data; for instance, a simulation representing expanded first- and last-mile mobility services to public transit hubs. A policymaker considering this scenario may be interested in predicting its effects on car flows, emissions, and congestion at different points within the transportation network. While the current dashboard would be capable of visualizing these metrics of interest for the intervention scenario, our current visualizations would not emphasize comparisons between the intervention scenario and the average day data. Future work could fine-tune data visualizations to make comparisons between scenarios evident.

Trip Trajectories

Implementing the full dataset for trip trajectories would give more information about optimal routes for each time interval in terms of time as well as cost savings. Furthermore, adding more nuances to this dataset such as the nature of each trip and applying machine learning models could help us dig deeper into understanding as to why and for what users commute. Such analysis could be used to plan infrastructure projects and make land-use decisions such as building commercial space vs. residential areas to improve the overall traffic of any region.

Software Development

Switch to HTTPs

The HTTP protocol has some obvious shortcomings, such as clear text communication, it does not verify the identity of the other party, and cannot prove the integrity of the message. Therefore, using the HTTP protocol has the risk of forging requests and content tampering.

Compared with HTTP, HTTPS provides more secure network communication, can authenticate users and servers, and can realize encrypted transmission, so we imagine that websites can use the HTTPS protocol. If you want to implement an HTTPS website, here are some simple steps:

(1) Apply for SSL certificate and record DNS configuration

(2) Purchase a domain name from a domain name vendor and configure the EC2 Instance public IP address

(3) Add DNS resolution rules in the domain name console and return to the SSL certificate website for verification

(4) Install and start the Nginx service on the EC2 Instance and configure the SSL certificate that supports HTTPS

Migrate data to Cloud Database System

Now we are using mapbox datasets, which is easy to use mapbox, but is not very friendly to develop customized features and also has a storage limit.

In the future, we hope to migrate data to a cloud database, which has larger storage capacity. Also we can run some models like regression or filter data online more easily.

Add more features in feature list

As we mentioned before, we created a feature list for the ideal dashboard. But because of limited time, we just finish the important parts and design of all features. For future work, we can finish:

Multiple scenarios visualization

2. Customize map style, which means users can choose different color schemes, choose different map types

3. Route visualization, which means users can obtain different routes, travel methods and corresponding travel time by clicking 2 neighborhoods

Acknowledgement

The research team would like to thank Dr. Sean Qian for his guidance and support throughout the course of this project. This project was not possible without the trust placed in us by our partners at 99P Labs. We would also like to thank Lynn Kirabo, 5th year PhD student from TBD Lab at Carnegie Mellon University for helping us improve the overall user experience of Sustainable Mobility Analytics Dashboard and to Stan Caldwell from Traffic21 Institute for critically evaluating the dashboard from a policy perspective and helping us bring context and depth to our analysis.