MBTA Rider Segmentation

Contents

Team

Harvard 2018 Spring AC297r Capstone Project: Chia Chi (Michelle) Ho, Yijun Shen, Jiawen Tong, Anthony Hou

Our github: organization

Introduction

The Massachusetts Bay Transportation Authority (MBTA) is the largest public transportation agency in New England, delivering a complex system of subway, bus, commuter rail, light rail, and ferry services to riders in the dynamic economy of the Greater Boston Area. It is estimated that MBTA provides over 1.3 million trips on an average weekday. While MBTA collects a wealth of trip transaction data on a daily basis, a persistent limitation has been the organization’s lack of knowledge around rider groups and their respective ridership habits. Understanding rider segmentation in the context of pattern-of-use has significant implications in developing new policies to improve its service planning and potentially changing its fare structure.

Project Deliverables

This repo contains code documentation for both of our deliverables:

Figure 1: Project Deliverables

Github Organization Structure

Note: The limited Dashboard, Final Report and Code Documentation are linked via a navigation bar on respective Github pages.

Full Package Structure

The high-level functionality of this rider segmentation package is to group individual MBTA riders according to pattern-of-use dimensions. Our full package has the following structure:

Rider-Segmentation-Full-App/
    MBTAdashboard/
        __init__.py
        app.py
        src/
            __init__.py
            json_generator_driver.py
            utils.py
        static/
            css/
                'custom css files for D3 visualization'
            img/
                'MBTA icon images'
            js/
                'custom javascript files for D3 visualization'
            lib/
                'javascript, css, and fonts library files'
        templates/
                'html files'
    MBTAriderSegmentation/
        __init__.py
        config.py
        features.py
        profile.py
        report.py
        segmentation.py
        visualization.py
        []_driver.py
        Manully_Label_Clusters.ipynb
        Train_Report_Model.ipynb
        data/
            cached_clusters/
                'cached clustering results' - not published
            cached_features/
                'cached extracted rider-level pattern-of-use features' - not published
            cached_profiles/
                'cached cluster profiles'
            cached_viz/
                'cached cluster geographical distribution visualization from the visualization module'
            input/
                census/
                    'census data'
                geojson/
                    'data to draw the maps'
                afc_odx/
                    'AFC/ODX data' - not published
                fareprod/
                    'Fare product data' - not published
                stops/
                    'MBTA Stops data' - not published       
            report_models/
                report_cnn.h5

A brief item description for MBTAdashboard:

A brief item description for MBTAriderSegmentation:

Installation