Module Class Documentation

Contents

Feature Extraction

The feature.py module cleans MBTA transaction-level data and extracts rider-level pattern-of-use features.

Class DataLoader

A DataLoader object first merges the joint AFC_ODX table, the stops table and fare product tables to form transaction records. The preprocessed transaction records are then passed to a FeatureExtractor object to extract the rider-level pattern-of-use features.

Note: A DataLoader object is initialized by a FeatureExtractor object and is not explicitly used elsewhere in our project.

Class FeatureExtractor

A FeatureExtractor object extracts the rider-level temporal, geographical, and ticket-purchasing features based on the preprocessed transaction records returned by the DataLoader Label riders by their total number of trips, and whether they use commuter rail expect for zone 1a The second step is for further filtering in segmentation model.

Rider Segmentation

Class Segmentation

A Segmentation object clusters riders by their temporal, geographical, and ticket-purchasing features based on a user-specified pipeline option (hierarchical vs. non-hierarchical), a user-specified algorithm option (kmeans vs. lda) and a user-specified feature weighs.

Cluster Inference

Class CensusFormatter

A CensusFormatter object formats the census data to counts, percentages or proportions based on the user’s specification.

Class ClusterProfiler:

A ClusterProfiler object summarizes each cluster’s overall pattern-of-use features and infers its demographics distributions based on the mapping from its softmax transformed geographical patterns to the census data.

Visualization

Class Visualization:

A Visualization object visualizes the cluster profiles in various types of visualizations (i.e. static heatmap for cluster temporal patterns, static scatter chart for visualizing clusters on 2D PCA-subspace, interactive map for cluster geographical patterns, and static bar charts for other cluster statistics)

Auto Report Generator

Class ReportGenerator

A ReportGenerator object is initialized in a ClusterProfiler object. It generates a text summary for each cluster based on the output of the ClusterProfiler that contains it and a pre-trained (and retrainable) Convolutional Neural Network (CNN) model for the 7x24 temporal pattern classification.