“Big data” offers great potential in transportation planning and methods must be developed to tap that potential. This case study demonstrates the use of large, electronically generated datasets for an intercity highway origin-destination study. Data from electronic toll collection, license plate capture, and a web-based survey were all collected for this project. They were combined and weighted together, creating one representative dataset to help understand intercity auto travel in the entire Northeast Corridor (NEC).

Seven and a half million E-ZPass transactions representing three consecutive days of travel were collected from eight agencies in the NEC, creating the backbone of this study. A license plate capture method filled the tolling gap in the state of Connecticut using four probe sites collecting data over the same three days.

More than 787,000 web survey invitations were sent to E-ZPass users who made relevant long-distance trips, license-plate-capture recruits, and all customers from a ninth E-ZPass agency, resulting in over 15,000 responses.

With so much study data, the challenge is how to efficiently aggregate data to make it useful while keeping it as disaggregate as possible to obtain the benefits of its granularity. For E-ZPass data, this was accomplished using timestamps and the direction of travel between sequential toll plazas to reduce transactions into five million discrete trips.

The electronic tolling data provided such fine granularity that marginal weighting methods had to be used in order to weight the comparatively sparse survey data. This was made more complicated with the addition of the ninth E-ZPass agency, as they did not provide detailed transaction data. Accurately combining and weighting these datasets was the crucial component to this study’s success.

Once weighted, the data provided the first comprehensive data driven auto OD dataset for the NEC. With big data providing the control data, survey recruits, and granularity, and the survey providing specific trip/traveler details, the combination resulted in a large-scale, robust, yet manageable dataset.