Collecting Tourist Activity Statistics from Public Cameras Using Open-World Person Re-identification

Chang Xin (1811404)


Travelling information and recommendation applications have been increasingly popular in recent years, with the growth of economic globalization. Using user's historical data and manual input, these applications try to create a user portrait, and recommends tour packages, travel routes or touring plans. In any type of the recommendation systems above, peer data plays as a key factor in the recommendation quality. However, both the quality and quantity of data relies on the active participation of tourists, so manual collection of real-time travelling data is both expensive and inefficient.

In this thesis, we propose a vision-based Tourist Activity Statistics Collection system (in abbreviation, TASC) to enable points-of-interest (PoIs) to automatically collect tourists' fine-grained touring routes without their active participation. Based on real-time video streams of public cameras in a PoI, TASC leverages visual multi-object tracking (MOT) and person re-identification (re-ID) technology to extract short-time touring trajectories of anonymous tourists in scenic spots. Using these trajectory information, statistics of the average visiting time in PoIs, overall staying time, attraction popularity, etc. could be easily produced.

While visual object tracking and person re-ID has gained much popularity in computer vision research, few researchers have paid attention to the tracking performance on real-time video streams. On one hand, long-time multi-person multi-camera tracking casts a great challenge on the computation burden. The rapidly growing size of re-ID gallery greatly slows down the person searching speed. On the other hand, the slow and unstable processing speed results in both redundancy and imbalance in the re-ID gallery, which in turn affects the tracking accuracy. In order to ensure a reasonable tracking speed while keeping a good performance, we leverage hierarchical correlation clustering (HCC) that reduces the redundancy in feature gallery while also cutting down the computational burden. With the dynamical maintenance of feature gallery, TASC could effectively track tourists and extract touring routes in multiple fine-grained spots in a PoI.

For system evaluation, we collected a dataset with 3-camera 30-minute videos of 23 test subjects, and performed experiments using NVIDIA Jetson TX2 computation platform that runs our TASC with a speed of 1~3 frames per second. Experimental results show that our system reaches 64.8% IDF1 score on our tourist tracking video data, 80.1% precision and 69.7% recall on the tourist trajectory extraction even with the low input frame rates.