Privacy-Sensitive Data Management for Securely Receiving Smart Home Services Under Benefit-Risk Trade-off

Sopicha Stirapongsasuti


A smart home equipped with various IoT devices including sensors and appliances provides its users with various useful services such as automatic life logging and elderly monitoring by machine learning, reducing energy consumption through optimized home energy management, etc. However, most of IoT-based services including smart-home services are realized based on cloud-based approaches where the data is stored/accumulated and then processed/analyzed at the servers. Hence, there is a potential risk of the data exposure (leading to re-identification of the user) by cyber attacks, careless management of untrusted service providers, etc. In order to preserve the privacy of dwellers and prevent re-identification, it is needed to restrict the kind of data and the data upload frequency (or data granularity) sent from smart homes to the cloud servers. At the same time, we also need to consider the user’s benefit obtained from the service (e.g., elderly monitoring) that can be received in return by uploading the data. Many studies in the literature try to preserve privacy in IoT and smart-home systems, but there is no study yet which considers the trade-off between the risk of privacy leakage and the benefit of services received. In this thesis, aiming to minimize the risk of privacy leakage and maximize benefit of services received by the users, we propose a novel privacy-aware smart-home system composed of three parts: smart homes with sensors, edge computing servers and a cloud server. The merits of utilizing edge servers are capabilities of reducing the transmission delay and computational time and mitigating the risk against the data leakage attacks (by sending only the analysis result to cloud and discarding the raw data) compared to the cloud server. To preserve the privacy while keeping benefit of the service, we employ the following ideas: (i) choice of upload data granularity and upload frequency, (ii) timeslot-based risk and benefit assessment and (iii) decision of optimal choice in each time slot over the service duration (e.g., 1 day, 1 week, etc.) taking into account the tradeoff between risk and benefit. In (i), we allow each smart-home user to choose the upload of either raw data or only activity label to cloud servers. Uploading only activity label data takes lower risk (since raw data has more information for re-identifying the user) but consumes more local edge resource (paid at cost) for estimating the activity label from the raw data. In (ii), we use $k$-anonymity to evaluate the risk level and use user’s preference to consider the benefit in each timeslot. In (iii), we formulate the combinatorial optimization problem to output the best choice of data granularity and upload frequency in each time slot taking into account the constraints of edge server resources and users’ budgets as well as k-anonymity of activities and users’ preferences. We also propose the algorithm to derive the optimal solutions by using the integer linear programming (ILP). We used the smart-home open dataset CASAS to evaluate the proposed system, where we conventionally divided this data into 60 one-day data and regard that these 60 data are performed by 60 different households in the same day. We applied the proposed algorithm to the data with the following conditions: (1) one day is divided to 8 time slots; (2) high, medium and low budgets are assigned to 60 homes in equal distribution (20 each); and (3) high, medium or low preference is randomly assigned to each time slot,and compared the result with two conventional methods. As a result, we found that the proposed method outperforms other methods and thus it is important to support users’ decision on smart-home data upload to preserve their privacy.