Defend Again Website Fingerprinting With Machine Learning

Abstract

In stream fingerprinting, an attacker can compromise user privacy past leveraging side-channel information (e.g., packet size) of encrypted traffic in streaming services. By taking advantages of machine learning, especially neural networks, an antagonist tin can reveal which YouTube video a victim watches with extremely high accuracy. While constructive defence methods accept been proposed, extremely loftier bandwidth overheads are needed. In other words, building an constructive defense with low overheads remains unknown. In this paper, we propose a new defense force mechanism, referred to as SmartSwitch, to address this open trouble. Our defense intelligently switches the noise level on different packets such that the defense remains effective simply minimizes overheads. Specifically, our method produces higher noises to obfuscate the sizes of more than pregnant packets. To identify which packets are more significant, we formulate it as a feature pick trouble and investigate several characteristic pick methods over high-dimensional data. Our experimental results derived from a large-scale dataset demonstrate that our proposed defense is highly effective against stream fingerprinting built upon Convolutional Neural Networks. Specifically, an adversary tin infer which YouTube video a user watches with only i% accuracy (aforementioned as random guess) even if the adversary retrains neural networks with obfuscated traffic. Compared to the country-of-the-art defense, our mechanism can save nearly xl% of bandwidth overheads.

Keywords

  • Encrypted traffic analysis
  • Motorcar learning
  • Feature selection

References

  1. NNI: An open source AutoML toolkit for neural architecture search and hyper-parameter tuning. https://github.com/Microsoft/nni

  2. Battiti, R.: Using mutual information for selecting features in supervised neural cyberspace learning. IEEE Trans. Neural Netw. v, 537–550 (1994)

    CrossRef  Google Scholar

  3. Bennasar, M., Hicks, Y., Setchi, R.: Feature selection using joint common information maximisation. Exp. Syst. Appl. 42, 8520–8532 (2015)

    CrossRef  Google Scholar

  4. Brownish, One thousand., Pocock, A., Zhao, M.J., Lujan, M.: Provisional likelihood maximisation: a unifying framework for data theoretic feature selection. J. Mach. Acquire. Res. xiii, 27–66 (2012)

    MathSciNet  MATH  Google Scholar

  5. Dubin, R., Dvir, A., Hadar, O., Pele, O.: I know what y'all saw last minute – the Chrome browser case. In: Black Hat Europe (2016)

    Google Scholar

  6. Dyer, K.P., Coull, S.E., Ristenpart, T., Shrimpton, T.: Peek-a-Boo, I still come across you: why efficient traffic analysis countermeasures fail. In: Proceedings of IEEE S&P'12 (2012)

    Google Scholar

  7. Juarez, Grand., Imani, Yard., Perry, M., Diaz, C., Wright, M.: Toward an efficient website fingerprinting defence force. In: Proceedings of ESORICS 2022 (2016)

    Google Scholar

  8. Kennedy, S., Li, H., Wang, C., Liu, H., Wang, B., Sun, Westward.: I can hear your alexa: voice command fingerprinting on smart abode speakers. In: Proceedings of IEEE CNS 2022 (2019)

    Google Scholar

  9. Kohls, Chiliad., Rupprecht, D., Holz, T., Popper, C.: Lost traffic encryption: fingerprinting LET/4G Traffic on Layer Two. In: Proceedings of ACM WiSec 2022 (2019)

    Google Scholar

  10. Liberatore, Chiliad., Levine, B.North.: Inferring the source of encrypted HTTP connections. In: Proceedings of ACM CCS'06 (2006)

    Google Scholar

  11. Liu, Y., Ou, C., Li, Z., Corbett, C., Mukherjee, B., Ghosal, D.: Wavelet-based traffic analysis for identifying video streams over broadband networks. In: Proceedings of IEEE GLOBECOM 2008 (2008)

    Google Scholar

  12. Molnar, C.: Interpretable car learning a guide for making black box models explainable. (2019). https://christophm.github.io/interpretable-ml-book/

  13. Panchenko, A., et al.: Website fingerprinting at internet scale. In: Proceedings of NDSS 2022 (2016)

    Google Scholar

  14. Panchenko, A., Niessen, Fifty., Zinnen, A., Engel, T.: Website fingerprinting in onion routing based anonymization networks. In: Proceedings of Workshop on Privacy in the Electronic Guild (2011)

    Google Scholar

  15. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    CrossRef  Google Scholar

  16. Peng, P., Yang, L., Vocal, L., Wang, One thousand.: Opening the blackbox of virustotal: analyzing online phishing browse engines. In: Proceedings of ACM SIGCOMM Internet Measurement Conference (IMC 2019) (2019)

    Google Scholar

  17. Rashid, T., Agrafiotis, I., Nurse, J.R.C.: A new take on detecting inside threats: exploring the employ of hidden markov models. In: Proceedings of the eighth ACM CCS International Workshop on Managing Insider Security Threats (2016)

    Google Scholar

  18. Reed, A., Klimkowski, B.: Leaky streams: identifying variable bitrate Nuance videos streamed over encrypted 802.11n connections. In: 13th IEEE Annual Consumer Communications & Networking Conference (CCNC) (2016)

    Google Scholar

  19. Rimmer, V., Preuveneers, D., Juarez, M., Goethem, T.V., Joosen, W.: Automated website fingerprinting through deep learning. In: Proceedings of NDSS 2022 (2018)

    Google Scholar

  20. Saponas, T.Due south., Lester, J., Hartung, C., Agarwal, S.: Devices that tell on you lot: privacy trends in consumer ubiquitous computing. In: Proceedings of USENIX Security 2007 (2007)

    Google Scholar

  21. Schuster, R., Shmatikov, V., Tromer, E.: Beauty and the burst: remote identification of encrypted video streams. In: Proceedings of USENIX Security 2022 (2017)

    Google Scholar

  22. Sirinam, P., Imani, Thou., Juarez, G., Wright, M.: Deep fingerprinting: agreement website fingerprinting defenses with deep learning. In: Proceedings of ACM CCS 2022 (2018)

    Google Scholar

  23. Wang, C., et al.: Fingerprinting encrypted voice traffic on smart speakers with deep learning. In: Proceedings of ACM WiSec 2022 (2020)

    Google Scholar

  24. Wang, T., Goldberg, I.: Walkie-Talkie: an efficient defence against passive website fingerprinting attacks. In: Proceedings of USENIX Security 2022 (2017)

    Google Scholar

  25. Weinshel, B., et al.: Oh, the places you've been! user reactions to longitudinal transparency about third-party web tracking and inferencing. In: Proceedings of ACM CCS 2022 (2019)

    Google Scholar

  26. Xiao, Q., Reiter, M.Thousand., Zhang, Y.: Mitigating storage side channels using statistical privacy mechanisms. In: Procedings of ACM CCS 2022 (2015)

    Google Scholar

  27. Yang, H.H., Moody, J.: Feature selection based on joint common information. In: Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis (1999)

    Google Scholar

  28. Zhang, X., Hamm, J., Reiter, Thousand.K., Zhang, Y.: Statistical privacy for streaming traffic. In: Proceedings of NDSS 219 (2019)

    Google Scholar

Download references

Acknowledgement

Our source code and datasets tin can exist found on GitHub (https://github.com/SmartHomePrivacyProject/SmartSwitch). Authors from the University of Cincinnati were partially supported by National Science Foundation (CNS-1947913), UC Office of the Vice President for Enquiry Airplane pilot Program, and Ohio Cyber Range at UC.

Writer information

Affiliations

Corresponding author

Correspondence to Boyang Wang .

Appendix

Appendix

Hyperparameters of CNN. The tuned hyperparameters of our CNN are described in Tabular array four. For the search space of each hyperparameter, we represent it as a gear up. For the activation functions, dropout, filter size and pool size, we searched hyperparameters at each layer. The tuned parameters we report in the table are presented every bit a sequence of values by following the order of layers we presented in Fig. 2. For instance, the tuned activation functions are selu (1st Conv), elu (2nd Conv), relu (third Conv), elu (4th Conv), tanh (the second-to-last Dumbo layer).

Table 4. Tuned hyperparameters of CNN When \(w = 0.05\) due south

Full size table

\(d^*\) -privacy. Xiao et al. [26] proposed \(d^*\)-privacy, which is a variant of differential privacy on time-series information, to preserve side-aqueduct information leakage. They proved that \(d^*\)-privacy can achieve (\(d^*\),2\(\epsilon \))-privacy, where \(d^{*}\) is a distance between 2 time series data and \(\epsilon \) is privacy parameter in differential privacy.

Let \(\mathbf {10}=(x_{1}, ..., x_{n})\) and \(\mathbf {y}=(y_{one}, ..., y_{n})\) denote two time series with the same length. The \(d^*\)-altitude between \(\mathbf {10}\) and \(\mathbf {y}\) is defined as:

$$\begin{aligned} d^*(\mathbf {x}, \mathbf {y}) = \sum _{i\ge ii} \left| (x_{i}-x_{i-1})-(y_{i}-y_{i-one}) \right| \end{aligned}$$

(half dozen)

\(d^{*}\)-privacy produces racket to data at a afterwards timestamp by considering data from an earlier timestamp in the same fourth dimension series. Specifically, allow D(i) announce the greatest power of 2 that divides timestamp i, \(d^*\)-privacy computes noised data at timestamp i as , where = 0, function \(G(\cdot )\) and \(r_{i}\) are defined as below

$$\begin{aligned} G(i)= {\left\{ \brainstorm{assortment}{ll} 0 &{} \text {if } i = 1 \\ i/2 &{} \text {if } i = D(i) \\ i-D(i) &{} \text {if } i > D(i) \\ \end{array}\right. } \end{aligned}$$

(7)

$$\begin{aligned} r_i= {\left\{ \begin{array}{ll} \mathrm {Laplace}(\frac{one}{\epsilon })&{} \text {if } i = D(i) \\ \mathrm {Laplace}(\frac{\lfloor \log _2i\rfloor }{\epsilon })&{} \text {otherwise} \end{array}\right. } \finish{aligned}$$

(eight)

Rights and permissions

Copyright data

© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Li, H., Niu, B., Wang, B. (2020). SmartSwitch: Efficient Traffic Obfuscation Confronting Stream Fingerprinting. In: Park, Northward., Sun, K., Foresti, South., Butler, K., Saxena, Due north. (eds) Security and Privacy in Communication Networks. SecureComm 2020. Lecture Notes of the Establish for Computer Sciences, Social Informatics and Telecommunications Engineering science, vol 335. Springer, Cham. https://doi.org/x.1007/978-iii-030-63086-7_15

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI : https://doi.org/10.1007/978-3-030-63086-7_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63085-0

  • Online ISBN: 978-3-030-63086-7

  • eBook Packages: Information science Informatics (R0)

lindseyquiters.blogspot.com

Source: https://link.springer.com/chapter/10.1007/978-3-030-63086-7_15

0 Response to "Defend Again Website Fingerprinting With Machine Learning"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel