Please use this identifier to cite or link to this item: http://hdl.handle.net/11023/3042
Title: Users Tracing in Online Text Systems
Author: Le, Hoi
Advisor: Safavi-Naini, Reyhaneh
Keywords: Computer Science
Abstract: Privacy for online systems including social networks, specialized websites such as reviewing systems, movies forums, etc. have become primary concern people who use these websites. Users of these websites must register accounts and input personal information, which maybe directly related to their identities. Their reviews, tweeting, comments, or chat messages provide more information about them through their writing characteristics. This threatened to reveal their identities and other personal information. A patient's records need to be accessible for research purposes or be provided to a third party. The user’s identity or health status must remain protected. Current methods provide more tools to eliminate portions of text in the records that can be used to infer those sensitive information. We provide a new approach to select parts of the text that must be removed. The novelty of this approach is using information theoretic measures to capture the definition of sensitive inference. Using this approach we almost double the number of detected inferences compared to the existing state-of-the-art systems. Human characteristics such as writing characteristics can be used to identify them with more information. This information can be used to trace users' activities across websites by performing writing style matching. To protect users from being traced, obfuscating their writing styles is necessary. However this is not an easy-to-accomplish task. In this thesis, we will show that there are security flaws in current works and design a writing style obfuscation algorithm which has a number of important security properties. As stylometry techniques have been expanded to new domains such as tweets, comments, chat messages and codes, the same privacy concerns exist in both traditional and new domains. Number of challenges exist such as authors can be traced or identified across domains. We have analysed privacy of multi-user Twitter accounts, and showed that authors can be recognized using data from other domains such as blogs.
URI: http://hdl.handle.net/11023/3042
Appears in Collections:Electronic Theses

Files in This Item:
File Description SizeFormat 
ucalgary_2016_le_hoi.pdf2.86 MBAdobe PDFView/Open


Items in The Vault are protected by copyright, with all rights reserved, unless otherwise indicated.