Data loss prevention based on text classification in controlled environments

Kongsgård, Kyrre Wahl; Nordbotten, Nils Agne; Mancini, Federico; Engelstad, Paal E.

View/Open

1435122.pdf (521.6Kb)

Date

2016

Author

Kongsgård, Kyrre Wahl

Nordbotten, Nils Agne

Mancini, Federico

Engelstad, Paal E.

Metadata

Show full item record

Abstract

Loss of sensitive data is a common problem with potentially severe consequences. By categorizing documents according to their sensitivity, security controls can be performed based on this classification. However, errors in the classification process may effectively result in information leakage. While automated classification techniques can be used to mitigate this risk, little work has been done to evaluate the effectiveness of such techniques when sensitive content has been transformed (e.g., a document can be summarized, rewritten, or have paragraphs copy-pasted into a new one). To better handle these more difficult data leaks, this paper proposes the use of controlled environments to detect misclassification. By monitoring the incoming information flow, the documents imported into a controlled environment can be used to better determine the sensitivity of the document(s) created within the same environment. Our evaluation results show that this approach, using techniques from machine learning and information retrieval, provides improved detection of incorrectly classified documents that have been subject to more complex data transformations.

URI

http://hdl.handle.net/20.500.12242/602
https://ffi-publikasjoner.archive.knowledgearc.net/handle/20.500.12242/602

DOI

10.1007/978-3-319-49806-5_7

Description

Kongsgård, Kyrre Wahl; Nordbotten, Nils Agne; Mancini, Federico; Engelstad, Paal E.. Data loss prevention based on text classification in controlled environments. Lecture Notes in Computer Science 2016 ;Volum 10063 LNCS. s. 131-150

Collections

Articles