LADEMU: a modular & continuous approach for generating labelled APT datasets from emulations
Abstract
Development and evaluation of data-driven capabilities for both threat hunting and intrusion detection require high-quality and up-to-date datasets. The generation of such datasets poses multiple challenges, which has led to a general lack of suitable datasets for this domain.One such difficulty is the ability to correctly label each datapoint at a suitable level of granularity. In this paper, we argue that the challenges faced when labelling datasets can to some degree be decoupled from realistic emulations of up-to-date attacks and benign behaviours. We propose a modular labelling approach that can be combined with existing emulation platforms that provide the necessary details used for labelling. A proof-of-concept implementation is provided with our LADEMU (Labelled Apt Datasets from EMUlations) tool, which is integrated with the Mitre CALDERA emulation platform and uses the GHOSTS framework for benign behaviour. LADEMU captures both host and network logs and labels them at a sufficient level of detail to separate the various attack steps. This provides dataset support for the development of data-driven APT, multi-step and kill-chain capabilities. As a case, LADEMU is used to generate a labelled dataset from an intelligence-driven emulation plan of an advanced persistent threat (APT) group.
Description
2022 IEEE International Conference on Big Data. IEEE (Institute of Electrical and Electronics Engineers) 2023 ISBN 978-1-6654-8045-1.