The TON_IoT Datasets

 

The TON_IoT datasets are new generations of Industry 4.0/Internet of Things (IoT) and Industrial IoT (IIoT) datasets for evaluating the fidelity and efficiency of different cybersecurity applications based on Artificial Intelligence (AI), i.e., Machine/Deep Learning algorithms. The datasets can be downloaded from HERE.  You can also use our datasets: the BoT-IoT and UNSW-NB15.

The datasets can be used for validating and testing various Cybersecurity applications-based AI such as intrusion detection systems, threat intelligence, malware detection, fraud detection, privacy-preservation, digital forensics, adversarial machine learning, and threat hunting. 

------------------------------------------------------------------------------------------

The datasets have been called 'ToN_IoT' as they include heterogeneous data sources collected from Telemetry datasets of IoT and IIoT sensors, Operating systems datasets of Windows 7 and 10 as well as Ubuntu 14 and 18 TLS and Network traffic datasets. The datasets were collected from a realistic and large-scale network designed at the Cyber Range and IoT Labs, the School of Engineering and Information technology (SEIT), UNSW Canberra @ the Australian Defence Force Academy (ADFA). A new testbed network was created for the industry 4.0 network that includes IoT and IIoT networks. The testbed was deployed using multiple virtual machines and hosts of windows, Linux and Kali operating systems to manage the interconnection between the three layers of IoT, Cloud and Edge/Fog systems. Various attacking techniques, such as DoS, DDoS and ransomware, against web applications, IoT gateways and computer systems across the IoT/IIoT network.  The datasets were gathered in a parallel processing to collect several normal and cyber-attack events from network traffic, Windows audit traces, Linux audit traces, and telemetry data of IoT services.

------------------------------------------------------------------------

The directories of the TON_IoT datasets include the following:

1.  Raw datasets

  • IoT/IIoT datasets were logged in log and CSV files, where more than 10 IoT and IIoT sensors such as weather and Modbus sensors were used to capture their telemetry data.
  • Network datasets were collected in the packet capture (pcap) formats, log files and CSV files of the ZEEK (Bro) tool.
  • Linux datasets were collected by running a tracing tool on Ubuntu 14 and 18 systems, especially atop, for logging desk, process, processor, memory and network activities. The data were logged in TXT and CSV files.
  • Windows datasets were captured by executing dataset collectors of the Performance Monitor Tool on Windows 7 and 10 systems. The raw datasets were collected in a blg format opened by Performance Monitor Tool to collect activities of desk, process, processor, memory and network activities in a CSV format.

2. Processed datasets

  • The four datasets were filtered to generate standard features and their label. The entire datasets were processed and filtered in the format of CSV files to be used at any platform. The new generated features of the four datasets were described in the Description_stats_datasets folder, and the number of records including normal and attack types is also demonstrated in this folder.

3. Train_Test_datasets

  • This folder involves samples of the four datasets in a CSV format that were selected for evaluating the fidelity and efficiency of new cyber security application-based AI and machine learning algorithms. The number of records including normal and attack types for training and testing the algorithms are listed in the Description_stats_datasets folder.

4. Description_stats_datasets

  • This folder includes the description of the features of the four processed dataset (the folder of processed datasets) and the statistics (i.e., the number of rows of normal and attack types).

5. SecurityEvents_GroundTruth_datasets

  • This folder includes the security events of hacking happened in the four datasets and their timestamp (ts). The datasets were labelled based on tagging IP addresses (192.168.159.30-39) and their timestamps in the four datasets. These IP addresses were used for Kali Linux systems to launch and exploit the systems of the four environments of IoT/IoT systems such as Cloud gateways, MQTT protocols, web applications of Node Red, Linux, Windows and network services.

-------------------------------------------------------------------------------------------------

The details of the TON_IoT datasets were published in following the papers. For the academic/public use of these datasets, the authors have to cities the following papers:

  1. Moustafa, Nour. "A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets." Sustainable Cities and Society (2021): 102994. Public Access Here.
  2. Booij, Tim M., Irina Chiscop, Erik Meeuwissen, Nour Moustafa, and Frank TH den Hartog. "ToN IoT-The role of heterogeneity and the need for standardization of features and attack types in IoT network intrusion datasets." IEEE Internet of Things Journal (2021). Public Access Here.
  3. Alsaedi, Abdullah, Nour Moustafa, Zahir Tari, Abdun Mahmood, and Adnan Anwar. "TON_IoT telemetry dataset: a new generation dataset of IoT and IIoT for data-driven Intrusion Detection Systems." IEEE Access 8 (2020): 165130-165150.
  4. Moustafa, Nour, M. Keshk, E. Debie and H. Janicke, "Federated TON_IoT Windows Datasets for Evaluating AI-Based Security Applications," 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2020, pp. 848-855, doi: 10.1109/TrustCom50675.2020.00114. Public Access Here.
  5. Moustafa, Nour, M. Ahmed and S. Ahmed, "Data Analytics-Enabled Intrusion Detection: Evaluations of ToN_IoT Linux Datasets," 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2020, pp. 727-735, doi: 10.1109/TrustCom50675.2020.00100.  Public Access Here.
  6. Moustafa, Nour. "New Generations of Internet of Things Datasets for Cybersecurity Applications based Machine Learning: TON_IoT Datasets." Proceedings of the eResearch Australasia Conference, Brisbane, Australia. 2019.
  7. Moustafa, Nour. "A systemic IoT-Fog-Cloud architecture for big-data analytics and cyber security systems: a review of fog computing.arXiv preprint arXiv:1906.01055 (2019).
  8. Ashraf, Javed, Marwa Keshk, Nour Moustafa, Mohamed Abdel-Basset, Hasnat Khurshid, Asim D. Bakhshi, and Reham R. Mostafa. "IoTBoT-IDS: A Novel Statistical Learning-enabled Botnet Detection Framework for Protecting Networks of Smart Cities." Sustainable Cities and Society (2021): 103041.

-------------------------------------------------------------------------------------------------

Free use of the TON_IoT datasets for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the author, Dr Nour Moustafa, who has asserted his right under the Copyright. The datasets was sponsored by the grants from the Australian Research Data Commons, https://ardc.edu.au/news/data-and-services-discovery-activities-successful-applicants/, and UNSW Canberra. To whom intend the use of the TON_IoT datasets have to cite the above eight papers.

-------------------------------------------------------------------------------------------------

For more information about the datasets, please contact the author, Dr Nour Moustafa, on his email: nour.moustafa@unsw.edu.au or nour.moustafa@ieee.org.

More information about Dr Nour Moustafa is available at:

Last Updated: 02 June 2021

Key contact

+61 416 817 811
nour.moustafa@adfa.edu.au