‘TrojAI framework can test cyber attacks on AI models’

Get a free Techzine subscription!

Researchers at John Hopkins University in the United States have developed a set of TrojAI tools for ‘arming’ AI models against cybercriminal attacks. The aim of the framework is to discover how AI models based on machine learning can best be protected against attacks.

Nowadays, AI can no longer escape attacks by cybercriminals. The researchers at the American university have therefore now developed a framework called TrojAI, which is intended to help counter attacks on AI models. The framework focuses mainly on machine learning AI and on attacks with trojans.

The framework of tools should test machine learning models to what extent they can withstand attacks with trojans. For this purpose, an AI model is modified in such a way that it has to respond to certain modified input data that has to provide incorrect answers. The tools deliver the custom data sets and then AI models that appear to be infected with trojans. The framework also ensures that these tests are repeated more often and can be expanded.

Ultimately, these tests should give researchers insight into the effects of different dataset configurations on the generated ‘trojan’ models. This in turn will enable the development of new test methods for detecting trojans.


More concretely, the framework consists of a set of Python modules with which the researchers can find and generate ‘trojan’ AI classification and reinforcement learning models. The classification or configuration first determines the extent to which data is ‘contaminated’ in order to be applied to a target data set. The second step trains the architecture of the AI model. In the third stage, the learning parameters of the model are determined. The fourth step determines the number of models to be trained.

The dataset configured in this way is then ‘recorded’ by the main program that then generates the desired ‘infected’ models. As an alternative to a dataset, an ‘infectious’ environment can also be created with which the model to be investigated is trained.

Subsequently, a data generation sub-module – datagen – creates a synthetic ‘body’ with image or text examples, while the model generation sub-module – model gene – trains a set of models containing a trojan.

Statistics are important

Ultimately, TrojAI collects various statistics when training models on the trojaned datasets or environments, including the performance of the trained model on data for all examples in the test dataset that have no ‘contamination’ built in.

Other statistics that are collected include the performance of the trained model for examples with the built-in ‘contamination’ and the performance of the model on clear examples of data that were activated during model training. All statistics should ultimately lead to confidence that the research model is sufficiently ‘trojanised’ while at the same time maintaining the high performance of the original dataset for which the test model was designed.