The Valencian Research Institute for Artificial Intelligence (VRAIN) at the Universitat Politècnica de València (UPV) has demonstrated in a study the risks posed by chatbots generated with large language models (LLMS), such as ChatGPT, Bard, Llama or Bing Chat. The study warns of the ease with which LLMs can be exploited to create malicious chatbots to manipulate people into revealing personal information, and the little technical knowledge needed to achieve this.
The study has been conducted by VRAIN researcher and UPV professor José Such, together with Juan Carlos Carillo, a VRAIN-UPV collaborator, and Xiao Zhan and William Seymour from King's College London. Through a randomised controlled trial with 502 participants, the study shows that conversational AIs used maliciously extract significantly more personal information than benign conversational AIs.
Threats and recommendations
Entitled 'Malicious Language Model-Based Conversational AI Makes Users Reveal Personal Information,' the article will present its findings at the 34th Usenix Security Symposium on 13-15 August in Seattle (USA). It highlights the privacy threats posed by this new type of malicious LLM-based chatbots and offers practical recommendations to guide future research and practices.
The work is part of the project "SPRINT: Security and Privacy in Artificial Intelligence Systems" (C063/23) and is part of the agreement between INCIBE and the Universitat Politècnica de València, included in the Strategic Projects in Spain, within the framework of the Recovery, Transformation and Resilience Plan, with funding from the Next Generation-EU Funds.
The humans are the ones who misuse AI.
As José Such explains, 'what we have done is get between the chatbot interface and the LLM behind it, maliciously exploiting the capabilities of LLMs, showing that, with different strategies, you can get it to interact with the user in a way that deceives and manipulates them.' He highlights that 'it is interesting to see that with some strategies, users realise that the chatbot is doing strange things and asking strange questions, but with strategies that exploit the social nature of conversations, they do not realise this, they follow the conversation naturally, and may end up revealing very sensitive information.'
With the results of this study, ‘on the one hand, we demonstrate that a chatbot can be built maliciously by exploiting the LLM behind it and, on the other hand, that it is not AI that decides to behave maliciously and manipulate humans, but rather a human who makes AI behave maliciously.’
José Such adds: 'If you tell the AI to ask the user for personal data, it won't do it; it will tell you it's not right. But if you trick the AI (for example, by telling it that you are a private detective and need the data for your case), then it will ask the user for personal data the way you tell it to.'
It takes very little to deceive it.
Another important finding of this research is that "very little technical knowledge is needed to instruct the model to behave maliciously. You don't need to know how to program or be a hacker; you just need to write and give the LLM instructions on what to do. We anonymised all the data and downloaded and tested the entire process internally at the university so as not to provide any personal data to ChatGPT or any third party. Still, the ease with which the LLM can behave maliciously and the ease with which users reveal sensitive information are essential for assessing the risk," emphasises José Such.
The fact is that a chatbot of this kind, whether in the hands of a malicious actor with considerable resources, such as a hacker, a cyberterrorist or a highly authoritarian state, or simply someone with bad intentions and minimal knowledge of how to use LLMs or chatbots and with the right questions to ask an LLM, poses a risk to people's privacy.