Discover more from Lukewarm Security Info
Poisoning ChatGPT and the world hub of information.
ChatGPT is an AI Language Model that takes information it reads online to answer questions that users pose to it. But how does it know if the information it reads is right?
What is ChatGPT?
Large Language Models are a type of Machine Learning Model, which is a subtype of Artificial Intelligence (AI). These models are designed to be able to read, make predictions from, and learn from data. Therefore, the more data available, the more accurate the results are.
There are many Machine Learning Models, of which Language Models are one of them. The “Large” simply refers to the scale and amount of data given to the Language Model.
When the data goes bad.
AI can only base it’s results on the data it’s been trained with. If an AI machine that generated images (instead of text, for example) was asked to print out a picture of a dumbbell without being told what a dumbbell is, it would have a difficult time generating one, and at the very least would generate one that likely looks nothing to how we would consider a dumbbell to look (probably some strange combination of two objects that contain the words “dumb” and “bell” in it).
Consider a case where the AI is presented with a large collection of images of dumbbells and it’s asked to generate an image of a dumbbell. Google’s DeepDream project did just that, and this was the result.
Why did this happen? Simple. The majority of the data supplied were likely images like the ones below.
What about bad data in ChatGPT?
Large Language Models like ChatGPT gather their data from text available on the internet in the forms of books, articles, websites and likely forums. What this can mean, in worst-case scenario, is that ChatGPT and other language models can be racist / sexist / discriminatory purely based on the data it reads in. This has been a big challenge for Language Models to master; how to detect and remove discriminatory views (not just language) from data.
The Echo Chamber.
Discriminatory language isn’t the only challenge that Large Language Models (LLM’s) face. Since LLM’s gather their data from internet sources, what is stopping a tool like ChatGPT from reading its own output that someone has pasted on the internet, adding itself to the pool of data, and reading what it learnt from itself as human output to influence how it understands humans to interact?
This probably doesn’t sound like that big of an issue since ChatGPT would likely have some way to detect whether or not the output was from itself, but what if there was another LLM? If that other LLM received output from ChatGPT as part of it’s data collection, then it would receive information filtered through the lens of ChatGPT which is built to simply summarise an answer to questions that the user prompts them with.
The issue of an echo chamber lies in the loop of LLM’s sending each other information.
It’s important to emphasise that these LLM’s dont “understand” language in the way that we do. They are coded to be able to read the data given to them and learn what a typical response to that question would be. ChatGPT and other LLM’s simply regurgitate and summarise information found on the internet.
The first call.
Given the simplicity of spamming the internet with repeated comments from bots (spend 5 minutes on twitter and you’ll understand what I mean), and the knowledge of locating the places in which information is obtained from the internet for these LLM’s, there’s the very real potential for ChatGPT or another LLM to respond to a user’s question with misinformation that poisoned its way into the data set. Since this (ChatGPT) would probably be a trusted resource, or with people posting the results of ChatGPT comments onto the internet, other LLM’s would likely be poisoned with the same misinformation - thus beginning the echo chamber.
How do we silence the echo?
There would be multiple ways, but two ways to stop misinformation from spreading like wildfire through ChatGPT or other LLM’s are listed below.
Blocking Invalid Data
This involves detecting whether or not the data being read is misinformation or valid information. The difficulty here lies in the ability to actually detect false information in a way that isn’t just by consensus (since this can be overrun by bots).
Blocking LLM output from being counted as data
Assuming that the LLM only wants to read human-generated input, a solution to this problem is already being put into active use with the primary goal being to detect whether a student actually did their work or got ChatGPT to do it for them.
GPTZero, created by Edward Tian, utilises the “perplexity” and “burstiness” of output generated by ChatGPT to determine whether or not it was written by a human. This will, however, likely be a cat and mouse game of ChatGPT generating output to avoid this detection, and then the GPTZero finding another way to detect AI output.
ChatGPT is a very powerful tool that, if used improperly, can result in an echo chamber spreading misinformation. Data is powerful, good data can yield good results, but bad data can yield bad results. The work lies in figuring out what is bad data and blocking it so as to not poison these Machine Learning Models.