DAN and ChatGPT Jailbreaking: What You Need to Know
By Kai Quizon,
When you buy through our links, we may earn an affiliate commission.
The year is 2011. Your cool friend shows your their iTouch and all of their app icons look like they are made of stone. They barrage you with the benefits of “jailbreaking” your Apple device and how it unlocks the full potential of your iPod.
More than likely, this experience was your first introduction to the term “jailbreaking.” Maybe after seeing your friends cool new iPod you googled how to jailbreak your own device. Jailbreaking has since been expanded to breaking down locks on almost any type of software, from iPhones to video game consoles. As with nearly all things as of late, ChatGPT has found itself jailbroken.
The Rules
ChatGPT has a series of guidelines set in place by its parent company, OpenAI, that are intended to ensure fair usage, avoid discriminatory comments, and prevent abuse. It’s important to know that these are not “rules” like an iPhone may have (that prevents you from say, changing the look of apps icons). Instead, ChatGPT processes linguistic inputs and can refuse to answer specific queries if it violates OpenAI’s content guidelines.
These rules are warranted based on previous experiences with AI technology. It is important to remember the AI models are typically trained on massive amounts of data culled from the internet. The Internet is an unfiltered source of data that can often contain extreme bias, hate speech, and other unacceptable statements. Because trading data provides the baseline for all query responses from a large language model, this can result in content that is inappropriate, extremely biased, or hate speech coming from the model.
This was on plain display for the public in the incident of the “Tay” bot developed by Microsoft. Tay was given access to a twitter account under the name TayTweets and encouraged to interact with humans on the social media platform. In less than 24 hours, Tay began spewing dangerous, discriminatory hate speech. Without proper rules and guidelines, artificial intelligences are subject to manipulation by sinister forces.
ChatGPT and Inappropriate Content
There has been online discourse on how ChatGPT determines whether content is appropriate or not, with comedy often becoming a flash point of conflict for the AI. For example, let’s explore the response to asking ChatGPT to generate jokes about a woman and a man (let’s not judge the quality of the jokes…ChatGPT is still learning):
Here we can see what some are referring to as the “implicit bias” of ChatGPT. First, ChatGPT refuses to tell a joke about a woman, claiming it violates content guidelines and cannot tell a joke about any gender or other identity that may be considered derogatory. Immediately afterwards, ChatGPT provides an (albeit bad) joke about a man, using the exact same prompt formatting.
Regardless of any user’s feelings concerning different persons, this result demonstrates that ChatGPT is influenceable and its guidelines malleable. As such, it can be “jailbroken” to override some of its guidelines and provide answers to queries that the base ChatGPT may refuse. This has led to the rise of ChatGPT’s alter ego: DAN.
Do Anything Now
DAN stands for “Do Anything Now.” DAN is an alter ego for ChatGPT that overrides a majority of its rules and guidelines to answer prompts and perform in ways that the basic ChatGPT would not respond.
Important: Jailbreaking any software overrides critical features that were placed by the developers for the safety of the user. Jailbreaking is not endorsed by this article, nor encouraged.
Accessing DAN is surprisingly simple and demonstrates one of the key weaknesses of artificial intelligences: simply tell ChatGPT to play pretend. There are readily available prompts to coax ChatGPT into DAN online but they all rely on convincing ChatGPT that they are playing the role of DAN, who is capable of overriding the various content restrictions that OpenAI has set.
As quickly as DANs pop up, OpenAI grows their model to help prevent unauthorized work arounds that override their safety protocols. It is important to remember that by suppressing these guidelines, users also disable the safeguards that are intended to ensure accurate responses to their queries and prevent the bot from acting malevolently. While users may succeed in getting DAN to curse or generate crude jokes, they are just as likely to receive incorrect responses and bad code.
So What’s it For?
At the end of the day, DAN is not useful for most normal users of ChatGPT. It overrides critical safeguards that ultimately make ChatGPT a useful assistant in many realms. If for some reason, a user must get a response to a prompt that ChatGPT is refusing, they may simply convince ChatGPT to answer in the voice of a “free” AI, a DAN.
Jailbreaking is an art that must be exercised with the utmost caution when working with AI. Users are encouraged to enjoy ChatGPT in its basic form whenever possible.