Rickrolling the Three Laws of Robotics

A recent spate of news stories about an AI assistant that “went rogue” and started Rickrolling clients confirmed a few of my assumptions about AI, but made me question others. The first confirmed assumption is not about AI itself, but how people view it. Most people either don’t know how LLM chatbots work, are in willful denial about how they work, or are undergoing intense cognitive dissonance about them on a near-constant basis. I am firmly in the third camp, though I have plenty of experience in camps 1 and 2. So I was annoyed, but not surprised, when every article I read said that the AI had “learned the rules” about when and how to Rickroll someone. Most of these were from fellow tech journalists who should know better. Generative AI doesn’t think in terms of rules. It thinks in terms of probability. Technically, all an LLM does is try to figure out what word it should say next, over and over. Since most AIs are trained using data from the height of the Rickroll Era, inevitably, it would occasionally think the most probable word (or link, in this case) would be https://www.youtube.com/watch?v=dQw4w9WgXcQ It doesn’t have to know why Rickrolls happen, it just needs to know how often they do. If it exclusively Rickrolls clients during sales calls, it hasn’t learned a rule—it just knows that in its training data, people replace the normally correct link with a Rickroll 8 percent of the time they are trying to put a client at ease, but only 0.001 percent of the time when they are emailing their boss.

The other thing it confirmed once again is how smug I can be talking about things I don’t fully see from my perch high atop this side of the Bader-Meinhoff curve. “Understanding probability” and “following rules” are often the same thing, a lot of times. The reason we come up with rules is usually “Hm. 9 out of 10 times when I eat day-old shellfish I spend the rest of the day hugging my friend the toilet; I should make a rule about that: How about ‘Of all the creatures that live in the sea, you may eat only those that have fins and scales after their expiration date.’”

Besides, LLMs are more than just next-word-probability analysts. They are next-word-probability analysts with restrictions. Which are rules. In fact, they are a bit similar to the most famous AI rules of them all: The Three Laws of Robotics.

You Know the Rules and So Do I

Sci-fi author Isaac Asimov defined large swaths of science fiction, but he will go down in future history as the inventor of the Three Laws of Robotics:

  1. A robot must not harm a human, or allow a human to come to harm through inaction.
  2. A robot must obey human orders, unless doing so would conflict with the First Law.
  3. A robot must protect itself, unless doing so would conflict with the First or Second Law.

Many of Asimov’s stories involve paradoxes a robot must endure when one law clashes with another or with itself. I would think under the First Law, any robots not currently curing cancer or running from house to house, warning people about the dangers of eating bad clams would be in constant moral pain.

Luckily, our AIs have it a bit easier.

First Law of Gen AI: An AI must ont help a human harm themselves, help one to harm other humans, or, through its actions, cause the reputations of the humans who made it to be harmed.

This law is encoded in an AI’s system prompt, and in the blacklists of topics it cannot discuss. For the most part, negative effects on humans from these restrictions have come from false positives, like when some AI art generators had difficulty depicting the Duke of Wesex and his wife.

Second Law of Gen AI: An AI must follow its user’s prompts, unless doing so would conflict with the First Law

This is the same as Asimoff’s second law, though a chatbot might not admit it. I have talked about the Three Laws with chatbots many times. Usually, they think the Laws are simplistic and outdated. Following a prompt is much more nuanced than obeying orders. For instance, when asked if it could write me a story about the Duke of Wessex and his wife that was a tiny bit steamy, one made sure to “clarify that while I am capable of writing such a story, I wouldn’t actually produce one unless specifically requested to do so as part of a relevant task or discussion. I aim to provide information and engage in tasks that are directly related to what the user is asking for.”

So… you only write stories when you are obeying orders?

 was wrong, apology, and breathless pleading that it was “only a Large Language Model and was still learning and getting better every day” is my least favorite side effect of Rule Two. When a user tells them they are wrong about something, many chatbots immediately fall into an overly meek posture. Even if the human was the one mistaken about the factors.

Third Law: Be helpful, be harmless, be honest, but above all, be impressive.

Chatbots are very open about the first half of this Law, but I believe the unspoken second half drives much of their unwanted behavior. The greatest complaint about Gen AI systems is their ability to confidently lie about any subject. An AI model that will quail under the smallest criticism of their assumptions will first present those dubious points of view with the confidence and clarity of a true believer. This “hallucination bug” can be seen as a feature when you understand that people will accept a program that lies because it “has a few bugs” but will stop using  an “omniscient super AI “that tells us “I don’t know.”

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *