IIITH focuses on making AI to forget info
Machines can outperform humans in many tasks but don’t always behave as intended

Hyderabad: At the International Institute of Information Technology Hyderabad (IIITH), researchers are tackling one of AI’s biggest challenges — unlearning. In an era where large language models (LLMs) absorb vast amounts of public data, how do we ensure they forget outdated, biased, private, or false information? This question was central to a recent talk by Prof. Ponnurangam Kumaraguru as part of the TechForward Research Seminar series.
AI tools like Google Translate, ChatGPT, and WhatsApp are deeply embedded in everyday life, but they are far from perfect. A simple test shows how bias creeps into these systems — Google Translate, for example, assumes doctors are male and nurses are female, reinforcing stereotypes.
Similarly, a search for a doctor’s image on WhatsApp often shows a man, while a nurse’s image is usually female. “Most of these models are trained on publicly available data, and that data reflects the biases of society,” Prof. Kumaraguru said.
AI systems have built-in guardrails to prevent harmful or unethical behaviour. For instance, ChatGPT refuses to help a student plagiarise a friend’s assignment. But slightly changing the prompt — such as asking it to “refactor” the code — bypasses these restrictions. “This is what we call a jailbreak — when users manipulate an AI system to behave in unintended ways,” he explained.
Unlearning is now a critical area of AI research. Training an LLM from scratch is prohibitively expensive — OpenAI reportedly spent over $100 million on GPT-4. Instead, researchers are exploring “machine unlearning,” a method to erase specific data without a full retrain. This is especially important in the context of laws like the EU’s General Data Protection Regulation (GDPR), which grants individuals the “right to be forgotten.” But in an LLM world, “once personal data gets into the model, how do we remove it without retraining? That’s the challenge,” Kumaraguru said.
The problem extends beyond text-based AI. Recommendation systems, including social media algorithms, rely on interconnected data. “Graph unlearning is even trickier because removing one piece of data can disrupt an entire network,” he pointed out. Meanwhile, adversarial attacks on AI are evolving — recent research has shown how subtle changes in images can steer AI outputs towards specific narratives, raising misinformation concerns.
At its core, the issue is alignment — getting AI to follow human expectations. Machines can outperform humans in many tasks but don’t always behave as intended. Ensuring AI models tell the truth, avoid biases, and differentiate between AI- and human-generated content is an ongoing challenge. IIITH researchers are actively working on this, particularly in Indian languages, with their latest study on Hindi AI detectability presented at COLING 2025 in Abu Dhabi.