When Aidan Zhang arrived at Carnegie Mellon University, he brought with him a deep curiosity about artificial intelligence and a passion for exploring its boundaries. This interest led him to tackle one of the most complex challenges in AI: improving the way models generate and refine code. The goal is to improve accuracy, efficiency, and adaptability, helping developers create reliable software faster.
Zhang’s work at CMU began in Li Li(opens in a new window)the laboratory of Institute of Language Technologies(opens in a new window)where he was welcomed into weekly meetings and matched with a Ph.D. mentor. Currently a second year student specializing in AI within the School of Computer Science(opens in a new window)Zhang Makes Research Advances Through CMU Summer Undergraduate Research Fellowship(opens in a new window) (SURF).
“I’ve always been fascinated by how big language patterns work,” Zhang said. “It’s crazy that machines can communicate almost as well as humans. I wanted to understand how this is possible.”
A smarter way to teach machines to code
Zhang’s SURF project focuses on code generation, a frontier in AI research where large language models (LLMs) are trained to write executable code based on natural language prompts. But instead of relying on traditional training methods, Zhang is experimenting with multi-round reinforcement learning, a technique that allows models to learn iteratively from feedback, much like human debugging code.
“Coding is inherently a multi-step process,” Zhang explained. “You write code, test it, get feedback, and refine it. We try to teach models to do the same – rewarding them not only for getting it right the first time, but also for getting better over multiple attempts.”
The goal? Build a state-of-the-art code generation model that outperforms existing models and demonstrates the effectiveness of this new training approach.
Challenges and surprises
Despite the promise of multi-round reinforcement learning, Zhang found that current models struggle to improve through feedback.
“They often get stuck in loops,” he said. “Even after several rounds of feedback, the improvement is minimal. It’s like they come to their best conclusion early and can’t move beyond it.”
This limitation, Zhang believes, arises from the way models are trained – primarily to produce correct answers the first time, not to incorporate feedback and iterate.
From research to concrete impact
Zhang’s interest in hallucinations – when models generate false or misleading information – remains a secondary passion. He even considered turning it into a startup idea.
“If we could create a product that could detect and filter hallucinations in LLMs, that would be huge,” he said.
Whether pursuing entrepreneurship or research, Zhang is grateful for the opportunities CMU provides.
“None of this would have been possible without CMU,” he said. “Faculty support, access to the lab, inspiring peers — all of this has shaped what I want to do.” »
As Zhang continues to refine his model and prepare to submit a paper to academic conferences, he is already thinking about the future.
“I want to make an impact, whether it’s through a startup or research in industry,” he said. “And I think this project is a great first step.”
