AI - Global Perspectives

China-US AI Perspectives on AGI: Researcher Song-Chun Zhu

At his lavishly funded Beijing Institute for General Artificial Intelligence, Zhu is one of a handful of individuals who the Chinese government has entrusted to push the AI frontier. His ideas are now shaping undergraduate curriculums and informing policymakers. But his philosophy is strikingly different from the prevailing paradigm in the US. American companies such as OpenAI, Meta and Anthropic have collectively invested billions of dollars on the premise that, equipped with enough data and computing power, models built from neural networks – mathematical systems loosely based on neurons in the brain – could lead humanity to the holy grail of artificial general intelligence (AGI). Broadly speaking, AGI refers to a system that can perform not just narrow tasks, but any task, at a level comparable or superior to the smartest humans. Some people in tech also see AGI as a turning point, when machines become capable of runaway self-improvement. They believe large language models, powered by neural networks, may be five to 10 years away from “takeoff”.

Zhu insists that these ideas are built on sand. A sign of true intelligence, he argues, is the ability to reason towards a goal with minimal inputs – what he calls a “small data, big task” approach, compared with the “big data, small task” approach employed by large language models like ChatGPT. AGI, Zhu’s team has recently said, is characterised by qualities such as resourcefulness in novel situations, social and physical intuition, and an understanding of cause and effect. Large language models, Zhu believes, will never achieve this. Some AI experts in the US have similarly questioned the prevailing orthodoxy in Silicon Valley, and their views have grown louder this year as AI progress has slowed and new releases, like GPT-5, have disappointed. A different path is needed, and that is what Zhu is working on in Beijing.

After Mao died in 1976, reformers took over the Communist party and soon scientific education replaced Marx as the new religion. Zhu was the top student at his local high school, and won a place at one of the nation’s best universities, the University of Science and Technology of China (USTC) in the city of Hefei, where he majored in computer science. By 1986, when Zhu began his degree, relations between the US and China had normalised and some of his professors were among the first batch of Chinese scholars sent on state-sponsored visits to the US. They brought back hauls of books to be translated. “At the time, we saw America as a beacon, a cathedral of science,” Zhu said.

......

Among the imported books was Vision by David Marr, a British neuroscientist who had famously broken down human vision – a biological process – into a mathematical framework. Marr’s work suggested that machines might one day be able to “see” the world as humans do. Zhu was hooked. Ever since then, he has dreamed of mapping intelligence – how we think, reason and exercise moral judgment – with the mathematical precision of a physicist charting the cosmos. Building an AGI was, for him, not an end goal, but a part of his deeper pursuit: to discover a “theory of everything” for the mind.

Zhu is known to have cried twice in public over recent years. The first was when recounting to his students the story of his acceptance to Harvard. In 1991, when Zhu graduated from USTC, he was so poor he couldn’t afford the application fees required by American universities. He applied anyway, without paying the fees, though not to the country’s most elite schools – he didn’t dare. In any case, he was summarily rejected. The following year, one of his professors suggested that Zhu apply again, and that Ivy League schools, which had more money, might not care about the missing application fee. A few months later, he was astonished to receive a thick yellow envelope from Harvard, offering him a full fellowship in the university’s doctoral programme in computer science. “It changed my life,” Zhu said.

.....

The man responsible was David Mumford, a decorated mathematician and Fields medalist who, a few years prior, had begun working on computer vision, a field of AI focused on enabling machines to recognise and process visual information. When Mumford came across an applicant from central China who espoused a “theory of everything” for intelligence, and cited Marr as his muse, he was captivated. “I was just flabbergasted at his vision and how he was going about approaching AI in this comprehensive way,” Mumford told me. In a 2020 interview, Mumford, who became Zhu’s adviser, mentioned the moment he realised he “was dealing with something special”. Zhu had taken an hour-long exam, but left one question blank. Not because it was hard, but because it was too easy. “He said, ‘This is ridiculous,’” recalled Mumford, “but he answered everything else perfectly.”

During our conversations over the course of this spring, Zhu seemed to associate Harvard with the US he had dreamed of in his youth: an open laboratory where a country bumpkin from rural China could, with enough gumption, make technological miracles into reality. This was the US of Edison and Einstein, the land that welcomed Jewish physicists fleeing Hitler’s Germany and gave them refuge, dignity and labs at Los Alamos. In Zhu’s eyes, it was a country that rewarded intellect and ambition over race, ideology and nationality. At Harvard, he never felt out of place, though occasionally he was puzzled by his new home. On one occasion he asked his classmate Nitzberg why no one picked the apples from the trees around Harvard campus. He thought it was a waste of food. Then came a series of breakthroughs. In the late 1980s, LeCun, then a researcher at AT&T Bell Labs, developed a powerful neural network that learned to recognise handwritten zip codes by training on thousands of examples. A parallel development soon unfolded at Harvard and Brown. In 1995, Zhu and a team of researchers there started developing probability-based methods that could learn to recognise patterns and textures – cheetah spots, grass etc – and even generate new examples of that pattern. These were not neural networks: members of the “Harvard-Brown school”, as Zhu called his team, cast vision as a problem of statistics and relied on methods such as “Bayesian inference” and “Markov random fields”. The two schools spoke different mathematical languages and had philosophical disagreements. But they shared an underlying logic – that data, rather than hand-coded instructions, could supply the infrastructure for machines to grasp the world and reproduce its patterns – that exists in today’s AI systems such as ChatGPT.

Throughout the late 1990s and early 2000s, Zhu and the Harvard-Brown school were some of the most influential voices in the computer vision field. Their statistical models helped convince many researchers that lack of data was a key impediment to AI progress. To address this problem, in 2004, two years into his time at UCLA, Zhu and a Microsoft executive set up the Lotus Hill Institute in Zhu’s home town of Ezhou, China. Researchers annotated images of everyday objects such as tables and cups in their physical contexts, and fed them into a big dataset that could be used to train a powerful statistical model. Lotus Hill was one of the earliest attempts to construct the large-scale datasets needed to improve and test AI systems. By 2009, however, Zhu was losing faith in the data-driven approach. His Lotus Hill team had annotated more than half a million images, but Zhu was troubled by a simple problem: what part of an image one annotated depended, somewhat arbitrarily, on what task one wanted the machines to achieve. If the task was to identify a cup for a robot to grasp, the handle’s position might be critical. If the task was to estimate the cup’s market value, details like the brand and material mattered more. Zhu believed that a truly generalisable intelligence must be able to “think” beyond the data. “If you train on a book, for example, your machine might learn how people talk, but why did we say those words? How did we come to utter them?” Zhu explained to me. A deeper layer of cognition was missing. In 2010, Zhu shut down the institute. He set out instead to build agents with a “cognitive architecture” capable of reasoning, planning and evolving in their physical and social contexts with only small amounts of data.

“Just as I turned my back to big data, it exploded,” wrote Zhu some years later, in a message to his mentor, Mumford. The most explicit clash between Zhu and the neural network school occurred in 2012, just months before the latter’s ImageNet triumph. At the time, Zhu was a general chair of CVPR, the foremost computer vision conference in the US, and that year a paper involving neural networks co-authored by LeCun was rejected. LeCun wrote a furious letter to the committee calling the peer reviews “so ridiculous” that he didn’t know how to “begin writing a rebuttal without insulting the reviewers”. Even today, Zhu maintains that the reviewers were right to have rejected LeCun’s paper. “The theoretical work was not clean,” he told me. “Tell me exactly what you are doing. Why is it so good?” Zhu’s question gets to the heart of his problem with neural networks: though they perform extraordinarily well on numerous tasks, it is not easy to discern why. In Zhu’s view, that has fostered a culture of complacency, a performance-at-all-cost mentality. A better system, he believes, should be more structured and responsible. Either it or its creator should be able to explain its responses.

Whatever Zhu’s reservations, the ImageNet victory triggered an AI gold rush, and many of the pioneers of neural networks were celebrated for their work. Hinton would go on to join Google. LeCun moved to Meta, and Ilya Sutskever, a co-author of the neural network that won ImageNet, helped found OpenAI. In 2018, Hinton and LeCun, along with Bengio, shared the Turing award – computer science’s most prestigious prize – for their work on neural networks. In 2024, Hinton was one of the joint winners of the Nobel prize in physics for his “foundational discoveries and inventions that enable machine learning with artificial neural networks”. In the mid-to-late 2010s, as neural networks were making startling progress on problems from facial recognition to disease diagnosis, Zhu was reading philosophy – the Confucians “understand the world much better than AI researchers”, he told me – and working quietly on his cognitive architecture. He was walking a lonely path. In 2019, Zhu served again as a general chair of the CVPR conference. As he read the submitted papers, his heart sank. Nearly all of them focused on squeezing incremental gains from neural networks on narrow tasks. By this time, Zhu’s opposition to neural networks had become visceral. A former doctoral student at UCLA recalled being berated by Zhu several times for sneaking neural networks into his papers. His inner circle learned to avoid forbidden phrases – “neural nets”, “deep learning”, “transformer” (the “T” in GPT). On one occasion, during an all-hands meeting at a LA-based startup Zhu had founded, a new recruit unwittingly added a slide on deep learning to his presentation. According to someone who was present, Zhu blasted him in front of the whole company. (Zhu told me this was “exaggerated”.)

“When he has a vision,” Zhu’s longtime collaborator told me, with some understatement, “he has a very strong belief that he’s right.”

In the US, academics who, in principle, are never leashed, are now feeling a sudden yank from the Trump administration. Billions of dollars in research funding have been paused until universities acquiesce to what the Harvard University president described as “direct governmental regulation” of the university’s “intellectual conditions”. In March, Columbia University agreed to new oversight of its Middle Eastern, South Asian and African Studies departments. Tony Chan, the former president of Hong Kong University of Science and Technology and a former faculty dean at UCLA, has experience in both university systems. He told me what he is seeing now in the US is worse than anything he ever saw in China. “We used to be able to clearly say that US universities were independent of the politicians. That was the advantage of the American academic system,” Chan told me. “I cannot say that any more.”

Zhu has a reputation as a tough academic adviser, with strict intellectual orthodoxies. According to his current students in Beijing, he has a go-to refrain, now immortalised as a gif that circulates in their group chats: “If you do that again, you will be dismissed!” Zhu is not, in other words, easily swayed. So when OpenAI unveiled ChatGPT in 2022, and much of the Chinese tech sector was stunned – one Chinese AI founder admitted he felt “lost” and “couldn’t sleep”, demoralised by the feeling of being bested again by the west – Zhu was untroubled. At an AI panel in early 2023, he avoided any praise for ChatGPT as a technical feat. Large language models, he said, “still fall short” of AGI because they do not “have the ability to understand or align with human values”.

Later that year, Mumford, the professor who Zhu credits with changing his life by admitting him to Harvard, travelled to Beijing to receive a maths prize. He was in his 80s and had been retired for nearly a decade. Were it not for the chance to “find out what Song-Chun was doing”, Mumford told me, he likely wouldn’t have made the trip. The two share a close bond, and used to meet regularly at Zhu’s lab in UCLA. In Zhu’s office at Peking University, there is a framed letter from Mumford to Zhu in which he wrote: “I feel that you are truly my intellectual heir.” They do not agree on everything, however. While Zhu had largely dismissed neural networks, Mumford came to see something profound in their mathematical structure, and he wanted to nudge his old student to reassess his views. “More than anything else,” Mumford told me, “what I was trying to convey was that I felt BigAI had to have a big team working on deep learning techniques in order to be successful.”

In Beijing, Mumford strolled with Zhu through the creeks, willows and paved roads of the Peking University campus, and dined with Zhu’s family. Then Mumford pressed his case. Zhu’s friends and students told me that it appears to have worked – somewhat. He has allowed his students to experiment with transformers – the most advanced neural network architecture – on some tasks. Researchers who once sneaked neural networks into their projects like contraband say they can use them more openly. Zhu is “by far the most brilliant student in computer vision I ever had”, Mumford later told me. And yet “it took him a long time to see that deep learning was doing tremendous things. I feel that was a major mistake of his.”

Nevertheless, neural networks will always play a circumscribed role in Zhu’s vision of AGI. “It’s not that we reject these methods,” Zhu told me. “What we say is they have their place.”

At the previous year’s tech forum, BigAI had unveiled a virtual humanoid child named TongTong, who, they hoped, would have capabilities that most AIs lack. Researchers widely agree that commonsense intuitions about how the physical and social world work are among the hardest things for neural networks to grasp. As LeCun recently put it: “We have LLMs that can pass the bar exam, so they must be smart. But then they can’t learn to drive in 20 hours like any 17-year-old, they can’t learn to clear up the dinner table, or fill in the dishwasher like any 10-year-old can in one shot. Why is that? What are we missing?” TongTong wasn’t ready to practise law, but it seemed to be able to load a dishwasher. It was designed to mimic the cognitive and emotional capacities of a three- to four-year-old child.

This year, the BigAI team was debuting TongTong 2.0, which they claim has the capabilities of a five- or six-year-old. On a large video screen, TongTong 2.0 took the form of an animated girl playing in a virtual living room. At the front of the conference room, a BigAI engineer was going through a live demonstration of TongTong’s abilities. When the engineer asked TongTong to work with her friend LeLe, another AI agent, to find a toy, TongTong appeared to avoid areas her friend had already searched. Later, when TongTong was asked to retrieve a TV remote from a bookshelf that was out of reach, she used a cushion to give herself an extra boost. (When prompting ChatGPT to do similar tasks, researchers have found it to be an “inexperienced commonsense problem solver”. Zhu believes that this weakness is not one that deep learning systems such as ChatGPT will be able to overcome.)

For now, TongTong exists only as a software operating within a simulated environment, rather than a 3D robot in the physical world. After the presentation, BigAI announced several partnerships with robotics companies. A crucial test of Zhu’s technology will be whether it can exist as an embodied system and still perform the reasoning and planning he ascribes so much weight to. Zhu is remarkably consistent in his views, but the way he frames his message has shifted over the years. In his speech, his rhetoric occasionally echoed that of party officials, who issue warnings not to follow the west on issues such as free trade and human rights. China, Zhu said, needed to “resist blindly following” the Silicon Valley narrative and develop its own “self-sufficient” approach to AI. (“The officials really like how he frames things,” one of his former students told me.) And yet in my four meetings with Zhu, he struck me as more intensely animated by the stakes of his intellectual quarrels than by international competition between the two countries where he had each spent exactly half his life. In service of his ambitions, he had learned to speak the Communist party’s vernacular.

By the time I left Zhu’s courtyard residence, it was the late afternoon. The sun had slanted below the rooftops, setting the magnolia blossoms aglow in a wash of pink. Zhu accompanied me back to the lattice fence that marked the entrance to his office. He wanted to reiterate that politics was not what was motivating him. “Over the last 30 years, I’ve been focused on one thing. It’s the unified theory of AI. To build understanding. That’s my only drive,” he told me. He brought up his research with Mumford again. “The Harvard and Brown school” of computer science, Zhu said, proudly. “That’s what we’re carrying on here.”

n 1966, an MIT professor named Joseph Weizenbaum created the first chatbot. He cast it in the role of a psychotherapist. A user would type a message on an electric typewriter connected to a mainframe. After a moment, the “psychotherapist” would reply.

User: Men are all alike. Chatbot: IN WHAT WAY User: They’re always bugging us about something or other. Chatbot: CAN YOU THINK OF A SPECIFIC EXAMPLE User: Well, my boyfriend made me come here. Chatbot: YOUR BOYFRIEND MADE YOU COME HERE User: He says I’m depressed much of the time. Chatbot: I AM SORRY TO HEAR YOU ARE DEPRESSED

Weizenbaum published this sample exchange in a journal article that explained how the chatbot worked. The software was relatively simple. It looked at the user input and applied a set of rules to generate a plausible response. He called the program Eliza, after Eliza Doolittle in Pygmalion. The cockney flower girl in George Bernard Shaw’s play uses language to produce an illusion: she elevates her elocution to the point where she can pass for a duchess. Similarly, Eliza would speak in such a way as to produce the illusion that it understood the person sitting at the typewriter.

“Some subjects have been very hard to convince that Eliza (with its present script) is not human,” Weizenbaum wrote. In a follow-up article that appeared the next year, he was more specific: one day, he said, his secretary requested some time with Eliza. After a few moments, she asked Weizenbaum to leave the room. “I believe this anecdote testifies to the success with which the program maintains the illusion of understanding,” he noted.

Weizenbaum’s nightmares: how the inventor of the first chatbot turned against AI – podcastRead more

Eliza isn’t exactly obscure. It caused a stir at the time – the Boston Globe sent a reporter to go and sit at the typewriter and ran an excerpt of the conversation – and remains one of the best known developments in the history of computing. More recently, the release of ChatGPT has renewed interest in it. In the last year, Eliza has been invoked in the Guardian, the New York Times, the Atlantic and elsewhere. The reason that people are still thinking about a piece of software that is nearly 60 years old has nothing to do with its technical aspects, which weren’t terribly sophisticated even by the standards of its time. Rather, Eliza illuminated a mechanism of the human mind that strongly affects how we relate to computers.

Early in his career, Sigmund Freud noticed that his patients kept falling in love with him. It wasn’t because he was exceptionally charming or good-looking, he concluded. Instead, something more interesting was going on: transference. Briefly, transference refers to our tendency to project feelings about someone from our past on to someone in our present. While it is amplified by being in psychoanalysis, it is a feature of all relationships. When we interact with other people, we always bring a group of ghosts to the encounter. The residue of our earlier life, and above all our childhood, is the screen through which we see one another.

Last updated