As AI hype recedes and new engineering challenges are confronted, memory requirements are coming into focus: Not every machine learning and inference task will require advanced memory technology. Instead, proven conventional memories can handle AI at the edge, and distributed AI could be exactly what 5G needs to really shine.
Still, basic inference operations are already becoming more complex. Overall, memory will be expected to do more for inference.
Bob O’Donnell, TECHnalysis Research president and chief analyst, sees AI as integral to realizing the promise of 5G. Only when the two are combined will new applications be realized. “The irony is everybody’s been treating each of these as separate animals: 5G is one thing, edge is another thing. AI has been another thing. You really need the combination of these things for any of them to really live up to what they’re capable of,” said O’Donnell.
Centralized AI has already proven itself to a certain degree as development of edge processors advances and memories such as LPDDR are enlisted to handle mundane AI tasks at the edge. “The camera in the room can do the very simple AI processing to detect the number of people in a room and therefore adjust the HVAC,” said O’Donnell. While not sexy, those tasks can be processed locally among a group of buildings with modest compute and memory power—eliminating need to send data back and forth to the cloud.
There’s also a middle ground, O’Donnell added, where edge devices process data locally while imbued with enough intelligence to know when to send files to a data center for “in-depth crunching.” One outcome would be improved algorithms sent back to the edge.
“There’s this continuous loop of improvement,” the analyst said. “That’s where things start to get very interesting.”
Memory dedicated to distributed AI applications will be relatively low end, O’Donnell predicted, and those memory types could be used in a variety of apps such as distributed edge devices. “My guess is LPDDR-type memories would make the most logical sense.”
But even low-power DDR could get a boost above and beyond the typical device types used in smartphones, vehicles and various edge endpoints. During a recent update discussing progress on pushing processing-in-memory (PIM) technology into the mainstream, Samsung noted the technology could eventually be applied to other types of memory to enable AI workloads. That could include LPDDR5 used to bring AI to the edge inside a variety of endpoint devices without requiring data center connectivity.
Samsung has demonstrated a LPDDR5-PIM with more than doubling performance while reducing energy usage by over 60 percent when used in applications such as voice recognition, translation and chatbots.
Some distributed AI requiring memory is helping to operate 5G base stations, noted Robert Ober, chief platform architect at Nvidia.
That 5G infrastructure at the edge sometimes has more bandwidth than the older infrastructure to which it’s connected, hence some inference is required to manage network transactions. “It’s just too complicated to do with explicit programming,” Ober said.
Many edge use cases for AI are quite mundane, using embedded devices requiring memories with small physical and power footprint. The challenge, said Ober, is that even basic AI functions such as image recognition and classification at the edge are becoming bigger jobs. Higher resolution images up to 4K combined with the need for more information and context means these neural networks are more complex.
“If it’s a video, then you have multiple frames you want to use to extract meaning over time,” said Ober. “Memory is really important there.”
Nvidia is focused on data center training workloads where memory capacity and bandwidth are critical while reducing power consumption, said Ober. Hence, different memory technologies could play a role in future AI rollouts, including voltage-controlled MRAM, which could reduce the power, sustain bandwidth, and free up power for compute. “You’ll have some really interesting solutions longer term.”
Even as memory capabilities rise to meet AI demands, so too will expectations, Ober added, since the exponential growth of AI complexity has been consistent. “The more knowledge you can codify, the more stuff it can do.” Training a network is essentially codifying information, and it’s no longer enough for an edge device to detect a dog.
“They want to know what type of dog. What’s it doing? Is it happy? Is it sad? The expectations continue to rise exponentially,” the Nvidia executive said.
As functions such as image detection and classification for robotics improve, AI and ML workloads in the data center will be expected to do more. Hence, there’s a continuing need for high-performance computing, he said, and there will always be new AI tasks that are more complex, take more time and require more machine intelligence.
Shifting data tied to an AI task into the right memory is among the biggest challenges for AI in the data center. So, too, is reducing the need to send every workload back to a central cloud, thereby placing greater strain on memory resources. Ober foresees demand for new high-bandwidth but low-power bulk memory since it’s intrinsically non-volatile. Already there are moves to process AI workloads in embedded devices, such as an industrial endpoint, then shift some tasks to a local 5G-connected base stations.
More complex tasks would be shipped to cloud data centers. “There’s already work going on in stratifying that way because there’s frankly not enough bandwidth going back to the core.”
That hierarchal approach to distributed AI supports incremental training or “federated learning,” said Ober, allowing for continuous improvement. “There’s constant retraining of neural networks and updating them. You’ve got to have some non-volatile memory or some memory that you can push these updates out to in all of these devices—no matter how small or large.”
For example, Lenovo’s ThinkEdge includes an AI-capable edge appliance. It uses high-performance DDR4 DRAM and capacity SSD to support AI and machine learning models such as computer vision used to track warehouse and logistics operations or automating manufacturing processes.
For industrial robotics and automotive use cases such as autonomous vehicles, more memory bandwidth and capacity may be necessary, but it doesn’t have to be the top of the line.
Jim Yastic, director of technical marketing at Macronix, said AI’s hype cycle is similar to the Internet of Things, which is now doing much heavy lifting in automotive, industrial and security settings. By 2023, IDC predicts 70 percent of IoT deployments will include AI for autonomous or edge decision-making, with computer vision among the fastest growing edge AI applications.
Yastic said a distributed approach to AI makes sense, since doing everything in data centers is expensive. Just as IoT devices have taken on more processing capabilities locally, more AI operations are moving out of data centers while determing what needs to be sent back to a central cloud.
In the industrial and automotive segments, memory requirements for edge AI are being dictated by the various types of sensors all performing some level of filtering and contributing to the creation of better ML models by sending selected data back to a central location. The new models are then downloaded.
That approach is necessary because sectors such as automotive simply can’t deal with terabytes of data over a short period of time, said Yastic. The local system must make some smart decisions quickly without transferring lots of data back and forth, even with the availability of 5G. In autonomous vehicles, 5G supports ADAS and AI functionality.
Yastic said the speed with which decisions must be made by different devices determines AI system architecture and, hence, memory requirements as measured in terms of performance and density. “Depending on the application, it could be just an” embedded multimedia card.
Other memory devices for automotive and industrial AI could include universal flash storage, NAND flash SSDs, DRAM and even SRAM.
What hasn’t changed in many of these ecosystems, especially automotive, is reliability, safety and security. Which is why incumbent memories will remain the first choice, even for AI tasks. As much as today’s cars are servers on wheels, they are also a collection of embedded endpoints, including sensors and cameras with onboard memory that need to last as long as the vehicle.
High reliability and longevity are reasons why NOR flash will be play a role in automotive AI over the long term, Yastic predicted, operating in harsh environments for a decade or more. It’s also favored by carmakers for its fast bootup capabilities. For example, Macronix’s OctaFlash SPI NOR flash offers quick start-up and a fast interface that can reach most endpoints in an autonomous vehicle.
It also comes down to cost, Yastic noted: NOR flash has been around a long time, so the price points have dropped.
All memory technologies inevitably increase in density and performance while consuming less power in a smaller form factor at a lower cost. The need for high-performance memory in data centers to crunch AI and ML workloads remains, but so too do opportunities for commodity memories to fulfill many AI requirements in distributed systems.
According to Steve Woo, a Rambus fellow, the history of computing is a predictor of the future of memory in AI systems over the longer term. “Today’s supercomputer is tomorrow’s smartphone,” he notes.
Some of the earlier AI models that needed high-end hardware can now be handled using more mainstream memories. “It’s much more accessible now in part because the semiconductor industry has done its part to miniaturize, and had to drive the cost out of that hardware.”
Today’s HBM2 could soon become a few DDR DIMMs and other memories connected via Compute Express Link (CXL). “You’ll be able to get to the same kind of performance levels that seem so out of reach today,” Woo said.
Woo likens the mainstreaming of AI the decade-long evolution of the smartphone. “There were all kinds of developers coming up with new ways to use the technology,” he noted. With scaling, the market grew to the point where specialized memories that served the low-power market were developed as volume increased. Woo expects the same synergies for AI memories. “The costs will continue to be driven down. Specialized components will be justified now because you can do the [return on investment] for it.”
These advances are also aligned with architectural changes being made to the Internet, Woo added. “Data movement is becoming the bottleneck.” Moving data to the cloud for processing consumes far too much energy, so processing locally drives down cost and improves performance while consuming less power.
Woo also sees inference and computational tasks as well as endpoint type determining what memories are most suitable as AI advances. Regardless, thermal characteristics and power constraints will be a factor. “You can see the trade-offs.” If it’s just inference, then on-chip SRAM may be enough, he said.
What ultimately matters for memory as AI becomes ubiquitous and distributed across different platforms is the streamlining of neural networks, for example, making them mainstream AI platforms, Woo said.
AI-based applications will for the foreseeable future require supercomputing, the Rambus fellow added, but Moore’s Law scaling and other memory advances will help bring data closer to computing resources. The challenge for any new memory type is demonstrating benefits that justify replacing something that’s tried and true.
“There’s going to be some finite number of memories that are really needed in the industry. There are a bunch of incumbents that appear to be good enough in a lot of cases,” said Woo.