Downloads
Keywords:
Architectural Features of Extended Retrieval Generation with External Memory
Authors
Abstract
This article examines the RoCR framework, a Retrieval-Augmented Generation (RAG) system optimized for edge deployment in latency-sensitive environments such as real-time search, product recommendation, and dynamic content generation in eCommerce platforms. RoCR leverages Compute-in-Memory (CiM) architectures to enable fast, energy-efficient inference at scale. At the core of the solution is the CiM-Retriever, a module optimized for performing max inner product search (MIPS). Two architectural variants of the generator are analyzed—decoder-only (RA-T) and encoder–decoder with kNN cross-attention—both demonstrating improved accuracy across various tasks while maintaining scalability to millions of documents. The aim of this study is to analyze the architectural characteristics of RAG systems enhanced with external memory modules, focusing on their applicability to eCommerce-scale tasks requiring sub-second response times and contextual relevance. The methodology is based on a review of recent scientific publications, enabling an in-depth exploration of the system-level design of RAG solutions leveraging memory augmentation. The insights from this analysis will be particularly relevant to AI practitioners and system architects working on scalable, high-performance retrieval systems for domains such as personalized retail, product search, and dynamic user engagement optimization. Moreover, the results are of interest to hardware-software co-design specialists and architects of scalable distributed platforms focused on integrating external memory modules in the context of cognitive and neural network applications.
Article Details
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.