Scratch-pad memory (SPM) has been widely used in embedded systems because it allows software-controlled data placement. By designing data placement strategies, optimal solutions with minimal memory access latency for loops on SPM-DRAM architecture can be explored. The existing data placement algorithms fail in solving the case of inconsecutive array access and fine-grained strategy can lead to excessive memory activation overhead.
To solve the problems, a research team led by Edwin Hsing-Mean Sha published their
new research on 15 May 2025 in
Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed a fine-grained and a multi-granularity data placement algorithm to tackle unsolved case and minimize latency. The algorithms are tested on a dedicated simulator with ten benchmarks. Compared with the existing research results, the proposed algorithms reduce data transfer and access latency effectively.
To deal with inconsecutive array access, which is not considered in existing works, they first propose a fine-grained dynamic programming algorithm, called FiDP, to solve the data placement problem with the optimal solution. The design of FiDP faces two key challenges: characterizing the structure of an optimal solution and formulating the state transition equation considering the space and location constraints. By overcoming these challenges, FiDP can obtain the optimal solution and can be better applied to more complex cases.
However, in FiDP, every move operation can only transfer one array element or scalar variable at a time, which leads to excessive memory activation time. To tackle this issue, they add a medium-grained block access scheme, which transfers a block of data with lower latency than accessing the data individually. Taking multiple loop iterations into consideration, they design an integer linear programming (ILP)-based algorithm with multi-granularity choices, called MuILP. To compensate for the high complexity of MuILP, they further propose a heuristic algorithm, called HMuDP. For a near-optimal latency, the effect of each placement decision on the following iterations must be fully taken into consideration.
Future work can focus on extending the algorithms to more complicated situations, such as nested loop, random array access, and multi-core architecture.
DOI:
10.1007/s11704-023-3566-y