 % vim:set spell

\section{Near-Memory-Computing}

% TODO spell checking

\begin{frame}[fragile]{Near-Memory-Computing}
\begin{figure}
\centering
\resizebox{\textwidth}{!}{
\input{figures/classictopology}
}
\caption{Classic CPU and memory topology}
\end{figure}

All data which does not fit in the cache has to be transferred from CPU to memory and vice versa.  Michael Panzlaff committed Jan 25, 2020 16 17 \end{frame}  Michael Panzlaff committed Jan 26, 2020 18 19 20 \begin{frame}[fragile]{Near-Memory-Computing} \begin{figure} \centering  Michael Panzlaff committed Jan 28, 2020 21  \frame{\includegraphics[width=200pt]{figures/memoryspeed.pdf}}  Michael Panzlaff committed Jan 26, 2020 22 23 24 25  \caption{Processor vs. memory performance \cite{carvalho2002gap}} \end{figure} Memory performance relative to processor speed worsened.  Michael Panzlaff committed Jan 25, 2020 26 27 \end{frame}  Michael Panzlaff committed Jan 26, 2020 28 29 30 31 32 33 34 35 36 37 38 \begin{frame}[fragile]{Near-Memory-Computing} \begin{figure} \centering \resizebox{\textwidth}{!}{ \input{figures/cramtopology} } \caption{In-Memory-Computing topology} \end{figure} Data transfer cost can be avoided by not transferring data in the first place.  Michael Panzlaff committed Jan 27, 2020 39 40  The CPU only performs control tasks and delegates calculations to CRAM\footnote{Computational RAM has already been proposed in 1992 \cite{elliott1992computational}} which is memory that performs calculations on the same chip.  Michael Panzlaff committed Jan 26, 2020 41 42 43 44 45 46  %reduce energy consumption %unnecessary transfers are avoided %bandwidth is no longer limited to memory interface \end{frame} \begin{frame}[fragile]{Near-Memory-Computing}  Michael Panzlaff committed Jan 27, 2020 47  problems with IMC:  Michael Panzlaff committed Jan 26, 2020 48 49 50 51 52 53 54  \begin{itemize}[<+->] \item based on new chip technologies \item CRAM requires special programming \item research is mostly based on simulations \end{itemize}  Michael Panzlaff committed Jan 27, 2020 55  \alert{\uncover<4->{$\,\to\,$ adaption of IMC difficult}}  Michael Panzlaff committed Jan 26, 2020 56 57 58 59 60 61 62 63 \end{frame} \begin{frame}[fragile]{Near-Memory-Computing} \begin{figure} \centering \resizebox{\textwidth}{!}{ \input{figures/nmctopology} }  Michael Panzlaff committed Jan 27, 2020 64  \caption{example of a Near-Memory-Computing topology}  Michael Panzlaff committed Jan 26, 2020 65 66 67 68 69  \end{figure} NMC can be used as compromise between IMC and a classic topology. A high speed accelerator, closely located to DRAM, is used for data intensive calculations.  Michael Panzlaff committed Jan 27, 2020 70  % TODO what's the difference to a graphics or other compute accelerator?  Michael Panzlaff committed Jan 26, 2020 71 72 73 \end{frame} \begin{frame}[fragile]{Near-Memory-Computing}  Michael Panzlaff committed Jan 27, 2020 74  Uncertainties:  Michael Panzlaff committed Jan 26, 2020 75 76  \begin{itemize}[<+->]  Michael Panzlaff committed Jan 27, 2020 77  \item Does NMC actually solve memory limitations?  Michael Panzlaff committed Jan 26, 2020 78  \item How does NMC integrate into software ecosystems?  Michael Panzlaff committed Jan 27, 2020 79 80  \item Can NMC compete with the performance of existing hardware? \item Is the step from simulation to hardware feasible?  Michael Panzlaff committed Jan 26, 2020 81 82  \end{itemize}  Michael Panzlaff committed Jan 27, 2020 83  \alert{\uncover<5->{No off-the-shelve NMC hardware available, custom hardware required!}}  Michael Panzlaff committed Jan 26, 2020 84 85 \end{frame}  Michael Panzlaff committed Jan 27, 2020 86 87  \begin{frame}[fragile]{Near-Memory-Computing}  Michael Panzlaff committed Jan 28, 2020 88  To solve the these problems with NMC, a dedicated platform is needed.  Michael Panzlaff committed Jan 27, 2020 89 90 91 92  We offer such a platform to ease development and evaluation of NMC-capable accelerators based on reconfigurable hardware. \end{frame}  Michael Panzlaff committed Jan 26, 2020 93 94 95 \section{NMC-platform} \begin{frame}[fragile]{NMC-platform}  Michael Panzlaff committed Jan 27, 2020 96  NMC-platform requirements:  Michael Panzlaff committed Jan 26, 2020 97   Michael Panzlaff committed Jan 27, 2020 98  \begin{itemize}[<+->]  Michael Panzlaff committed Jan 27, 2020 99 100  \item rapid prototyping \item possibility to easily evaluate different NMC-accelerators  Michael Panzlaff committed Jan 27, 2020 101 102  \item superior performance in memory intensive applications \item seamless integration into existing software stack  Michael Panzlaff committed Jan 27, 2020 103  \item basis for future real-world platforms  Michael Panzlaff committed Jan 27, 2020 104  \end{itemize}  Michael Panzlaff committed Jan 26, 2020 105 106 107 \end{frame} \begin{frame}[fragile]{NMC-platform}  Michael Panzlaff committed Jan 27, 2020 108  Manufacturing an ASICs for many different types of architectures is expensive.  Michael Panzlaff committed Jan 26, 2020 109   Michael Panzlaff committed Jan 27, 2020 110  FPGAs are reconfigurable and can be used to evaluate many different designs at lower cost.  Michael Panzlaff committed Jan 26, 2020 111   Michael Panzlaff committed Jan 27, 2020 112 113  Recent FPGAs like the Xilinx UltraScale+ series offer High Bandwidth Memory, ideal for data intensive applications!  Michael Panzlaff committed Jan 26, 2020 114 115 116 \end{frame} \begin{frame}[fragile]{NMC-platform}  Michael Panzlaff committed Jan 27, 2020 117 118  \textbf{Processor:} Multiple open RISC-V cores are available \cite{riscvcores}. The Rocket Chip is actively developed and maintained.  Michael Panzlaff committed Jan 26, 2020 119   Michael Panzlaff committed Jan 27, 2020 120  % emphasize system aspect, NMC not so much in focus  Michael Panzlaff committed Jan 27, 2020 121 122  \textbf{Operating System:} Working bare metal makes development more difficult. Linux officially supports RISC-V and can be modified freely.  Michael Panzlaff committed Jan 27, 2020 123 124 125  \textbf{FPGA:} The Virtex UltraScale+ VU37P (ADM-PCIE-9H7) is a powerful FPGA with HBM, the Zynq 7020 (ZedBoard) with DDR3 can be used for comparison to a lower tier part.  Michael Panzlaff committed Jan 26, 2020 126 127 \end{frame}  Michael Panzlaff committed Jan 25, 2020 128 129 \section{Architecture}  Michael Panzlaff committed Jan 26, 2020 130 \begin{frame}[fragile]{Architecture}  Michael Panzlaff committed Jan 27, 2020 131  The Rocket Chip generator is a SoC generator written in Chisel \cite{rocketchip}.  Michael Panzlaff committed Jan 27, 2020 132   Michael Panzlaff committed Jan 27, 2020 133  It generates Verilog code for an in-order RISC-V core which can be synthesized with FPGA toolchains.  Michael Panzlaff committed Jan 27, 2020 134 135  A large set of configurations can be used to adjust for the requirements.  Michael Panzlaff committed Jan 25, 2020 136 137 \end{frame}  Michael Panzlaff committed Jan 26, 2020 138 \begin{frame}[fragile]{Architecture}  Michael Panzlaff committed Jan 27, 2020 139 140 141 142 143 144  The processor still needs peripherals: \begin{itemize}[<+->] \item UART connection for system console \item GPIOs to debug the bootrom \item AXI memory (HBM or DDR3) \end{itemize}  Michael Panzlaff committed Jan 26, 2020 145 146 147 \end{frame} \begin{frame}[fragile]{Architecture}  Michael Panzlaff committed Jan 27, 2020 148 149 150 151 152  \begin{figure} \centering \includegraphics[width=\textwidth]{figures/schematic.pdf} \caption{overview of the platform components} \end{figure}  Michael Panzlaff committed Jan 25, 2020 153 154 \end{frame}  Michael Panzlaff committed Jan 26, 2020 155 \begin{frame}[fragile]{Architecture}  Michael Panzlaff committed Jan 27, 2020 156 157 158 159  Processor has to execute code when leaving reset: \begin{itemize}[<+->] \item Rocket core contains boot-ROM that is executed on power-up \item necessary hardware and the memory have to be initialized  Michael Panzlaff committed Jan 27, 2020 160  \item vmlinux ELF and bbl\uncover<3->{\footnote<3->{Berkeley Boot Loader, required to provide the Supervisor Binary Interface (SBI)}} have to be loaded to memory  Michael Panzlaff committed Jan 27, 2020 161  \end{itemize}  Michael Panzlaff committed Jan 25, 2020 162   Michael Panzlaff committed Jan 27, 2020 163 164 165  \uncover<4->{\textcolor{i4green}{Little hardware, so easy to initialize!}} \uncover<5->{\textcolor{i4red}{No persistent storage, where to load vmlinux from?}}  Michael Panzlaff committed Jan 27, 2020 166 167 168 169 170 \end{frame} \begin{frame}[fragile]{Architecture} \begin{figure} \centering  Michael Panzlaff committed Jan 27, 2020 171  \includegraphics[width=\textwidth]{figures/bootproc.pdf}  Michael Panzlaff committed Jan 27, 2020 172 173  \caption{boot procedure block diagram} \end{figure}  Michael Panzlaff committed Jan 25, 2020 174 175 \end{frame}  Michael Panzlaff committed Jan 27, 2020 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 \begin{frame}[fragile]{Architecture} Boot procedure summary: \begin{enumerate}[<+->] \item Rocket core powers up \item boot-ROM awaits UART connection from host PC \item rocketload sends commands and data to fill memory \item rocketload sends a boot command \item Rocket core jumps to start of memory \item bbl is executed and inits console \item bbl loads the vmlinux ELF and executes Linux \item Linux boots up and user programs can be executed \end{enumerate} \end{frame} \begin{frame}[fragile]{Architecture} How does Linux know about the hardware connected? \only<2>{ It uses a device tree from boot-ROM! \begin{center} \frame{ \resizebox{!}{80pt}{ \lstinputlisting[basicstyle=\ttfamily]{dtssample.txt} } } \end{center} } \end{frame} \begin{frame}[fragile]{Architecture}  Michael Panzlaff committed Jan 27, 2020 205  The platform does not have persistent storage. How can user programs be executed after the Kernel starts?  Michael Panzlaff committed Jan 27, 2020 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225  \begin{itemize}[<+->] \item initramfs as the first kind of file system \item image with all required programs is created on developer machine \item BusyBox provides a large set of coreutils in a single small statically linked binary \item no way to provide initramfs via bootloader, so embed it into Kernel \end{itemize} \end{frame} \begin{frame}[fragile]{Architecture} \begin{verbatim}\$ screen /dev/ttyUSB0 115200\end{verbatim} \begin{center} \frame{ \resizebox{\textwidth}{!}{ \lstinputlisting[basicstyle=\ttfamily,linewidth=400pt,breaklines=false]{riscvlinux.txt} } } \end{center} \end{frame}  Michael Panzlaff committed Jan 25, 2020 226 227 \section{Evaluation}  Michael Panzlaff committed Jan 27, 2020 228 229 230 231 232 233 \begin{frame}[fragile]{Evaluation} Features of the proposed NMC-platform: \begin{itemize}[<+->] \item support for regular Linux RISC-V programs \item hardware components can be interchanged easily \item design avoids unnecessarily complicated boot steps  Michael Panzlaff committed Jan 27, 2020 234  \item high-speed memory for data intensive computing  Michael Panzlaff committed Jan 27, 2020 235  \end{itemize}  Michael Panzlaff committed Jan 26, 2020 236 237 \end{frame}  Michael Panzlaff committed Jan 27, 2020 238 239 240 \begin{frame}[fragile]{Evaluation} Without presence of an NMC-accelerator the final performance is difficult to predict.  Michael Panzlaff committed Jan 27, 2020 241  However, some CPU-based figures about the memory can be helpful for making correct design decisions for future accelerators.  Michael Panzlaff committed Jan 25, 2020 242 243 \end{frame}  Michael Panzlaff committed Jan 27, 2020 244 245 246 247 248 \begin{frame}[fragile]{Evaluation (test-tlb)} \input{figures/testtlb} \end{frame} \begin{frame}[fragile]{Evaluation}  Michael Panzlaff committed Jan 26, 2020 249 250  \begin{table} \centering  Michael Panzlaff committed Jan 27, 2020 251  \captionsetup[table]{position=bottom}  Michael Panzlaff committed Jan 26, 2020 252 253 254 255 256 257 258 259 260 261  \resizebox{\textwidth}{!}{ \begin{tabular}{@{} lllll @{}} \toprule Board & LUTs & Block RAM & DSPs & Flip-Flops \\ \midrule ZedBoard & 31534 (59.2\%) & 12 (8.6\%) & 17 (7.7\%) & 16019 (15.1\%) \\ ADM-PCIE-9H7& 35664 (2.7\%) & 15 (0.8\%) & 17 (0.2\%) & 20971 (0.8\%) \\ \bottomrule \end{tabular} }  Michael Panzlaff committed Jan 27, 2020 262  \caption{Rocket core FPGA resource utilization}  Michael Panzlaff committed Jan 26, 2020 263 264  \label{table:fpgares} \end{table}  Michael Panzlaff committed Jan 27, 2020 265 266 267 268 269 270 \end{frame} \begin{frame}[fragile]{Evaluation} The boot procedure over UART is very basic --- unfortunately too basic. Booting over UART is very slow which makes Kernel and software debugging an exceptionally time consuming process.  Michael Panzlaff committed Jan 27, 2020 271 272  Boot image size has to stay as small as possible to keep boot times low.  Michael Panzlaff committed Jan 26, 2020 273 274 \end{frame}  Michael Panzlaff committed Jan 27, 2020 275 \section{Summary}  Michael Panzlaff committed Jan 25, 2020 276   Michael Panzlaff committed Jan 27, 2020 277 278 279 280 281 \begin{frame}[fragile]{Summary} Topics covered: \begin{itemize}[<+->] \item NMC as a method to increase performance for data intensive workloads \item components required to create a platform that can host NMC accelerators  Michael Panzlaff committed Jan 29, 2020 282  \item booting Linux on the Rocket core with bbl  Michael Panzlaff committed Jan 27, 2020 283  \item performance of the Rocket core  Michael Panzlaff committed Jan 27, 2020 284  \end{itemize}  Michael Panzlaff committed Jan 25, 2020 285 286 \end{frame}  Michael Panzlaff committed Jan 27, 2020 287 \begin{frame}[fragile]{Future Work}  Michael Panzlaff committed Jan 27, 2020 288  For more comfortable development, a faster boot mechanism is required which might include persistent storage.  Michael Panzlaff committed Jan 26, 2020 289   Michael Panzlaff committed Jan 27, 2020 290  NMC-accelerators have to be implemented and evaluated. Even simulating IMC capable chips is possible.  Michael Panzlaff committed Jan 26, 2020 291   Michael Panzlaff committed Jan 27, 2020 292 293  For NMC, the performance of accelerators is more relevant than the CPU performance. However, replacing the Rocket core with a faster alternative like BOOM \cite{Celio:EECS-2015-167} might still be preferable.  Michael Panzlaff committed Jan 25, 2020 294 295 \end{frame}  Michael Panzlaff committed Jan 27, 2020 296 297 \begin{frame}[standout] Questions?  Michael Panzlaff committed Jan 25, 2020 298 299 \end{frame}  Michael Panzlaff committed Jan 27, 2020 300 \appendix  Michael Panzlaff committed Jan 25, 2020 301   Michael Panzlaff committed Jan 27, 2020 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 \begin{frame}[fragile]{Linux RISC-V history} \begin{figure} \centering \begin{tikzpicture} \begin{axis}[ mbarplot, symbolic x coords={4.19,4.20,5.0,5.1,5.2,5.3,5.4}, xlabel={version}, ylabel={lines}, width=0.9\textwidth, height=6cm, xtick=data, yticklabel style={/pgf/number format/fixed,/pgf/number format/precision=5}, scaled y ticks=false, ymin=0, ] \addplot[fill=i4green] plot coordinates {(4.19, 12096) (4.20,878) (5.0, 724) (5.1, 2047) (5.2, 1049) (5.3, 724) (5.4, 576)}; \addplot[fill=i4red] plot coordinates {(4.19, 0) (4.20,342) (5.0, 262) (5.1, 247) (5.2, 1216) (5.3, 258) (5.4, 201)}; \legend{insertions, deletions} \end{axis} \end{tikzpicture} \caption{\texttt{git diff {-}{-}stat vX vY {-}{-} arch/riscv}} \end{figure} \end{frame}  Michael Panzlaff committed Jan 28, 2020 330 331 332 333 \begin{frame}[fragile]{LINPACK.C} \input{figures/linpackc} \end{frame}  Michael Panzlaff committed Jan 29, 2020 334 335 336 337 338 339 340 \begin{frame}[fragile]{Evaluation (STREAM)} \input{figures/streamc} Maximum burst transfer rate of 256 bit AXI HBM bus at 150 MHz should be 4.8 GB/s. The Rocket core does not even utilize 1\% of the bandwidth! \end{frame}  Michael Panzlaff committed Jan 27, 2020 341 342 343 344 345 346 347 348 349 350 %\begin{frame}[fragile]{Backup slides} % Sometimes, it is useful to add slides at the end of your presentation to % refer to during audience questions. % % The best way to do this is to include the \verb|appendixnumberbeamer| % package in your preamble and call \verb|\appendix| before your backup slides. % % \themename will automatically turn off slide numbering and progress bars for % slides in the appendix. %\end{frame}