main.tex 12.2 KB
Newer Older
Michael Panzlaff's avatar
Michael Panzlaff committed
1
% vim:set spell
Michael Panzlaff's avatar
Michael Panzlaff committed
2
3
\section{Near-Memory-Computing}

Michael Panzlaff's avatar
Michael Panzlaff committed
4
5
% TODO spell checking

Michael Panzlaff's avatar
Michael Panzlaff committed
6
7
8
9
10
11
12
13
14
\begin{frame}[fragile]{Near-Memory-Computing}
    \begin{figure}
        \centering
        \resizebox{\textwidth}{!}{
            \input{figures/classictopology}
        }
        \caption{Classic CPU and memory topology}
    \end{figure}

Michael Panzlaff's avatar
Michael Panzlaff committed
15
    All data which does not fit in the cache has to be transferred from CPU to memory and vice versa.
Michael Panzlaff's avatar
Michael Panzlaff committed
16
17
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
18
19
20
\begin{frame}[fragile]{Near-Memory-Computing}
    \begin{figure}
        \centering
21
        \frame{\includegraphics[width=200pt]{figures/memoryspeed.pdf}}
Michael Panzlaff's avatar
Michael Panzlaff committed
22
23
24
25
        \caption{Processor vs. memory performance \cite{carvalho2002gap}}
    \end{figure}

    Memory performance relative to processor speed worsened.
Michael Panzlaff's avatar
Michael Panzlaff committed
26
27
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
28
29
30
31
32
33
34
35
36
37
38
\begin{frame}[fragile]{Near-Memory-Computing}
    \begin{figure}
        \centering
        \resizebox{\textwidth}{!}{
            \input{figures/cramtopology}
        }
        \caption{In-Memory-Computing topology}
    \end{figure}

    Data transfer cost can be avoided by not transferring data in the first place.

Michael Panzlaff's avatar
Michael Panzlaff committed
39
40
    The CPU only performs control tasks and delegates calculations to CRAM\footnote{Computational RAM has already been proposed in 1992 \cite{elliott1992computational}}
    which is memory that performs calculations on the same chip.
Michael Panzlaff's avatar
Michael Panzlaff committed
41
42
43
44
45
46
    %reduce energy consumption
    %unnecessary transfers are avoided
    %bandwidth is no longer limited to memory interface
\end{frame}

\begin{frame}[fragile]{Near-Memory-Computing}
47
    problems with IMC:
Michael Panzlaff's avatar
Michael Panzlaff committed
48
49
50
51
52
53
54

    \begin{itemize}[<+->]
        \item based on new chip technologies
        \item CRAM requires special programming
        \item research is mostly based on simulations
    \end{itemize}

Michael Panzlaff's avatar
Michael Panzlaff committed
55
    \alert{\uncover<4->{$\,\to\,$ adaption of IMC difficult}}
Michael Panzlaff's avatar
Michael Panzlaff committed
56
57
58
59
60
61
62
63
\end{frame}

\begin{frame}[fragile]{Near-Memory-Computing}
    \begin{figure}
        \centering
        \resizebox{\textwidth}{!}{
            \input{figures/nmctopology}
        }
Michael Panzlaff's avatar
Michael Panzlaff committed
64
        \caption{example of a Near-Memory-Computing topology}
Michael Panzlaff's avatar
Michael Panzlaff committed
65
66
67
68
69
    \end{figure}

    NMC can be used as compromise between IMC and a classic topology.

    A high speed accelerator, closely located to DRAM, is used for data intensive calculations.
Michael Panzlaff's avatar
Michael Panzlaff committed
70
    % TODO what's the difference to a graphics or other compute accelerator?
Michael Panzlaff's avatar
Michael Panzlaff committed
71
72
73
\end{frame}

\begin{frame}[fragile]{Near-Memory-Computing}
74
    Uncertainties:
Michael Panzlaff's avatar
Michael Panzlaff committed
75
76

    \begin{itemize}[<+->]
Michael Panzlaff's avatar
Michael Panzlaff committed
77
        \item Does NMC actually solve memory limitations?
Michael Panzlaff's avatar
Michael Panzlaff committed
78
        \item How does NMC integrate into software ecosystems?
Michael Panzlaff's avatar
Michael Panzlaff committed
79
80
        \item Can NMC compete with the performance of existing hardware?
        \item Is the step from simulation to hardware feasible?
Michael Panzlaff's avatar
Michael Panzlaff committed
81
82
    \end{itemize}

Michael Panzlaff's avatar
Michael Panzlaff committed
83
    \alert{\uncover<5->{No off-the-shelve NMC hardware available, custom hardware required!}}
Michael Panzlaff's avatar
Michael Panzlaff committed
84
85
\end{frame}

86
87

\begin{frame}[fragile]{Near-Memory-Computing}
88
    To solve the these problems with NMC, a dedicated platform is needed.
89
90
91
92

    We offer such a platform to ease development and evaluation of NMC-capable accelerators based on reconfigurable hardware.
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
93
94
95
\section{NMC-platform}

\begin{frame}[fragile]{NMC-platform}
Michael Panzlaff's avatar
Michael Panzlaff committed
96
    NMC-platform requirements:
Michael Panzlaff's avatar
Michael Panzlaff committed
97

Michael Panzlaff's avatar
Michael Panzlaff committed
98
    \begin{itemize}[<+->]
Michael Panzlaff's avatar
Michael Panzlaff committed
99
100
        \item rapid prototyping
        \item possibility to easily evaluate different NMC-accelerators
Michael Panzlaff's avatar
Michael Panzlaff committed
101
102
        \item superior performance in memory intensive applications
        \item seamless integration into existing software stack
103
        \item basis for future real-world platforms
Michael Panzlaff's avatar
Michael Panzlaff committed
104
    \end{itemize}
Michael Panzlaff's avatar
Michael Panzlaff committed
105
106
107
\end{frame}

\begin{frame}[fragile]{NMC-platform}
Michael Panzlaff's avatar
Michael Panzlaff committed
108
    Manufacturing an ASICs for many different types of architectures is expensive.
Michael Panzlaff's avatar
Michael Panzlaff committed
109

Michael Panzlaff's avatar
Michael Panzlaff committed
110
    FPGAs are reconfigurable and can be used to evaluate many different designs at lower cost.
Michael Panzlaff's avatar
Michael Panzlaff committed
111

Michael Panzlaff's avatar
Michael Panzlaff committed
112
113
    Recent FPGAs like the Xilinx UltraScale+ series offer High Bandwidth Memory,
    ideal for data intensive applications!
Michael Panzlaff's avatar
Michael Panzlaff committed
114
115
116
\end{frame}

\begin{frame}[fragile]{NMC-platform}
Michael Panzlaff's avatar
Michael Panzlaff committed
117
118
    \textbf{Processor:} Multiple open RISC-V cores are available \cite{riscvcores}.
    The Rocket Chip is actively developed and maintained.
Michael Panzlaff's avatar
Michael Panzlaff committed
119

120
    % emphasize system aspect, NMC not so much in focus
Michael Panzlaff's avatar
Michael Panzlaff committed
121
122
    \textbf{Operating System:} Working bare metal makes development more difficult.
    Linux officially supports RISC-V and can be modified freely.
Michael Panzlaff's avatar
Michael Panzlaff committed
123
124
125

    \textbf{FPGA:} The Virtex UltraScale+ VU37P (ADM-PCIE-9H7) is a powerful FPGA with HBM,
    the Zynq 7020 (ZedBoard) with DDR3 can be used for comparison to a lower tier part.
Michael Panzlaff's avatar
Michael Panzlaff committed
126
127
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
128
129
\section{Architecture}

Michael Panzlaff's avatar
Michael Panzlaff committed
130
\begin{frame}[fragile]{Architecture}
Michael Panzlaff's avatar
Michael Panzlaff committed
131
    The Rocket Chip generator is a SoC generator written in Chisel \cite{rocketchip}.
Michael Panzlaff's avatar
Michael Panzlaff committed
132

Michael Panzlaff's avatar
Michael Panzlaff committed
133
    It generates Verilog code for an in-order RISC-V core which can be synthesized with FPGA toolchains.
Michael Panzlaff's avatar
Michael Panzlaff committed
134
135

    A large set of configurations can be used to adjust for the requirements.
Michael Panzlaff's avatar
Michael Panzlaff committed
136
137
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
138
\begin{frame}[fragile]{Architecture}
Michael Panzlaff's avatar
Michael Panzlaff committed
139
140
141
142
143
144
    The processor still needs peripherals:
    \begin{itemize}[<+->]
        \item UART connection for system console
        \item GPIOs to debug the bootrom
        \item AXI memory (HBM or DDR3)
    \end{itemize}
Michael Panzlaff's avatar
Michael Panzlaff committed
145
146
147
\end{frame}

\begin{frame}[fragile]{Architecture}
Michael Panzlaff's avatar
Michael Panzlaff committed
148
149
150
151
152
    \begin{figure}
        \centering
        \includegraphics[width=\textwidth]{figures/schematic.pdf}
        \caption{overview of the platform components}
    \end{figure}
Michael Panzlaff's avatar
Michael Panzlaff committed
153
154
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
155
\begin{frame}[fragile]{Architecture}
Michael Panzlaff's avatar
Michael Panzlaff committed
156
157
158
159
    Processor has to execute code when leaving reset:
    \begin{itemize}[<+->]
        \item Rocket core contains boot-ROM that is executed on power-up
        \item necessary hardware and the memory have to be initialized
160
        \item vmlinux ELF and bbl\uncover<3->{\footnote<3->{Berkeley Boot Loader, required to provide the Supervisor Binary Interface (SBI)}} have to be loaded to memory
Michael Panzlaff's avatar
Michael Panzlaff committed
161
    \end{itemize}
Michael Panzlaff's avatar
Michael Panzlaff committed
162

Michael Panzlaff's avatar
Michael Panzlaff committed
163
164
165
    \uncover<4->{\textcolor{i4green}{Little hardware, so easy to initialize!}}

    \uncover<5->{\textcolor{i4red}{No persistent storage, where to load vmlinux from?}}
Michael Panzlaff's avatar
Michael Panzlaff committed
166
167
168
169
170
\end{frame}

\begin{frame}[fragile]{Architecture}
    \begin{figure}
        \centering
Michael Panzlaff's avatar
Michael Panzlaff committed
171
        \includegraphics[width=\textwidth]{figures/bootproc.pdf}
Michael Panzlaff's avatar
Michael Panzlaff committed
172
173
        \caption{boot procedure block diagram}
    \end{figure}
Michael Panzlaff's avatar
Michael Panzlaff committed
174
175
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
\begin{frame}[fragile]{Architecture}
    Boot procedure summary:
    \begin{enumerate}[<+->]
        \item Rocket core powers up
        \item boot-ROM awaits UART connection from host PC
        \item rocketload sends commands and data to fill memory
        \item rocketload sends a boot command
        \item Rocket core jumps to start of memory
        \item bbl is executed and inits console
        \item bbl loads the vmlinux ELF and executes Linux
        \item Linux boots up and user programs can be executed
    \end{enumerate}
\end{frame}

\begin{frame}[fragile]{Architecture}
    How does Linux know about the hardware connected?
    \only<2>{
        It uses a device tree from boot-ROM!
        \begin{center}
            \frame{
                \resizebox{!}{80pt}{
                    \lstinputlisting[basicstyle=\ttfamily]{dtssample.txt}
                }
            }
        \end{center}
    }
\end{frame}

\begin{frame}[fragile]{Architecture}
205
    The platform does not have persistent storage. How can user programs be executed after the Kernel starts?
Michael Panzlaff's avatar
Michael Panzlaff committed
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225

    \begin{itemize}[<+->]
        \item initramfs as the first kind of file system
        \item image with all required programs is created on developer machine
        \item BusyBox provides a large set of coreutils in a single small statically linked binary
        \item no way to provide initramfs via bootloader, so embed it into Kernel
    \end{itemize}
\end{frame}

\begin{frame}[fragile]{Architecture}
    \begin{verbatim}$ screen /dev/ttyUSB0 115200\end{verbatim}
    \begin{center}
        \frame{
            \resizebox{\textwidth}{!}{
                \lstinputlisting[basicstyle=\ttfamily,linewidth=400pt,breaklines=false]{riscvlinux.txt}
            }
        }
    \end{center}
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
226
227
\section{Evaluation}

Michael Panzlaff's avatar
Michael Panzlaff committed
228
229
230
231
232
233
\begin{frame}[fragile]{Evaluation}
    Features of the proposed NMC-platform:
    \begin{itemize}[<+->]
        \item support for regular Linux RISC-V programs
        \item hardware components can be interchanged easily
        \item design avoids unnecessarily complicated boot steps
234
        \item high-speed memory for data intensive computing
Michael Panzlaff's avatar
Michael Panzlaff committed
235
    \end{itemize}
Michael Panzlaff's avatar
Michael Panzlaff committed
236
237
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
238
239
240
\begin{frame}[fragile]{Evaluation}
    Without presence of an NMC-accelerator the final performance is difficult to predict.

Michael Panzlaff's avatar
Michael Panzlaff committed
241
    However, some CPU-based figures about the memory can be helpful for making correct design decisions for future accelerators.
Michael Panzlaff's avatar
Michael Panzlaff committed
242
243
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
244
245
246
247
248
\begin{frame}[fragile]{Evaluation (test-tlb)}
    \input{figures/testtlb}
\end{frame}

\begin{frame}[fragile]{Evaluation}
Michael Panzlaff's avatar
Michael Panzlaff committed
249
250
    \begin{table}
        \centering
Michael Panzlaff's avatar
Michael Panzlaff committed
251
        \captionsetup[table]{position=bottom}
Michael Panzlaff's avatar
Michael Panzlaff committed
252
253
254
255
256
257
258
259
260
261
        \resizebox{\textwidth}{!}{
            \begin{tabular}{@{} lllll @{}}
                \toprule
                Board       &   LUTs            &   Block RAM   &   DSPs        &   Flip-Flops      \\
                \midrule
                ZedBoard    &   31534 (59.2\%)  &   12 (8.6\%)  &   17 (7.7\%)  &   16019 (15.1\%)  \\
                ADM-PCIE-9H7&   35664 (2.7\%)   &   15 (0.8\%)  &   17 (0.2\%)  &   20971 (0.8\%)   \\
                \bottomrule
            \end{tabular}
        }
Michael Panzlaff's avatar
Michael Panzlaff committed
262
        \caption{Rocket core FPGA resource utilization}
Michael Panzlaff's avatar
Michael Panzlaff committed
263
264
        \label{table:fpgares}
    \end{table}
Michael Panzlaff's avatar
Michael Panzlaff committed
265
266
267
268
269
270
\end{frame}

\begin{frame}[fragile]{Evaluation}
    The boot procedure over UART is very basic --- unfortunately too basic.

    Booting over UART is very slow which makes Kernel and software debugging an exceptionally time consuming process.
Michael Panzlaff's avatar
Michael Panzlaff committed
271
272

    Boot image size has to stay as small as possible to keep boot times low.
Michael Panzlaff's avatar
Michael Panzlaff committed
273
274
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
275
\section{Summary}
Michael Panzlaff's avatar
Michael Panzlaff committed
276

Michael Panzlaff's avatar
Michael Panzlaff committed
277
278
279
280
281
\begin{frame}[fragile]{Summary}
    Topics covered:
    \begin{itemize}[<+->]
        \item NMC as a method to increase performance for data intensive workloads
        \item components required to create a platform that can host NMC accelerators
Michael Panzlaff's avatar
Michael Panzlaff committed
282
        \item booting Linux on the Rocket core with bbl
Michael Panzlaff's avatar
Michael Panzlaff committed
283
        \item performance of the Rocket core
Michael Panzlaff's avatar
Michael Panzlaff committed
284
    \end{itemize}
Michael Panzlaff's avatar
Michael Panzlaff committed
285
286
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
287
\begin{frame}[fragile]{Future Work}
Michael Panzlaff's avatar
Michael Panzlaff committed
288
    For more comfortable development, a faster boot mechanism is required which might include persistent storage.
Michael Panzlaff's avatar
Michael Panzlaff committed
289

Michael Panzlaff's avatar
Michael Panzlaff committed
290
    NMC-accelerators have to be implemented and evaluated. Even simulating IMC capable chips is possible.
Michael Panzlaff's avatar
Michael Panzlaff committed
291

Michael Panzlaff's avatar
Michael Panzlaff committed
292
293
    For NMC, the performance of accelerators is more relevant than the CPU performance.
    However, replacing the Rocket core with a faster alternative like BOOM \cite{Celio:EECS-2015-167} might still be preferable.
Michael Panzlaff's avatar
Michael Panzlaff committed
294
295
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
296
297
\begin{frame}[standout]
    Questions?
Michael Panzlaff's avatar
Michael Panzlaff committed
298
299
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
300
\appendix
Michael Panzlaff's avatar
Michael Panzlaff committed
301

302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
\begin{frame}[fragile]{Linux RISC-V history}
    \begin{figure}
        \centering
        \begin{tikzpicture}
            \begin{axis}[
                    mbarplot,
                    symbolic x coords={4.19,4.20,5.0,5.1,5.2,5.3,5.4},
                    xlabel={version},
                    ylabel={lines},
                    width=0.9\textwidth,
                    height=6cm,
                    xtick=data,
                    yticklabel style={/pgf/number format/fixed,/pgf/number format/precision=5},
                    scaled y ticks=false,
                    ymin=0,
                ]

                \addplot[fill=i4green] plot coordinates {(4.19, 12096) (4.20,878) (5.0, 724) (5.1, 2047) (5.2, 1049) (5.3, 724) (5.4, 576)};
                \addplot[fill=i4red]   plot coordinates {(4.19, 0)     (4.20,342) (5.0, 262)  (5.1, 247)  (5.2, 1216) (5.3, 258) (5.4, 201)};

                \legend{insertions, deletions}

            \end{axis}
        \end{tikzpicture}
        \caption{\texttt{git diff {-}{-}stat vX vY {-}{-} arch/riscv}}
    \end{figure}
\end{frame}

330
331
332
333
\begin{frame}[fragile]{LINPACK.C}
    \input{figures/linpackc}
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
334
335
336
337
338
339
340
\begin{frame}[fragile]{Evaluation (STREAM)}
    \input{figures/streamc}

    Maximum burst transfer rate of 256 bit AXI HBM bus at 150 MHz should be 4.8 GB/s.
    The Rocket core does not even utilize 1\% of the bandwidth!
\end{frame}

Michael Panzlaff's avatar
Michael Panzlaff committed
341
342
343
344
345
346
347
348
349
350
%\begin{frame}[fragile]{Backup slides}
%    Sometimes, it is useful to add slides at the end of your presentation to
%    refer to during audience questions.
%
%    The best way to do this is to include the \verb|appendixnumberbeamer|
%    package in your preamble and call \verb|\appendix| before your backup slides.
%
%    \themename will automatically turn off slide numbering and progress bars for
%    slides in the appendix.
%\end{frame}