OpenRISC SoC Practical Session Instructions

Overview

This session will demonstrate how to get an OpenRISC system up and running. We will compile a pre-configured OpenRISC-based SoC under the FuseSoC environment for the Altera DE0 Nano FPGA board, program the board, and connect to the design with a debugger to download a program before running it.

This guide will

introduce to OpenRISC and SoC development
indicate the required set up (source, tools)
show how to simulate the system
show how to build the SoC and program the SoC onto the FPGA board
detail connecting to the SoC via a debugger, downloading and executing code

FAQ

What is the OpenRISC project?

The OpenRISC project deals with architecture and implementation and tools development. Think of the architecture side as the instruction set manuals, the implementation side the model code (RTL, C, SystemC). The tools are things like the GNU compiler/debugger tool chain and chip debugger.

A SoC?

The synthesisable models we develop aren't much fun (or use) on their own. So we then integrate our synthesisable models into larger systems - a System on (a) Chip (SoC). When they are brought together with peripheral controllers (eg. I2C, SPI, Singlewire) and communications I/O (GPIO, UART, Ethernet, PCIE, etc.) and system infrastructure (memory, debug, interconnect) we then have a system which is capable of many things. Typically the "brains" of the system is the programmable CPU.

An OpenRISC SoC?

In the OpenRISC's case, these CPUs are relatively low-performance embedded-class processors. In an FPGA implementation we can run them up to 100MHz, and they execute up to a single instruction per cycle. However, in certain configurations they are capable of running full operating system kernels like Linux. They are more suited, however, to running embedded real-time operating systems (RTOS).

Brief Overview Of The OpenRISC Microprocessor Architecture

The OpenRISC 1000 architecture (OR1K or or1k) has a 32-bit instruction word and either 32 or 64-bit data. It it a reduced instruction set computer (RISC) meaning its instructions are relatively simple like:

add register 3 with register 6 and store in register 8

or

load the data at the memory address held in register 4 into register 5

In contrast, a more complex instruction set computer might be capable of doing much more in a single instruction word:

load the data at the memory address in register 2, increment it, compare with zero, and store back at the address held in register 3 while incrementing both registers 2 and 3

This should indicate something rather obvious, which is that the latter seems a lot harder to implement than the former. This means implementations of RISC computers require less logic, and the idea is that the complexity is offloaded onto the software compiler.

The OpenRISC project is lucky enough to have a good quality GNU tool chain port, and an LLVM port also exists. This allows us to compile C and C++ to OpenRISC 1000 machine code and execute it on our models. We also have software libraries in newlib (for baremetal) and uClibc (for Linux userspace, an EGLIBC port is in the works I believe), and the GNU debugger (GDB) which understands the OR1K architecture.

mor1kx core

The SoC's CPU core is the mor1kx. It is written in Verilog and provides a choice between 3 major variants, based on the pipeline architecture. They are the 3-stage pipeline espresso core, the 3-stage delay-slot-free pronto espresso core, and the 6-stage cappuccino core which can optionally have MMUs and caches, making it capable and powerful enough to run Linux.

Tutorial set up

There are several components which must be available for us to do this.

[See the tools install guide page for all of the details] (https://github.com/embecosm/chiphack/wiki/OpenRISC-tools-install)

Introduction to FuseSoc

FuseSoC is a SoC development tool which aims to overcome some of the challenges faced by open source RTL development in a new and interesting way. It does this a number of ways.

First it does it by by breaking the habit of a lot of IP development projects in the recent past which tended to centralise everything they needed into a single project, and instead it just relies on a description of where it can download the required portions of the design from, no matter where they reside (github, OpenCores SVN, Open Hardware Repository git, etc.). This, in a way, is a list of dependencies.

FuseSoc handles both IP cores and full SoC designs and can be configured to fully automate the simulation, verification and synthesis of them. It refers to IP blocks as cores and collections of them used to form a bigger design, a system.

It is currently a command-line driven tool (and I'm sure the developers would be open to the offer of someone to develop a GUI interface). The following tutorial will use FuseSoC to simulate a generic SoC based around the OpenRISC processor, and then build a design for the DE0 nano, download it to the board and we'll attach the software debugger to it and watch it step through code.

Getting ready to work with FuseSoC

You will require both the FuseSoC and orpsoc-cores source from github. If you haven't already, go and download these.

The default setup is to have the fusesoc and orpsoc-cores directories next to each other. If you have this setup then running from the fusesoc directory will work out of the box as it has a default fusesoc.conf which assumes this. Otherwise you will need to edit the fusesoc.conf file to point to where your orpsoc-cores directory by setting the cores_root and systems_root paths in that file.

You should be able to run the following command see it print out a list of IP cores it knows about thanks to the configuration files in orpsoc-cores:

cd ~/git/fusesoc

fusesoc list-cores

Simulating a generic OpenRISC system

There is a generic "system" based on the mor1kx processor containing a simple set up of the processor, some memory, a UART core and some Wishbone interconnect stitching it all together.

We can simulate this, but really we need some software to run on it. So let's write a little hello world.

Compile an OpenRISC helloworld

Write the following simple program into a file called hello.c

#include <stdio.h>

int main(void)
{
    printf("Hello world, from an OpenRISC system!\n");
    return 0;
}

Now compile it using the OpenRISC toolchain (which you should have already installed).

or1k-elf-gcc hello.c -o hello.elf

Run an OpenRISC helloworld on the mor1kx-generic system

fusesoc sim mor1kx-generic --elf-load hello.elf

You should then see the simulation load and issue a friendly greeting.

fusesoc sim mor1kx-generic --elf-load hello.elf
orpsoc_tb.dut.mor1kx0.bus_gen.ibus_bridge: Wishbone bus IF is B3_REGISTERED_FEEDBACK
orpsoc_tb.dut.mor1kx0.bus_gen.dbus_bridge: Wishbone bus IF is B3_REGISTERED_FEEDBACK
Program header 0: addr 0x00000000, size 0x00006BEC
Program header 1: addr 0x00008BEC, size 0x00000A00
elf-loader: /home/openrisc/work/hello.elf was loaded
Loading        9595 words
Hello world, from an OpenRISC system!
Closing RSP server

If you didn't then ensure you have the all of the tools installed. It helps to re-run after a failure with the --force flag.

Inspect the internals of mor1kx-generic system via a waveform dump

We can re-run the same simulation of the system but this time observe its internals on a waveform view afterwards. Do this by adding the --vcd flag.

fusesoc sim mor1kx-generic --elf-load hello.elf --vcd

The VCD (Value Change Dump) switch will cause the simulation to create the following file:

build/mor1kx-generic/sim-icarus/testlog.vcd

This file can be opened with a waveform viewer such as GTKWave, which you should have installed earlier.

gtkwave build/mor1kx-generic/sim-icarus/testlog.vcd

In the hierarchy browser window in the top left-hand corner of the window, expand orpsoc_tb and then dut (this is an acronym of design under test, a commonly used term for the thing we're testing). Highlight the mor1kx0 instance to get a list of its signals in the window below.

Select the signals beginning with iwbm... (click then shift+click) and click the Insert button below that pane. Do the same again for the dwbm... signals. This should make some signals appear in the waveform frame on the right. These signals are the instruction bus and data bus of the processor. They are separate buses, showing off the Harvard architecture of the processor, that is they have separate access buses for instruction and data.

Click the zoom out button (minus sign icon) to zoom out and show the first

GTKWave showing a trace of the mor1kx-generic system booting up

The above screenshot shows the first few accesses on the instruction bus, with the processor fetching the first number of instructions. Zooming out further will show the processors activity becomes less regular eventually. This is due to the processor executing from its cache and performing things like data accesses.

This is a fully synchronous system, all running from the same clock. All operations occur in lock step with this clock. Add it to the waveform to see each of the bus's signals changing on each rising edge.

The following image shows the change in instruction bus behaviour when the processor's cache is enabled and it begins to perform burst Wishbone accesses to fill its cache RAM.

GTKWave showing a trace of the mor1kx-generic enable its cache

You can inspect the source for the cores under build/mor1kx-generic/src/. FuseSoC will copy what it needs into a temporary directory to build the system.

The following shows the processor's arithmetic logic unit (ALU) doing some operations such as addition, [logical AND] (http://en.wikipedia.org/wiki/AND_gate) and logical OR on the 2 values a and b which are input from the processor's control logic. The result can be seen in the alu_result_o[31:0] signal.

GTKWave showing a trace of the mor1kx-generic crunching numbers

Building and running on DE0 Nano

The next section will focus on building and running the system on the DE0 Nano.

Source preparation

We will modify the source for the de0_nano system in orpsoc-cores to support the use of the Embecosm USB to UART adapter board.

From within the orpsoc-cores directory we downloaded during the set up process we will download a patch with the following command:

wget https://googledrive.com/host/0B8QBx9Nr1WjyeVlGNUlHT0tyTzQ/0001-de0_nano-Put-UART-in-place-suitable-for-Embecosm-UAR.patch

(or wget http://goo.gl/xw74Aa - note the patch you download will then be named xw74Aa)

We will now apply the patch with:

git am 0001-de0_nano-Put-UART-in-place-suitable-for-Embecosm-UAR.patch

The system's source will then be suitably configured to use the Embecosm USB to UART adapter board.

Synthesis

Here we will build the de0_nano system with the following command

fusesoc build de0_nano

[Ensure you have all of the required tools and they have been installed correctly.] (https://github.com/embecosm/chiphack/wiki/OpenRISC-tools-install#altera-quartus-tools)

Programming the board

In the synthesis directory, there is also a makefile recipe for programming the board:

fusesoc pgm de0_nano

One of several things could go wrong here.

One is that the JTAG daemon running needs to be killed and the Altera one run instead, to do this run:

killall jtagd

sudo /opt/altera/13.1/quartus/bin/jtagd

Another problem might be that the OpenOCD debugger is still using the JTAG/USB port - exiting OpenOCD will fix this
Another could be basic permissions on the USB device, and running sudo make pgm may fix things

Connecting the debug proxy

From within the OpenOCD directory (presumably $HOME/openrisc/git/openOCD) run:

sudo ./build/src/openocd -f ./tcl/interface/altera-usb-blaster.cfg -f altera-dev.tcl

You should then see something like:

Info : JTAG tap: or1k.cpu tap/device found: 0x020f30dd (mfg: 0x06e, part: 0x20f3, ver: 0x0)

(ignore any warnings or errors about the JTAG tap...)

target state: halted

Chip is or1k.cpu, Endian: big, type: or1k

Once the proxy is connected we can then connect to it with GDB.

Note that when the proxy connects it will stall the processor.

Prepare some software to download

We will write a hello world to run which will talk to our terminal console over the UART.

Note: This example uses a Embecosm USB to UART adapter board, as per [the instructions in the introduction to Verilog section] (https://github.com/embecosm/chiphack/wiki/Introduction-to-Verilog#attaching-uart-to-the-pc). You must have also applied the patch which adds support for this UART on the DE0 Nano build in orpsoc-cores before building.

Take [the C code we used earlier] (https://github.com/embecosm/chiphack/wiki/OpenRISC-SoC-Practical-Session-Instructions#compile-an-openrisc-helloworld) and now recompile it to run on the DE0 Nano by passing a flag to identify the board it should run on:

or1k-elf-gcc hello.c -o hello_de0_nano.elf -mboard=de0_nano

Open a terminal

We will need to open a terminal to see the output from the board.

Do so the same way we did during the UART exercise

Connecting the debugger

In a new terminal run the OR1K port of the GNU debugger (GDB) and specify the ELF executable we want to run:

or1k-elf-gdb hello_de0_nano.elf

Now connect to the port the proxy is running on:

(gdb) target remote :50001

You should see something like the following:

Remote debugging using :50001

0x00000700 in ?? ()

You can now access the system memory and registers.

Read memory with x <addr> eg:

(gdb) x 0x0

0x0: 0x00000000

See this table of GDB commands for further information.

Run the program by first downloading it and launching the processor:

(gdb) load
Loading section .vectors, size 0x2000 lma 0x0
Loading section .init, size 0x28 lma 0x2000
Loading section .text, size 0xf8f0 lma 0x2028
Loading section .fini, size 0x1c lma 0x11918
Loading section .rodata, size 0x680 lma 0x11934
Loading section .eh_frame, size 0x4 lma 0x12000
Loading section .ctors, size 0x8 lma 0x12004
Loading section .dtors, size 0x8 lma 0x1200c
Loading section .jcr, size 0x4 lma 0x12014
Loading section .data, size 0xa38 lma 0x12018
Start address 0x100, load size 76292
Transfer rate: 337 KB/sec, 5868 bytes/write.
(gdb) c
Continuing.

Boot Linux

Download a precompiled Linux kernel + BusyBox image:

wget https://www.dropbox.com/s/bi5vx8kmqnjdldx/vmlinux-de0_nano

Load it in gdb as before, but this time we need to set the NPC to the reset address before running:

(gdb) file vmlinux-de0_nano 
Load new symbol table from "vmlinux-de0_nano"? (y or n) y
Reading symbols from vmlinux-de0_nano...done.
(gdb) load
Loading section .text, size 0x1a3468 lma 0x0
Loading section .rodata, size 0x25090 lma 0x1a4000
Loading section .eh_frame, size 0x3473c lma 0x1c9090
Loading section __param, size 0x220 lma 0x1fd7cc
Loading section .data, size 0x1a9e0 lma 0x1fe000
Loading section __ex_table, size 0x9a8 lma 0x2189e0
Loading section .notes, size 0x24 lma 0x219388
Loading section .head.text, size 0x4000 lma 0x21a000
Loading section .init.text, size 0x11158 lma 0x21e000
Loading section .init.data, size 0x218dac lma 0x22f160
Start address 0xc0000000, load size 4481284
Transfer rate: 359 KB/sec, 15834 bytes/write.
(gdb) spr npc 0x100
SYS.NPC (SPR0_16) set to 256 (0x100), was: 3221225472 (0xc0000000)
(gdb) c
Continuing.

You should then see the console present you with a Linux console prompt.

Get involved with OpenRISC hacking

You can get in touch with the OpenRISC developers and find out more about how this works, or just tell them you think this stuff is plain cool by getting on the #openrisc IRC room on irc.freenode.net or posting to the mailing lists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly