Profiling an Architectural Simulator

Overview

gem5 is a state-of-the-art software-based architectural simulator with wide spread use both in academia and industry. We set out to profile the performance of gem5 on different platforms and evaluate its performance. Our observation show that gem5 is 1.7x~3.02x faster on a MacBook Pro w/ M1 vs. Dell server w/ Intel Xeon Gold. Hence, we use FireSim to validate our hypothesis that gem5 is largely impacted by its cache sizes. Insightful stats like cache misses, branch mispredictions, cpu utilization etc. are collected by reading performance counters on these platforms. In this documentation, we describe the steps for running gem5 as a workload on FireSim.

Details

Running gem5 on FireSim

The main idea is to execute gem5 as a workload on FireSim to validate our hypothesis that gem5 is largely imparted by the size of the l1 cache. To do this, the user must prepare the gem5 workload (Sieve of Eratosthenes), the FireSim workload, which in this case is the gem5 simulator, and finally, launch the FireSim simulation. Below we give the general steps required to achieve this:

Steps to run gem5 on FireSim

Set up the AWS FireSim environment
Build the gem5 binary for RISC-V ISA
Prepare gem5 workload and transfer it to the instance
Create FireSim workload using FireMarshal
Build the target design and modify its parameter

Set up the AWS FireSim environment

We use a Z1d.2xlarge FireSim manager instance. Check out the FireSim documentation for more details. https://docs.fires.im/en/stable/Initial-Setup/index.html

mosh --ssh"=ssh -i firesim.pem" username@ip_addr #username is centos, ip_addr is dynamically assigned to the manager instance upon initialization

Build the gem5 binary for RISC-V ISA

Use QEMU to emulate a RISC-V architecture for building the gem5 binary and installing dependencies.
Test the compiled binary binary on We use a SiFive HiFive Unleashed developmental board, which natively runs Ubuntu.

Prepare gem5 workload and transfer it to the instance

In this step, you should compile your binary (we used Sieve of Erastosthenes) for the gem5 target ISA.
Next, transfer your compiled binary to the AWS EC2 F1 instance. We used sftp like this:

sudo sftp -i firesim.pem "username@ip_addr"

put <filename> #this apples to any file

Create FireSim workload using FireMarshal

FireSim requires a .json input file format to define workloads (e.g. gem5) that will run on the target design. FireMarshal is used to manage this process. Check out the FireMarshal documentation for more details. https://firemarshal.readthedocs.io/en/latest/index.html.
This produces the following .json file in the /home/centos/firesim/deploy/workload directory, which defines the gem5 workload, as well as its output

"benchmark_name": "gem5-workload",
"common_simulation_outputs": [ "uartlog"],
"workloads":
[
    {
"name": "gem5-workload-gem5",
"bootbinary": "../../../target-design/chipyard/software/firemarshal/images/gem5-workload-gem5-bin",
"rootfs": "../../../target-design/chipyard/software/firemarshal/images/gem5-workload-gem5.img",
"outputs": [ "/root/sim-environment/m5out" ]
    }
]

Build our target design and Modify parameters

To build your target design on FireSim, you can utilize any of the Chipyard’s included RTL generators (e.g. Rocket Chip).

We use a quad-core Rocket Chip with an 16KB 2-way set associative icache & dcache, and a 512KB l2 cache base config.
To change the base system configuration, we had to specify new design parameters in TargetConfigs.scala file in the following path.

/home/centos/firesim/target-design/chipyard/generators/firechip/src/main/scala/TargetConfigs.Scala

An example of creating a target design with 64KB L1I and L1D Caches

We specify a quad-core rocket chip with a 64KB L1 icache and dcache in the TargetConfigs.scala file. Precedence of the parameters defined before goess from bottom up. Note that: The default block size is 64Bytes.

class FireSimGem5ConfigQuadRocketConfig extends Config(
new freechips.rocketchip.subsystem.WithL1ICacheWays(16) ++  // change rocket I$
new freechips.rocketchip.subsystem.WithL1ICacheSets(64) ++  // change rocket I$
new freechips.rocketchip.subsystem.WithL1DCacheWays(16) ++  // change rocket D$
new freechips.rocketchip.subsystem.WithL1DCacheSets(64) ++  // change rocket D$
new WithDefaultFireSimBridges ++
new WithDefaultMemModel ++
new WithFireSimConfigTweaks ++
new chipyard.QuadRocketConfig)

Modify config_build_recipe.yaml, config_build.yaml, & config_runtime.yaml files by adding the following lines.

config_build_receipes.yaml

Modifying config_build_recipe.yaml
firesim_rocket_quadcore_gem5_config: # This can be any name specified by the user
DESIGN: FireSim
TARGET_CONFIG: DDR3FRFCFSLLC4MB_WithDefaultFireSimBridges_WithFireSimTestChipConfigTweaks_FireSimGem5ConfigQuadRocketConfig
PLATFORM_CONFIG: WithAutoILA_F140MHz_BaseF1Config
deploy_triplet: null
post_build_hook: null
metasim_customruntimeconfig: null
bit_builder_recipe: bit-builder-recipes/f1.yaml

config_build.yaml

builds_to_run:
    - firesim_rocket_quadcore_gem5_config  # This name must match the name specified in config_build_recipes.yaml

config_runtime.yaml

run_farm:
    # run farm hosts to spawn: a mapping from a spec below (which is an EC2
    # instance type) to the number of instances of the given type that you
    # want in your runfarm.
    run_farm_hosts_to_use:
    - f1.16xlarge: 0
    - f1.4xlarge: 0
    - f1.2xlarge: 1 # we want to use f1.2xlarge as the runfarm instance
    - m4.16xlarge: 0
    - z1d.3xlarge: 0
    - z1d.6xlarge: 0
    - z1d.12xlarge: 0

target_config:
    topology: no_net_config
    no_net_num_nodes: 1
    link_latency: 6405
    switching_latency: 10
    net_bandwidth: 200
    profile_interval: -1

    # This references a section from config_hwdb.yaml for fpga-accelerated simulation
    # or from config_build_recipes.yaml for metasimulation
    # In homogeneous configurations, use this to set the hardware config deployed
    # for all simulators
    default_hw_config: firesim_rocket_quadcore_gem5_config

workload:
    workload_name: gem5-workload.json

Next, we use golden gate compiler to generate the verilog code from the Chisel-generated RTL code for the AWS AGFI build process.

To move to the golden gate compiler directory, run:

cd /home/centos/firesim/sim/

Run make

make DESIGN=FireSim TARGET_CONFIG=DDR3FRFCFSLLC4MB_WithDefaultFireSimBridges_WithFireSimTestChipConfigTweaks _FireSimGem5ConfigQuadRocketConfig PLATFORM_CONFIG=WithAutoILA_F140MHz_BaseF1Config f1

Build the AWS FPGA Image by executing:

firesim buildbitstream

After a successfull build, update config_hwdb.yaml with the AGFI info.

firesim_rocket_quadcore_gem5_config: # Add your AGFI info to config_hwdb.yaml, so they can be deployed during simulation
    agfi: agfi-06e876ba9378cc9ff
    deploy_triplet_override: null
    custom_runtime_config: null

Then, launch runfarm instance, setup the simulation infrastructure, and run your firesim simulation.

firesim launchrunfarm; firesim infrasetup; firesim runworkload

Finally, results can be collected from the following directory.

cd /home/centos/firesim/results-workload/

Publications

Johnson Umeike, Neel Patel, Alex Manley, Amin Mamandipoor, Heechul Yun, Mohammad Alian, “Profiling gem5 Simulator,” ISPASS 2023 [paper] [slides]

FireSim and Chipyard User and Developer Workshop at ASPLOS 2023 [website]
Title: Profiling an Architectural Simulator (Using Firesim to Profile gem5) [presentation]

Personnel

Johnson Umeike (Lead Author Student)
Neel Patel (Co-Author Student)
Alex Manley (Co-Author Student)
Amin Mamandipoor (Co-Author Student)
Heechul Yun (KU Collaborator)
Mohammad Alian (Principal Investigator)