Let's get started
Overview of the workflow
graph TD
B[Boot] --> L{Load};
L --> I[Initialization];
I --> F[Functional Warming];
F --> P[Partitioning];
P --> RP[Run Partitions];
RP --> R[Results];
R --> C{Check Results};
C -->|Number of Samples Is Enough| Fi[Finished];
C ---->|Number of Samples Is Not Enough| U[Update Sample Number];
U --> CleanPartition[Clean Up Partitions, Removes Old Files];
CleanPartition --> F;
By the end of this tutorial, you will have run through the entire workflow of statistical sampling. You will also learn why each step is necessary.
0) Prerequisites
Before you begin
Make sure your environment is ready using one of the following ways:
- All requirements installed locally, or
- You’ve started the Docker environment with the
dephelper (see Installation and Docker):./dep start-docker --worm --mounting-folder <MOUNT_DIR>
Create the base image
Before booting, create a base image in the same directory you use for --image-folder and --working-directory in your shared args file.
Command
- Set
<YOUR_FOLDER>to the path you use inqflex.args:--image-folder <YOUR_FOLDER> - By default, the image name aligns with your args (e.g.,
--image-name root.qcow2if you kept the template).
See all options
Explore additional flags for sizing, format, or naming:
./qflex create-base-image --help
--image-folder consistent with your later steps so snapshots and artifacts stay together.
Create a shared args file (and the run pattern)
To avoid repeating long flag lists, keep common options in a single file and pass them to qflex with xargs.
Create ./qflex.args (a.k.a. qflex.common.args) with one argument per line.
Template (replace <MOUNT_DIR>):
--core-count 8
--double-cores
--quantum-size-ns 2000
--llc-size-per-tile-mb 2
--parallel
--network none
--memory-gb 32
--host-name ZEN3
--workload-name web-search
--population-seconds 5
--no-consolidated
--primary-ipc 2.0
--primary-core-start 0
--phantom-cpu-ipc 4
--experiment-name experiment_name_001
--image-name root.qcow2
--image-folder <MOUNT_DIR>/<IMAGE_DIR>
--mounting-folder <MOUNT_DIR>
--check-period-quantum-coeff 53.0
--no-unique --use-image-directly
Inline flags are possible, but
You can pass these flags directly on the command line, but keeping them in qflex.args avoids duplication and reduces mistakes.
Run pattern using xargs
Use xargs to feed the file’s flags to ./qflex:
Add any run-specific flags after the command:
1) Boot to bring up the OS and install requirements
Use the boot action to start the OS with your common arguments. Once the VM is up, install your workload’s dependencies inside the qemu image (e.g., package manager installs, copying configs, etc.).
- This brings up the VM using the CPU/memory/ defined in
qflex.args. - With
--image-folderand--working-directorypointing to your mount, you can place existing qemu images in<MOUNT_DIR>and access them from within the docker contianer and skip this step.
Save a snapshot of the image and qemu monitor
Once the VM is in the desired state:
-
Open the QEMU monitor: press Ctrl+A, pause briefly, then press C.
You should see a prompt like(qemu). -
Save a snapshot:
(qemu) savevm boot
Snapshot name
You can change the snapshot name if you want but this name is used in later steps in this tutorial.
What you have now
You’ve created a reusable image snapshot that has your workload dependencies installed and is ready for the next steps.
2) Load
Load restores the prepared VM state you saved after boot and where you can start the workload, and each core will have a clock.
Restore the snapshot and start the workload
Use the snapshot saved in the previous step (created from the QEMU monitor with savevm boot) and load it:
Afterwards you can run your commands to start your workload within qemu.
- The value for
--loadvm-nameshould match the snapshot you saved in Section 2 (Boot):
Once you have started your workload started you can save a snapshot by bringing up qemu monitor and saving a snapshot:
Why “Load” is needed
Workloads often have multiple components that must respect order and time as they are starting, i.e. a web server must start after the database is up and running and if the components are running on different cores they need to be started within a certain time window of each other. The Load step allows you to script and orchestrate this process.
3) Init Warm
Init Warm initializes long-term microarchitectural states—such as caches, branch predictors, and TLBs so that functional warming can start.
Run init warm
Load from the prior loaded snapshot and initialize the microarchitectural state:
--loadvm-name loadedshould match the snapshot produced in Section 4 (Load).
Initialization notes
This step makes sure all long-term state have been initialized, therefore make sure you have started the workload, as without a workload using the resources, this step will execute for a long time.
Snapshot created automatically
Upon successful initialization, a QEMU snapshot named init_warmed is created on the image.
This snapshot captures the initialized microarchitectural state so subsequent stages can start from a consistent baseline.
What you have now
A VM snapshot (init_warmed) with warmed long-term microarchitectural state, ready for the next step.
This step and all the steps before it only need to be done once per workload, unless changing the workload configuration.
4) Functional Warming
Functional warming runs the VM forward for the configured population length (in seconds) and emits checkpoints that will be used later for timing (detailed) simulation.
Run functional warming
Start from the init_warmed snapshot created in the previous step:
- The warm length is controlled by your common args (e.g.,
--population-seconds 0.00001). - Output includes checkpoints that later stages can load to run short, timing simulation.
--sample-size 30controls how many checkpoints are created. You can adjust this number based on your needs and the results you get from the timing simulation in later steps.
5) Partitioning
After checkpoints are created from functional warming, you can split the checkpoints into partitions to be ran in parallel during detailed simulation (using flexus).
You can adjust the --partition-count to fit your needs.
- This creates folders named
partition_0,partition_1, ...,partition_15in the run folder of the experiments folder (for--partition-count 16).
6) Running Partitions
After creating the partitions you will have the script that manages running all partitions in timing simulation using flexus.
xargs -a ./qflex.args -- ./qflex run-partition --warming-ratio 2 --measurement-ratio 1
--warming-ratio and --measurement-ratio.
7) Collecting Results
After the detailed simulation is done you can collect the results and decided whether or not you want to reiterate from the functional warming step as it uses an estimate IPC. The result command creates estimated IPCs that can be more accurate and with more accurate IPC you can re-iterate.
This will automatically create a new core_info.csv file in the experiment folder with the updated IPC values. ready for you to re-iterate the experiment.
8) Re-iterating
If you decide to reiterate you should first remove the partitions folders.
The IPC is already updated in the cfg/core_info.csv file and you just need to update your sample size and can re-iterate from the functional warming step.
Warning
This will remove all partition folders and their contents. Make sure you have collected any necessary data before running this command.
Debugging timing simulation
The previous command runs all the partitions in parallel so you will not see the output of each partition. To debug a specific partition you can run it directly using the following command:
You can change partition_0 to the partition you want to debug.