MultiThreaded support in the TCG

This is work in progress. The most tested combination is ARMv7 running on an x86 backend however the general patches run for all architectures depending on what the test case is doing. For full support however each Front End (guest) and Back End (tcg host) need to be converted to have solutions for:

Atomic Instructions
Memory Coherence (honouring barriers)

The intention is to support all combinations where they make sense. See the bottom of the page for links, recent discussions and code.

Overview

Qemu can currently emulate a number of CPU’s in parallel, but it does so in a single thread. Given many modern hosts are multi-core, and many targets equally use multiple cores, a significant performance advantage can be realised by making use of multiple host threads to simulate multiple target cores.

There was a talk at KVM Forum 2015 (video slides) which acts as a useful primer. The general thread safety for system-emulation TCG builds on the work already done for linux-user emulation. Indeed some of the work has already been merged and is making a difference to the linux-user code. The main focus is working on whole system emulation.

The last design document was was posted to the list in June 2016. The current work in progress can be found in Alex's GIT tree.

Already Merged Work

Atomic patching of TranslationBlocks
Re-factoring of main cpu_exec loop
QHT based lookups of next TB
Initial memory consistency support (GSoC 2016)

Lockless hot-path in cpu_exec (build on QHT)
cpu-exec: Safe work in quiescent state (gives thread safe tb_flush)

Ready to Merge

cmpxchg-based atomics

Plan and problems to solve

There are 3 main groups of problems and the additional work of enabling the various front and back ends.

General Thread Safety

These are covered by the current "Base enabling patches for MTTCG" (v3, WIP Branch). This is an architecture independent patch series which allows you to run multi-threaded test programs as long as they don't make any assumptions about:

Atomicity
Memory consistency
Cache flushes behaviour (v4 should fix cputlb)

This basically means dedicated test programs see Alex's kvm-unit-tests

Memory consistency

Host and guest might implement different memory consistency models. While supporting a weak ordering model on a strong ordering back-end isn't a problem it's going to be hard supporting strong ordering on a weakly ordered back-end.

Remaining Case: strong on weak, ex. emulating x86 memory model on ARM systems

Instruction atomicity

There a number of approaches being discussed on the list at the moment:

cmpxchg-based emulation of atomics

This work by Emilio Cota and Richard Henderson adds a number of atomic primitives which can be used in TCG code to emulate atomic instructions and paired load-link store-conditionals.

Slow path for atomic instruction emulation

This work by Alvise Rigo tweaks the SoftMMU emulation to trigger a slow path in contended cases.

Front-end and Back-end conversions

Each front end will need to be converted to use MTTCG aware atomics and instrument their barrier instructions.

Each back end will need to support the generation of new TCGOps required to support the front ends.

How to get involved

Right now, there is a small dedicated team looking at this issue. Those are:

Alex Bennée (Review, testing, base enabling tree)
Fred Konrad (Original core MTTCG patch set)
Alvise Rigo (LL/SC work)
Emilio Cota (QHT, cmpxchg atomics)
Mark Burton
Pavel Dovgalyuk

Mailing List

If you would like to be involved, please use the mail list: mttcg@listserver.greensocs.com

You can subscribe here:

       http://listserver.greensocs.com/wws/info/mttcg

If you send to this mail list, please make sure to copy qemu-devel as well.

There is a once a fortnight phone conference with summary notes posted to the mailing lists (archives).

Current Code

Remember these trees are WORK-IN-PROGRESS and could be broken at any particular point. Branches may be re-based without notice.

MTTCG Work:

Latest Tree: https://github.com/stsquad/qemu (branch:mttcg/enable-mttcg-for-armv7-v1)
Fred's Tree: http://git.greensocs.com/fkonrad/mttcg.git (branch:multi_tcg_v8)

LL/SC Work

Alvise's Tree: https://git.virtualopensystems.com/dev/qemu-mt.git (branch:slowpath-for-atomic-v8-no-mttcg)

MTTCG Test Cases:

These are tests specifically designed to exercise the code, based on kvm-unit-tests:

https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v5

Other Work

This is the most important section initially, and we welcome any, and all comments and other work. If you know of any patch sets that may be of value, PLEASE let us know via the qemu-devel mail list.

Proof of concept implementations

Below are all the proof of concept implementations we have found thus far. Most of them seem to have bitrotted.

HQEMU
- http://dl.acm.org/citation.cfm?id=2259030&CFID=454906387&CFTOKEN=60579010

PQEMU
- https://github.com/podinx/PQEMU
- http://www.cs.nthu.edu.tw/~ychung/conference/ICPADS2011.pdf

COREMU