Outreachy 2017 MayAugust

From QEMU

Introduction

QEMU is participating in Outreachy 2017 May-August. This page contains our ideas list and information for applicants and mentors.

How to apply

  1. Read the Outreachy website first
  2. Choose a project idea from the list below. You should have the necessary technical skills for the idea you have chosen.
  3. Contact the mentor for your project idea to introduce yourself and discuss how you want to tackle the project.
  4. Choose a small task from the BiteSizedTasks page and submit a patch contribution. See Contribute/SubmitAPatch for guidelines on patch submission.

Find Us

  • IRC: #qemu-outreachy on irc.oftc.net

For general questions about QEMU in Outreachy, please contact the following people:

Project Ideas

This is the listing of suggested project ideas.

QEMU audio backend

Summary: Rework QEMU audio backend

The audio backend facilitates audio playback and capture using host audio APIs (CoreAudio, PulseAudio, DirectSound, etc). It is used by emulated soundcards and may need to convert between the audio format supported by the emulated soundcard and the format supported by the physical soundcard. This area of the codebase has been stable for a long time but is now due some significant improvements.

The goal of this summer project is to improve the audio/ backend. The preliminary task is to rebase and merge (some or all) of the GSOC "audio 5.1 patches 00/51" series which modernizes the audio backend codebase.

Then, add a generic GStreamer audio backend. GStreamer is an open source multimedia framework that is cross-platform and already supports a lot of the functionality that is implemented in QEMU's audio backend.

Finally, try to replace as much of audio/ by custom gstreamer pipelines. This would be a major simplification that reduces the code size significantly, making QEMU's audio backend smaller and easier to maintain.

Links:

Details:

  • Skill level: intermediate or advanced
  • Language: C
  • Mentors: marcandre.lureau@redhat.com, kraxel@redhat.com
  • Contact: past gsoc student "Kővágó Zoltán" <dirty.ice.hu@gmail.com>
  • Suggested by: marcandre.lureau@redhat.com

Disk Backup Tool

Summary: Write a tool that performs both full and incremental disk backups

QEMU has added command primitives that can be combined to perform both full and incremental disk backups while the virtual machine is running. A full backup copies the entire contents of the disk. An incremental backup copies only regions that have been modified. Orchestrating a multi-disk backup to local or remote storage is non-trivial and there is no example code showing how to do it from start to finish.

It would be helpful to have a "reference implementation" that performs backups using QEMU's QMP commands. Backup software and management stack developers wishing to add QEMU backup support could look at this tool's code as an example. Users who run QEMU directly could us this tool as their backup software.

You need to be able to read C since that's what most of QEMU is written in. This project will expose you to backup and data recovery, as well as developing command-line tools in Python.

See the links to familiarize yourself with disk image files, backups, and snapshots.

Links:

Details:

  • Skill level: intermediate
  • Language: Python
  • Mentors: John Snow <jsnow@redhat.com> (jsnow on IRC), Stefan Hajnoczi <stefanha@redhat.com> (stefanha on IRC)

Moving I/O throttling and write notifiers into block filter drivers

Summary: Refactor the block layer so that I/O throttling and write notifiers are implemented as block filter drivers instead of being hardcoded into the core code

QEMU's block layer handles I/O to disk image files and now supports flexible configuration through a "BlockDriverState graph". Block drivers can be inserted or removed from the graph to modify how I/O requests are processed.

Block drivers implement read and write functions (among other things). Typically they access a file or network storage but some block drivers perform other jobs like data encryption. These block drivers are called "filter" drivers because they process I/O requests but ultimately forward requests to the file format and protocol drivers in the leaf nodes of the graph.

I/O throttling (rate-limiting the guest's disk I/O) and write notifiers (used to implement backup) are currently hardcoded into the block layer's core code. The goal of this project is to extract this functionality into filter drivers that are inserted into the graph only when a feature is needed. This makes the block layer more modular and reuses the block driver abstraction that is already present.

This project will expose you to QEMU's block layer. It requires refactoring existing code for which there is already some test coverage to aid you.

Links:

Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: Kevin Wolf <kwolf@redhat.com> (kwolf on IRC), Stefan Hajnoczi <stefanha@redhat.com> (stefanha on IRC), Alberto Garcia (berto on IRC)

PCI Express to PCI bridge

Summary: Code an emulated PCIe-to-PCI bridge for QEMU PCI Express machines

Modern Virtual Machines and their devices are PCI Express, however a means of supporting existing PCI and PCI-X deployment is required. Some use cases may need using legacy PCI devices that plug into platforms that exclusively support PCI and PCI-X system slots.

QEMU already has a solution, the i82801b11 DMI-to-PCI Bridge Emulation. However, the device has some disadvantages: it cannot be used by ARM guests and it is part of the Root Complex, so it can't be hot-plugged.

The goal of this summer project is to code a generic PCIe-PCI bridge. The bridge should be hot-pluggable into PCI Express Root Ports and be usable across various architectures and Guest Operating Systems.

Once the bridge is merged upstream, the PCI/PCI Express infrastructure will be ported to the QOM model to conform with QEMU standards, all that as the time permits.

Links:

Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: marcel@redhat.com, marcel_a on IRC
  • Suggested by: Marcel Apfelbaum <marcel@redhat.com>

Add a Hypervisor.framework accelerator

Summary: Add x86 virtualization support on macOS using Hypervisor.framework

QEMU does not yet take advantage of Hypervisor.framework, the API for hypervisors on macOS. Currently one must use the slower TCG just-in-time compiler or the Intel HAXM accelerator module that relies on a third-party driver.

Hypervisor.framework was added to macOS in Yosemite (10.10). It exposes the Intel VMX CPU feature for running guest code safely at native speed. The main difference to the KVM or HAXM APIs is that the Hypervisor.framework user must implement instruction emulation to handle instructions that vmexit due to I/O accesses. Most of the code will be related to this emulator.

QEMU would be able to run x86 virtual machines with much better performance and without relying on third-party drivers thanks to Hypervisor.framework. This will make QEMU more useful on macOS and encourage more contributions from developers on that platform.

This project is an advanced project. You should be familiar with the concept of an emulator. Luckily there is the Linux KVM code as well as other code that implements VMX or Hypervisor.framework to use for inspiration. You will learn about writing the most core part of a hypervisor.

There is an existing QEMU-based Hypervisor.framework implementation in Veertu's hypervisor. This can serve as a reference and one way to approach the project is to take that code and get it merged into QEMU after necessary changes have been made.

Links:

Details:

  • Skill level: advanced
  • Language: C
  • Mentor: Alexander Graf <agraf@suse.de>

Vhost-pci based inter-VM communication extension

Summary: extend the current vhost-pci based inter-VM communication

Existing vhost-pci supports dynamic setup (i.e. the vhost-pci-net device is created and hot-plugged to the VM based on runtime requests) of an asymmetric inter-VM communication channel (i.e. communication between vhost-pci-net and virtio-net). The channel is built by sharing a VM’s entire memory with another VM. This gives rise to good inter-VM communication performance and it is useful for use cases (e.g. Network Function Virtualization) where security is not an important factor.

In the extension work, we enable static setup (i.e. create vhost-pci-net via QEMU booting command line) of a symmetrical inter-VM communication channel (i.e. vhost-pci-net to vhost-pci-net communication). As opposed to sharing the entire VM’s memory, the two VMs share a piece of intermediate memory to transmit network packets.

Links:

Details:

  • Skill level: advanced
  • Language: C
  • Mentor: Wei, Wang <wei.w.wang@intel.com>, Yuanhan, Liu <yuanhan.liu@intel.com>
  • Suggested by: Marc-André Lureau <marcandre.lureau@redhat.com>

Vulkan-ize virgl

Summary: accelerated rendering of Vulkan APIs

virgl enables accelerated 3d rendering in a VM. It uses Desktop GL on host, and provides OpenGL/GLES in guest.

This project would aim at implementing Vulkan accelerated rendering. There are multiple ways of interpreting this idea. One interesting approach would be to support Vulkan in VM on a Vulkan-capable host, doing more passthrough.

Links:

Details:

  • Skill level: advanced
  • Language: C
  • Mentors: airlied@redhat.com, marcandre.lureau@redhat.com
  • Suggested by: marcandre.lureau@redhat.com

virgl Windows driver

Summary: accelerated rendering of Windows guests

virgl enables accelerated 3d rendering in a VM. It currently only supports Linux guests.

There is some working prototype of virtio-gpu dod driver already. The goal of this project would be to enable 3d rendering. By working on an OpenGL Installable Client Driver (probably as a first step), and DirectX support (could be worth investigating and using the 'nine' mesa state tracker)

Links:

Details:

  • Skill level: advanced
  • Language: C
  • Mentors: airlied@redhat.com, vrozenfe@redhat.com
  • Suggested by: marcandre.lureau@redhat.com

virgl on Windows host

Summary: make virgl rendering work on Windows host

virgl enables accelerated 3d rendering in a VM. It requires Desktop GL on the host.

In theory, virgl should work on Windows with a capable host driver. This project aim at making virgl work well with various GPU on Windows. Since many Windows OpenGL drivers have bad behaviours, it would be worth to support ANGLE/opengles instead. This would require various modifications in virgl library. Additionally, it would be a good opportunity to ease the cross-compilation and packaging of qemu/virgl with msitools.

Links:

Details:

  • Skill level: intermediate or advanced
  • Language: C
  • Mentors: marcandre.lureau@redhat.com, airlied@redhat.com
  • Suggested by: marcandre.lureau@redhat.com

MTTCG Performance Enhancements

Summary: The MTTCG Project is a project that converted the TCG engine from single threaded execution to multi-threaded execution to take advantage of all cores on a modern processor. With this conversion several performance bottlenecks were identified when running strongly ordered guests like x86 on weakly ordered hosts like ARM64. The first part of the project will be to quantify the identified bottlenecks for TCG performance. Based on this data, you need to prioritize one of the following sub-tasks.

  • Measure performance bottlenecks experimentally
 - Reasons for code flushes in the current code execution
 - Re-translation overhead for commonly used translation blocks
 - Consistency overhead caused by generating fence instructions for all loads/stores
  • Place TranslationBlock structures into the same memory block as code_gen_buffer

Consider what happens within every TB:

(1) We have one or more references to the TB address, via exit_tb.

For aarch64, this will normally require 2-4 insns.

 # alpha-softmmu
 0x7f75152114:  d0ffb320      adrp x0, #-0x99a000 (addr 0x7f747b8000)
 0x7f75152118:  91004c00      add x0, x0, #0x13 (19)
 0x7f7515211c:  17ffffc3      b #-0xf4 (addr 0x7f75152028)
 # alpha-linux-user
 0x00569500:  d2800260      mov x0, #0x13
 0x00569504:  f2b59820      movk x0, #0xacc1, lsl #16
 0x00569508:  f2c00fe0      movk x0, #0x7f, lsl #32
 0x0056950c:  17ffffdf      b #-0x84 (addr 0x569488)

We would reduce this to one insn, always, if the TB were close by, since the ADR instruction has a range of 1MB.

(2) We have zero to two references to a linked TB, via goto_tb.

  • Remove the 128MB translation cache size limit on ARM64.

The translation cache size for an ARM64 host is currently limited to 128 MB. This limitation is imposed by utilizing a branch instruction which encodes the jump offset and is limited by the number of bits it can use for the range of the offset. The performance impact by this limitation is severe and can be observed when you try to run large programs like a browser in the guest. The cache is flushed several times before the browser starts and the performance is not satisfactory. This limitation can be overcome by generating a branch-to-register instruction and utilizing that when the destination address is outside the range of what can be encoded in current branch instruction.

Based on the previous task of placing the translation structures within the code gen buffer, we can remove this 128 MB cache size limit as follows:

(i) Raise the maximum to 2GB by aligning an instruction pair, adrp+add, to compute the address; the following insn would branch. The update code would write a new destination by modifing the adrp+add with a single 64-bit store.

(ii) Eliminate the maximum altogether by referencing the destination directly in the TB. This is the !USE_DIRECT_JUMP path. It is normally not used on 64-bit targets because computing the full 64-bit address of the TB is harder, or just as hard, as computing the full 64-bit address of the destination.

However, if the TB is nearby, aarch64 can load the address from TB.jmp_target_addr in one insn, with LDR (literal). This pc-relative load also has a 1MB range.

This has the side benefit that it is much quicker to re-link TBs, both in the computation of the code for the destination as well as re-flushing the icache.

  • Implement an LRU translation block code cache.

In the current mechanism that it is not necessary to know how much code is going to be generated for a given set of TCG opcodes. When we reach the high-water mark, we flush everything and start over at the beginning of the buffer. We can improve this situation by not flushing the TBs that were recently used i.e., by implementing an LRU policy for freeing the blocks. If you manage the cache with an allocator, you'll need to know in advance how much code is going to be generated. This is going to require that you generate position-independent code into an external buffer and copy it into the code gen buffer after determining the size. We can then implement an LRU policy for removing unused blocks and saving the translation cache.

  • Avoid consistency overhead for strong memory model guests by generating load-acquire and store-release instructions.

To run a strongly ordered guest on a weakly ordered host using MTTCG, for example, x86 on ARM64, we have to generate fence instructions for all the guest memory accesses to ensure consistency. The overhead imposed by these fence instructions is significant (almost 3x when compared to a run without fence instructions). ARM64 provides load-acquire and store-release instructions which are sequentially consistent and can be used instead of generating fence instructions. Add support to generate these instructions in the TCG run-time to reduce the consistency overhead in MTTCG. You have to use the memory access auxiliary info tags to generate appropriate fences on the host architecture unlike the current situation, where only explicit guest fence instructions are translated.

Further Reading:

Requirements: Working on this will require the student to develop a good understanding of the internals of tiny code generator (TCG) in QEMU. An understanding of compiler theory or previous knowledge of the TCG would also be beneficial to this work. Finally familiarity with GIT and being able to frequently re-base work on upstream master branch would be useful.

Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: Alex Bennée <alex.bennee@linaro.org> (stsquad on IRC)
  • Suggested by: Pranith Kumar, Alex Bennée, and Richard Henderson

Project idea template

=== TITLE ===
 
 '''Summary:''' Short description of the project
 
 Detailed description of the project.
 
 '''Links:'''
 * Wiki links to relevant material
 * External links to mailing lists or web sites
 
 '''Details:'''
 * Skill level: beginner or intermediate or advanced
 * Language: C
 * Mentor: Email address and IRC nick
 * Suggested by: Person who suggested the idea

Information for mentors

Mentors are responsible for keeping in touch with their candidate and assessing the candidate's progress.

The mentor typically gives advice, reviews the candidate's code, and has regular communication with the candidate to ensure progress is being made.

Being a mentor is a significant time commitment, plan for 5 hours per week. Make sure you can make this commitment because backing out during the summer will affect the candidate's experience.

The mentor chooses their candidate by reviewing candidate application forms, giving out bite-sized tasks so applicants can submit a patch upstream, and conducting IRC interviews with candidates. Depending on the number of candidates, this can be time-consuming in itself. Choosing the right candidate is critical so that both the mentor and the candidate can have a successful experience.