Skip to the content.

SJTU的二进制翻译研究总结

自从2007年实现了CrossBit后,在此基础上进行了若干研究,包括优化、安全、不同形态的翻译器等。

安全

Nightingale【2017】memvisor【2012/2014】Multimem【2012】CrossIF【2010】system call check【2009】

翻译器

BabelFish【2017】DistriBit【2012/2010】,MTCrossBit【2011/2009/2008】,GXBit【2011/2005】,CacheBit【2009】,vBTrans【2008】Co-design CrossBit【2008】CrossBit【2007】

优化

热代码集中【2011/2010】Fast Return【2010/2009】HW-profile【2009】Condition Codes【2009】硬件查找翻译,与执行分离【2009/2008】profile hot path【2008】Code Cache替换【2008】

特定问题

大小端【2011】调试Guest【2008】浮点【2008】

一个简单的时间列表

2017

2014

2012

2011

2010

2009

2008

2007

文章简析

2017 Nightingale: Translating Embedded VM Code in x86 BInary Executables

研究了基于VM解释执行的代码保护方式,并提出了一个二进制翻译工具来简化(优化)嵌入其中的那个VM。

2017 针对 MIPS 程序的静态二进制翻译技术研究

对跳转、调用、数据段处理等分析都比较粗糙

(原文中甚至直接把引用文献的题目写进去了,这真的是2017年的毕业论文吗?!)

2017 嵌入式设备动态分析方法研究

2015 虚拟化环境下操作系统安全性和性能的研究

并不是二进制翻译

(1)利用嵌套虚拟化在KVM底下又加了一层,来控制访存和数据加密

(2)用vCPU Ballooning来绑定vCPU到物理核,减少vCPU的数量,降低双重调度问题导致的过高的开销

2014 Multi-Granularity Memory Mirroring via Binary Translation in Cloud Environments

As the size of DRAM memory grows in clusters, memory errors are common.

Current memory availability strategies mostly focus on memory backup and error recovery.

In this paper, we present a novel system called Memvisor to provide high availability memory mirroring.

It is a software approach achieving flexible multi-granularity memory mirroring based on virtualization and binary translation.

(1) flexibly set memory areas to be mirrored or not from process level to the whole user mode applications.

(2) Then, all memory write instructions are duplicated.

(3) If memory failures happen, Memvisor will recover the data from the backup space

2012 Multimem: Retrofitting system availability via a lightweight binary translation framework

As the size of memory in servers becomes larger and larger, the availability of them is under big pressure as memory failures are common.

To improve the memory availability, some solutions try to mitigate the occurrence of memory errors while most strategies focus on memory failure tolerance.

Hardware solutions like mirror memory needs expensive peripheral equipments while existing software approaches are somewhat complicated and limited by the high overhead in practical usage.

In this paper, we present a novel lightweight binary translation framework called Multimem to improve system memory access availability.

It is a software approach achieving hardware mirror memory feature via static binary translation technology.

Multimem switches native systems to high available systems with two or more copies of memory so when memory failures happen, systems could recover the data from the replica.

2012 Memvisor: Application Level Memory Mirroring via Binary Translation

2012 面向受限系统的分布式动态二进制翻译的分析与研究

2011 Flexible Endian Adjustment for Cross Architecture Binary Translation

The issue is inconspicuous but may lead to significant performance bottleneck.

This paper investigates the key aspects of endianness and finds several solutions to endian adjustment for cross-architecture binary translation.

In particular, it considers the two principal methods of this field — byte swapping and address swizzling, and gives a comparison of them in our DBT (Dynamic Binary Translator) CrossBit.

Swizzled address = HighWMark − (SIZE + EA − LowWMark)

0. little-endian guest and big-endian host
1. store 0x2000 4Bytes 0xABCD
store address = 0x2004 - (4 + 0x2000 - 0x2000) = 0x2000
//         0x2000, 0x2001, 0x2002, 0x2003
// little:      A,      B,      C,      D
// big   :      D,      C,      B,      A  // 直接按照host的进行写入,在host看来是0xABCD,在guest看来是0xDCBA
2. load 0x2002 2Bytes
load  address = 0x2004 - (2 + 0x2002 - 0x2000) = 0x2000 // 地址转换
load  value   = 0xCD // 直接按照host的进行读取,得到正确的数据
0. little-endian guest and big-endian host
1. store 0x2000 4Bytes 0xABCD => 0xDCBA
//         0x2000, 0x2001, 0x2002, 0x2003
// little:      D,      C,      B,      A  // host数据,在host看来存储的是0xDCBA
// big   :      A,      B,      C,      D
2. load 0x2002 2Bytes
load  value   = 0xDC // 按照host的大端读取
byte swapping = 0xCD // 得到正确数据

2011 A Dynamic-Static Combined Code Layout Reorganization Approach for Dynamic Binary Translation

In the static phase, based on the profile information collected in the previous stage, we first use the method of code replicating to build the traces, and then reorganize the layout of the target code by putting the hottest traces at the top of the software cache.

2011 MTCrossBit: A dynamic binary translation system based on multithreaded optimization

We propose a multithreaded DBT framework with no associated hardware called the MTCrossBit, where a helper thread for building a hot trace is employed to significantly reduce the overhead.

the dual-special-parallel translation caches and

the new lock-free threads communication mechanism—assembly language communication (ASLC).

这英文写得太难受了,读起来也难受……

2011 CPU/GPU异构多核虚拟执行环境框架的设计与实现

2011 基于虚拟机 QEMU 的嵌入式全系统仿真测试环境的研究与实现

哇正文读不下去,看个题目就好了。。。

2010 A New Approach to Reorganize Code Layout of Software Caceh in Dynamic Binary Translator

In this paper, we designed a new approach using dynamic-static combined framework to reorganize code layout of software cache.

2010 The Optimizations in Dynamic Binary Translation

This paper investigates a few optimizations to alleviate the overhead in DBT.

We evaluate these optimizations in CrossBit, which is a resourceable and retargetable dynamic binary translator, including block linking, condition codes optimization, register mapping optimization, static-integrated optimization, multithreaded optimization.

2010 动态二进制翻译中跳转分析与优化

源程序的返回地址直接替换成对应的翻译后块的地址

2010 基于动态二进制探测框架的缓冲区溢出检测研究

2010 用于受限系统的分布式动态二进制翻译框架的设计与实现

2012年也有一篇呀,同一个导师哎,怎么回事呢

2009 The Implementation of Dynamic linking in Dynamic Binary Translation System

2009 CacheBit: A Multisource-Multitarget Cache Instrumentation Tool

Cachebit simulates cache behavior and presents statistics of cache profile at runtime.

After running programs on Crossbit with Cachebit available, cache profile information can be reported to help developers rewrite and improve their programs.

也就是模拟了cache的行为呗,基于DBT就可以拿到所有的访存行为

我的评价是:不如收集trace并分析来得更容易,但模拟cache行为可以提供更大的能力

2009 Return Instruction Analysis and Optimization in Dynamic Binary Translation

In this paper, we present an improved return cache scheme with relative low overhead to handle the return instruction, the most important form of indirect branch.

2009 MTCrossBit: A Dynamic Binary Translation System Using Multithreaded Optimization Framework

2009 A Runtime Profile Method for Dynamic Binary Translation Using Hardware-Support Technique

In this paper, we propose a novel profile approach on DBT using hardware support technique to achieve rapidly and accurately collecting profile information with minimal runtime overhead.

This approach makes use of instrumentation code and a set of profiling hardware which supports operations of updating counters.

The Store instruction is used to pass the first SPC of each block to the FIFO buffer.

与之前的一个想法类型:在内存条上插入FPGA来截获store和load来进行额外的运算,可以用来计数发现热点

2009 The Implementation of Static-integrated Optimization Framework for Dynamic Binary Translation

In traditional Dynamic binary translation (DBT) systems, poor profile information at runtime limits the manner of optimization.

Combining dynamic binary translation with static analysis brings an opportunity to improve the runtime performance.

Optimization on the target code is mainly based on the profile information collected from the first execution.

这么搞基本没什么意思了,而且只得到了5%的性能提升

2009 A Heuristic Policy-based System Call Interposition in Dynamic Binary Translation

In this paper, we present HPSCIBit, a solution that efficiently confines malicious applications, supports automatic policy generation and interactive policy generation, intrusion detection and prevention in the DBT system.

入侵的检测和防护

N. Provos, "Improving Host Security with System Call Policies," In Proc. 12th USENIX security Symp., pp. 257-272,2003

看上去是一些非常简单直白的规则来保护系统

2009 A Two-Phase Optimization Approach for Condition Codes in a Machine Adaptable Dynamic Binary Translator

First, redundant flag computing code in a basic block is reduced based on the information collected by Crossbit when the block is identified.

Then, lazy evaluation technique is used inter basic blocks, which make the condition codes emulation more efficient.

2009 跳转链接技术在动态二进制翻译中的应用

2009 虚拟机的软硬件协同设计方法研究

采用FPGA实现了虚拟机协处理器(包括二进制翻译器以及TCache等部件单元)

确认是用户态的二进制翻译了,可以见原文第四章的图

2009 动态二进制翻译中的中间表示

[6] Adve V, Lattner C, Brukman M, et al. LLVA: A Low-level Virtual Instruction Set Architecture[C]//Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. San Diego, California, USA: [s. n.], 2003.
[7] Engler D R. VCODE: A Retargetable, Extensible, Very Fast Dynamic Code Generation System[C]//Proc. of ACM Conf. on Programming Language Design and Implementation. New York, USA: [s. n.], 1996.

2009 动态二进制翻译中的热路径优化

2008 Multithreaded optimizing technique for dynamic binary translator CrossBit

2008 动态二进制翻译中的调试器研究

2008 IA64体系结构下虚拟化IA32内存管理单元

基本和QEMU的方法一致

2008 动态二进制翻译中基于 profile 的优化算法研究

2008 可重定向动态二进制翻译器中浮点运算单元的设计与实现

[5] D. Burger, T.M. Austin, and S. Bennett, Evaluating Future Microprocessors: The SimpleScalar Tool Set, Tech. Rep. CS-TR96-1308, Univ. Wisconsin, Madison, 1996.

2008 动态二进制翻译中的TCache替换算法

2002年就被测试过了吧…结论也一致

2008 基于动态二进制翻译的操作系统虚拟化研究

2008 基于软硬件协同设计的虚拟机的并行性研究

并行系统通过将动态二进制翻译的任务分配 到两个处理核上执行,将代码翻译 、profile 信息收集、缓存维护、 源-目标执行代码入口地址等任务从二进制翻译器中源结构指令 执行的关键路径上分离出来,提高系统的性能与实时性。

硬件翻译单元主要具备动态二进制翻译,TCache 管理功能以及部分优化机制 的实现等功能。

软件层包括 IA-32 可执行文件的加载器和虚拟 机 IP 核驱动程序以及 Linux 操作系统。

硬件部分包括 PowerPC 处理器、内存和虚拟机 IP 核。

其中虚拟机的 IP 核主要由两部分组成:二进制翻译器和 TCache 管理器。

软件与硬件之间的通信问题由共享存储的方式解决。

软硬件协同设计虚拟机使用微代码实现解释器

硬件 profile 还 为动态翻译中基本块的生成和选取提供路径描述符,执行计数和指令块大小等数 据支持。优化器还可以根据探测到的指令模式调整 TCache 中热点路径的产生和 监测方式。

2007 An Intermediate Language Level Optimization Framework for Dynamic Binary Translation

The framework proposed in this paper includes efficient profiling, hot code recognition and smart code cache management policies.

An optimizer is a loadable tool that coexists with other components in the middle layer of the dynamic binary translation system.

2007 二进制翻译系统 QEMU 的优化技术

本文的主要贡献为研究了目前二进制翻译领域的典型翻译系统,详细研究了 动态二进制翻译系统QEMU的翻译机制、运行方式、翻译策略,并使用其用户级系 统作为我们的实验平台。

针对QEMU动态二进制翻译系 统中将中间变量映射到宿主机寄存器上的翻译机制,对寄存器的不同映射方案进 行了性能测试。发现了在目前翻译机制下,中间变量的确是使用最为频繁,最有 价值映射的部分。最后提出了取消中间变量的新翻译机制设想。

针对QEMU动态二 进制翻译系统中每个基本块以头指令pc作为唯一标识的方式,发现了基本块覆盖 的存在。即基本块可能是某个基本块的一部分,也有可能包括一些基本块。对此 提出了减少基本块覆盖现象的方案,并且将其实现。

2007 构建基于动态二进制翻译技术的进程虚拟机

CrossBit 的设计目标是可重定向和可扩展。实现可重定向的关键 在于中间指令集 VInst。

[29] V. Adve, C. Lattner, M. Brukman, A. Shukla, and B. Gaeke, LLVA: A Low-level Virtual Instruction Set Architecture, MICRO-36, San Diego, California, 2003.

2005 GXBIT: Combining Polyhedral Model with Dynamic Binary Translation

Analysis stage uses binary instrumentation and binary analysis to probe potential parallel parts (usually nested loop) of a binary executable and then polyhedral model is employed to detect whether there is data dependence or not among all iterations of a nested loop.