Skip to the content.

二进制代码相似度的对比

2021 A Survey of Binary Code Similarity

Overview - Compilation Process

Overview - Binary Code Similarity

Applications

Evaluation

The origins.

起源于自动生成 patch 的需求:连续版本的 diff、减少网络传输

The first decade 2000-2009

语义相似度开始出现

The last decade 2010-2019

主要应用于寻找 BUG:对于已知问题搜索程序其他部分是否有相似的错误

APPROACHES

粒度 Granularity

语法相似度 Syntactic Similarity

语义相似度 Semantic Similarity

结构相似度 Structural Similarity

特征相似度 Feature-Based Similarity

哈希 Hashing

架构 Supported Architectures

分析的类型 Type of Analysis

Binary code similarity approaches can use static analysis, dynamic analysis, or both.

归一化 Normalization

Syntactic similarity approaches often normalize instructions, so that two instructions that are normalized to the same form are considered similar despite some syntactic differences

IMPLEMENTATIONS

EVALUATIONS

DISCUSSION

PIN

iLine [58]

[58] Jiyong Jang, Maverick Woo, and David Brumley. 2013. Towards automatic software lineage inference. In USENIX Security Symposium.

Blex [38]

[38] Manuel Egele, Maverick Woo, Peter Chapman, and David Brumley. 2014. Blanket execution: Dynamic similarity testing for program binaries and components. In USENIX Security Symposium.

identifying similar functions among binary executables

Recent work tries to establish semantic similarity based on static analysis methods

these methods do not perform well if the compared binaries are produced by different compiler toolchains or optimization levels

we propose blanket execution, a novel dynamic equivalence testing primitive that achieves complete coverage by overriding the intended program logic

KLKI2016 [66]

[66] TaeGuen Kim, Yeo Reum Lee, BooJoong Kang, and Eul Gyu Im. 2016. Binary executable file similarity calculation using function matching. Journal of Supercomputing 75, 2 (Dec. 2016), 607–622.

KS2017 [64]

[64] Ulf Kargén and Nahid Shahmehri. 2017. Towards robust instruction-level trace alignment of binary code. In
IEEE/ACM International Conference on Automated Software Engineering.

Program trace alignment is the process of establishing a correspondence between dynamic instruction instances in executions of two semantically similar but syntactically different programs.

we present what is, to the best of our knowledge, the fifirst method capable of aligning realistically long execution traces of real programs.

IMF-sim [116]

[116] Shuai Wang and Dinghao Wu. 2017. In-memory fuzzing for binary code similarity analysis. In IEEE/ACM International Conference on Automated Software Engineering.

In this paper, we present a novel method that leverages inmemory fuzzing for binary code similarity analysis.