2021-08-09-llvm学习
我最近想往我们代码库引入一个静态分析,所以需要使用clang AST matcher,所以下面的内容实际上包含两部分即AST的学习和我要用AST MATCHER解决什么东西?最后我使用Clang-Tidy建立了什么检查条件和解决了什么问题
1 AST
参考:
- https://clang.llvm.org/docs/IntroductionToTheClangAST.html 一些基础的概念的介绍,介绍起始的概念
- https://jywhy6.zone/2020/11/27/clang-notes/#RecursiveASTVisitor-%20%E7%9B%B8%E5%85%B3 一片介绍比较详细的blog
- https://blog.csdn.net/qq_23599965/article/details/94595735 clang里面AST的基础类型的关系等东西
Clang参考的文档为:
- https://clang.llvm.org/doxygen/ 完整文档
- https://clang.llvm.org/docs/LibASTMatchersReference.html ASTMatcher的内容
简单来说
2 AST MATCHER
可以阅读第二个链接的英文的话建议直接阅读,下面实际上就是个比较粗糙的翻译。
参考:
- https://clang.llvm.org/docs/LibASTMatchers.html 如何使用AST-Matcher
- https://clang.llvm.org/docs/LibASTMatchersReference.html 一个具体的,使用AST Matcher的内容
- https://xinhuang.github.io/posts/2015-02-08-clang-tutorial-the-ast-matcher.html 给了一个AST Matcher的例子
AST Matcher主要包含三种不同类型的match:
- Node Matchers: 用来匹配特定类型的AST节点的MATCHER
- Narrowing Matchers: 用来匹配AST节点属性的MATCHER,可以用来缩小范围
- Traversal Matchers: 用来匹配在AST节点之间遍历的MATCHER
这里有几点要注意,AST节点完全展开实际上内部有很多的隐式转换,默认的AST Matcher运行的结构为AsIs模式,这个模式要求必须精确匹配/忽略内部节点,因此如果不追求精确匹配,建议使用IgnoreUnlessSpelledInSource模式。
使用clang query的话,需要使用下面命令来修改匹配模式
set traversal IgnoreUnlessSpelledInSource
如果是C++代码,比方说修改clang-tidy的源码,使用这个源码
Finder->addMatcher(traverse(TK_IgnoreUnlessSpelledInSource,
returnStmt(hasReturnArgument(integerLiteral(equals(0))))
), this);
2.1 匹配构造函数里面调用两次
stackoverflow上面的原链接:https://stackoverflow.com/questions/60435722/clang-ast-matchers-how-to-find-function-body-from-a-function-declaration
想匹配的代码是
class Dummy_file
{
FILE *f1_;
FILE *f2_;
public:
Dummy_file(const char* f1_name, const char* f2_name, const char * mode){
f1_ = fopen(f1_name, mode);
f2_ = fopen(f2_name, mode);
}
~Dummy_file(){
fclose(f1_);
fclose(f2_);
}
};
看上去还是比较直接的,相匹配的类别,其构造函数调用了两次fopen。对应到语法当中是有子节点,且子节点调用了两次fopen
直接看
//匹配构造函数
cxxConstructorDecl(
//有子节点,这不不能用has是为了传递
hasDescendant(
//调用了函数
callExpr(
callee(
//函数有名字fopen
functionDecl(hasName("fopen"))
)
).bind("fopencall")
)
).bind("ctr")
2.2 检测比较,比较对象一个是自定的类型
stackoverflow上面的原链接:https://stackoverflow.com/questions/59404925/clang-ast-matcher-for-variables-compared-to-different-variable-types
代码为
typedef int my_type;
void foo()
{
int x = 0;//this should be identified as need to be fixed
my_type z = 0;
if( x == z){
//match this case
}
}
比较的Matcher为
// Match binary operators
binaryOperator(
// that are equality comparisons,
hasOperatorName("=="),
// where one side refers to a variable
hasEitherOperand(ignoringImpCasts(declRefExpr(to(varDecl(
// whose type is a typedef or type alias
hasType(typedefNameDecl(
// named "::my_type"
hasName("::my_type"),
// that aliases any type, which is bound to the name "aliased",
hasType(type().bind("aliased"))))))))),
// and where one side refers to a variable
hasEitherOperand(ignoringImpCasts(declRefExpr(to(varDecl(
// whose type is the same as the type bound to "aliased",
// which is bound to the name "declToChange".
hasType(type(equalsBoundNode("aliased")))).bind("declToChange"))))));
3 CLANG-QUERY
这里我要先写一点东西,ast matcher是匹配到一个就完了的,也就是说
- https://firefox-source-docs.mozilla.org/code-quality/static-analysis/writing-new/clang-query.html 如何使用clang-query
- https://devblogs.microsoft.com/cppblog/exploring-clang-tooling-part-2-examining-the-clang-ast-with-clang-query/ 另一篇教学例子
单独拿出来CLANG-QUERY是为了能够验证AST或者看AST是否正常
举一个简单的例子,下面的代码
int f(int x) {
int result = (x / 42);
return result;
}
class un_init_double {
public:
un_init_double() {
init_param_ = 0;
}
bool compare(un_init_double& other) {
if (other.un_init_param_ == un_init_param_) {
return true;
}
return false;
}
private:
double un_init_param_;
double init_param_;
};
先dump一下AST树看下。
- 函数f的dump就很简单。就是一个FunctionDecl(附带着ParmVarDecl,这个后面用了DeclRefExpr引用)包着一个CompoundStmt:先一个DeclStmt—VarDecl,里面用了BinaryOperator。这里注意这里面有一个ImplicitCastExpr的隐式转换(从左值到右值),然后除以一个整数字面量IntegerLiteral。最后一个ReturnStmt内部有个隐式转换从左值到右值,而且用了旧的引用DeclRefExpr
- 然后是un_init_double的定义,可以看到提供了几个生成的构造函数。
TranslationUnitDecl 0x138046208 <<invalid sloc>> <invalid sloc>
.........
|-FunctionDecl 0x1300133a0 <test.cc:1:1, line:4:1> line:1:5 f 'int (int)'
| |-ParmVarDecl 0x1300132d0 <col:7, col:11> col:11 used x 'int'
| `-CompoundStmt 0x130013608 <col:14, line:4:1>
| |-DeclStmt 0x1300135a8 <line:2:3, col:24>
| | `-VarDecl 0x1300134a8 <col:3, col:23> col:7 used result 'int' cinit
| | `-ParenExpr 0x130013588 <col:16, col:23> 'int'
| | `-BinaryOperator 0x130013568 <col:17, col:21> 'int' '/'
| | |-ImplicitCastExpr 0x130013550 <col:17> 'int' <LValueToRValue>
| | | `-DeclRefExpr 0x130013510 <col:17> 'int' lvalue ParmVar 0x1300132d0 'x' 'int'
| | `-IntegerLiteral 0x130013530 <col:21> 'int' 42
| `-ReturnStmt 0x1300135f8 <line:3:3, col:10>
| `-ImplicitCastExpr 0x1300135e0 <col:10> 'int' <LValueToRValue>
| `-DeclRefExpr 0x1300135c0 <col:10> 'int' lvalue Var 0x1300134a8 'result' 'int'
`-CXXRecordDecl 0x130013628 <line:6:1, line:20:1> line:6:7 class un_init_double definition
|-DefinitionData pass_in_registers standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
| |-DefaultConstructor exists non_trivial user_provided
| |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveConstructor exists simple trivial needs_implicit
| |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveAssignment exists simple trivial needs_implicit
| `-Destructor simple irrelevant trivial needs_implicit
|-CXXRecordDecl 0x130013748 <col:1, col:7> col:7 implicit referenced class un_init_double
|-AccessSpecDecl 0x1300137d8 <line:7:3, col:9> col:3 public
|-CXXConstructorDecl 0x130013888 <line:8:5, line:10:5> line:8:5 un_init_double 'void ()'
| `-CompoundStmt 0x130044ec0 <col:22, line:10:5>
| `-BinaryOperator 0x130044ea0 <line:9:7, col:21> 'double' lvalue '='
| |-MemberExpr 0x130044e38 <col:7> 'double' lvalue ->init_param_ 0x130044d78
| | `-CXXThisExpr 0x130044e28 <col:7> 'un_init_double *' implicit this
| `-ImplicitCastExpr 0x130044e88 <col:21> 'double' <IntegralToFloating>
| `-IntegerLiteral 0x130044e68 <col:21> 'int' 0
|-CXXMethodDecl 0x130044c28 <line:11:5, line:16:5> line:11:10 compare 'bool (un_init_double &)'
| |-ParmVarDecl 0x130013968 <col:18, col:34> col:34 used other 'un_init_double &'
| `-CompoundStmt 0x130045030 <col:41, line:16:5>
| |-IfStmt 0x130044ff0 <line:12:7, line:14:7>
| | |-BinaryOperator 0x130044f98 <line:12:11, col:35> 'bool' '=='
| | | |-ImplicitCastExpr 0x130044f68 <col:11, col:17> 'double' <LValueToRValue>
| | | | `-MemberExpr 0x130044ef8 <col:11, col:17> 'double' lvalue .un_init_param_ 0x130044d10
| | | | `-DeclRefExpr 0x130044ed8 <col:11> 'un_init_double' lvalue ParmVar 0x130013968 'other' 'un_init_double &'
| | | `-ImplicitCastExpr 0x130044f80 <col:35> 'double' <LValueToRValue>
| | | `-MemberExpr 0x130044f38 <col:35> 'double' lvalue ->un_init_param_ 0x130044d10
| | | `-CXXThisExpr 0x130044f28 <col:35> 'un_init_double *' implicit this
| | `-CompoundStmt 0x130044fd8 <col:51, line:14:7>
| | `-ReturnStmt 0x130044fc8 <line:13:9, col:16>
| | `-CXXBoolLiteralExpr 0x130044fb8 <col:16> 'bool' true
| `-ReturnStmt 0x130045020 <line:15:7, col:14>
| `-CXXBoolLiteralExpr 0x130045010 <col:14> 'bool' false
|-AccessSpecDecl 0x130044cd0 <line:17:3, col:10> col:3 private
|-FieldDecl 0x130044d10 <line:18:5, col:12> col:12 referenced un_init_param_ 'double'
`-FieldDecl 0x130044d78 <line:19:5, col:12> col:12 referenced init_param_ 'double'
ld: file too small (length=0) file '/var/folders/70/sz5vlj3n0qj2t4ktcw4qbpy00000gn/T/test-e8b440.o' for architecture arm64
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
希望找到一下几个地方:
- 返回值为int的函数
- 存在double成员,且未显式初始化的类别
- 存在double成员,且比较直接用的==而不是double减法比较
使用clang-query查看文件
clang-query test.cc --
3.1 匹配返回int的函数
首先看第一个匹配,返回值为int的函数,下面的匹配能够拿到这个结果。可是在老版本的clang-query(llvm 13)的版本会报错说没有hasReturnTyoeLoc这个matcher。老版本的clang-query怎么办呢?看第二个方法
# 这种方法在clang13 里面用不了
clang-query> match functionDecl(hasReturnTypeLoc(loc(asString("int"))))
# 这种理论上可以用,但是我还没有测试
clang-query> match functionDecl(returns(asString("int")))
3.2 匹配存在double成员,且未显式初始化的类别
这里面我参考了这篇文章:Detecting Uninitialized Variables in C++ with the Clang Static Analyzer∗
我理解这里未显式初始化匹配实际上就是匹配构造函数没有给double成员赋值。那么第一个思路类别的属性匹配,且没有调用=号。这个时候需要注意,我们需要引入逻辑运算属性matcher:allOf, anyOf, anything and unless。
一步一步来
- 构造函数调用了binaryOperator”=”号
- 匹配函数体里面有属性表达式的构造函数
# 匹配调用了=的构造函数
clang-query> match cxxConstructorDecl(hasDescendant(binaryOperator(hasOperatorName("="))))
Match #1:
/home/qcraft/code_test/ast_dump/test.cpp:8:5: note: "root" binds here
un_init_double() {
^~~~~~~~~~~~~~~~~~
1 match.
clang-query>
#匹配函数体里面有属性表达式的构造函数
clang-query> match cxxConstructorDecl(hasDescendant(memberExpr()))
Match #1:
/home/qcraft/code_test/ast_dump/test.cpp:8:5: note: "root" binds here
un_init_double() {
^~~~~~~~~~~~~~~~~~
1 match.
clang-query>
最后再加上我们要匹配的是构造函数,fieldDecl没赋值的。我目前没想到特别合适的,所以写了一个比较复杂的
// match record
cxxRecordDecl(
has(
// constuctor has init double fieldDecl with binaryoperator = , bind to init_double_field
cxxConstructorDecl(
hasDescendant(
binaryOperator(
hasOperatorName("="),
hasEitherOperand(memberExpr(hasDeclaration(fieldDecl(hasType(asString("double"))).bind("init_double_field"))))
)
)
)
),
has(
// match double field which didn't call binaryoperator = in constructor
fieldDecl(hasType(asString("double")), unless(equalsBoundNode("init_double_field"))).bind("un_init_double_field")
)
)
这个写起来比较复杂,但是目前看起来可以初步解决问题。但是实际上这个结果是错误的,为什么呢?因为init_double_field
实际上只匹配到了第一个init_param,即找到匹配的就返回,因此init_double_field是不充足的,如果也对un_init_param_做了赋值,那么init_doubel_field没办法匹配出来这个。因此要想办法让init_double_field的matcher绑定到每个binaryOperator上。最后我改成了下面的语法
cxxRecordDecl(
has(
cxxConstructorDecl(
forEachDescendant(
binaryOperator(
hasOperatorName("="),
hasEitherOperand(memberExpr(hasDeclaration(fieldDecl(hasType(asString("double"))).bind("init_double_field"))))
)
)
)
),
has(
fieldDecl(hasType(asString("double")), unless(equalsBoundNode("init_double_field"))).bind("un_init_double_field")
)
)
使用新的match匹配能够正确的进行检查
clang-query> match cxxRecordDecl(has(cxxConstructorDecl(forEachDescendant(binaryOperator(hasOperatorName("="),hasEitherOperand(memberExpr(hasDeclaration(fieldDecl(hasType(asString("double"))).bind("init_double_field")))))))), has(fieldDecl(hasType(asString("double")), unless(equalsBoundNode("init_double_field"))).bind("un_init_double_field")))
Match #1:
/home/qcraft/code_test/ast_dump/test.cpp:19:5: note: "init_double_field" binds here
double init_param_;
^~~~~~~~~~~~~~~~~~
/home/qcraft/code_test/ast_dump/test.cpp:6:1: note: "root" binds here
class un_init_double {
^~~~~~~~~~~~~~~~~~~~~~
/home/qcraft/code_test/ast_dump/test.cpp:18:5: note: "un_init_double_field" binds here
double un_init_param_;
^~~~~~~~~~~~~~~~~~~~~
1 match.
clang-query>
3.3 存在double成员,且比较直接用的==而不是减法
相比于第二种,感觉这个就简单很多。毕竟没有那么复杂的环境
functionDecl(
hasDescendant(
binaryOperator(
hasOperatorName("=="),
hasEitherOperand(hasType(asString("double")))
)
)
)
clang-query> match functionDecl(hasDescendant(binaryOperator(hasOperatorName("=="), hasEitherOperand(hasType(asString("double"))))))
Match #1:
/home/qcraft/code_test/ast_dump/test.cpp:11:5: note: "root" binds here
bool compare(un_init_double& other) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 match.
4 Clang-Tidy
参考链接:
- https://clang.llvm.org/extra/clang-tidy/Contributing.html
两个例子:
- 一个是检查出来,函数/类的名字以smzdm开头的
- 检查出来类,这个类要么有未初始化的double成员,要么有double成员的比较,而这个比较使用的是等号,而不是两个相减,小于某个数值
5 用clang-query来做些简单的代码分析
一些可以供参考的cpp-rule
- https://rules.sonarsource.com/cpp 源码分析软件sonar的规则
目前可以做到的
- 检测double的==比较
- 检测某些不安全的函数
- 检测boost::
唉,尴尬