2021-08-09-llvm学习 #
我最近想往我们代码库引入一个静态分析,所以需要使用clang AST matcher,所以下面的内容实际上包含两部分即AST的学习和我要用AST MATCHER解决什么东西?最后我使用Clang-Tidy建立了什么检查条件和解决了什么问题
1 AST #
参考:
- https://clang.llvm.org/docs/IntroductionToTheClangAST.html 一些基础的概念的介绍,介绍起始的概念
- https://jywhy6.zone/2020/11/27/clang-notes/#RecursiveASTVisitor-%20%E7%9B%B8%E5%85%B3 一片介绍比较详细的blog
- https://blog.csdn.net/qq_23599965/article/details/94595735 clang里面AST的基础类型的关系等东西
Clang参考的文档为:
- https://clang.llvm.org/doxygen/ 完整文档
- https://clang.llvm.org/docs/LibASTMatchersReference.html ASTMatcher的内容
简单来说
2 AST MATCHER #
可以阅读第二个链接的英文的话建议直接阅读,下面实际上就是个比较粗糙的翻译。
参考:
- https://clang.llvm.org/docs/LibASTMatchers.html 如何使用AST-Matcher
- https://clang.llvm.org/docs/LibASTMatchersReference.html 一个具体的,使用AST Matcher的内容
- https://xinhuang.github.io/posts/2015-02-08-clang-tutorial-the-ast-matcher.html 给了一个AST Matcher的例子
AST Matcher主要包含三种不同类型的match:
- Node Matchers: 用来匹配特定类型的AST节点的MATCHER
- Narrowing Matchers: 用来匹配AST节点属性的MATCHER,可以用来缩小范围
- Traversal Matchers: 用来匹配在AST节点之间遍历的MATCHER
这里有几点要注意,AST节点完全展开实际上内部有很多的隐式转换,默认的AST Matcher运行的结构为AsIs模式,这个模式要求必须精确匹配/忽略内部节点,因此如果不追求精确匹配,建议使用IgnoreUnlessSpelledInSource模式。
使用clang query的话,需要使用下面命令来修改匹配模式
set traversal IgnoreUnlessSpelledInSource
如果是C++代码,比方说修改clang-tidy的源码,使用这个源码
Finder->addMatcher(traverse(TK_IgnoreUnlessSpelledInSource,
  returnStmt(hasReturnArgument(integerLiteral(equals(0))))
  ), this);
2.1 匹配构造函数里面调用两次 #
stackoverflow上面的原链接:https://stackoverflow.com/questions/60435722/clang-ast-matchers-how-to-find-function-body-from-a-function-declaration
想匹配的代码是
class Dummy_file
{
  FILE *f1_;
  FILE *f2_;
  public:
    Dummy_file(const char* f1_name, const char* f2_name, const char * mode){
        f1_ = fopen(f1_name, mode);
        f2_ = fopen(f2_name, mode);
    }
    ~Dummy_file(){
        fclose(f1_);
        fclose(f2_);
    }
};
看上去还是比较直接的,相匹配的类别,其构造函数调用了两次fopen。对应到语法当中是有子节点,且子节点调用了两次fopen
直接看
//匹配构造函数
cxxConstructorDecl(
    //有子节点,这不不能用has是为了传递
    hasDescendant(
        //调用了函数
        callExpr(
            callee(
                //函数有名字fopen
                functionDecl(hasName("fopen"))
            )
        ).bind("fopencall")
    )
).bind("ctr")
2.2 检测比较,比较对象一个是自定的类型 #
stackoverflow上面的原链接:https://stackoverflow.com/questions/59404925/clang-ast-matcher-for-variables-compared-to-different-variable-types
代码为
typedef int my_type;
void foo()
{
       int x = 0;//this should be identified as need to be fixed
       my_type z = 0;
       if( x == z){
               //match this case
       }
}
比较的Matcher为
// Match binary operators
binaryOperator(
    // that are equality comparisons,
    hasOperatorName("=="),
    // where one side refers to a variable
    hasEitherOperand(ignoringImpCasts(declRefExpr(to(varDecl(
        // whose type is a typedef or type alias
        hasType(typedefNameDecl(
            // named "::my_type"
            hasName("::my_type"),
            // that aliases any type, which is bound to the name "aliased",
            hasType(type().bind("aliased"))))))))),
    // and where one side refers to a variable
    hasEitherOperand(ignoringImpCasts(declRefExpr(to(varDecl(
        // whose type is the same as the type bound to "aliased",
        // which is bound to the name "declToChange".
        hasType(type(equalsBoundNode("aliased")))).bind("declToChange"))))));
3 CLANG-QUERY #
这里我要先写一点东西,ast matcher是匹配到一个就完了的,也就是说
- https://firefox-source-docs.mozilla.org/code-quality/static-analysis/writing-new/clang-query.html 如何使用clang-query
- https://devblogs.microsoft.com/cppblog/exploring-clang-tooling-part-2-examining-the-clang-ast-with-clang-query/ 另一篇教学例子
单独拿出来CLANG-QUERY是为了能够验证AST或者看AST是否正常
举一个简单的例子,下面的代码
int f(int x) {
  int result = (x / 42);
  return result;
}
class un_init_double {
  public:
    un_init_double() {
      init_param_ = 0;
    }
    bool compare(un_init_double& other) {
      if (other.un_init_param_ == un_init_param_) {
        return true;
      }
      return false;
    }
  private:
    double un_init_param_;
    double init_param_;
};
先dump一下AST树看下。
- 函数f的dump就很简单。就是一个FunctionDecl(附带着ParmVarDecl,这个后面用了DeclRefExpr引用)包着一个CompoundStmt:先一个DeclStmt—VarDecl,里面用了BinaryOperator。这里注意这里面有一个ImplicitCastExpr的隐式转换(从左值到右值),然后除以一个整数字面量IntegerLiteral。最后一个ReturnStmt内部有个隐式转换从左值到右值,而且用了旧的引用DeclRefExpr
- 然后是un_init_double的定义,可以看到提供了几个生成的构造函数。
TranslationUnitDecl 0x138046208 <<invalid sloc>> <invalid sloc>
.........
|-FunctionDecl 0x1300133a0 <test.cc:1:1, line:4:1> line:1:5 f 'int (int)'
| |-ParmVarDecl 0x1300132d0 <col:7, col:11> col:11 used x 'int'
| `-CompoundStmt 0x130013608 <col:14, line:4:1>
|   |-DeclStmt 0x1300135a8 <line:2:3, col:24>
|   | `-VarDecl 0x1300134a8 <col:3, col:23> col:7 used result 'int' cinit
|   |   `-ParenExpr 0x130013588 <col:16, col:23> 'int'
|   |     `-BinaryOperator 0x130013568 <col:17, col:21> 'int' '/'
|   |       |-ImplicitCastExpr 0x130013550 <col:17> 'int' <LValueToRValue>
|   |       | `-DeclRefExpr 0x130013510 <col:17> 'int' lvalue ParmVar 0x1300132d0 'x' 'int'
|   |       `-IntegerLiteral 0x130013530 <col:21> 'int' 42
|   `-ReturnStmt 0x1300135f8 <line:3:3, col:10>
|     `-ImplicitCastExpr 0x1300135e0 <col:10> 'int' <LValueToRValue>
|       `-DeclRefExpr 0x1300135c0 <col:10> 'int' lvalue Var 0x1300134a8 'result' 'int'
`-CXXRecordDecl 0x130013628 <line:6:1, line:20:1> line:6:7 class un_init_double definition
  |-DefinitionData pass_in_registers standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
  | |-DefaultConstructor exists non_trivial user_provided
  | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
  | |-MoveConstructor exists simple trivial needs_implicit
  | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
  | |-MoveAssignment exists simple trivial needs_implicit
  | `-Destructor simple irrelevant trivial needs_implicit
  |-CXXRecordDecl 0x130013748 <col:1, col:7> col:7 implicit referenced class un_init_double
  |-AccessSpecDecl 0x1300137d8 <line:7:3, col:9> col:3 public
  |-CXXConstructorDecl 0x130013888 <line:8:5, line:10:5> line:8:5 un_init_double 'void ()'
  | `-CompoundStmt 0x130044ec0 <col:22, line:10:5>
  |   `-BinaryOperator 0x130044ea0 <line:9:7, col:21> 'double' lvalue '='
  |     |-MemberExpr 0x130044e38 <col:7> 'double' lvalue ->init_param_ 0x130044d78
  |     | `-CXXThisExpr 0x130044e28 <col:7> 'un_init_double *' implicit this
  |     `-ImplicitCastExpr 0x130044e88 <col:21> 'double' <IntegralToFloating>
  |       `-IntegerLiteral 0x130044e68 <col:21> 'int' 0
  |-CXXMethodDecl 0x130044c28 <line:11:5, line:16:5> line:11:10 compare 'bool (un_init_double &)'
  | |-ParmVarDecl 0x130013968 <col:18, col:34> col:34 used other 'un_init_double &'
  | `-CompoundStmt 0x130045030 <col:41, line:16:5>
  |   |-IfStmt 0x130044ff0 <line:12:7, line:14:7>
  |   | |-BinaryOperator 0x130044f98 <line:12:11, col:35> 'bool' '=='
  |   | | |-ImplicitCastExpr 0x130044f68 <col:11, col:17> 'double' <LValueToRValue>
  |   | | | `-MemberExpr 0x130044ef8 <col:11, col:17> 'double' lvalue .un_init_param_ 0x130044d10
  |   | | |   `-DeclRefExpr 0x130044ed8 <col:11> 'un_init_double' lvalue ParmVar 0x130013968 'other' 'un_init_double &'
  |   | | `-ImplicitCastExpr 0x130044f80 <col:35> 'double' <LValueToRValue>
  |   | |   `-MemberExpr 0x130044f38 <col:35> 'double' lvalue ->un_init_param_ 0x130044d10
  |   | |     `-CXXThisExpr 0x130044f28 <col:35> 'un_init_double *' implicit this
  |   | `-CompoundStmt 0x130044fd8 <col:51, line:14:7>
  |   |   `-ReturnStmt 0x130044fc8 <line:13:9, col:16>
  |   |     `-CXXBoolLiteralExpr 0x130044fb8 <col:16> 'bool' true
  |   `-ReturnStmt 0x130045020 <line:15:7, col:14>
  |     `-CXXBoolLiteralExpr 0x130045010 <col:14> 'bool' false
  |-AccessSpecDecl 0x130044cd0 <line:17:3, col:10> col:3 private
  |-FieldDecl 0x130044d10 <line:18:5, col:12> col:12 referenced un_init_param_ 'double'
  `-FieldDecl 0x130044d78 <line:19:5, col:12> col:12 referenced init_param_ 'double'
ld: file too small (length=0) file '/var/folders/70/sz5vlj3n0qj2t4ktcw4qbpy00000gn/T/test-e8b440.o' for architecture arm64
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
希望找到一下几个地方:
- 返回值为int的函数
- 存在double成员,且未显式初始化的类别
- 存在double成员,且比较直接用的==而不是double减法比较
使用clang-query查看文件
clang-query test.cc --
3.1 匹配返回int的函数 #
首先看第一个匹配,返回值为int的函数,下面的匹配能够拿到这个结果。可是在老版本的clang-query(llvm 13)的版本会报错说没有hasReturnTyoeLoc这个matcher。老版本的clang-query怎么办呢?看第二个方法
# 这种方法在clang13 里面用不了
clang-query> match functionDecl(hasReturnTypeLoc(loc(asString("int"))))
# 这种理论上可以用,但是我还没有测试
clang-query> match functionDecl(returns(asString("int")))
3.2 匹配存在double成员,且未显式初始化的类别 #
这里面我参考了这篇文章:Detecting Uninitialized Variables in C++ with the Clang Static Analyzer∗
我理解这里未显式初始化匹配实际上就是匹配构造函数没有给double成员赋值。那么第一个思路类别的属性匹配,且没有调用=号。这个时候需要注意,我们需要引入逻辑运算属性matcher:allOf, anyOf, anything and unless。
一步一步来
- 构造函数调用了binaryOperator"=“号
- 匹配函数体里面有属性表达式的构造函数
# 匹配调用了=的构造函数
clang-query> match cxxConstructorDecl(hasDescendant(binaryOperator(hasOperatorName("="))))
Match #1:
/home/qcraft/code_test/ast_dump/test.cpp:8:5: note: "root" binds here
    un_init_double() {
    ^~~~~~~~~~~~~~~~~~
1 match.
clang-query> 
#匹配函数体里面有属性表达式的构造函数
clang-query> match cxxConstructorDecl(hasDescendant(memberExpr()))
Match #1:
/home/qcraft/code_test/ast_dump/test.cpp:8:5: note: "root" binds here
    un_init_double() {
    ^~~~~~~~~~~~~~~~~~
1 match.
clang-query> 
最后再加上我们要匹配的是构造函数,fieldDecl没赋值的。我目前没想到特别合适的,所以写了一个比较复杂的
// match record
cxxRecordDecl(
  has(
    // constuctor has init double fieldDecl with binaryoperator = , bind to init_double_field
    cxxConstructorDecl(
      hasDescendant(
        binaryOperator(
          hasOperatorName("="),
          hasEitherOperand(memberExpr(hasDeclaration(fieldDecl(hasType(asString("double"))).bind("init_double_field"))))
        )
      )
    )
  ),
  has(
    // match double field which didn't call binaryoperator = in constructor
    fieldDecl(hasType(asString("double")), unless(equalsBoundNode("init_double_field"))).bind("un_init_double_field")
  )
)
这个写起来比较复杂,但是目前看起来可以初步解决问题。但是实际上这个结果是错误的,为什么呢?因为init_double_field实际上只匹配到了第一个init_param,即找到匹配的就返回,因此init_double_field是不充足的,如果也对un_init_param_做了赋值,那么init_doubel_field没办法匹配出来这个。因此要想办法让init_double_field的matcher绑定到每个binaryOperator上。最后我改成了下面的语法
cxxRecordDecl(
  has(
    cxxConstructorDecl(
      forEachDescendant(
        binaryOperator(
          hasOperatorName("="),
          hasEitherOperand(memberExpr(hasDeclaration(fieldDecl(hasType(asString("double"))).bind("init_double_field"))))
        )
      )
    )
  ), 
  has(
      fieldDecl(hasType(asString("double")), unless(equalsBoundNode("init_double_field"))).bind("un_init_double_field")
  )
)
使用新的match匹配能够正确的进行检查
clang-query> match cxxRecordDecl(has(cxxConstructorDecl(forEachDescendant(binaryOperator(hasOperatorName("="),hasEitherOperand(memberExpr(hasDeclaration(fieldDecl(hasType(asString("double"))).bind("init_double_field")))))))), has(fieldDecl(hasType(asString("double")), unless(equalsBoundNode("init_double_field"))).bind("un_init_double_field")))
Match #1:
/home/qcraft/code_test/ast_dump/test.cpp:19:5: note: "init_double_field" binds here
    double init_param_;
    ^~~~~~~~~~~~~~~~~~
/home/qcraft/code_test/ast_dump/test.cpp:6:1: note: "root" binds here
class un_init_double {
^~~~~~~~~~~~~~~~~~~~~~
/home/qcraft/code_test/ast_dump/test.cpp:18:5: note: "un_init_double_field" binds here
    double un_init_param_;
    ^~~~~~~~~~~~~~~~~~~~~
1 match.
clang-query> 
3.3 存在double成员,且比较直接用的==而不是减法 #
相比于第二种,感觉这个就简单很多。毕竟没有那么复杂的环境
functionDecl(
    hasDescendant(
        binaryOperator(
            hasOperatorName("=="), 
            hasEitherOperand(hasType(asString("double")))
        )
    )
)
clang-query> match functionDecl(hasDescendant(binaryOperator(hasOperatorName("=="), hasEitherOperand(hasType(asString("double"))))))
Match #1:
/home/qcraft/code_test/ast_dump/test.cpp:11:5: note: "root" binds here
    bool compare(un_init_double& other) {
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 match.
4 Clang-Tidy #
参考链接:
两个例子:
- 一个是检查出来,函数/类的名字以smzdm开头的
- 检查出来类,这个类要么有未初始化的double成员,要么有double成员的比较,而这个比较使用的是等号,而不是两个相减,小于某个数值
5 用clang-query来做些简单的代码分析 #
一些可以供参考的cpp-rule
- https://rules.sonarsource.com/cpp 源码分析软件sonar的规则
目前可以做到的
- 检测double的==比较
- 检测某些不安全的函数
- 检测boost::
唉,尴尬
  