RetDec v4.0 is out

栏目: IT技术 · 发布时间: 3年前

内容简介:Since its initial public release in December 2017, we have released three other stable versions:Now, we are glad to announce a new version

RetDec is an open-source machine-code decompiler based on LLVM. It isn’t limited by a target architecture, operating system, or executable file format:

  • Runs on Windows, Linux, and macOS.
  • Supports all the major object-file formats: Windows PE, Unix ELF, macOS Mach-O.
  • Supports all the prevailing architectures: x86, x64, arm, arm64, mips, powerpc.

Since its initial public release in December 2017, we have released three other stable versions:

  • v3.0 — The initial public release.
  • v3.1 — Added macOS support, simplified the repository structure , reimplemented recursive traversal decoder.
  • v3.2 — Replaced all shell scripts with Python and thus made the usage much simpler.
  • v3.3 — Added x64 architecture, added FreeBSD support (maintainted by the community), deployed a new LLVM-IR-to-BIR converter

Now, we are glad to announce a new version 4.0 release with the following major features:

retdec

See changelog for the complete list of new features, enhancements, and fixes.

1. arm64 architecture

This one is clear — now you can decompile arm64 binary files with RetDec!

Adding a new architecture is isolated to the capstone2llvmir library. Thus, it is doable with little knowledge about the rest of RetDec. In fact, the library already also supports mips64 and powerpc64. These aren’t yet enabled by RetDec itself because we haven’t got around to adequately test them. Any architecture included in Capstone could be implemented. We even put together a how-to-do-it wiki page so that anyone can contribute.

2. JSON output option

As one would expect, RetDec by default produces a C source code as its output. This is fine for consumption by humans, but what if another program wants to make use of it? Parsing high-level-language source code isn’t trivial. Furthermore, additional meta-information may be required to enhance user experience or automated analysis — information that is hard to convey in a traditional high-level language.

For this reason, we added an option to generate the output as a sequence of annotated lexer tokens. Two output formats are possible:

-f json-human
-f json

This means that if you run retdec-decompiler.py -f json-human input , you get the following output:

{
    "tokens": [
        { "addr": "0x804851c" },
        { "kind": "i_var", "val": "result" },
        { "addr": "0x804854c" },
        { "kind": "ws", "val": " " },
        { "kind": "op", "val": "=" },
        { "kind": "ws", "val": " " },
        { "kind": "i_var", "val": "ack" },
        { "kind": "punc", "val": "(" },
        { "kind": "i_var", "val": "m" },
        { "kind": "ws", "val": " " },
        { "kind": "op", "val": "-" },
        { "kind": "ws", "val": " " },
        { "kind": "l_int", "val": "1" },
        { "kind": "op", "val": "," },
        { "kind": "ws", "val": " " },
        { "kind": "l_int", "val": "1" },
        { "kind": "punc", "val": ")" },
        { "kind": "punc", "val": ";" }
    ],
    "language": "C"
}

instead of this one:

result = ack(m - 1, 1);

In addition to the source-code token values, there is meta-information on token types, and even assembly instruction addresses from which these tokens were generated. The addresses are on a per-command basis at the moment, but we plan to make them even more granular in the future. See the Decompiler outputs wiki page for more details.

JSON output option is currently used in RetDec’s Radare2 plugin and an upcoming IDA plugin v1.0 . Feel free to use it in your projects as well.

3. New build system

RetDec is a collection of libraries, executables, and resources. Chained together in a script, we get the decompiler itself — retdec-decompiler.py . But what about all the individual components? Couldn’t they be useful on their own?

Most definitely they could!

Until now the RetDec components weren’t easy to use. As of version 4.0, the installation contains all the resources necessary to utilize them in other CMake projects.

If RetDec is installed into a standard system location (e.g. /usr ), its library components can be used as simply as:

find_package(retdec 4.0 REQUIRED 
   COMPONENTS 
      <component> 
      [...]
)
target_link_libraries(your-project 
   PUBLIC 
      retdec::<component> 
      [...]
)

If it isn’t installed somewhere where it can be discovered, CMake needs help before find_package() is used. There are generally two ways to do it:

  1. Add the RetDec installation directory to CMAKE_PREFIX_PATH
  2. list(APPEND CMAKE_PREFIX_PATH ${RETDEC_INSTALL_DIR})
  3. Set the path to installed RetDec CMake scripts to retdec_DIR
  4. set(retdec_DIR ${RETDEC_INSTALL_DIR}/share/retdec/cmake)

It is also possible to configure the build system to produce only the selected component(s). This can significantly speed up compilation. The desired components can be enabled at CMake-configuration time by one of these parameters:

-D RETDEC_ENABLE_<component>=ON [...]
-D RETDEC_ENABLE=component[,...]

See Repository Overview for the list of available RetDec components, retdec-build-system-tests for component demos, and Build Instructions for the list of possible CMake options.

4. retdec library

Well, now that we can use various RetDec libraries, can we use the whole RetDec decompiler as a library?

Not yet. But we should!

In fact, the vast majority of RetDec functionality is in libraries as it is. The retdec-decompiler.py script and other related scripts are just putting it all together. But they are kinda remnants of the past. There is no reason why even the decompilation itself couldn’t be provided by a library. Then, we could use it in various front-ends, replacing hacked-together Python scripts. Other prime users would be the already mentioned RetDec’s IDA and Radare2 plugins.

We aren’t there yet, but version 4.0 moves in this direction. It adds a new library called retdec , which will eventually implement a comprehensive decompilation interface. As a first step, it currently offers a disassembling functionality. That is a full recursive traversal decoding of a given input file into an LLVM IR module and structured (functions & basic blocks) Capstone disassembly.

It also provides us with a good opportunity to demonstrate most of the things this article talked about. The following source code is all that’s needed to get to a complete LLVM IR and Capstone disassembly of an input file:

#include <iostream>
#include <retdec/retdec/retdec.h>
#include <retdec/llvm/Support/raw_ostream.h>

int main(int argc, char* argv[])
{
   if (argc != 2)
   {
      llvm::errs() << "Expecting path to input\n";
      return 1;
   }
   std::string input = argv[1];

   retdec::common::FunctionSet fs;
   retdec::LlvmModuleContextPair llvm = 
      retdec::disassemble(input, &fs);

   // Dump entire LLVM IR module.
   llvm::outs() << *llvm.module;
    
   // Dump functions, basic blocks, instructions.
   for (auto& f : fs)
   {
      llvm::outs() << f.getName() << " @ " << f << "\n";
      for (auto& bb : f.basicBlocks)
      {
         llvm::outs() << "\t" << "bb @ " << bb << "\n";
         // These are not only text entries.
         // There is a full Capstone instruction.
         for (auto* i : bb.instructions)
         {
            llvm::outs() << "\t\t"
                << retdec::common::Address(i->address) 
                << ": " << i->mnemonic
                << " " << i->op_str
                << "\n";
         }
      }
   }
    
   return 0;
}

The CMake script building it looks simply like this:

cmake_minimum_required(VERSION 3.6)
project(demo)

find_package(retdec 4.0 REQUIRED 
   COMPONENTS 
      retdec 
      llvm
)

add_executable(demo demo.cpp)
target_link_libraries(demo 
   retdec::retdec
   retdec::deps::llvm
)

If RetDec is installed somewhere where it can be discovered, the demo can be built simply with:

cmake ..
make

If it is not, one option is to set the path to installed CMake scripts:

cmake .. -Dretdec_DIR=$RETDEC_INSTALL_DIR/share/retdec/cmake
make

If we are building RetDec ourselves, we can configure CMake to enable only the retdec library with cmake .. -DRETDEC_ENABLE_RETDEC=ON .

What’s next?

We believe that for effective and efficient manual malware analysis it is best to selectively decompile only the interesting functions. Interact with the results, and gradually compose an understanding of the inspected binary. Such a workflow is enabled by RetDec’s IDA and Radare2 plugins, but no so much by its native one-off mode of operation. Especially when performance on medium-to-large files is still an ongoing issue. We also believe in the ever-increasing role of advanced automated malware analysis.

For these reasons, RetDec will move further in the direction outlined in the previous section. Having all the decompilation functionality available in a set of libraries will enable us to build better tools for both manual and automated malware analysis.

Reversing tools series

With this introductory piece, we are starting a series of articles focused on engineering behind reversing. So, if you are interested in the inner workings of such tools, then do look out for new posts in here!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

我用微软改变世界

我用微软改变世界

保罗·艾伦 / 吴果锦 / 浙江人民出版社 / 2012-3 / 46.00元

《我用微软改变世界(微软联合创始人保罗•艾伦回忆录)》内容简介:1975年,两个从大学退学的男孩夜以继日地设计一款软件。其中一个男孩就是后来的世界首富比尔盖茨,而另外一个则作为盖茨背后的男人,一直生活在盖茨的阴影里,其实,他的人生经历远比盖茨更为传奇和丰富。 16岁,与比尔盖茨在顶级名校湖畔中学相遇,成为最佳拍档,无数趣事,无数闹腾,高呼“处男万岁”还不够,还得意扬扬把这话刻在碑上留给学弟们......一起来看看 《我用微软改变世界》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

URL 编码/解码
URL 编码/解码

URL 编码/解码