Skip to content

Integrating LIEF and Ghidra's C++ Decompiler Engine

Article Thumbnail

Integrating LIEF and Ghidra’s C++ Decompiler Engine

Section titled “Integrating LIEF and Ghidra’s C++ Decompiler Engine”

For DaiC, our modern collaborative decompiler assisted by AI, we combined the physical parsing capabilities of LIEF with the powerful semantic decompilation of Ghidra. In this article, we will explore exactly how to compile Ghidra’s libdecomp into a manageable library via CMake. We will also dissect the minimal C++ classes required to interface Ghidra’s theoretical memory map with LIEF’s physical binary parsing. Finally, we will use LIEF to parse a PE executable, and then feed that data directly into the Ghidra decompiler engine. You can use DaiCParse and LiefGhidraEngine to follow this article

  1. Building Ghidra
  2. Overriding libdecomp Classes
  3. Initialize Ghidra Engine
  4. Decompile a Function
  5. Conclusion & Future Work

To extract libdecomp as a standalone library, our CMake configuration requires a few specific dependencies. Crucially, we need the ZLIB package (via find_package(ZLIB REQUIRED)) to unpack compressed SLEIGH specification files. Our other dependencies include LIEF for binary parsing, pugixml for lightweight XML processing, and DaiCParse for extended utility functions.

target_link_libraries(${TARGET_NAME} LIEF::LIEF)
target_link_libraries(${TARGET_NAME} ZLIB::ZLIB)
target_link_libraries(${TARGET_NAME} pugixml)
target_link_libraries(${TARGET_NAME} DaiCParse)

Ghidra’s libdecomp was originally engineered to function both as a standalone command-line diagnostic utility and as a backend processing server communicating with the Ghidra Java front-end via a custom binary protocol. To build it cleanly, we must target the source directory at Ghidra/Features/Decompiler/src/decompile/cpp and actively filter out conflicting main functions.

set(GHIDRA_SRC_DIR "${CMAKE_CURRENT_SOURCE_DIR}/ghidra/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp")
file(GLOB GHIDRA_SOURCES "${GHIDRA_SRC_DIR}/*.cc")
# Filter out conflicting mains
list(FILTER GHIDRA_SOURCES EXCLUDE REGEX ".*consolemain\\.cc$")
list(FILTER GHIDRA_SOURCES EXCLUDE REGEX ".*slgh_compile\\.cc$")
list(FILTER GHIDRA_SOURCES EXCLUDE REGEX ".*testfunction\\.cc$")

Ghidra relies on P-code, a pseudo-code that is translated from raw assembly bytes using the SLEIGH translator. Therefore, we must compile the ghidra/Ghidra/Processors files to generate the .sla files required by the engine.

file(GLOB_RECURSE SLASPEC_FILES "ghidra/Ghidra/Processors/*.slaspec")
file(GLOB_RECURSE CPEC_FILES "ghidra/Ghidra/Processors/*.cspec")
file(GLOB_RECURSE LDEFS_FILES "ghidra/Ghidra/Processors/*.ldefs")
file(GLOB_RECURSE PSPEC_FILES "ghidra/Ghidra/Processors/*.pspec")
set(SLAFILES "")
set(SLEIGH_BASE "${CMAKE_CURRENT_BINARY_DIR}/sleigh")
file(MAKE_DIRECTORY "${SLEIGH_BASE}")
foreach(slaspec ${SLASPEC_FILES})
get_filename_component(sleigh_name "${slaspec}" NAME_WE)
get_filename_component(sleigh_dir "${slaspec}" DIRECTORY)
set(sla_file "${SLEIGH_BASE}/${sleigh_name}.sla")
add_custom_command(OUTPUT "${sla_file}"
COMMAND sleighc "${slaspec}" "${sla_file}"
MAIN_DEPENDENCY "${slaspec}"
WORKING_DIRECTORY "${sleigh_dir}"
DEPENDS sleighc)
list(APPEND SLAFILES "${sla_file}")
endforeach()
add_custom_target(sla ALL DEPENDS ${SLAFILES})

Link to the rz-ghidra snippet

To force libdecomp to analyze a static binary parsed by LIEF, we need to implement concrete subclasses for several abstract C++ interfaces.

Ghidra asks for bytes; we must provide them. By creating a LiefLoadImage class, we override the loadFill(uint1 *ptr, int4 size, const Address &addr) method. When Ghidra asks for 16 bytes at virtual address 0x401000, our LiefLoadImage queries the LIEF Binary object, locates the correct section, and copies the data into the pointer. If the memory doesn’t exist, we throw a DataUnavailError.

The SleighArchitecture (derived from the Architecture base class) manages the entire decompilation lifecycle. We must create a LiefArchitecture class to hijack the default Ghidra setup pipeline and inject our LIEF-backed components.

As Ghidra lifts P-Code, it encounters memory references and function calls, requiring it to query the scope. We must override findAddr (to look up addresses) and findContainer (to locate the smallest symbol encompassing a memory range).

Symbol* LiefScope::queryLief(const Address& addr) const
{
if (addr.getSpace() != arch->getDefaultCodeSpace()) return nullptr;
uintptr_t offset = addr.getOffset();
const Binary::Function* fcn = bin->getFunctionAtAdress(offset);
if (fcn) {
return registerFunction(fcn);
}
return nullptr;
}
Symbol* LiefScope::findAddr(const Address& addr) const
{
Symbol* sym = queryLief(addr);
return sym ? sym->getMapEntry(addr) : nullptr;
}
SymbolEntry* LiefScope::findContainer(const Address& addr, int4 size,
const Address& usepoint) const
{
if (!entry) {
Symbol* sym = queryLief(addr);
entry = sym ? sym->getMapEntry(addr) : nullptr;
} else {
uintb last = entry->getAddr().getOffset() + entry->getSize() - 1;
if (last < addr.getOffset() + size - 1) return nullptr;
}
return entry;
}

Here is a graph helping you understand how our Architecture work:

Architecture


The entry point to the decompiler library requires a global initialization phase. This is typically implemented as a call to ghidra::startDecompilerLibrary(const char *), which expects the path to the directory containing your compiled .sla files.

Throughout execution, the decompiler relies on XML-like Document Object Model (DOM) structures for state management, represented by the ghidra::DocumentStorage class. When initialized, the Sleigh translator parses the .sla file into this storage model, building an in-memory database of registers, address spaces, and P-code operations.

Here is how we bring the pieces together to initialize the engine in C++:

// 1. Initialize the global library context
ghidra::startDecompilerLibrary("/path/to/sleigh_compiled_dir/");
// 2. Parse LIEF Binary
std::unique_ptr<Binary> binary = Binary("malware.exe");
// 3. Initialize the Ghidra engine
auto arch =
std::make_shared<LiefArchitecture>("lief_analysis", "default", binary);
auto store = std::make_shared<ghidra::DocumentStorage>();
try {
arch->init(*store);
} catch (const ghidra::LowlevelError& e) {
std::cout << std::string("/* Error initializing Ghidra: ") + e.explain +
" */"
<< std::endl;
} catch (const std::exception& e) {
std::cout << std::string("/* Error loading spec files: ") + e.what() +
"*/"
<< std::endl;
}

When a specific virtual address range is targeted, the engine generates a Funcdata object. This object is the fundamental container for a single function’s analysis lifecycle, storing intra-function control flow and data flow graphs.

The Sleigh translator decodes the raw bytes provided by LoadImage into sequences of P-code operations. The engine then applies transformational algorithms (referred to as Action objects) to optimize the graph. Finally, the PrintC interface traverses the tree and emits the high-level C pseudocode.

Here is the code required to trigger this pipeline:

// 4. Define the Function Symbol
ghidra::Address func_addr = arch->getAddress(func.getStart());
ghidra::string func_name = func.getName();
if (func_name.empty()) func_name = "sub_" + std::to_string(func.getStart());
ghidra::Scope* global_scope = arch->symboltab->getGlobalScope();
ghidra::Funcdata* data = global_scope->findFunction(
ghidra::Address(arch->getDefaultCodeSpace(), func_addr.getOffset()));
if (!data) throw std::runtime_error("Function not found");
// 5. Run the Analysis
try {
arch->allacts.getCurrent()->perform(*data);
} catch (const ghidra::LowlevelError& e) {
return std::string("/* Decompilation Failed: ") + e.explain + " */";
}
// 6. Generate Output (Print to C)
try {
ghidra::PrintLanguage* printer = arch->buildLanguage("c-language");
std::stringstream ss;
printer->setOutputStream(&ss);
printer->docFunction(data);
delete printer;
return ss.str();
} catch (const std::exception& e) {
return std::string("/* Error generating C output: ") + e.what() + " */";
}

This architecture establishes a foundation for unparalleled headless analysis. You can use our LiefGhidraDecompile template to create your project:

  • Wrap the C++ engine in Pybind11 for vulnerability researchers that would write Python scripts that seamlessly utilize LIEF.
  • Build an headless lightweight cli tool to decompile binary via the terminal.

However, there is still work to be done. Currently, we need to bridge more information regarding LIEF Functions, as the Binary class does not natively expose internal variables, stack layouts, or function prototypes.

Ultimately, this parser-decompiler bridge feeds directly into our final project, DaiC. DaiC aims to go beyond the semantic decompilation of Ghidra by integrating AI decompilation models to produce highly readable, context-aware high-level pseudo code.

[https://spinsel.dev/2021/04/02/ghidra-decompiler-debugging.html] [https://github.com/rizinorg/rz-ghidra] [https://daic.re]


This blog got written by: