Integrating LIEF and Ghidra's C++ Decompiler Engine

Integrating LIEF and Ghidra’s C++ Decompiler Engine
Section titled “Integrating LIEF and Ghidra’s C++ Decompiler Engine”For DaiC, our modern collaborative decompiler assisted by AI, we combined the physical parsing capabilities of LIEF with the powerful semantic decompilation of Ghidra. In this article, we will explore exactly how to compile Ghidra’s libdecomp into a manageable library via CMake. We will also dissect the minimal C++ classes required to interface Ghidra’s theoretical memory map with LIEF’s physical binary parsing. Finally, we will use LIEF to parse a PE executable, and then feed that data directly into the Ghidra decompiler engine. You can use DaiCParse and LiefGhidraEngine to follow this article
Table of Contents
Section titled “Table of Contents”- Building Ghidra
- Overriding libdecomp Classes
- Initialize Ghidra Engine
- Decompile a Function
- Conclusion & Future Work
1. Building Ghidra
Section titled “1. Building Ghidra”To extract libdecomp as a standalone library, our CMake configuration requires a few specific dependencies. Crucially, we need the ZLIB package (via find_package(ZLIB REQUIRED)) to unpack compressed SLEIGH specification files. Our other dependencies include LIEF for binary parsing, pugixml for lightweight XML processing, and DaiCParse for extended utility functions.
target_link_libraries(${TARGET_NAME} LIEF::LIEF)target_link_libraries(${TARGET_NAME} ZLIB::ZLIB)target_link_libraries(${TARGET_NAME} pugixml)target_link_libraries(${TARGET_NAME} DaiCParse)Ghidra’s libdecomp was originally engineered to function both as a standalone command-line diagnostic utility and as a backend processing server communicating with the Ghidra Java front-end via a custom binary protocol. To build it cleanly, we must target the source directory at Ghidra/Features/Decompiler/src/decompile/cpp and actively filter out conflicting main functions.
set(GHIDRA_SRC_DIR "${CMAKE_CURRENT_SOURCE_DIR}/ghidra/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp")file(GLOB GHIDRA_SOURCES "${GHIDRA_SRC_DIR}/*.cc")
# Filter out conflicting mainslist(FILTER GHIDRA_SOURCES EXCLUDE REGEX ".*consolemain\\.cc$")list(FILTER GHIDRA_SOURCES EXCLUDE REGEX ".*slgh_compile\\.cc$")list(FILTER GHIDRA_SOURCES EXCLUDE REGEX ".*testfunction\\.cc$")Compiling the SLEIGH Translation Files
Section titled “Compiling the SLEIGH Translation Files”Ghidra relies on P-code, a pseudo-code that is translated from raw assembly bytes using the SLEIGH translator. Therefore, we must compile the ghidra/Ghidra/Processors files to generate the .sla files required by the engine.
file(GLOB_RECURSE SLASPEC_FILES "ghidra/Ghidra/Processors/*.slaspec")file(GLOB_RECURSE CPEC_FILES "ghidra/Ghidra/Processors/*.cspec")file(GLOB_RECURSE LDEFS_FILES "ghidra/Ghidra/Processors/*.ldefs")file(GLOB_RECURSE PSPEC_FILES "ghidra/Ghidra/Processors/*.pspec")set(SLAFILES "")set(SLEIGH_BASE "${CMAKE_CURRENT_BINARY_DIR}/sleigh")file(MAKE_DIRECTORY "${SLEIGH_BASE}")foreach(slaspec ${SLASPEC_FILES}) get_filename_component(sleigh_name "${slaspec}" NAME_WE) get_filename_component(sleigh_dir "${slaspec}" DIRECTORY) set(sla_file "${SLEIGH_BASE}/${sleigh_name}.sla") add_custom_command(OUTPUT "${sla_file}" COMMAND sleighc "${slaspec}" "${sla_file}" MAIN_DEPENDENCY "${slaspec}" WORKING_DIRECTORY "${sleigh_dir}" DEPENDS sleighc) list(APPEND SLAFILES "${sla_file}")endforeach()add_custom_target(sla ALL DEPENDS ${SLAFILES})2. Overriding libdecomp Classes
Section titled “2. Overriding libdecomp Classes”To force libdecomp to analyze a static binary parsed by LIEF, we need to implement concrete subclasses for several abstract C++ interfaces.
The LoadImage Interface
Section titled “The LoadImage Interface”Ghidra asks for bytes; we must provide them. By creating a LiefLoadImage class, we override the loadFill(uint1 *ptr, int4 size, const Address &addr) method. When Ghidra asks for 16 bytes at virtual address 0x401000, our LiefLoadImage queries the LIEF Binary object, locates the correct section, and copies the data into the pointer. If the memory doesn’t exist, we throw a DataUnavailError.
SleighArchitecture and Scope
Section titled “SleighArchitecture and Scope”The SleighArchitecture (derived from the Architecture base class) manages the entire decompilation lifecycle. We must create a LiefArchitecture class to hijack the default Ghidra setup pipeline and inject our LIEF-backed components.
As Ghidra lifts P-Code, it encounters memory references and function calls, requiring it to query the scope. We must override findAddr (to look up addresses) and findContainer (to locate the smallest symbol encompassing a memory range).
Symbol* LiefScope::queryLief(const Address& addr) const{ if (addr.getSpace() != arch->getDefaultCodeSpace()) return nullptr; uintptr_t offset = addr.getOffset(); const Binary::Function* fcn = bin->getFunctionAtAdress(offset); if (fcn) { return registerFunction(fcn); } return nullptr;}
Symbol* LiefScope::findAddr(const Address& addr) const{ Symbol* sym = queryLief(addr); return sym ? sym->getMapEntry(addr) : nullptr;}
SymbolEntry* LiefScope::findContainer(const Address& addr, int4 size, const Address& usepoint) const{ if (!entry) { Symbol* sym = queryLief(addr); entry = sym ? sym->getMapEntry(addr) : nullptr; } else { uintb last = entry->getAddr().getOffset() + entry->getSize() - 1; if (last < addr.getOffset() + size - 1) return nullptr; } return entry;}Here is a graph helping you understand how our Architecture work:

3. Initialize Ghidra Engine
Section titled “3. Initialize Ghidra Engine”The entry point to the decompiler library requires a global initialization phase. This is typically implemented as a call to ghidra::startDecompilerLibrary(const char *), which expects the path to the directory containing your compiled .sla files.
Throughout execution, the decompiler relies on XML-like Document Object Model (DOM) structures for state management, represented by the ghidra::DocumentStorage class. When initialized, the Sleigh translator parses the .sla file into this storage model, building an in-memory database of registers, address spaces, and P-code operations.
Here is how we bring the pieces together to initialize the engine in C++:
// 1. Initialize the global library contextghidra::startDecompilerLibrary("/path/to/sleigh_compiled_dir/");
// 2. Parse LIEF Binarystd::unique_ptr<Binary> binary = Binary("malware.exe");
// 3. Initialize the Ghidra engineauto arch = std::make_shared<LiefArchitecture>("lief_analysis", "default", binary);auto store = std::make_shared<ghidra::DocumentStorage>();try { arch->init(*store);} catch (const ghidra::LowlevelError& e) { std::cout << std::string("/* Error initializing Ghidra: ") + e.explain + " */" << std::endl;} catch (const std::exception& e) { std::cout << std::string("/* Error loading spec files: ") + e.what() + "*/" << std::endl;}4. Decompile a Function
Section titled “4. Decompile a Function”When a specific virtual address range is targeted, the engine generates a Funcdata object. This object is the fundamental container for a single function’s analysis lifecycle, storing intra-function control flow and data flow graphs.
The Sleigh translator decodes the raw bytes provided by LoadImage into sequences of P-code operations. The engine then applies transformational algorithms (referred to as Action objects) to optimize the graph. Finally, the PrintC interface traverses the tree and emits the high-level C pseudocode.
Here is the code required to trigger this pipeline:
// 4. Define the Function Symbolghidra::Address func_addr = arch->getAddress(func.getStart());ghidra::string func_name = func.getName();if (func_name.empty()) func_name = "sub_" + std::to_string(func.getStart());ghidra::Scope* global_scope = arch->symboltab->getGlobalScope();ghidra::Funcdata* data = global_scope->findFunction( ghidra::Address(arch->getDefaultCodeSpace(), func_addr.getOffset()));if (!data) throw std::runtime_error("Function not found");
// 5. Run the Analysistry { arch->allacts.getCurrent()->perform(*data);} catch (const ghidra::LowlevelError& e) { return std::string("/* Decompilation Failed: ") + e.explain + " */";}
// 6. Generate Output (Print to C)try { ghidra::PrintLanguage* printer = arch->buildLanguage("c-language");
std::stringstream ss; printer->setOutputStream(&ss);
printer->docFunction(data);
delete printer; return ss.str();} catch (const std::exception& e) { return std::string("/* Error generating C output: ") + e.what() + " */";}5. Conclusion & Future Work
Section titled “5. Conclusion & Future Work”This architecture establishes a foundation for unparalleled headless analysis. You can use our LiefGhidraDecompile template to create your project:
- Wrap the C++ engine in Pybind11 for vulnerability researchers that would write Python scripts that seamlessly utilize LIEF.
- Build an headless lightweight cli tool to decompile binary via the terminal.
However, there is still work to be done. Currently, we need to bridge more information regarding LIEF Functions, as the Binary class does not natively expose internal variables, stack layouts, or function prototypes.
Ultimately, this parser-decompiler bridge feeds directly into our final project, DaiC. DaiC aims to go beyond the semantic decompilation of Ghidra by integrating AI decompilation models to produce highly readable, context-aware high-level pseudo code.
6. External links
Section titled “6. External links”[https://spinsel.dev/2021/04/02/ghidra-decompiler-debugging.html] [https://github.com/rizinorg/rz-ghidra] [https://daic.re]
This blog got written by: