One of Google Cloud\'s major missions is to arm security professionals with modern tools to help them defend against the latest threats. Part of that mission involves moving closer to a more autonomous, adaptive approach in threat intelligence automation.
In our latest advancements in malware analysis, we\'re equipping Gemini with new capabilities to address obfuscation techniques and obtain real-time insights on indicators of compromise (IOCs). By integrating the Code Interpreter extension, Gemini can now dynamically create and execute code to help deobfuscate specific strings or code sections, while Google Threat Intelligence (GTI) function calling enables it to query GTI for additional context on URLs, IPs, and domains found within malware samples. These tools are a step toward transforming Gemini into a more adaptive agent for malware analysis, enhancing its ability to interpret obfuscated elements and gather contextual information based on the unique characteristics of each sample.
Building on this foundation, we previously explored critical preparatory steps with Gemini 1.5 Pro, leveraging its expansive 2-million-token input window to process substantial sections of decompiled code in a single pass. To further enhance scalability, we introduced Gemini 1.5 Flash, incorporating automated binary unpacking through Mandiant Backscatter before the decompilation phase to tackle certain obfuscation techniques. Yet, as any seasoned malware analyst knows, the true challenge often begins once the code is exposed. Malware developers frequently employ obfuscation tactics to conceal critical IOCs and underlying logic. Malware may also download additional malicious code, making it challenging to fully understand the behavior of a given sample.
For large language models (LLMs), obfuscation techniques and additional payloads create unique challenges. When dealing with obfuscated strings such as URLs, IPs, domains, or file names, LLMs often “hallucinate” without explicit decoding methods. Additionally, LLMs cannot access, for example, URLs that host additional payloads, often resulting in speculative interpretations about the sample\'s behavior.
To help with these challenges, Code Interpreter and GTI function calling tools provide targeted solutions. Code Interpreter enables Gemini to autonomously create and execute custom scripts, as needed, using its own judgment to decode obfuscated elements within a sample, such as strings encoded with XOR-based algorithms. This capability minimizes interpretation errors and enhances Gemini\'s ability to reveal hidden logic without requiring manual intervention.