Operator X: Bridging the Gap Between Intent and Knowledge with Cutting-Edge LLMs
07:05:2024
BY Ed Sealing
Cyber operators in the Department of Defense (DoD) combat some of the most sophisticated cyber actors in the world. Cyber protection teams (CPTs) tend to be tasked with building, operating, maintaining, and troubleshooting complex, high-performance sensor/analytics systems during a mission. Though current cyber military occupational specialist (MOS) and joint cyber analysis course (JCAC) training provide excellent knowledge in cyber analysis, a gap exists in fundamental engineering and administration experience. Compound this with the fact that teams that operate overseas or on classified environments often do so with limited internet access.
At SealingTech, we’ve been working on a solution where generative-AI can augment the knowledge gap experienced by cyber defenders and analysts regardless of mission classifications and sensitivities along with 100% offline capability. While many Generative-AI systems are being built to automate, we believe that augmenting the operator, while keeping them fully in control of the actions (e.g. Human-in-the-loop) is the best approach. Additionally, wherever possible, we add in explainability and references to ensure the original source data is preserved and readily accessible.
A Simple Chat Interface with Powerful Capabilities in the Field
With the growing popularity of large language models (LLMs) like ChatGPT and GitHub Co-Pilot, it seemed only a matter of time that these new technologies would spill over into the cybersecurity arena. My team at SealingTech witnessed movement in this area with the development of tools like Microsoft Security Co-Pilot, an LLM powered tool integrating with Microsoft products to help defenders and IT staff quickly remediate security incidents. So, we asked ourselves, how about integrating it with common SIEMs and models trained on cybersecurity datasets? The idea behind Operator X was born.
Through our innovative solution, operators can simply ask for what they need. Operator X translates their request into a complex program or script, that the operator can choose to execute with the click of a button to retrieve the desired information. No longer needing to sift through documentation, operators can upload a wide array of document types and query across a simple chat interface. Operator X utilizes advanced AI techniques such as Chain-of-Thought reasoning, modular agent deployments for complex actions and multiple domain-specific models varying in size and architecture.
Use Cases
Operator X targets numerous use cases spanning the areas of knowledge retrieval, data query and visualization, intent-to-action, rapid code generation, and threat intelligence correlation. Here’s some of the ways we’re looking to implement our latest technology:
Offline Knowledge Retrieval
One of the primary initial use cases for Operator X lies in the ability for operators to upload internal data in an offline environment, then query across those documents for information retrieval and summaries. Supported file formats include PDFs, Word documents, text files, and PowerPoints. Metrics on retrieval is monitored for improvements to the knowledge retrieval pipeline over time to improve context retrieval and relevancy of generations.
Sensor Ruleset Deployment Via Action
Another key use case for Operator X includes the ability to generate Suricata rules and push them to sensors given a natural language query. For example, an operator can ask an application to “Write a Suricata rule to detect the snake malware and push to my security onion sensors”. Operator X then processes the query, generates the respective rule, and returns it back to the user, asking permission to push to the sensors. Upon confirmation, Operator X gets configured with the proper credentials to send out these new rules and run any necessary commands on the sensors to apply them. This saves operators the time of writing or having to research rules manually, and instead, translates their intent into an actionable set of commands.
Analytics Query and Visualization
Operator X can also interact with APIs for common applications in the operator’s environment, specifically, Elasticsearch and Splunk for the initial implementation. Rather than writing up queries by hand or creating new dashboards, operators can simply ask for the data they want, and Operator X translates it into a respective query. For example, operators can ask the application, “What are my top talkers in Elasticsearch?” Operator X then leverages the LLM to generate a corresponding Elasticsearch query. It returns this information to the user, again asking for confirmation before executing.
Automated Target Network Interaction
Finally, Operator X possesses the ability to run scans on a network using tools like Nmap and ACAS. Our technology takes their query such as “Perform a vulnerability scan of network 192.168.0.0/24” and translates their intent into a set of Nessus commands or API calls to achieve the desired result. By using the language reasoning capabilities of LLMs, different permutations of questions can result in different intents, and thus commands. For example, Operator X translates, “Determine if port UDP port 53 is open on 10.15.67.13.” into a different set of command flags or tools for Nmap, then asks for permission to execute and returns the results.
Operator X Architecture
We intend to deploy Operator X onto our GN7000 series node with a Rocky based operating system as the base. Deploying onto a GPU enabled node allows for higher throughput and inference speeds from the LLM then would be feasible on a CPU-only machine.
Retrieval Augmented Generation (RAG)
RAG allows users to upload their own documents and use the LLM to query them for specific information and summaries. Below are three diagrams that outline the RAG process in Operator X:
With our innovation, Operator X, we’ve leaned into the strengths of open-source models and provided a finetuned model of code generation for any given task a cyber protection team may have, thus creating more consistent and faster inference times. Interested in working with us to develop your own agents and implementations? Contact us today.
Related Articles
AI Solutions that Support the Mission: TechNet Indo-Pac
Members of SealingTech’s team attended AFCEA’s TechNet Indo-Pac Conference in Honolulu, Hawaii in October. Its theme: “Free-Open-Secure” focused on the critical issues identified by regional military leaders to maintain and…
The Call for Explainable AI
Enhancing Network Visibility with Machine Learning Artificial intelligence (AI) and machine learning are transforming business processes across industries. For many organizations, data has become their most valuable asset. The ability…
Unsupervised Learning for Cybersecurity
Dashboards and automated alerts remain well-established fundamental components of nearly every cybersecurity team’s toolbelt. Peel back the layers of a network monitoring tool suite, and you’ll discover that every team…
Could your news use a jolt?
Find out what’s happening across the cyber landscape every month with The Lightning Report.
Be privy to the latest trends and evolutions, along with strategies to safeguard your government agency or enterprise from cyber threats. Subscribe now.