Android Malware Detection with Graph Reduction, Contrastive Learning, and LLM Embeddings

This project focuses on utilizing advanced techniques such as graph reduction, contrastive learning, and large language model (LLM) embeddings to enhance the detection of malware in Android applications. By leveraging these cutting-edge methods, the system is designed to provide robust and accurate detection mechanisms for malicious behaviors within Android environments.

Features

Graph-Based Representation:
- Represent Android application behaviors using directed graphs derived from system calls, API invocations, and other relevant runtime data.
- Perform graph reduction to simplify the structure while retaining meaningful patterns.
Contrastive Learning:
- Apply contrastive learning to learn discriminative features between malicious and benign samples.
- Use embeddings from graph structures to identify unique patterns specific to malware.
LLM Embeddings:
- Integrate embeddings generated by large language models to capture semantic information from code or logs.
- Combine static code analysis with dynamic behavior insights for comprehensive detection.
Pipeline Integration:
- End-to-end pipeline for preprocessing, feature extraction, training, and inference.
- Modular design for extensibility and easy integration into existing systems.

Architecture Overview

Key Components:

Graph Reduction:
- Extract graphs from Android applications representing control-flow, data-flow, or dependency relationships.
- Simplify graphs by removing redundant nodes/edges while preserving critical structures.
Contrastive Learning Module:
- Train models using contrastive loss to maximize the similarity between embeddings of similar classes (benign/malware) and minimize similarity for dissimilar classes.
- Enhance generalization for unseen malware samples.
LLM Embedding Integration:
- Utilize pre-trained LLMs (e.g., OpenAI's GPT models or similar) to extract embeddings from decompiled code or logs.
- Combine these embeddings with graph-based features for richer representation.
Detection Model:
- Classifier that integrates graph-based features and LLM embeddings to detect malicious activities.
- Flexible backend supporting various architectures (e.g., GNNs, transformers).

Installation

Prerequisites

Python 3.8+
PyTorch 1.12+ or TensorFlow
CUDA-enabled GPU (optional, for faster training)

Steps

Clone the repository:

git clone https://github.com/SuZeAI/Graph_reduce_Contrastive_Learning_ADM.git
cd Graph_reduce_Contrastive_Learning_ADM

Install dependencies:

 conda create -n grc python=3.10
 conda activate grc
 python3 -m pip install -r requirements.freeze.txt

Configure environment variables for LLM integration (if needed).

Usage

Data Preparation

Obtain a dataset of Android APK files (e.g., Drebin, AndroZoo).
Extract static and dynamic features:
- Static analysis: Use decompilation tools (e.g., JADX) for code extraction.
- Dynamic analysis: Simulate apps in a sandbox to gather runtime data.
Convert features into graph representations and embeddings.

Training & Testing the Model

Use script run download, preprocessing dataset, training, testing, ... in makefile

Results

Detection Accuracy:
False Positive Rate:
Runtime Performance:

Future Work

Contributions

We welcome contributions from the community. Please fork the repository, make your changes, and submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

OpenAI for LLM APIs
Community datasets like Drebin, AndroZoo
Research papers on graph-based malware detection and contrastive learning

Citation

If you use this project in your research, please cite the accompanying paper:

@article{<Name>,
  title={<Name>: <description>},
  author={ManhVM},
  journal={None},
  year={2025}
}

Contact

For questions or collaborations, contact manhvm@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
assets		assets
core		core
data_storage/processed		data_storage/processed
docs		docs
experiments		experiments
logs		logs
notebooks		notebooks
.gitignore		.gitignore
.project-root		.project-root
INSTRUCTION.md		INSTRUCTION.md
LICENSE		LICENSE
MANAGE.md		MANAGE.md
NOTE.md		NOTE.md
README.md		README.md
SETUP_ENV.md		SETUP_ENV.md
config.jenkins		config.jenkins
environment.yaml		environment.yaml
makefile		makefile
requirements.ci.txt		requirements.ci.txt
requirements.freeze.txt		requirements.freeze.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Android Malware Detection with Graph Reduction, Contrastive Learning, and LLM Embeddings

Features

Architecture Overview

Key Components:

Installation

Prerequisites

Steps

Usage

Data Preparation

Training & Testing the Model

Results

Future Work

Contributions

License

Acknowledgments

Citation

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Android Malware Detection with Graph Reduction, Contrastive Learning, and LLM Embeddings

Features

Architecture Overview

Key Components:

Installation

Prerequisites

Steps

Usage

Data Preparation

Training & Testing the Model

Results

Future Work

Contributions

License

Acknowledgments

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages