Skip to content

SuZeAI/Graph_reduce_Contrastive_Learning_ADM

Repository files navigation

Android Malware Detection with Graph Reduction, Contrastive Learning, and LLM Embeddings

This project focuses on utilizing advanced techniques such as graph reduction, contrastive learning, and large language model (LLM) embeddings to enhance the detection of malware in Android applications. By leveraging these cutting-edge methods, the system is designed to provide robust and accurate detection mechanisms for malicious behaviors within Android environments.


Features

  1. Graph-Based Representation:

    • Represent Android application behaviors using directed graphs derived from system calls, API invocations, and other relevant runtime data.
    • Perform graph reduction to simplify the structure while retaining meaningful patterns.
  2. Contrastive Learning:

    • Apply contrastive learning to learn discriminative features between malicious and benign samples.
    • Use embeddings from graph structures to identify unique patterns specific to malware.
  3. LLM Embeddings:

    • Integrate embeddings generated by large language models to capture semantic information from code or logs.
    • Combine static code analysis with dynamic behavior insights for comprehensive detection.
  4. Pipeline Integration:

    • End-to-end pipeline for preprocessing, feature extraction, training, and inference.
    • Modular design for extensibility and easy integration into existing systems.

Architecture Overview

Key Components:

  1. Graph Reduction:

    • Extract graphs from Android applications representing control-flow, data-flow, or dependency relationships.
    • Simplify graphs by removing redundant nodes/edges while preserving critical structures.
  2. Contrastive Learning Module:

    • Train models using contrastive loss to maximize the similarity between embeddings of similar classes (benign/malware) and minimize similarity for dissimilar classes.
    • Enhance generalization for unseen malware samples.
  3. LLM Embedding Integration:

    • Utilize pre-trained LLMs (e.g., OpenAI's GPT models or similar) to extract embeddings from decompiled code or logs.
    • Combine these embeddings with graph-based features for richer representation.
  4. Detection Model:

    • Classifier that integrates graph-based features and LLM embeddings to detect malicious activities.
    • Flexible backend supporting various architectures (e.g., GNNs, transformers).

Installation

Prerequisites

  • Python 3.8+
  • PyTorch 1.12+ or TensorFlow
  • CUDA-enabled GPU (optional, for faster training)

Steps

  1. Clone the repository:

    git clone https://github.com/SuZeAI/Graph_reduce_Contrastive_Learning_ADM.git
    cd Graph_reduce_Contrastive_Learning_ADM
  2. Install dependencies:

     conda create -n grc python=3.10
     conda activate grc
     python3 -m pip install -r requirements.freeze.txt
  3. Configure environment variables for LLM integration (if needed).


Usage

Data Preparation

  1. Obtain a dataset of Android APK files (e.g., Drebin, AndroZoo).

  2. Extract static and dynamic features:

    • Static analysis: Use decompilation tools (e.g., JADX) for code extraction.
    • Dynamic analysis: Simulate apps in a sandbox to gather runtime data.
  3. Convert features into graph representations and embeddings.

Training & Testing the Model

  • Use script run download, preprocessing dataset, training, testing, ... in makefile

Results

  • Detection Accuracy:
  • False Positive Rate:
  • Runtime Performance:

Future Work


Contributions

We welcome contributions from the community. Please fork the repository, make your changes, and submit a pull request.


License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments

  • OpenAI for LLM APIs
  • Community datasets like Drebin, AndroZoo
  • Research papers on graph-based malware detection and contrastive learning

Citation

If you use this project in your research, please cite the accompanying paper:

@article{<Name>,
  title={<Name>: <description>},
  author={ManhVM},
  journal={None},
  year={2025}
}

Contact

For questions or collaborations, contact manhvm@gmail.com.

About

This repository focuses on Android malware detection using a novel approach that combines graph reduction, contrastive learning, and LLM-based embeddings. It aims to enhance detection accuracy and efficiency by leveraging structural analysis and deep representation learning techniques.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors