Timeout and Menopause Supplement Video Research

This repository collects and analyzes short-form videos about parenting timeout strategies and menopause-related supplements using automated Google Search scraping and AI-powered video analysis.

Overview

The project consists of two main components:

Google Search Scraper (run_googlesearch.py) - Automatically scrapes Google search results for short videos
Video Analysis (batch_LLM.py) - Processes videos using a large language model to extract structured information

Installation

Basic Requirements

uv pip install -r requirements.txt

For Google Search Scraping

pip install -r requirements-googlesearch.txt

Required Python packages:

pandas
tqdm
selenium
undetected-chromedriver

System packages (for Tor support):

tor
torsocks
chromium/chrome browser

Usage

1. Google Search Scraper (`run_googlesearch.py`)

This script scrapes Google search results for short videos related to:

Timeout dataset: #parenting #timeout and #gentleparenting #timeout
Supplements dataset: #menopause #supplements and #menopause #vitamins

Run the scraper locally:

# Normal execution (scrapes both datasets)
python3 run_googlesearch.py

# With Tor (if IP blocked)
torsocks python3 run_googlesearch.py

# Or use the --use-tor flag
python3 run_googlesearch.py --use-tor

How it works:

Searches Google for the specified hashtag combinations
Clicks on "Short videos" filter
Scrolls to load more results and clicks "More results" button
Scrapes video metadata: link, duration, title, source, author
Combines results with previous data
Filters to include only Instagram, TikTok, YouTube, and Facebook videos
Saves results to CSV and text files

Output files:

Timeout dataset:

timeout.csv - Full results with columns: link, duration, title, source, author
timeout_links.txt - Just the links, one per line

Supplements dataset:

supplements.csv - Full results with columns: link, duration, title, source, author
supplements_links.txt - Just the links, one per line

IP Blocking Protection:

The scraper includes automatic protection against IP blocking:

First attempts to run without Tor
If that fails (likely due to IP blocking), automatically retries with Tor/torsocks
Tor provides anonymity and helps avoid rate limiting

Automated Execution:

The repository includes a GitHub Actions workflow (.github/workflows/googlesearch.yml) that:

Runs daily at 2 AM UTC (configurable via cron schedule)
Can also be triggered manually via GitHub Actions UI
Automatically commits updated CSV files back to the repository

To trigger manually:

Go to the "Actions" tab in GitHub
Select "Google Search Scraper" workflow
Click "Run workflow"

To change the schedule, edit the cron expression in .github/workflows/googlesearch.yml:

schedule:
  - cron: '0 2 * * *'  # Daily at 2 AM UTC

Common cron schedules:

0 */6 * * * - Every 6 hours
0 0 * * 0 - Weekly on Sunday at midnight
0 0 1 * * - Monthly on the 1st at midnight

2. Video Analysis (`batch_LLM.py`)

This script processes downloaded videos using the Qwen3-Omni-30B-A3B-Instruct multimodal model to extract structured information.

Prerequisites:

Downloaded videos (see "Downloading Videos" section below)
GPU with sufficient VRAM (approximately 78GB required)
transformers library and Qwen dependencies

Run the video analysis:

# Process timeout dataset
python3 batch_LLM.py --dataset timeout

# Process supplements dataset
python3 batch_LLM.py --dataset supplements

How it works:

Scans the {dataset}_videos/ folder for video JSON metadata files
Skips videos that have already been processed (result file exists)
For each video:
- Loads video file and metadata
- Sends video and context to the LLM with a dataset-specific prompt
- Extracts structured information based on the dataset
- Saves results to {dataset}_results/ folder as JSON files

Extracted Information:

For timeout videos:

Video description and transcript
Tone and language
Whether video discusses timeout as a parenting strategy
Parenting approach shown
Target child age range
Speaker's profession
Sentiment (positive/neutral/negative toward timeout)
Criticisms of timeout
Alternative strategies mentioned
Relevance to ASD, ADHD, anxiety
Usefulness, misleading content, and quality ratings
Personal experiences shared

For supplements videos:

Video description and transcript
Tone and language
Supplements, vitamins, or medications mentioned
Active ingredients
Symptoms or conditions addressed
Whether targeted at menopause
Speaker's profession
Sentiment (positive/neutral/negative toward supplements)
Criticisms of supplements
Alternative strategies mentioned
Usefulness, misleading content, and quality ratings
Personal experiences shared

Output files:

Results are saved as JSON files in:

timeout_results/ - Analysis results for timeout videos
supplements_results/ - Analysis results for supplements videos

3. Downloading Videos

To download videos from the collected links, use yt-dlp:

# Download timeout videos
yt-dlp --write-info-json --batch-file timeout_links.txt --paths timeout_videos

# Download supplements videos
yt-dlp --write-info-json --batch-file supplements_links.txt --paths supplements_videos

This downloads:

Video files to {dataset}_videos/
Metadata JSON files (.info.json) with video information

Files in this Repository

run_googlesearch.py - Python script for Google search scraping (headless mode)
batch_LLM.py - Video analysis script using Qwen3-Omni model
googlesearch.ipynb - Original Jupyter notebook with scraping logic
test_LLM.ipynb - Jupyter notebook for testing LLM analysis
join_results.ipynb - Jupyter notebook for combining and analyzing results
.github/workflows/googlesearch.yml - GitHub Actions workflow for automated scraping
requirements.txt - Python dependencies for video analysis
requirements-googlesearch.txt - Python dependencies for scraping

Data Files

timeout.csv / timeout_links.txt - Collected timeout video links and metadata
supplements.csv / supplements_links.txt - Collected supplements video links and metadata
timeout_LLM_results.xlsx - Analyzed results for timeout videos
supplements_LLM_results.xlsx - Analyzed results for supplements videos

License

See LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Timeout and Menopause Supplement Video Research

Overview

Installation

Basic Requirements

For Google Search Scraping

Usage

1. Google Search Scraper (`run_googlesearch.py`)

Run the scraper locally:

How it works:

Output files:

IP Blocking Protection:

Automated Execution:

2. Video Analysis (`batch_LLM.py`)

Prerequisites:

Run the video analysis:

How it works:

Extracted Information:

Output files:

3. Downloading Videos

Files in this Repository

Data Files

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
batch_LLM.py		batch_LLM.py
googlesearch.ipynb		googlesearch.ipynb
join_results.ipynb		join_results.ipynb
requirements-googlesearch.txt		requirements-googlesearch.txt
requirements.txt		requirements.txt
run_googlesearch.py		run_googlesearch.py
supplements.csv		supplements.csv
supplements_LLM_results.xlsx		supplements_LLM_results.xlsx
supplements_links.txt		supplements_links.txt
test_LLM.ipynb		test_LLM.ipynb
timeout.csv		timeout.csv
timeout_LLM_results.xlsx		timeout_LLM_results.xlsx
timeout_links.txt		timeout_links.txt

Folders and files

Latest commit

History

Repository files navigation

Timeout and Menopause Supplement Video Research

Overview

Installation

Basic Requirements

For Google Search Scraping

Usage

1. Google Search Scraper (run_googlesearch.py)

Run the scraper locally:

How it works:

Output files:

IP Blocking Protection:

Automated Execution:

2. Video Analysis (batch_LLM.py)

Prerequisites:

Run the video analysis:

How it works:

Extracted Information:

Output files:

3. Downloading Videos

Files in this Repository

Data Files

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. Google Search Scraper (`run_googlesearch.py`)

2. Video Analysis (`batch_LLM.py`)

Packages