Skip to content

SiferStan/Wallapop-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Wallapop Scraper – Technical Test

This script automates the extraction of car listings from Wallapop using Selenium. It collects data from 10 product pages and saves it into a structured JSON file.


Setup & Usage Instructions

  1. Install dependencies
    Make sure you have Python 3.11+ and Chrome installed. Then install Selenium:

    pip install selenium
  2. Run the script
    From the project folder, run:

    python main_wallapop.py
  3. Output
    A file named products_wallapop.json will be saved in the /output folder.


Summary of Technical Decisions

  • Selenium + ChromeDriver chosen for compatibility with JavaScript-rendered pages.
  • Explicit waits (WebDriverWait) used instead of fixed delays to ensure element presence.
  • Resilient scraping: uses find_elements + fallback defaults to avoid breaking on missing content.
  • Simple JSON structure: clean and portable for further processing.
  • Randomized user behavior: delay and interaction simulation for realism.

Description of Anti-Bot Measures

To simulate human behavior and reduce detection:

  • User-Agent spoofing: Custom desktop browser signature is applied.
  • Real-time navigation: Browser accepts cookies, scrolls, and opens each product page individually.
  • Randomized delays: Between 1–4 seconds added between requests.
  • Browser dimensions: Configured to mimic real user screens (1920x1080).
  • Try/except for fault tolerance: Script continues even if one page fails.

Output Examples

Example JSON entry:

[
  {
    "title": "BMW Serie 3 1997",
    "price": "4500 €",
    "description": "BMW 318TDS in good condition, always in garage. Everything original. Includes 17\" wheels.",
    "image": "https://cdn.wallapop.com/images/...",
    "url": "https://es.wallapop.com/item/..."
  }
]

Optional screenshot snippet:

driver.save_screenshot(f"output/screenshot_{i}.png")

Logging & Unit Tests

  • Console logs inform about cookies, extracted product count, and any errors.
  • No unit tests included, since the script relies on interactive, stateful page content.
  • Exceptions are handled per product to prevent full script failure.

Final Notes

This solution fulfills the test’s requirements:

  • Scrapes a real-world JavaScript site
  • Demonstrates basic anti-bot evasion
  • Includes user interaction (clicks, navigation, scroll)
  • Produces clean JSON output

Ready to extend with additional features like proxy rotation, screenshot capture, or headless execution.

About

Web scraper built with Selenium to extract car listings from Wallapop, featuring anti-bot measures, structured JSON output, and human-like interaction simulation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages