WikiPulse V3: Streaming Architecture Trade-off Exploration

WikiPulse V3 is a pragmatic exploration of real-time streaming system behavior under simulated load.

This repository centers on:

streaming pipeline exploration
backpressure and consumer scaling behavior
PostgreSQL serving both OLTP and analytical endpoints through targeted indexing and materialized-view style rollups

The reliability posture is loss-minimized, at-least-once processing, not absolute delivery guarantees.

Architecture walkthrough placeholder:

Full walkthrough and scaling demo: INSERT_LINK_LATER

What This Project Explores

Wikimedia recentchange is bursty by nature. Traffic can spike quickly, and stream systems need to absorb those bursts without stalling consumers or collapsing write latency.

WikiPulse is intentionally built to explore those trade-offs:

Kafka buffering and partition-level backpressure behavior
Manual-ack consumer flow with deduplication and dead-letter safety
PostgreSQL dual role:
1. durable write path for processed edits
2. analytical serving path for /api/analytics/* and /api/v2/analytics/*
React + WebSocket user experience under continuous inbound traffic

System Architecture Deep Dive

End-to-end flow

WikipediaSseClient consumes Wikimedia SSE and publishes normalized WikiEditEvent records to Kafka (wiki-edits).
WikiEditConsumer processes records with manual acknowledgments.
Redis handles:
1. dedup lock (edit:processed:<id>, 24h TTL)
2. velocity signal (bot:velocity:<user>, 60s TTL)
ProcessedEditService persists records to PostgreSQL and broadcasts live updates.
AnalyticsAggregationService rolls recent windows into analytics_rollups for fast grouped reads.
Analytics APIs read from PostgreSQL:
1. rollup-driven endpoints (/api/analytics/*)
2. indexed direct queries for deep analytics (/api/v2/analytics/*)
Kafka Streams anomaly topology emits wiki-anomalies, which the UI receives through /topic/anomalies.

Architecture diagram

flowchart LR
  A[Wikimedia SSE Stream] --> B[Spring WebFlux SSE Client]
  B --> C[Kafka topic: wiki-edits]

  C --> D[Worker Consumer Group]
  D --> E[Redis Dedup and Velocity]
  E --> F[PostgreSQL table: processed_edits]
  F --> G[PostgreSQL table: analytics_rollups]

  G --> H[REST /api/analytics/*]
  F --> I[REST /api/v2/analytics/*]

  C --> J[Kafka Streams Anomaly Topology]
  J --> K[Kafka topic: wiki-anomalies]
  K --> L[Anomaly Consumer]

  F --> M[WebSocket /topic/edits]
  L --> N[WebSocket /topic/anomalies]

  H --> O[React Dashboard]
  I --> O
  M --> O
  N --> O

  P[Actuator and Micrometer] --> Q[Prometheus]
  Q --> R[Grafana]

Why PostgreSQL is doing both jobs

PostgreSQL currently handles transactional writes and analytics reads by combining:

write-optimized event persistence in processed_edits
targeted indexes for filter-heavy analytical endpoints
materialized-view style rollups (analytics_rollups) refreshed on a schedule

Reliability Model

The worker follows a save-before-ack contract:

Read from Kafka
Validate and deduplicate
Enrich (complexity and bot velocity)
Persist to PostgreSQL
Publish WebSocket update
Acknowledge Kafka offset

This yields loss-minimized, at-least-once processing semantics with replay tolerance.

Malformed payloads are isolated through retry plus dead-letter routing (wiki-edits-dlt) instead of freezing partitions.

React Dashboard

The frontend is split into two operator views:

AnalyticsOverviewTab
1. KPI snapshots, language/namespace/bot distributions, trend chart
2. Poll-driven reads from /api/analytics/* and /api/v2/analytics/*
3. Live anomaly ticker from /topic/anomalies
LiveFirehoseTab
1. Initial hydration from /api/edits/recent
2. Continuous stream updates from /topic/edits
3. Fixed-size rolling window to cap memory/render overhead

Observability Stack

Telemetry path:

Spring Actuator exposes /actuator/prometheus
Prometheus scrapes worker + container telemetry
Grafana visualizes throughput, lag, CPU, memory, and service health

Representative metrics:

wikipulse_edits_processed_total
wikipulse_bots_detected_total
wikipulse_errors_total
wikipulse_processing_latency
kafka_consumer_fetch_manager_records_lag

Local Docker Fast-Start

Prerequisites

Docker Desktop with Compose v2
Recommended minimum 8 GB RAM
Ports available: 3000, 3001, 5432, 6379, 8080, 9090, 9092

Start

docker compose up -d --build

Verify

docker compose ps
curl http://localhost:8080/actuator/health

Access

React Dashboard: http://localhost:3000
Grafana: http://localhost:3001
Prometheus: http://localhost:9090
Worker metrics: http://localhost:8080/actuator/prometheus

Stop

docker compose down

Kubernetes Deployment and Live Demo

1) Start Minikube and metrics server

minikube start --cpus=4 --memory=8192
minikube addons enable metrics-server

2) Build images inside Minikube daemon (PowerShell)

minikube -p minikube docker-env --shell powershell | Invoke-Expression
docker build -t wikipulse-ingestor:latest .
docker build -t wikipulse-frontend:latest ./frontend

3) Deploy

kubectl apply -f k8s/infrastructure.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/frontend-deployment.yaml
kubectl apply -f k8s/hpa.yaml

4) Wait for rollout

kubectl rollout status deployment/wikipulse-worker

5) Validate baseline

kubectl get pods -o wide
kubectl get hpa wikipulse-worker-hpa
kubectl top pods -l app=wikipulse-worker

6) Open UI

minikube service wikipulse-frontend

7) Optional load test for autoscaling behavior

bash scripts/k8s_load_test.sh

Watchers:

kubectl get hpa wikipulse-worker-hpa -w
kubectl get pods -l app=wikipulse-worker -w
kubectl describe hpa wikipulse-worker-hpa

8) Cleanup

kubectl delete -f k8s/hpa.yaml --ignore-not-found
kubectl delete -f k8s/frontend-deployment.yaml --ignore-not-found
kubectl delete -f k8s/service.yaml --ignore-not-found
kubectl delete -f k8s/deployment.yaml --ignore-not-found
kubectl delete -f k8s/secret.yaml --ignore-not-found
kubectl delete -f k8s/configmap.yaml --ignore-not-found
kubectl delete -f k8s/infrastructure.yaml --ignore-not-found
minikube stop

Testing Methodology

Testing focuses on behavior under realistic infrastructure, not mocked-only happy paths.

Unit tests for deterministic utility and rule logic
Integration tests with Testcontainers for Kafka, PostgreSQL, and Redis
API and metrics assertions for end-to-end confidence

Run:

.\mvnw clean compile
.\mvnw test

Roadmap

Add schema governance and evolution checks for event contracts
Expand anomaly rules with multi-window baselines and confidence scoring
Add deterministic replay/load harness for repeatable scaling experiments
Package Kubernetes deployment as Helm chart overlays

Closing

WikiPulse V3 is a working sandbox for streaming architecture trade-offs, not a claim of perfect reliability. It is designed to make operational behavior visible: queue pressure, consumer lag, scaling response, and analytics query cost.

For full architecture rationale and explicit trade-off notes, see ARCHITECTURE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.mvn/wrapper		.mvn/wrapper
frontend		frontend
grafana		grafana
k8s		k8s
logs		logs
prometheus		prometheus
scripts		scripts
src		src
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml
prometheus.yml		prometheus.yml
test.json		test.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WikiPulse V3: Streaming Architecture Trade-off Exploration

Table of Contents

What This Project Explores

System Architecture Deep Dive

End-to-end flow

Architecture diagram

Why PostgreSQL is doing both jobs

Reliability Model

React Dashboard

Observability Stack

Local Docker Fast-Start

Prerequisites

Start

Verify

Access

Stop

Kubernetes Deployment and Live Demo

1) Start Minikube and metrics server

2) Build images inside Minikube daemon (PowerShell)

3) Deploy

4) Wait for rollout

5) Validate baseline

6) Open UI

7) Optional load test for autoscaling behavior

8) Cleanup

Testing Methodology

Roadmap

Closing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WikiPulse V3: Streaming Architecture Trade-off Exploration

Table of Contents

What This Project Explores

System Architecture Deep Dive

End-to-end flow

Architecture diagram

Why PostgreSQL is doing both jobs

Reliability Model

React Dashboard

Observability Stack

Local Docker Fast-Start

Prerequisites

Start

Verify

Access

Stop

Kubernetes Deployment and Live Demo

1) Start Minikube and metrics server

2) Build images inside Minikube daemon (PowerShell)

3) Deploy

4) Wait for rollout

5) Validate baseline

6) Open UI

7) Optional load test for autoscaling behavior

8) Cleanup

Testing Methodology

Roadmap

Closing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages