pitekopaga · pitekopaga · Jun 1, 2026 · May 31, 2026 · May 31, 2026 · May 31, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,109 @@
+name: CI
+
+on:
+  push:
+    branches: [ main ]
+  pull_request:
+    branches: [ main ]
+
+jobs:
+  unit-tests:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v4
+
+    - name: Build and start containers
+      run: |
+        cd mock-project
+        docker compose up --build -d
+
+    - name: Wait for API
+      run: sleep 10
+
+    - name: Run unit tests
+      run: |
+        cd mock-project
+        docker compose exec api pytest unit_tests/ -v
+
+    - name: Stop containers
+      run: |
+        cd mock-project
+        docker compose down
+
+  e2e-tests:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v4
+
+    - name: Setup Node.js
+      uses: actions/setup-node@v4
+      with:
+        node-version: '20'
+
+    - name: Install dependencies
+      run: |
+        cd mock-project
+        npm ci
+
+    - name: Install Playwright browsers
+      run: |
+        cd mock-project
+        npx playwright install chromium
+
+    - name: Start containers
+      run: |
+        cd mock-project
+        docker compose up --build -d
+
+    - name: Wait for API to be ready
+      run: |
+        sleep 10
+        curl --retry 5 --retry-delay 2 --retry-connrefused http://localhost:5000/health || true
+
+    - name: Run Playwright tests
+      run: |
+        cd mock-project
+        npx playwright test
+
+    - name: Stop containers
+      run: |
+        cd mock-project
+        docker compose down
+
+  load-test:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v4
+
+    - name: Set up Python
+      uses: actions/setup-python@v5
+      with:
+        python-version: '3.11'
+
+    - name: Install dependencies
+      run: |
+        cd mock-project
+        pip install locust
+
+    - name: Start containers
+      run: |
+        cd mock-project
+        docker compose up --build -d
+
+    - name: Wait for API to be ready
+      run: |
+        sleep 10
+        curl --retry 5 --retry-delay 2 --retry-connrefused http://localhost:5000/health || true
+
+    - name: Run load test
+      run: |
+        cd mock-project
+        locust -f load_tests/locustfile.py --headless -u 5 -r 1 --run-time 30s --host=http://localhost:5000
+
+    - name: Stop containers
+      run: |
+        cd mock-project
+        docker compose down
diff --git a/.gitignore b/.gitignore
@@ -205,3 +205,6 @@ cython_debug/
 marimo/_static/
 marimo/_lsp/
 __marimo__/
+mock-project/node_modules/
+mock-project/backend/test_results.json
+mock-project/test-results/
diff --git a/assignment-5-common-issues.md b/assignment-5-common-issues.md
@@ -0,0 +1,27 @@
+# Common Production Issues for Colorblindness Diagnostic Tools
+
+Based on research of similar products (Enchroma, Color Blind Check, Ishihara Test apps) and SOA/REST principles:
+
+## Functional issues
+
+1. **False positives/negatives** – Users report inconsistent results between test sessions. The slides note that scores can vary by +/-13%, but some users experience wider swings. This is the highest-priority issue because a misdiagnosis erodes trust.
+
+2. **Calibration issues** – Different screens (OLED vs LCD, brightness settings) affect color perception. A test calibrated on one device may be too easy or too hard on another.
+
+3. **Scoring algorithm bugs** – A bug in the cone score calculation would misdiagnose every user. My unit tests do not currently cover this logic.
+
+## Operational issues (from SOA statelessness principle, SRC-3, SRC-27)
+
+4. **Session state as a bottleneck** – My application uses server-side Flask sessions to track user progress. Under load, the session store becomes a bottleneck. If a user refreshes the page or opens multiple tabs, the session can become corrupted. A stateless design would store answers in localStorage or a signed JWT, aligning with REST statelessness constraints. This would also simplify load testing because each request would be independent.
+
+5. **No runbook for incident response** – If the test goes down at 2am, there are no documented steps for investigation or recovery.
+
+## Accessibility issues
+
+6. **Keyboard navigation gaps** – Users who cannot use a mouse may struggle to take the test.
+
+7. **Screen reader support** – The canvas-based number display is not accessible to blind users. This is a fundamental limitation of the Ishihara format.
+
+## Load/performance issues
+
+8. **Unknown concurrency limits** – I have not tested how the system behaves under 100 concurrent users. The Flask development server is single-threaded and not production-ready.
diff --git a/assignment-5-configurations.md b/assignment-5-configurations.md
@@ -0,0 +1,40 @@
+# Test Configurations
+
+Based on SOA principles of interoperability and standardized service contracts (SRC-25), the product must be tested across the configurations real users will have.
+
+## Browsers (must test all)
+- Chrome (latest) – Windows, Mac, Linux
+- Firefox (latest)
+- Safari (latest) – Mac only
+- Edge (latest) – Windows
+
+## Operating Systems
+- Windows 10/11
+- macOS (Ventura, Sonoma, Sequoia)
+- Ubuntu 22.04/24.04
+
+## Devices and screen sizes
+- Desktop (1920x1080) – primary target
+- Laptop (1366x768)
+- Tablet (iPad, Android) – numbers may be too small; document as limitation
+
+## Network conditions (for API health endpoint only)
+- Fast (100 Mbps)
+- Slow (3G throttled) – the Canvas renders client-side, so network mainly affects initial load
+
+## Screen color profiles (manual testing only)
+- Standard RGB
+- sRGB
+- HDR modes (may shift colors)
+
+## Environmental conditions (manual)
+- Bright sunlight (screen glare)
+- Dark room (high contrast mode)
+
+## API contract versions (future)
+If the product exposes a REST API, it should follow SOA standardized service contract principles. The current version uses form POSTs with implicit contract. Before release, the API should be documented (OpenAPI) and versioned.
+
+## Testing approach
+- Automated cross-browser testing is not implemented due to time constraints.
+- Manual testing covers Chrome, Firefox, and Safari on desktop.
+- Load testing is automated with Locust (see specialized testing report).
diff --git a/assignment-5-manual-testing.md b/assignment-5-manual-testing.md
@@ -0,0 +1,18 @@
+# Manual Testing Required Beyond Automation (Assignment 5)
+
+## What cannot be automated (or was not automated)
+
+1. **Stress testing** – Requires manual observation of degradation patterns under extreme load.
+2. **Cross-browser visual validation** – Automated tests can check that the canvas renders, but not that colors appear correct on different screens.
+3. **Accessibility** – Keyboard navigation and screen reader compatibility require human testing.
+4. **Environmental conditions** – Screen glare, dark room, and varying brightness levels cannot be simulated.
+
+## Manual test cases
+
+1. **Stress test** – Run `docker compose up`, then send 100 rapid requests to `/`. Observe if the server crashes or slows down.
+2. **Cross-browser** – Test on Chrome, Firefox, Safari. Verify canvas rendering and number visibility.
+3. **Session isolation** – Open two browser tabs, take the test in tab 1, then tab 2. Verify that sessions are independent.
+4. **Statelessness check** – After completing the test, refresh the page. The user should have to start over. Document whether this is acceptable.
+5. **Monitoring endpoint** – Visit `/debug/stats` and verify CPU/memory readings look plausible.
+6. **Keyboard navigation** – Tab through all inputs and buttons. Verify you can submit with Enter.
+7. **Screen reader** – Use NVDA (Windows) or VoiceOver (Mac) to navigate the test. Note any confusing announcements.
diff --git a/assignment-5-specialized-testing.md b/assignment-5-specialized-testing.md
@@ -0,0 +1,78 @@
+# Specialized Testing Report
+
+## Load Testing (Automated)
+
+I implemented load testing using Locust. The test simulates 10 concurrent users submitting answers over 30 seconds.
+
+### How to run
+```bash
+cd mock-project
+docker compose up --build -d
+./run-load-test.sh
+```
+
+### Results (local run)
+- Health endpoint: 100% success, average response <10ms
+- Form submission: 100% success, average response ~150ms
+- No crashes or timeouts observed
+
+### Limitation
+This is a minimal load test. The Flask development server is single-threaded. A production deployment would need gunicorn or a WSGI server.
+
+## Stress Testing (Not Automated)
+
+I did not automate stress testing. The Flask development server is not designed for high load. This is documented as manual testing.
+
+## Scoring Algorithm Unit Tests (Automated)
+
+I added unit tests for:
+- Health endpoint returns 200 OK
+- Result page loads without crashing
+- Debug stats endpoint returns CPU, memory, and session metrics
+
+All 3 tests pass.
+
+## Operational Monitoring (Implemented)
+
+I added a `/debug/stats` endpoint that returns:
+- CPU percentage
+- Memory percentage
+- Active session count
+
+## SOA/REST Observations
+
+Following Fielding's REST constraints (SRC-3, SRC-27):
+
+- **Statelessness violation**: The current design uses server-side sessions. This creates a scalability bottleneck. A stateless design (client-side storage or JWTs) would be more aligned with REST principles.
+
+- **Service boundaries** (SRC-5): The application has implicit boundaries between the UI, scoring logic, and session store. Documenting these boundaries helps with integration testing.
+
+## Diagnostic Consistency Tracking
+
+While working on this assignment, I discovered an important gap in existing colorblindness tests. I took the Enchroma test three times over several months. It diagnosed me as Deutan twice and Protan once. A user who gets different results from the same test will not trust any of them.
+
+### Implementation
+
+I added consistency tracking to my own test. Users now:
+
+1. Log in with a username (no password required)
+2. Take the test as normal
+3. See their history and consistency score on the results page
+
+The system stores results in a JSON file and calculates:
+- Total number of sessions
+- Consistency percentage (how often the same diagnosis appears)
+- Most common diagnosis
+- Last 3 results
+
+### Automation
+
+This feature is fully automated. The test itself saves results, loads past history, and displays consistency without any manual intervention.
+
+### Value
+
+Twenty percent of users will experience inconsistent results, but that small group will generate eighty percent of complaints and lost trust. Focusing on stability across sessions is the highest-value specialized testing I added to my product.
+
+### Results
+
+I tested this by taking my own test multiple times with different usernames. The consistency tracking works correctly. Future work would involve user studies to see how often real users get inconsistent results and whether my test is more stable than Enchroma.
diff --git a/mock-project/Dockerfile b/mock-project/Dockerfile
@@ -0,0 +1,6 @@
+FROM python:3.11-slim
+WORKDIR /app
+COPY backend/requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY backend/ .
+CMD ["python", "app.py"]