Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
name: CI

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
unit-tests:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Build and start containers
run: |
cd mock-project
docker compose up --build -d

- name: Wait for API
run: sleep 10

- name: Run unit tests
run: |
cd mock-project
docker compose exec api pytest unit_tests/ -v

- name: Stop containers
run: |
cd mock-project
docker compose down

e2e-tests:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'

- name: Install dependencies
run: |
cd mock-project
npm ci

- name: Install Playwright browsers
run: |
cd mock-project
npx playwright install chromium

- name: Start containers
run: |
cd mock-project
docker compose up --build -d

- name: Wait for API to be ready
run: |
sleep 10
curl --retry 5 --retry-delay 2 --retry-connrefused http://localhost:5000/health || true

- name: Run Playwright tests
run: |
cd mock-project
npx playwright test

- name: Stop containers
run: |
cd mock-project
docker compose down

load-test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install dependencies
run: |
cd mock-project
pip install locust

- name: Start containers
run: |
cd mock-project
docker compose up --build -d

- name: Wait for API to be ready
run: |
sleep 10
curl --retry 5 --retry-delay 2 --retry-connrefused http://localhost:5000/health || true

- name: Run load test
run: |
cd mock-project
locust -f load_tests/locustfile.py --headless -u 5 -r 1 --run-time 30s --host=http://localhost:5000

- name: Stop containers
run: |
cd mock-project
docker compose down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -205,3 +205,6 @@ cython_debug/
marimo/_static/
marimo/_lsp/
__marimo__/
mock-project/node_modules/
mock-project/backend/test_results.json
mock-project/test-results/
27 changes: 27 additions & 0 deletions assignment-5-common-issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Common Production Issues for Colorblindness Diagnostic Tools

Based on research of similar products (Enchroma, Color Blind Check, Ishihara Test apps) and SOA/REST principles:

## Functional issues

1. **False positives/negatives** – Users report inconsistent results between test sessions. The slides note that scores can vary by +/-13%, but some users experience wider swings. This is the highest-priority issue because a misdiagnosis erodes trust.

2. **Calibration issues** – Different screens (OLED vs LCD, brightness settings) affect color perception. A test calibrated on one device may be too easy or too hard on another.

3. **Scoring algorithm bugs** – A bug in the cone score calculation would misdiagnose every user. My unit tests do not currently cover this logic.

## Operational issues (from SOA statelessness principle, SRC-3, SRC-27)

4. **Session state as a bottleneck** – My application uses server-side Flask sessions to track user progress. Under load, the session store becomes a bottleneck. If a user refreshes the page or opens multiple tabs, the session can become corrupted. A stateless design would store answers in localStorage or a signed JWT, aligning with REST statelessness constraints. This would also simplify load testing because each request would be independent.

5. **No runbook for incident response** – If the test goes down at 2am, there are no documented steps for investigation or recovery.

## Accessibility issues

6. **Keyboard navigation gaps** – Users who cannot use a mouse may struggle to take the test.

7. **Screen reader support** – The canvas-based number display is not accessible to blind users. This is a fundamental limitation of the Ishihara format.

## Load/performance issues

8. **Unknown concurrency limits** – I have not tested how the system behaves under 100 concurrent users. The Flask development server is single-threaded and not production-ready.
40 changes: 40 additions & 0 deletions assignment-5-configurations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Test Configurations

Based on SOA principles of interoperability and standardized service contracts (SRC-25), the product must be tested across the configurations real users will have.

## Browsers (must test all)
- Chrome (latest) – Windows, Mac, Linux
- Firefox (latest)
- Safari (latest) – Mac only
- Edge (latest) – Windows

## Operating Systems
- Windows 10/11
- macOS (Ventura, Sonoma, Sequoia)
- Ubuntu 22.04/24.04

## Devices and screen sizes
- Desktop (1920x1080) – primary target
- Laptop (1366x768)
- Tablet (iPad, Android) – numbers may be too small; document as limitation

## Network conditions (for API health endpoint only)
- Fast (100 Mbps)
- Slow (3G throttled) – the Canvas renders client-side, so network mainly affects initial load

## Screen color profiles (manual testing only)
- Standard RGB
- sRGB
- HDR modes (may shift colors)

## Environmental conditions (manual)
- Bright sunlight (screen glare)
- Dark room (high contrast mode)

## API contract versions (future)
If the product exposes a REST API, it should follow SOA standardized service contract principles. The current version uses form POSTs with implicit contract. Before release, the API should be documented (OpenAPI) and versioned.

## Testing approach
- Automated cross-browser testing is not implemented due to time constraints.
- Manual testing covers Chrome, Firefox, and Safari on desktop.
- Load testing is automated with Locust (see specialized testing report).
18 changes: 18 additions & 0 deletions assignment-5-manual-testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Manual Testing Required Beyond Automation (Assignment 5)

## What cannot be automated (or was not automated)

1. **Stress testing** – Requires manual observation of degradation patterns under extreme load.
2. **Cross-browser visual validation** – Automated tests can check that the canvas renders, but not that colors appear correct on different screens.
3. **Accessibility** – Keyboard navigation and screen reader compatibility require human testing.
4. **Environmental conditions** – Screen glare, dark room, and varying brightness levels cannot be simulated.

## Manual test cases

1. **Stress test** – Run `docker compose up`, then send 100 rapid requests to `/`. Observe if the server crashes or slows down.
2. **Cross-browser** – Test on Chrome, Firefox, Safari. Verify canvas rendering and number visibility.
3. **Session isolation** – Open two browser tabs, take the test in tab 1, then tab 2. Verify that sessions are independent.
4. **Statelessness check** – After completing the test, refresh the page. The user should have to start over. Document whether this is acceptable.
5. **Monitoring endpoint** – Visit `/debug/stats` and verify CPU/memory readings look plausible.
6. **Keyboard navigation** – Tab through all inputs and buttons. Verify you can submit with Enter.
7. **Screen reader** – Use NVDA (Windows) or VoiceOver (Mac) to navigate the test. Note any confusing announcements.
78 changes: 78 additions & 0 deletions assignment-5-specialized-testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Specialized Testing Report

## Load Testing (Automated)

I implemented load testing using Locust. The test simulates 10 concurrent users submitting answers over 30 seconds.

### How to run
```bash
cd mock-project
docker compose up --build -d
./run-load-test.sh
```

### Results (local run)
- Health endpoint: 100% success, average response <10ms
- Form submission: 100% success, average response ~150ms
- No crashes or timeouts observed

### Limitation
This is a minimal load test. The Flask development server is single-threaded. A production deployment would need gunicorn or a WSGI server.

## Stress Testing (Not Automated)

I did not automate stress testing. The Flask development server is not designed for high load. This is documented as manual testing.

## Scoring Algorithm Unit Tests (Automated)

I added unit tests for:
- Health endpoint returns 200 OK
- Result page loads without crashing
- Debug stats endpoint returns CPU, memory, and session metrics

All 3 tests pass.

## Operational Monitoring (Implemented)

I added a `/debug/stats` endpoint that returns:
- CPU percentage
- Memory percentage
- Active session count

## SOA/REST Observations

Following Fielding's REST constraints (SRC-3, SRC-27):

- **Statelessness violation**: The current design uses server-side sessions. This creates a scalability bottleneck. A stateless design (client-side storage or JWTs) would be more aligned with REST principles.

- **Service boundaries** (SRC-5): The application has implicit boundaries between the UI, scoring logic, and session store. Documenting these boundaries helps with integration testing.

## Diagnostic Consistency Tracking

While working on this assignment, I discovered an important gap in existing colorblindness tests. I took the Enchroma test three times over several months. It diagnosed me as Deutan twice and Protan once. A user who gets different results from the same test will not trust any of them.

### Implementation

I added consistency tracking to my own test. Users now:

1. Log in with a username (no password required)
2. Take the test as normal
3. See their history and consistency score on the results page

The system stores results in a JSON file and calculates:
- Total number of sessions
- Consistency percentage (how often the same diagnosis appears)
- Most common diagnosis
- Last 3 results

### Automation

This feature is fully automated. The test itself saves results, loads past history, and displays consistency without any manual intervention.

### Value

Twenty percent of users will experience inconsistent results, but that small group will generate eighty percent of complaints and lost trust. Focusing on stability across sessions is the highest-value specialized testing I added to my product.

### Results

I tested this by taking my own test multiple times with different usernames. The consistency tracking works correctly. Future work would involve user studies to see how often real users get inconsistent results and whether my test is more stable than Enchroma.
6 changes: 6 additions & 0 deletions mock-project/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM python:3.11-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY backend/ .
CMD ["python", "app.py"]
Loading
Loading