Skip to content

feat(legacy): import sbpl.net Drupal content (33 alumni, 6 new sections, verbatim archive)#2

Open
PegasusGTV wants to merge 7 commits into
sbpl:mainfrom
PegasusGTV:import-legacy-sbpl-net-content
Open

feat(legacy): import sbpl.net Drupal content (33 alumni, 6 new sections, verbatim archive)#2
PegasusGTV wants to merge 7 commits into
sbpl:mainfrom
PegasusGTV:import-legacy-sbpl-net-content

Conversation

@PegasusGTV
Copy link
Copy Markdown

Summary

This PR imports the content of the legacy sbpl.net Drupal 7 website into the new Hugo-based sbpl.github.io. The source material lives on the migration workstation under ~/Desktop/newsite/newwebsite/ (the pre-extracted clean snapshot of the Apr-2018 Drupal MySQL dump). The PR does two independent things, in two commits:

Commit 1 — feat(site): import content sections from legacy sbpl.net

New top-level pages and menu entries:

Path Description
/robots/ Allen the PR2, Man-Hack the Hexacopter, Melvin the Segbot (full bios + images)
/software/ Current SBPL libs, MoveIt plugin, lattice/SIPP/E-Graph planners, papers
/old-software/ Historical SVN/Mercurial links from the ROS Diamondback/Electric/Fuerte era
/videos/ YouTube reel grouped by topic (mobile-manip, manip, navigation, dyn-env, hexacopter)
/tutorials/ Index that links into the archived tutorial HTML (so the legacy primers still render)
/contact/ Physical/mailing address and the sbpl-users mailing list

Also:

  • +33 alumni in data/members/alumni.yaml covering the CMU/UPenn lab roster of the sbpl.net era (PhDs, masters, postdocs, research staff, interns). Three of them get real headshots (Brian MacAllister, Jonathan Butzke, Bradford Neuman) — the rest fall back to the existing placeholder SVG because the legacy site didn't carry member photos.
  • Refreshed homepage blurb (hugo.toml::params.labBlurb) using the authoritative SBPL mission statement from the original 0009-home.md.
  • Generic layouts/_default/single.html so any future standalone markdown page renders without bespoke layout work.
  • CSS for the new sections (.legacy-page, .video-grid) appended to static/css/custom.css.

Commit 2 — feat(archive): import verbatim sbpl.net Drupal snapshot under /archive/legacy-drupal

Drop the entire cleaned extraction (55 pages × 3 formats, 2 blocks, all Drupal metadata, 77 images, 3 user avatars, misc uploads, original settings.php, re-extraction scripts) into static/archive/legacy-drupal/ so the published site serves the historical content at /archive/legacy-drupal/.... This is what the Tutorials page links into.

The .gitattributes declares two folders as Git LFS:

  • static/archive/legacy-drupal/media/videos/*.mp4 (~559 MB of 5 .mp4)
  • static/archive/legacy-drupal/data/raw/*.sql (~184 MB of 2 raw MySQL dumps)

Neither is committed in this PR. The migration environment couldn't install git-lfs (GitHub release-asset host was firewalled). Stub README.md files at both paths plus a top-level LEGACY_LFS.md document exactly how to add them on a workstation that has LFS configured.

Privacy / security redactions

Before committing, the following were redacted from the archive (full list in static/archive/legacy-drupal/README.md):

  • content/users.json — email local-parts partially masked.
  • data/drupal_variables.jsoncron_key, drupal_private_key, smtp_password, FTP username***REDACTED***.
  • data/contact_categories.json — contact-form recipient masked.
  • content/pages/0030-contact.*andrew.dornbush@gmail.com masked.
  • content/blocks/001-private-files.* — plaintext username:sbpl password:arastar from the old /private/SBPL/ upload form redacted.

What's not in this PR

  • Lab member photos — only 3 are available in the legacy data (avatars of Brian MacAllister, Jonathan Butzke, Bradford Neuman). The 30 other alumni will use the existing placeholder.svg until current/former members supply headshots. Bios are intentionally short (1–2 lines verbatim from the old member pages); please update if you have richer text.
  • The 5 legacy .mp4 videos and 2 raw SQL dumps — see LEGACY_LFS.md for the LFS follow-up.
  • The current lab's People / Publications data is untouched.

Test plan

  • hugo serve — visit /, /people/, /publications/, /robots/, /software/, /old-software/, /videos/, /tutorials/, /contact/; confirm the new menu links resolve, images load, and YouTube embeds render.
  • Visit /archive/legacy-drupal/content/pages/0049-primer-to-the-search-based-planning-library-sbpl.html and a few sibling tutorials directly — they should render as the original Drupal HTML.
  • On /people/, confirm the Alumni section shows ~34 entries grouped under a single section (the layout doesn't filter alumni by category, so the new research-staff/intern/developer category strings are fine).
  • Confirm the home-page hero blurb reads "The Search-Based Planning Laboratory researches methodologies and algorithms that enable autonomous systems to act fast, intelligently, and robustly…".
  • Sanity-check that no sbpl/arastar, mlcl46$^, or other plaintext credential survived: git grep -i -E 'arastar|mlcl46|drupal_private_key|cron_key' -- ':!LEGACY_LFS.md' ':!**/README.md' should return no real-secret hits.

Follow-up checklist for maintainers (after merge)

  1. Decide whether to keep the legacy archive inside static/ (served publicly) or move it to a non-static folder so it ships in the repo but isn't included in the GitHub Pages build output.
  2. On a workstation with git-lfs installed, add the videos and SQL dumps following LEGACY_LFS.md (free GitHub LFS quota is 1 GB; the combined ~743 MB fits, but watch bandwidth).
  3. Update the 33 imported alumni with real photos / current affiliations as they become available.
  4. Once the new pages are reviewed, you may want to drop the old member pages from the archive (they're already promoted to alumni.yaml).

PegasusGTV and others added 3 commits May 11, 2026 18:34
Add the lab content that lived on the old Drupal site (~2012–2015)
into the new Hugo site as proper top-level sections, plus 33 historical
lab members as Alumni:

- New top-level pages and menu entries:
  - /robots/        — Allen the PR2, Man-Hack the Hexacopter, Melvin the Segbot
  - /software/      — current SBPL libraries, MoveIt plugin, lattice/SIPP/E-Graph planners
  - /old-software/  — historical SVN / ROS Diamondback / Electric / Fuerte links
  - /videos/        — YouTube reel (mobile-manip, manip, navigation, dyn-env, hexacopter)
  - /tutorials/     — index that links into the archived tutorial HTML
  - /contact/       — physical / mailing address and sbpl-users mailing list
- Add 33 historical members to `data/members/alumni.yaml` covering the
  CMU / UPenn lab roster of that era (PhDs, masters, postdocs, research
  staff, interns). Three of them (Brian MacAllister, Jonathan Butzke,
  Bradford Neuman) have real headshots from the legacy site avatars,
  saved under `static/images/members/legacy/`.
- Refresh the homepage `labBlurb` (in `hugo.toml`) with the authoritative
  mission statement from the original sbpl.net home page.
- Introduce a generic `layouts/_default/single.html` so any future
  standalone markdown page renders without custom layout work.
- Extend `static/css/custom.css` with `.legacy-page`, `.video-grid`,
  and related rules to style the new sections consistently.

Co-authored-by: Cursor <cursoragent@cursor.com>
…e/legacy-drupal

Drop the cleaned, extracted snapshot of the old Drupal 7 site
(sbpl.net, last DB dump from Apr 2018) into `static/archive/legacy-drupal/`
so it ships with the published site at `/archive/legacy-drupal/...`.
This is what the Tutorials page and the cross-links from imported
content rely on.

Inventory (~31 MB on disk):

- `content/pages/`  — 55 page triplets (.md / .html / .json) including
  the original Home, People, Robots, Software, Tutorials, and 25
  individual member pages.
- `content/blocks/` — 2 custom HTML blocks (private/public file
  upload forms). The plaintext `username: sbpl password: arastar`
  credentials in `001-private-files.*` have been REDACTED — the old
  `/private/SBPL/` upload endpoint is offline.
- `content/users.json`, `content/menus.json`, `content/site_variables.json`
  — Drupal user list (passwords already stripped upstream, emails
  masked here), navigation tree, curated site variables.
- `data/`           — taxonomy, file-managed, file-usage, roles,
  permissions, comments, text formats, page-field tables, and 180
  Drupal variables.
- `media/images/`   — 77 source images + 77 thumbnails (~22 MB).
- `media/logos/`    — SBPL logo (transparent + 70% variant).
- `media/pictures/` — 3 user-avatar pictures.
- `media/uploads/`  — misc Drupal uploads (~7 MB), plus
  `media/uploads/SBPL/{sbpl.tar.gz,sbpl.zip}` — the legacy software
  download tarballs (kept; they're public).
- `reference/`      — original Drupal `settings.php` (DB password
  already redacted upstream) and the install procedure.
- `scripts/`        — the Python 3 stdlib-only extractor/manifest
  scripts so the dumps can be re-extracted from scratch.

Redactions applied before commit (see
`static/archive/legacy-drupal/README.md` for the full list):

- `content/users.json` — email local-parts partially masked.
- `data/drupal_variables.json` — `cron_key`, `drupal_private_key`,
  `smtp_password`, FTP `username` replaced with `***REDACTED***`.
- `data/contact_categories.json` — contact-form recipient masked.
- `content/pages/0030-contact.*` — `andrew.dornbush@gmail.com` masked.

Two large subtrees are NOT committed here because the migration
environment couldn't install git-lfs:

- `media/videos/`  — ~559 MB of 5 .mp4 files
- `data/raw/`      — ~184 MB of 2 raw MySQL dumps

Both folders have a stub README explaining what's missing and how to
add the files via Git LFS. The repo's `.gitattributes` already declares
those paths as LFS-tracked so dropping the files in and committing
will Just Work. See `LEGACY_LFS.md` at the repo root for the
full follow-up checklist (including a note on the free 1 GB LFS quota
and the option of moving the large blobs out of `static/` to keep them
out of every GitHub-Pages build clone).

Co-authored-by: Cursor <cursoragent@cursor.com>
Public forks can't host new LFS objects (GitHub policy — they share LFS
with the upstream parent, which requires upstream-write to upload). The
migration account is a member of the `sbpl` org but doesn't currently
have individual push permission on `sbpl/sbpl.github.io`, so the
videos couldn't be added to LFS from this PR.

What changes:

- `.gitattributes` now also tracks `*.wmv` and `*.avi` under
  `static/archive/legacy-drupal/media/videos/` (one of the legacy
  videos is .wmv, not .mp4).
- `static/archive/legacy-drupal/media/videos/README.md` is rewritten
  to explain (a) why the videos aren't in the PR yet, and (b) the
  exact one-shot procedure a maintainer with `sbpl/sbpl.github.io`
  push access can run to drop them in (a `cp` + `git rm README` +
  `git add` + `git commit` + `git push`; .gitattributes already
  routes the files to LFS automatically).
- `static/archive/legacy-drupal/data/raw/README.md` is rewritten to
  state clearly that the two raw MySQL dumps are held back not for
  technical reasons but because they still contain unredacted PII
  (real emails, password hashes, IPs, ~27k spam-bot registrations,
  SMTP / private / cron keys). It includes a sanitisation checklist.
- `LEGACY_LFS.md` is rewritten to reflect the actual state of this
  PR (scaffold only for both folders) and to summarise the GitHub
  free-tier LFS quota (1 GB storage / 1 GB monthly bandwidth) plus
  the option of mirroring `.mp4` blobs externally (Internet Archive,
  CMU Box) and just linking from `/videos/`.

Co-authored-by: Cursor <cursoragent@cursor.com>
@PegasusGTV
Copy link
Copy Markdown
Author

Tried to add the 5 legacy .mp4 / .wmv videos via Git LFS in commit c854913 — turned out that public forks can't upload new LFS objects to GitHub LFS storage (forks share their LFS namespace with the upstream and require upstream-write to push new blobs), and my account @PegasusGTV is a member of sbpl but doesn't currently have individual push permission on sbpl/sbpl.github.io.

So this PR now ships:

  • .gitattributes rules covering *.mp4, *.wmv, *.avi, *.mov, *.sql, *.sql.gz under static/archive/legacy-drupal/
  • A clear README at static/archive/legacy-drupal/media/videos/README.md with the one-shot procedure for any maintainer with push access — clone, drop the 5 files in, git add, git commit, git push. The .gitattributes routes them to LFS automatically.
  • A separate README at static/archive/legacy-drupal/data/raw/README.md explaining why the 2 raw MySQL dumps are held back (unredacted PII — emails, password hashes, IPs, ~27k spam-bot registrations, SMTP/private/cron keys) and what to strip before they could ever be pushed publicly.
  • An updated LEGACY_LFS.md at the repo root that ties it all together and discusses the free-tier 1 GB LFS storage / 1 GB monthly bandwidth budget and the option of mirroring the .mp4 blobs to Internet Archive / CMU Box instead.

The full 559 MB of videos and 184 MB of SQL dumps still live on the migration workstation (~/Desktop/newsite/newwebsite/{media/videos,data/raw}/) ready to be picked up.

If one of you with push to sbpl/sbpl.github.io could grant @PegasusGTV write to this repo, I'd be happy to push the videos directly on this same branch instead — the LFS endpoint then accepts the upload.

PegasusGTV and others added 4 commits May 12, 2026 14:22
Adds 110 publication stubs (9 hand-curated + 101 auto-generated) recovered
from the old Drupal site. The auto-generated stubs were built from the
sbpl.net/files/*.pdf URL list extracted from the Drupal access-log dump
(data/raw/dump-2018-04-12.sql) and cover the full ~2002–2016 publication
record.

For each PDF found in the legacy file catalogue:
- Title is a best-effort heuristic expansion of the filename slug
- Venue + year are parsed from the filename suffix (e.g. _icra11, _rss08)
- Authors list is conservative (Maxim Likhachev only); co-authors flagged
  for verification from the source PDF
- project_page links to the live mirror on cs.cmu.edu/~maxim/files/
- Body includes the original (defunct) sbpl.net/files/ URL for provenance

Each file is marked as needing canonical-citation review in its body so a
maintainer can drop in correct titles + author lists later.

Also adds a .gitignore for Hugo build artefacts.

Co-authored-by: Cursor <cursoragent@cursor.com>
Rewrote every publication entry using Maxim Likhachev's canonical CMU
publications page (https://www.cs.cmu.edu/~maxim/publications.html) as
the source of truth. For each paper we now have:

- the correct title (no more filename-slug expansions)
- the full author list in original order (~109 distinct researchers
  now appear across the catalogue, all auto-linked into the /authors/
  taxonomy)
- a short venue label (e.g. "ICRA 2011", "IJRR 2013", "RSS 2014")
- a long-form venue + year line in the body
- a 'pdf' field pointing at the live CMU mirror
- an 'abstract_html' field where Maxim's page links one
- an 'award' field for Best Paper / Honorable Mention / Influential
  Paper / invited paper distinctions

Also added 19 newer (2017-2021) papers that were on Maxim's page but
weren't yet in the catalogue.

Layout / styling updates:
- publication-card.html and publications/single.html now render the
  award badge, a "PDF" button, and an "Abstract" button when present
- custom.css picks up .publication-award (gold pill) and tightens
  .pub-year / .publication-list spacing for the year-grouped index

Removed 9 hand-curated duplicates that pointed at the same PDFs as
the auto-generated entries. The naming convention is now uniform:
content/publications/<pdf-slug>.md across all 129 papers.

Two pre-existing files (mmd.md, xecbs.md) are untouched - they were
not part of the legacy sbpl.net import.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ent/publications/

Adds the two helpers that built and corrected the publication catalogue,
plus a cached copy of Maxim Likhachev's CMU publications page (the source
of truth) so the import can be reproduced offline.

  scripts/legacy-import/
    ├── 01_import_from_sbpl_net_pdf_list.py
    ├── 02_rewrite_with_canonical_citations.py
    ├── maxim_publications_cached.md
    └── README.md

Re-running step 2 is idempotent and is the easiest way to refresh the
catalogue when Maxim publishes new papers - just delete the cached page
to force a fresh fetch.

Co-authored-by: Cursor <cursoragent@cursor.com>
Current Members and Alumni are now both grouped by subcategory:

  Current Members
    - Postdoctoral Researchers
    - PhD Students
    - Master's Students
    - Undergraduate Researchers
    - Research Staff & Engineers
    - Research Collaborators (visiting / external)

  Alumni
    - Postdoc Alumni
    - PhD Alumni
    - Master's Alumni
    - Undergraduate Alumni
    - Research Staff Alumni
    - Past Research Collaborators
    - Intern Alumni

Each section header shows a member count and a one-line description; an
"era" tag (e.g. "sbpl.net (~2012-2015)") is shown on each legacy entry so
the snapshot context is clear.

data/members/alumni.yaml is reorganised: members are grouped under
category-themed comment headers, bios are tightened, categories are
corrected (Sandip Aine - collaborator, Margarite Safonova -
research-staff, etc.), and an `era:` field is added to every legacy
entry.

data/members/current.yaml drops the placeholder "Example Master" row and
re-categorises Aditya Pujara as a research collaborator (he holds his
primary PhD affiliation elsewhere).

Photo handling:

- The generic grey placeholder.svg is replaced with a branded SBPL
  silhouette gradient.
- The member-card partial now renders a coloured circular avatar with
  the person's initials whenever no real photo is available. A small
  deterministic palette gives each researcher a stable colour without
  needing per-person images. Real photos still take precedence and
  gracefully fall back to the initials avatar if the image 404s.

A new "Lab Photos" section at the bottom of the page surfaces the four
group snapshots recovered from the legacy sbpl.net archive (lab cookout
2011, Kennywood 2012).

custom.css picks up styling for the new section headings, the initials
avatar, the era tag, the section lead paragraphs, and the lab-photo
figure grid.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant