Skip to content

Add a flag to specify url path pattern #95

@Nadreck

Description

@Nadreck

What problem does this solve?

Currently, when processing llms.txt to get a list of pages to check, it strips the .md extension and uses the remaining path as the url, on the assumption that the pages use "clean" url paths with no extension. However, this isn't always the case - some docs still use the (older but still quite valid) filename.ext pattern. Currently, this means the checker just ends up with however a particular server is set up to handle those cases (whether a 404 or a redirect or what), which can skew the results of the check in ways that wouldn't be valid for a real agent.

Example:
https://www.example.com/filename.md in llms.txt is converted to https://www.example.com/filename by afdocs, but the server uses real filenames and is expecting https://www.example.com/filename.html. The exact result will depend on how that particular server handles those cases, but it'll quite likely be a 404.

What would you like to see?

Ideally it'd be great if the checker could magically sniff out what URL pattern a site uses, but probably a simpler solution would be to add a flag that lets the user specify how paths should be processed.

Alternatives considered

Can't really think of one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions