A collection of file manipulation and inspection utilities, including pattern-based file operations and knapsack-based file selection.
You can install filesmith using pip:
pip install filesmithFilesmith depends on Python 3.10+ and includes the shared smith-utils utility APIs.
Filesmith provides a unified CLI with subcommands.
filesmith <command> [arguments] [options]Find files and copy/move them.
filesmith find-move <src> <dst> [-p PATTERN] [-m {copy,move}] [-n] [-R]src: Source directory.dst: Destination directory.-p,--pattern: Glob pattern (default:*).-m,--mode:copyormove(default:copy).-n,--dry-run: Show what would be done.-R,--no-recursive: Do NOT search recursively.
Example:
filesmith find-move ./src ./backup -p "*.py"Knapsack-related operations.
Copy a subset of files to a destination without exceeding a total size capacity.
filesmith knapsack copy <src_dir> <dest_dir> <capacity> [-p PATTERN] [-n] [-R]Example:
# Copy up to 100MB of images
filesmith knapsack copy ./photos ./usb-drive 104857600 -p "*.jpg"Solve a general knapsack/subset-sum problem for integer items.
filesmith knapsack solve <capacity> <items...>Find duplicate files by walking a directory up to an instructed depth and comparing SHA-256 digests via smith-utils.
filesmith duplicates <root> --maxdepth <depth> [-p PATTERN] [-o REPORT]root: Root directory to scan.--maxdepth: Maximum directory depth to scan. Use0for files directly inroot.-p,--pattern: Glob pattern (default:*).-o,--output: Text report path (default:filesmith-duplicates.txt).
Example:
filesmith duplicates ./archive --maxdepth 2 -o duplicate-report.txtThe report format is line-oriented and tab-separated for downstream tasks:
# filesmith duplicate report v1
root ./archive
maxdepth 2
duplicate_groups 1
duplicate_files 2
wasted_bytes 1024
group 1 sha256 ... size 1024 count 2
file 1 1024 ./archive/a.bin
file 1 1024 ./archive/copy/a.bin
The original regex-based copy tool is available via:
filesmith-legacy copy <origin> <destination> <pattern> [--newermt REF] [-n] [-q]Filesmith can also be used as a Python library.
Filesmith re-exports common APIs from smith-utils for date parsing, numeric cleanup, text normalization, and string distance helpers.
from filesmith import ensure_date, normalize_text, parse_numeric_value
date = ensure_date("20231225")
amount = parse_numeric_value("(1,250.50)")
text = normalize_text(" Smith Utils ")Modern, structured way to find and move/copy files.
from pathlib import Path
from filesmith import find_files, transfer_files
# Find all Python files recursively
files = find_files(Path("./src"), pattern="*.py", recursive=True)
# Transfer them to a backup folder (copy or move)
transfer_files(files, Path("./backup"), mode="copy", on_conflict="skip")An orchestration class for "find and transfer" operations.
from pathlib import Path
from filesmith import FindMoveJob
job = FindMoveJob(
src_root=Path("./src"),
dest_root=Path("./dst"),
pattern="*.txt",
mode="move"
)
job.run()Finds exactly one file in a directory that matches a key and optional extension. Throws ValueError if zero or multiple files are found.
from filesmith import get_target_file
# Returns a Path object if unique match found
path = get_target_file("./data", "report_2023", ext=".csv")
print(f"Found unique report: {path.name}")Regex-based copy tool with optional modification time filtering.
from filesmith import copy_files
# Copy files matching a regex, newer than a specific date
copy_files(
origin="./logs",
destination="./archive",
pattern=r"error_.*\.log",
newermt="2023-01-01"
)Find duplicate files by content digest and write a downstream-friendly text report.
from filesmith import find_duplicate_files, write_duplicate_report
groups = find_duplicate_files("./archive", maxdepth=2)
write_duplicate_report(groups, root="./archive", maxdepth=2, output_path="duplicates.txt")Copies a subset of files that fit within a specified byte capacity. Useful for filling external drives.
from filesmith import copy_files_by_capacity
total_size, ops = copy_files_by_capacity(
src_dir="./photos",
dest_dir="/mnt/usb",
capacity=1024 * 1024 * 700 # 700 MB
)
print(f"Filled {total_size} bytes across {len(ops)} files.")General-purpose subset-sum solver for integer items.
from filesmith import run_knapsack
items = [10, 20, 30, 40, 50]
capacity = 65
best_sum, indices = run_knapsack(items, capacity)
# best_sum: 60, indices: [4, 0] (50 + 10) or similar- Enriched Python API documentation with practical usage examples.
- Added unified
filesmithcommand with subcommands:find-move,knapsack. - Added
filesmith-legacyfor the previous regex-based CLI. - Expanded Python API in
filesmithpackage. - Improved internal structure (finder, transfer, engine).
- Added
get_target_fileutility. - Improved
copy_fileswith structured logging and--newermtfiltering.