# pyrfs > Pythonic filesystem ergonomics inspired by R's fs — tidy paths, typed values, chainable, pandas-friendly pyrfs is a Python filesystem library porting the UX of R's fs package: consistent noun_verb naming (path_*, file_*, dir_*, link_*), tidy paths, typed self-describing values (FsPath, Bytes, Perms), explicit failure, and three interchangeable surfaces — functional, fluent FsPath chaining, and a pandas Series accessor with typed DataFrame columns. # Start here # pyrfs **Pythonic filesystem ergonomics, inspired by R's [fs](https://fs.r-lib.org).** Tidy paths, typed self-describing values, explicit failure — chainable, and pandas-native. Pure Python ≥ 3.10, zero hard dependencies. ``` import pyrfs as fs fs.dir_ls("src", recurse=True, glob="*.py") # [FsPath('src/app.py'), ...] fs.file_size("data.csv") > "10MB" # True — sizes compare to literals fs.file_copy("a.txt", "backup/") # FsPath('backup/a.txt'), refuses to clobber ``` ## Install Not yet on PyPI — install from GitHub: ``` pip install "pyrfs @ git+https://github.com/Lightbridge-KS/pyrfs" # with the pandas integration: pip install "pyrfs[pandas] @ git+https://github.com/Lightbridge-KS/pyrfs" ``` ## One engine, three surfaces Every operation is implemented once and reachable three ways — pick per task, mix freely: ``` import pyrfs as fs fs.path("foo", "bar", "a", ext="txt") # FsPath('foo/bar/a.txt') fs.dir_ls("data", glob="*.csv") fs.file_copy("a.txt", "b.txt") # -> FsPath('b.txt') ``` Closest to R's fs — the `noun_verb` names transfer directly. See [Coming from R's fs](https://pyrfs.netlify.app/coming-from-r/index.md). ``` from pyrfs import FsPath (FsPath("data") / "raw.csv").with_ext("parquet").copy_to("clean/") FsPath("project").mkdir().touch_file("README.md") FsPath("logs").ls(glob="*.log") ``` `FsPath` **is a `str`** — it drops into `open()`, `pd.read_csv()`, any API that takes a path. ``` import pyrfs as fs (fs.dir_info("src", recurse=True) .query("size > '10KB' and type == 'file'") # typed columns! .sort_values("size", ascending=False)) df["path"].fs.ext() # vectorized over a column ``` `size` and `permissions` are real ExtensionDtypes — string literals work inside `.query()`. ## Where next - **[Coming from R's fs](https://pyrfs.netlify.app/coming-from-r/index.md)** — the translation table. - **[The three surfaces](https://pyrfs.netlify.app/guides/three-surfaces/index.md)** — when to use which. - **[Typed values](https://pyrfs.netlify.app/guides/typed-values/index.md)** — `Bytes('444.5K')`, `Perms('rw-r--r--')`. - **[Tour notebook](https://pyrfs.netlify.app/tour/pyrfs-tour/index.md)** — everything, runnable. - **[API reference](https://pyrfs.netlify.app/api/paths/index.md)** — by family: `path_*`, `file_*`, `dir_*`, `link_*`. # Coming from R's fs pyrfs keeps fs's **UX contract** — consistent `noun_verb` naming, tidy paths, predictable typed returns, explicit failure — expressed in idiomatic Python. If you know fs, your muscle memory transfers: the functional names are identical. ## The four families | Prefix | Domain | Examples | | ------- | ------------------------------------------------ | ------------------------------------------------------------- | | `path_` | construct & manipulate path strings (**no I/O**) | `path()`, `path_dir()`, `path_ext_set()`, `path_rel()` | | `file_` | operate on files | `file_create()`, `file_copy()`, `file_info()`, `file_chmod()` | | `dir_` | operate on directories | `dir_create()`, `dir_ls()`, `dir_info()`, `dir_tree()` | | `link_` | operate on links | `link_create()`, `link_path()`, `link_copy()` | Plus predicates (`is_file`, `is_dir`, …), `user_ids`/`group_ids`, and temp helpers (`file_temp`, `path_temp`, `file_temp_push/pop`) — all as in fs. ## Translation table | R fs | pyrfs functional | pyrfs fluent | | ----------------------------- | ----------------------------- | ------------------------------------------- | | `path("a", "b", ext = "txt")` | `path("a", "b", ext="txt")` | `FsPath("a") / "b"` then `.with_ext("txt")` | | `dir_ls("d", recurse = TRUE)` | `dir_ls("d", recurse=True)` | `FsPath("d").ls(recurse=True)` | | `dir_info("d")` | `dir_info("d")` → DataFrame | — | | `file_copy("a", "b")` | `file_copy("a", "b")` | `FsPath("a").copy_to("b")` | | `file_size("a")` | `file_size("a")` → `Bytes` | `FsPath("a").size()` | | `path_ext_set("a.txt", "md")` | `path_ext_set("a.txt", "md")` | `FsPath("a.txt").with_ext("md")` | | `path_rel("a/b", "a")` | `path_rel("a/b", "a")` | `FsPath("a/b").rel_to("a")` | | `dir_tree("d")` | `dir_tree("d")` | `FsPath("d").tree()` | | `fs_bytes("10MB")` | `Bytes("10MB")` | — | | `fs_perms("644")` | `Perms("644")` | — | | `x %>% file_delete()` | loop / `df.pipe(...)` | `FsPath(x).delete()` | ## The headline demo, ported ``` # R dir_info("src", recurse = FALSE) |> filter(type == "file", size > "10KB") |> arrange(desc(size)) ``` ``` # Python (with the pandas extra) (fs.dir_info("src") .query("size > '10KB' and type == 'file'") .sort_values("size", ascending=False)) ``` `size` and `permissions` are real pandas ExtensionDtypes, so comparisons against human literals work inside `.query()` — same trick as fs's `fs_bytes`/`fs_perms` tibble columns. ## Vectorization fs is vectorized end to end; Python is scalar-by-default. pyrfs functions are **polymorphic on the first argument**: ``` fs.path_ext("a.txt") # 'txt' (scalar -> scalar) fs.path_ext(["a.txt", "b.md"]) # ['txt', 'md'] (list -> list) fs.path_ext(df["path"]) # pandas Series (Series -> Series) df["path"].fs.ext() # the idiomatic column form ``` ## What's different (on purpose) - **Errors are Python-native.** `FileExistsError`/`FileNotFoundError`/ `PermissionError` instead of classed `fs_error` conditions; `FsValueError` for pyrfs-level validation. `tryCatch` → `try/except`. - **`recurse` defaults match fs** (`False` for listing, `True` for `dir_create`), and accepts an `int` depth, exactly like fs. - **Byte units are 1024-based across the board** — `Bytes("10MB") == Bytes("10MiB")`, matching `fs_bytes`. - **`is_file`/`is_dir` classify the entry itself** (lstat): a symlink answers `True` only to `is_link` — fs semantics, not `os.path.isdir` semantics. - **No `dir_move()`** — directories move via `file_move()`, same as fs. - **`FsPath` is a `str`, not a `pathlib.Path`.** Best interop and pandas round-tripping; call `.as_pathlib()` when you want pathlib semantics. The `/` join concatenates then tidies — an absolute right-hand side does *not* reset the path (unlike `os.path.join`). - **The split method is `parts()`** — `str.split()` is left untouched so `FsPath` never surprises code that treats it as a string. - **`dir_walk()` is a lazy generator** rather than a callback walker — the Pythonic spin; `dir_ls()`/`dir_map()` are built on it. # Guides # Safety & errors pyrfs inherits fs's stance: **explicit failure, destructive actions opt-in**. Nothing silently returns `False`; nothing clobbers unless you ask. ## Safe defaults (learn once) | Argument | Meaning | Default | On | | ----------------- | ------------------------------------------------- | ------------------------------- | ----------------------- | | `overwrite` | allow clobbering an existing target | `False` | copy/move | | `recurse` | `True` = fully, `False` = no, `int` = to depth | `False` listing / `True` create | `dir_*` | | `all` | include hidden dotfiles | `False` | `dir_ls`, `dir_map`, … | | `type` | filter by entry type (`"file"`, `"directory"`, …) | `"any"` | traversals | | `glob` / `regexp` | filter listings (mutually exclusive) | `None` | `dir_ls`, `path_filter` | | `fail` | raise vs warn on unreadable entries | `True` | traversals | Behavior flags are keyword-only, so call sites read self-documenting: `file_copy(a, b, overwrite=True)`. ## The error model ``` fs.file_copy("a.txt", "b.txt") # FileExistsError if b.txt exists fs.dir_ls("nope") # FileNotFoundError fs.path_filter(ps, glob="*.py", regexp=r"\.py$") # FsValueError: cannot set both ``` - **OS-level failures raise native `OSError` subclasses** — `FileNotFoundError`, `FileExistsError`, `PermissionError` — familiar and `except`-able. - **pyrfs-level validation raises `FsError`** (usually the `FsValueError` subclass): conflicting arguments, bad size/permission literals, deleting a non-symlink with `link_delete`. ## Softening traversals: `fail=False` One unreadable entry shouldn't abort a whole directory walk: ``` fs.dir_ls("/var", recurse=True, fail=False) # UserWarning: skipping unreadable directory: ... # -> returns everything it *could* read ``` This is a direct port of fs's `fail` knob, and applies to `dir_ls`, `dir_walk`, `dir_map`, and `dir_info`. ## Destination resolution (copy/move) Copying or moving **into an existing directory** targets `dir/basename` — shell `cp`/`mv` semantics — and the `overwrite` guard applies to that *resolved* target: ``` fs.file_copy("report.pdf", "archive/") # -> FsPath('archive/report.pdf') fs.file_copy("report.pdf", "archive/") # FileExistsError ``` There is no `dir_move()`: directories are files at the OS level, so `file_move()` moves them — same deliberate choice as fs. # The three surfaces Every pyrfs operation is implemented **once** in a pure-stdlib engine; the three user-facing surfaces are thin delegates. They interoperate freely — `dir_ls()` returns `FsPath`s you can chain methods on or drop into a DataFrame column. ## A — Functional: scripts and R muscle memory ``` import pyrfs as fs files = fs.dir_ls("data", recurse=True, glob="*.csv") fs.dir_create("backup") for f in files: fs.file_copy(f, "backup/", overwrite=True) ``` Names mirror R's fs exactly — see [Coming from R's fs](https://pyrfs.netlify.app/coming-from-r/index.md). Functions are polymorphic on the first argument (scalar → scalar, list → list, Series → Series). ## B — Fluent `FsPath`: OO-style chaining ``` from pyrfs import FsPath report = (FsPath("analysis") / "draft.md").with_ext("html") work = FsPath("project").mkdir().touch_file("README.md").touch_file("setup.py") big_logs = [p for p in FsPath("logs").walk(glob="*.log") if p.size() > "5MB"] ``` Because `FsPath` subclasses `str`: - `open(p)`, `pd.read_csv(p)`, `json.dump(..., open(p, "w"))` all just work; - every `str` method behaves normally (`p.startswith("src/")`, `p.split("/")`); - it serializes cleanly (JSON, parquet, databases) as a plain string. Mutating verbs return the resulting path, so chains read top-to-bottom like R pipes. ## C — pandas: columns and frames Requires the extra: `pip install "pyrfs[pandas]"`. ``` import pandas as pd import pyrfs as fs # typed frame in, typed frame out big = (fs.dir_info("src", recurse=True) .query("size > '10KB' and type == 'file'") .sort_values("size", ascending=False)) # vectorized path algebra over a column df = pd.DataFrame({"path": fs.dir_ls("src", recurse=True, type="file")}) df.assign( ext=df["path"].fs.ext(), dir=df["path"].fs.dir(), size=df["path"].fs.size(), # a real 'bytes'-dtype column ) ``` Without pandas installed, the core works unchanged and `*_info` returns `list[dict]` rows carrying the same typed scalars. ## Choosing | Situation | Reach for | | ----------------------------------------------- | ---------------- | | Shell-script-like automation, R habits | **A** functional | | Building paths through transformations, OO code | **B** fluent | | Filtering/aggregating many files as data | **C** pandas | # Typed values The heart of fs's charm: values that *know what they are* and print for humans. pyrfs ships three — each subclasses a builtin, so it still behaves like its base type everywhere. ## `Bytes` ⊂ `int` ``` from pyrfs import Bytes Bytes("10MB") # Bytes(10485760) str(Bytes(455200)) # '444.5K' Bytes(455200) < "1MB" # True — comparisons parse literals sum([Bytes("1MB"), Bytes("500KB")]) # Bytes -> '1.49M' (arithmetic stays typed) ``` All units are 1024-based `"10MB"`, `"10MiB"` and `"10M"` all mean `10 * 1024**2`, matching R's `fs_bytes`. `repr()` stays exact (`Bytes(455200)`); `str()`/`format()` humanize. `file_size()` returns `Bytes`, so `fs.file_size("x.bin") > "10KB"` reads like the question you're asking. ## `Perms` ⊂ `int` ``` from pyrfs import Perms Perms("644") # Perms('rw-r--r--') Perms("644") == "rw-r--r--" # True Perms("644") == "u=rw,go=r" # True — symbolic forms parse too Perms("644") | "u+x" # Perms('rwxr--r--') — mode algebra stays typed ``` `file_chmod()` accepts all the same forms, and symbolic modes apply **relative to the current mode** (chmod semantics): `fs.file_chmod("run.sh", "u+x")`. ## `FsPath` ⊂ `str` ``` from pyrfs import FsPath FsPath("src//a.txt/") # FsPath('src/a.txt') — tidied on construction FsPath("a") / "b" / "c.md" # FsPath('a/b/c.md') ``` Tidy form: always `/` separators, no doubled or trailing slashes. In a terminal, the repr is coloured by on-disk type via `LS_COLORS` (degrades automatically on non-TTY or `NO_COLOR`). ## In pandas columns With the `[pandas]` extra these become real ExtensionDtypes — `"bytes"`, `"perms"`, `"path"` — so whole columns display humanized, sort correctly, compare against literals inside `.query()`, and `sum()`/`min()`/`max()` return typed scalars: ``` s = pd.Series(["1K", "10MB", "455"], dtype="bytes") s > "1K" # [False, True, False] s.sum() # Bytes -> '10M' ``` # API reference # Directories — `dir_*` All traversals share the fs filter set: `all`, `recurse` (bool or depth), `type`, `glob`/`regexp` (mutually exclusive), `invert`, `fail`. ## pyrfs.dir_create ``` dir_create(path: str, *, mode: int | str = 493, recurse: bool = True) -> FsPath ``` Create a directory (parents too when `recurse`); existing dirs are fine. Vectorized: also accepts an iterable or pandas Series of paths. Parameters: | Name | Type | Description | Default | | --------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `path` | `str or PathLike` | The directory to create. | *required* | | `mode` | `int or str` | Permissions for newly created directories (default 0o755); subject to the process umask. | `493` | | `recurse` | `bool` | Create missing parents too (default True, matching fs — note this differs from the recurse=False default of the listing functions). | `True` | Returns: | Type | Description | | -------- | -------------------------- | | `FsPath` | The created path (chains). | See Also file_create : The file counterpart. FsPath.mkdir : Fluent equivalent. Examples: ``` >>> dir_create("out/plots") FsPath('out/plots') >>> dir_exists("out/plots") True ``` ## pyrfs.dir_exists ``` dir_exists(path: str) -> bool ``` Whether the path exists and is a directory (follows symlinks). Vectorized: also accepts an iterable or pandas Series of paths. See Also pyrfs.is_dir : Entry-itself (lstat) semantics — a symlink to a directory answers `False` there but `True` here. ## pyrfs.dir_ls ``` dir_ls(path: PathInput = '.', *, all: bool = False, recurse: bool | int = False, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> list[FsPath] ``` List directory entries with the full fs filter set. The eager form of `dir_walk` — same parameters, returns a sorted list. Parameters: | Name | Type | Description | Default | | --------- | ------------------------ | ----------------------------------------------------------------------------------------------- | ------- | | `path` | `str or PathLike` | Directory to list (default: the working directory). | `'.'` | | `all` | `bool` | Include hidden dotfiles. | `False` | | `recurse` | `bool or int` | True = full recursion, False = this level only, an int limits depth (1 = one level below path). | `False` | | `type` | `str or iterable of str` | Keep only these entry types ("file", "directory", "symlink", ...); "any" keeps all. | `'any'` | | `glob` | `str` | Keep entries whose path matches (mutually exclusive). | `None` | | `regexp` | `str` | Keep entries whose path matches (mutually exclusive). | `None` | | `invert` | `bool` | Keep entries that do not match glob/regexp. | `False` | | `fail` | `bool` | Raise on unreadable entries (True) or warn and skip (False). | `True` | Returns: | Type | Description | | ---------------- | ------------------------------------------------------- | | `list of FsPath` | Entry paths, prefixed by path, siblings sorted by name. | Raises: | Type | Description | | -------------- | --------------------------------------------------------------------- | | `FsValueError` | If both glob and regexp are set, or type names an unknown entry type. | See Also dir_walk : The lazy (generator) form. dir_info : The same listing as typed stat rows / DataFrame. pyrfs.path_filter : The same glob/regexp filter for in-memory lists. Examples: ``` >>> from pyrfs import file_touch >>> _ = dir_create("proj/sub") >>> _ = file_touch(["proj/a.py", "proj/b.txt"]) >>> dir_ls("proj") [FsPath('proj/a.py'), FsPath('proj/b.txt'), FsPath('proj/sub')] >>> dir_ls("proj", glob="*.py") [FsPath('proj/a.py')] >>> dir_ls("proj", type="directory") [FsPath('proj/sub')] ``` ## pyrfs.dir_walk ``` dir_walk(path: PathInput = '.', *, all: bool = False, recurse: bool | int = False, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> Iterator[FsPath] ``` Lazily yield directory entries, with the full fs filter set. Parameters: | Name | Type | Description | Default | | --------- | ------------------------ | ----------------------------------------------------------------------------------------------- | ------- | | `path` | `str or PathLike` | Directory to walk. | `'.'` | | `all` | `bool` | Include hidden dotfiles. | `False` | | `recurse` | `bool or int` | True = full recursion, False = this level only, an int limits depth (1 = one level below path). | `False` | | `type` | `str or iterable of str` | Keep only these entry types ("file", "directory", "symlink", ...); "any" keeps all. | `'any'` | | `glob` | `str` | Keep entries whose path matches (mutually exclusive). | `None` | | `regexp` | `str` | Keep entries whose path matches (mutually exclusive). | `None` | | `invert` | `bool` | Keep entries that do not match glob/regexp. | `False` | | `fail` | `bool` | Raise on unreadable entries (True) or warn and skip (False). | `True` | Yields: | Type | Description | | -------- | ------------------------------------------------------- | | `FsPath` | Entry paths, prefixed by path, siblings sorted by name. | Raises: | Type | Description | | -------------- | --------------------------------------------------------------------- | | `FsValueError` | If both glob and regexp are set, or type names an unknown entry type. | See Also dir_ls : The eager (list-returning) form. dir_map : Apply a function to each entry. Examples: ``` >>> from pyrfs import file_touch >>> _ = dir_create("logs") >>> _ = file_touch("logs/a.log") >>> walker = dir_walk("logs") # nothing read yet — it's a generator >>> next(walker) FsPath('logs/a.log') ``` ## pyrfs.dir_map ``` dir_map(path: PathInput, fn: Callable[[FsPath], object], *, all: bool = False, recurse: bool | int = False, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> list[object] ``` Apply `fn` to each entry and collect the results. Takes the same filter arguments as `dir_ls`. See Also dir_walk : Iterate lazily instead of collecting. Examples: ``` >>> from pyrfs import file_touch >>> _ = dir_create("d") >>> _ = file_touch(["d/a.py", "d/b.py"]) >>> dir_map("d", lambda p: p.ext()) ['py', 'py'] ``` ## pyrfs.dir_copy ``` dir_copy(path: str, new_path: PathInput, *, overwrite: bool = False) -> FsPath ``` Copy a directory tree to `new_path` (a name, or an existing directory). Same destination resolution and `overwrite` guard as `file_copy`: copying into an existing directory targets `new_path/basename` (shell `cp -r` semantics). With `overwrite=True` an existing destination is *replaced*, not merged. Symlinks are copied as symlinks. Parameters: | Name | Type | Description | Default | | ----------- | ----------------- | ----------------------------------------------------------- | ---------- | | `path` | `str or PathLike` | Source directory. | *required* | | `new_path` | `str or PathLike` | Destination name, or an existing directory to copy into. | *required* | | `overwrite` | `bool` | Replace an existing (resolved) destination (default False). | `False` | Returns: | Type | Description | | -------- | ------------------------- | | `FsPath` | The root of the new copy. | Raises: | Type | Description | | -------------------- | ------------------------------------------------------------ | | `NotADirectoryError` | If path is not a directory. | | `FileExistsError` | If the (resolved) destination exists and overwrite is False. | See Also file_copy : Single files. file_move : Directories move via `file_move` (there is no dir_move). Examples: ``` >>> _ = dir_create("src/sub") >>> dir_copy("src", "backup") FsPath('backup') >>> dir_exists("backup/sub") True ``` ## pyrfs.dir_delete ``` dir_delete(path: str) -> FsPath ``` Delete a directory and everything below it (recursive, like `rm -rf`). Vectorized: also accepts an iterable or pandas Series of paths. Returns: | Type | Description | | -------- | ----------------- | | `FsPath` | The deleted path. | See Also file_delete : Single files and symlinks. FsPath.rmdir : Fluent equivalent. Examples: ``` >>> _ = dir_create("scratch/deep") >>> dir_delete("scratch") FsPath('scratch') >>> dir_exists("scratch") False ``` ## pyrfs.dir_tree ``` dir_tree(path: PathInput = '.', *, recurse: bool | int = True, all: bool = False) -> None ``` Print a box-drawing tree of the directory, like the Unix `tree`. Entries are coloured by type via `LS_COLORS` in a capable terminal (plain on non-TTY or `NO_COLOR`). Hidden files are skipped unless `all=True`; `recurse` limits depth as in `dir_ls`. Examples: ``` >>> from pyrfs import file_touch >>> _ = dir_create("proj/src") >>> _ = file_touch("proj/README.md") >>> dir_tree("proj") proj ├── README.md └── src ``` # Files — `file_*` Mutating verbs return the (new) path so calls chain; `overwrite=False` on an existing target raises `FileExistsError`. Copy/move into an existing directory targets `dir/basename`. ## pyrfs.file_create ``` file_create(path: str, *, mode: int | str = 420) -> FsPath ``` Create a new file (an existing file is left unchanged). Vectorized: also accepts an iterable or pandas Series of paths. Parameters: | Name | Type | Description | Default | | ------ | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `path` | `str or PathLike` | The file to create. The parent directory must exist. | *required* | | `mode` | `int or str` | Permissions for a newly created file — octal string ("644"), symbolic ("u=rw,go=r"), or raw bits (default 0o644); subject to the process umask. | `420` | Returns: | Type | Description | | -------- | -------------------------- | | `FsPath` | The created path (chains). | See Also file_touch : Also update timestamps when the file exists. pyrfs.dir_create : The directory counterpart. Examples: ``` >>> file_create("notes.txt") FsPath('notes.txt') ``` ## pyrfs.file_touch ``` file_touch(path: str) -> FsPath ``` Update access/modification times, creating the file if needed. Vectorized: also accepts an iterable or pandas Series of paths. See Also file_create : Create without updating timestamps of an existing file. Examples: ``` >>> file_touch("stamp.txt") FsPath('stamp.txt') ``` ## pyrfs.file_copy ``` file_copy(path: str, new_path: PathInput, *, overwrite: bool = False) -> FsPath ``` Copy a file to `new_path` (a file name, or an existing directory). Vectorized: copy many files into one directory with `file_copy([a, b], "dir")`. Parameters: | Name | Type | Description | Default | | ----------- | ----------------- | --------------------------------------------------------------------------------------------------------- | ---------- | | `path` | `str or PathLike` | Source file. | *required* | | `new_path` | `str or PathLike` | Destination file name, or an existing directory to copy into (the target then becomes new_path/basename). | *required* | | `overwrite` | `bool` | Allow clobbering an existing destination (default False). | `False` | Returns: | Type | Description | | -------- | ------------------------- | | `FsPath` | The path of the new copy. | Raises: | Type | Description | | ----------------- | ------------------------------------------------------------ | | `FileExistsError` | If the (resolved) destination exists and overwrite is False. | See Also file_move : Move instead of copy. pyrfs.dir_copy : Copy a directory tree. FsPath.copy_to : Fluent equivalent. Examples: ``` >>> src = file_create("a.txt") >>> file_copy(src, "b.txt") FsPath('b.txt') >>> file_copy(src, "b.txt") Traceback (most recent call last): ... FileExistsError: target already exists: FsPath('b.txt') (pass overwrite=True) ``` ## pyrfs.file_move ``` file_move(path: str, new_path: PathInput, *, overwrite: bool = False) -> FsPath ``` Move (rename) a file — or a directory: dirs move via `file_move`. Same destination resolution and `overwrite` guard as `file_copy`. There is deliberately no `dir_move`, matching fs. Parameters: | Name | Type | Description | Default | | ----------- | ----------------- | --------------------------------------------------------- | ---------- | | `path` | `str or PathLike` | Source file or directory. | *required* | | `new_path` | `str or PathLike` | Destination name, or an existing directory to move into. | *required* | | `overwrite` | `bool` | Allow clobbering an existing destination (default False). | `False` | Returns: | Type | Description | | -------- | ----------------- | | `FsPath` | The new location. | Raises: | Type | Description | | ----------------- | ------------------------------------------------------------ | | `FileExistsError` | If the (resolved) destination exists and overwrite is False. | See Also file_copy : Copy instead of move. FsPath.move_to : Fluent equivalent. Examples: ``` >>> _ = file_create("a.txt") >>> file_move("a.txt", "b.txt") FsPath('b.txt') ``` ## pyrfs.file_delete ``` file_delete(path: str) -> FsPath ``` Delete a file or symlink (for directories use `dir_delete`). Vectorized: also accepts an iterable or pandas Series of paths. Returns: | Type | Description | | -------- | ----------------- | | `FsPath` | The deleted path. | Raises: | Type | Description | | ------------------- | --------------------------- | | `FileNotFoundError` | If the file does not exist. | See Also pyrfs.dir_delete : Recursive directory deletion. pyrfs.link_delete : Symlink-only deletion (refuses non-links). Examples: ``` >>> p = file_create("scrap.txt") >>> file_delete(p) FsPath('scrap.txt') >>> file_exists(p) False ``` ## pyrfs.file_exists ``` file_exists(path: str) -> bool ``` Whether the path exists — a broken symlink counts as existing. Uses `lexists` (the entry itself), matching fs. Vectorized: also accepts an iterable or pandas Series of paths. See Also pyrfs.dir_exists : Directory-specific test (follows symlinks). pyrfs.is_file, pyrfs.is_dir, pyrfs.is_link : Type predicates. Examples: ``` >>> _ = file_create("here.txt") >>> file_exists(["here.txt", "gone.txt"]) [True, False] ``` ## pyrfs.file_access ``` file_access(path: str, mode: str = 'exists') -> bool ``` Test access to a path for the current process. Vectorized: also accepts an iterable or pandas Series of paths. Parameters: | Name | Type | Description | Default | | ------ | ---------------------------------------- | ---------------------------- | ---------- | | `path` | `str or PathLike` | The path to test. | *required* | | `mode` | `('exists', 'read', 'write', 'execute')` | The kind of access to check. | `"exists"` | Raises: | Type | Description | | -------------- | ----------------------------------------------- | | `FsValueError` | If mode is not one of the four accepted values. | Examples: ``` >>> p = file_create("data.txt") >>> file_access(p, "read") True ``` ## pyrfs.file_size ``` file_size(path: str) -> Bytes ``` File size as a `pyrfs.Bytes` value (compares against literals). Vectorized: also accepts an iterable or pandas Series of paths. Returns: | Type | Description | | ------- | ----------------------------------------------------------------------------------------------------- | | `Bytes` | The size — an int subclass that displays humanized (444.5K) and compares against strings like "10KB". | See Also pyrfs.Bytes : The typed scalar. file_info : Size together with the full stat row. Examples: ``` >>> p = file_create("two-bytes.bin") >>> with open(p, "wb") as fh: ... _ = fh.write(b"hi") >>> file_size(p) Bytes(2) >>> file_size(p) < "1KB" True ``` ## pyrfs.file_chmod ``` file_chmod(path: str, mode: int | str) -> FsPath ``` Change permissions; symbolic modes apply relative to the current mode. Vectorized: also accepts an iterable or pandas Series of paths. Parameters: | Name | Type | Description | Default | | ------ | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `path` | `str or PathLike` | The file to change. | *required* | | `mode` | `int or str` | Octal string ("644"), display form ("rw-r--r--"), or raw bits — all absolute; symbolic clauses ("u+x") modify the current mode, like the chmod command. | *required* | See Also pyrfs.Perms : The typed permission scalar. FsPath.chmod : Fluent equivalent. Examples: ``` >>> p = file_create("run.sh", mode="644") >>> _ = file_chmod(p, "u+x") >>> file_access(p, "execute") True ``` ## pyrfs.file_chown ``` file_chown(path: str, user: str | int | None = None, group: str | int | None = None) -> FsPath ``` Change owner and/or group (names or numeric ids; POSIX only). Parameters: | Name | Type | Description | Default | | ------- | ----------------- | ------------------------ | ---------- | | `path` | `str or PathLike` | The file to change. | *required* | | `user` | `str or int` | New owner (name or uid). | `None` | | `group` | `str or int` | New group (name or gid). | `None` | Raises: | Type | Description | | -------------- | ----------------------------------- | | `FsValueError` | If neither user nor group is given. | ## pyrfs.file_show ``` file_show(path: str) -> FsPath ``` Open a file in the OS default application (`open`/`xdg-open`). Examples: ``` >>> file_show("report.pdf") FsPath('report.pdf') ``` # FsPath ## pyrfs.FsPath Bases: `str` A tidy filesystem path string — the fluent pyrfs surface. Construction normalizes the path (`/` separators, no doubled or trailing slashes). The `/` operator joins; methods chain because each returns an `FsPath`. Inherited `str` behavior is untouched — `p.split("/")`, `p.startswith(...)`, `open(p)` all work as on any string (the *split-into-components* method is `parts`, so `str.split` is never shadowed). In a capable terminal the repr is coloured by on-disk type via `LS_COLORS`. See Also pyrfs.path : Functional construction with an `ext=` option. as_pathlib : Convert when you want `pathlib` semantics. Examples: ``` >>> FsPath("src//a.txt/") # tidied on construction FsPath('src/a.txt') >>> (FsPath("foo") / "bar" / "a.txt").with_ext("md") FsPath('foo/bar/a.md') >>> FsPath("a/b").startswith("a") # still a str True ``` ### __truediv__ ``` __truediv__(other: str | PathLike[str]) -> FsPath ``` Join with `other`: `FsPath('a') / 'b'` -> `FsPath('a/b')`. Concatenation + tidy: an absolute right-hand side does *not* reset the path (unlike `pathlib`/`os.path.join`). ### __rtruediv__ ``` __rtruediv__(other: str | PathLike[str]) -> FsPath ``` Support `'a' / FsPath('b')` joining from a plain string. ### ext ``` ext() -> str ``` Extension without the dot (`''` if none) — `pyrfs.path_ext`. ### with_ext ``` with_ext(ext: str) -> FsPath ``` Replace (or add) the extension; `''` removes it — `pyrfs.path_ext_set`. Examples: ``` >>> (FsPath("data") / "raw.csv").with_ext("parquet") FsPath('data/raw.parquet') ``` ### dir ``` dir() -> FsPath ``` Directory part of the path (`'.'` if none) — `pyrfs.path_dir`. ### name ``` name() -> FsPath ``` File name — the last path component — `pyrfs.path_file`. ### parts ``` parts() -> list[str] ``` Path components (a leading root stays `'/'`) — `pyrfs.path_split`. Named `parts` (as in `pathlib`) so `str.split` keeps its normal string behavior. Examples: ``` >>> FsPath("/usr/bin").parts() ['/', 'usr', 'bin'] ``` ### rel_to ``` rel_to(start: str | PathLike[str]) -> FsPath ``` This path expressed relative to `start` — `pyrfs.path_rel`. ### has_parent ``` has_parent(parent: str | PathLike[str]) -> bool ``` Whether this path sits at or below `parent` — `pyrfs.path_has_parent`. ### expand ``` expand() -> FsPath ``` Expand a leading `~` to the home directory — `pyrfs.path_expand`. ### norm ``` norm() -> FsPath ``` Normalize `.` and `..` lexically — `pyrfs.path_norm`. ### abs ``` abs() -> FsPath ``` Absolute form (links unresolved) — `pyrfs.path_abs`. ### real ``` real() -> FsPath ``` Canonical form, symlinks resolved — `pyrfs.path_real`. ### copy_to ``` copy_to(new_path: str | PathLike[str], *, overwrite: bool = False) -> FsPath ``` Copy this file to `new_path` — `pyrfs.file_copy`. Copying into an existing directory targets `new_path/basename`; an existing destination raises `FileExistsError` unless `overwrite=True`. Returns the new copy's path (chains). ### move_to ``` move_to(new_path: str | PathLike[str], *, overwrite: bool = False) -> FsPath ``` Move (rename) this file or directory — `pyrfs.file_move`. Same destination resolution and `overwrite` guard as `copy_to`. ### create ``` create(*, mode: int | str = 420) -> FsPath ``` Create this file (existing files untouched) — `pyrfs.file_create`. ### touch ``` touch() -> FsPath ``` Update timestamps, creating the file if needed — `pyrfs.file_touch`. ### delete ``` delete() -> None ``` Delete this file or symlink — `pyrfs.file_delete`. Returns `None`: a deleted path has nothing to chain onto. For directories use `rmdir`. ### exists ``` exists() -> bool ``` Whether this path exists (broken symlinks count) — `pyrfs.file_exists`. ### access ``` access(mode: str = 'exists') -> bool ``` Test `"exists"`/`"read"`/`"write"`/`"execute"` — `pyrfs.file_access`. ### size ``` size() -> Bytes ``` File size as a `pyrfs.Bytes` value — `pyrfs.file_size`. Examples: ``` >>> FsPath("notes.txt").create().size() == 0 True ``` ### chmod ``` chmod(mode: int | str) -> FsPath ``` Change permissions — `pyrfs.file_chmod`. Symbolic modes (`"u+x"`) apply to the *current* mode; octal and display forms are absolute. Returns this path (chains). ### info ``` info() -> dict[str, object] ``` Stat this path into one row of typed values — `pyrfs.file_info`. Returns a single `dict` (use the functional `pyrfs.file_info` / `pyrfs.dir_info` for tables). ### mkdir ``` mkdir(*, mode: int | str = 493, recurse: bool = True) -> FsPath ``` Create this directory (parents too when `recurse`) — `pyrfs.dir_create`. Examples: ``` >>> FsPath("proj").mkdir().touch_file("README.md").ls() [FsPath('proj/README.md')] ``` ### rmdir ``` rmdir() -> None ``` Delete this directory and everything below it — `pyrfs.dir_delete`. Recursive (`rm -rf` semantics), despite the `os.rmdir`-like name. Returns `None`: nothing left to chain onto. ### touch_file ``` touch_file(name: str | PathLike[str]) -> FsPath ``` Create a child file and return *this directory* (keeps chaining). Returning the directory (not the new file) lets several `touch_file` calls chain; use `(p / name).touch()` when you want the file's path back. ### ls ``` ls(*, all: bool = False, recurse: bool | int = False, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> list[FsPath] ``` List entries of this directory — `pyrfs.dir_ls` (same filters). ### walk ``` walk(*, all: bool = False, recurse: bool | int = True, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> Iterator[FsPath] ``` Lazily yield entries below this directory — `pyrfs.dir_walk`. Unlike the functional default, `recurse=True` here: walking a tree is the common fluent use. ### tree ``` tree(*, recurse: bool | int = True, all: bool = False) -> None ``` Print a box-drawing tree of this directory — `pyrfs.dir_tree`. ### is_file ``` is_file() -> bool ``` Whether this is a regular file (lstat; symlinks answer `False`) — `pyrfs.is_file`. ### is_dir ``` is_dir() -> bool ``` Whether this is a directory (lstat; symlinks answer `False`) — `pyrfs.is_dir`. ### is_link ``` is_link() -> bool ``` Whether this is a symlink — `pyrfs.is_link`. ### as_pathlib ``` as_pathlib() -> pathlib.Path ``` This path as a `pathlib.Path`, when you want pathlib semantics. Examples: ``` >>> FsPath("a/b").as_pathlib() PosixPath('a/b') ``` # Info, temp & errors `*_info` returns a typed DataFrame when pandas is installed, otherwise `list[dict]` rows carrying the same typed scalars. ## pyrfs.file_info ``` file_info(path: PathInput | Iterable[PathInput], *, follow: bool = False) -> pd.DataFrame | list[dict[str, object]] ``` Stat path(s) into a typed table. Returns a DataFrame with typed columns (`path`/`size`/`permissions` as pyrfs dtypes) when pandas is installed, else `list[dict]` rows of the same typed scalars. Parameters: | Name | Type | Description | Default | | -------- | --------------------------------------- | --------------------------------------------------------------------- | ---------- | | `path` | `str, os.PathLike, or iterable of them` | Path(s) to stat. | *required* | | `follow` | `bool` | Stat symlink targets instead of the links themselves (default False). | `False` | See Also dir_info : Stat a directory's entries. pyrfs.FsPath.info : One row, as a plain dict. Examples: ``` >>> file_info("pyproject.toml") path type size permissions ... 0 pyproject.toml file 1.7K rw-r--r-- ... ``` ## pyrfs.dir_info ``` dir_info(path: PathInput = '.', *, all: bool = False, recurse: bool | int = False, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> pd.DataFrame | list[dict[str, object]] ``` Stat directory entries into a typed table (same filters as `dir_ls`). Returns a DataFrame with typed columns when pandas is installed, else `list[dict]` rows. This is the fs headline: with typed columns, string literals work inside `.query()`. See Also file_info : Stat explicit path(s). pyrfs.dir_ls : The underlying listing and its filter arguments. Examples: ``` >>> (dir_info("pyrfs", recurse=True) ... .query("size > '10KB' and type == 'file'") ... .sort_values("size", ascending=False)) ``` ## pyrfs.has_pandas ``` has_pandas() -> bool ``` Whether pandas is importable (cached; decides the `*_info` shape). Examples: ``` >>> has_pandas() in (True, False) True ``` ## pyrfs.file_temp ``` file_temp(pattern: str = 'file', tmp_dir: PathInput | None = None, ext: str = '') -> FsPath ``` Return a unique temp path (a *name* only — the file is not created). If names were queued with `file_temp_push`, the oldest queued name is returned instead — deterministic mode, fs's trick for reproducible examples, docs, and tests. Parameters: | Name | Type | Description | Default | | --------- | ----------------- | ------------------------------------------------------------- | -------- | | `pattern` | `str` | Filename prefix (default "file"). | `'file'` | | `tmp_dir` | `str or PathLike` | Directory for the name (default: the session temp directory). | `None` | | `ext` | `str` | Extension, with or without the leading dot. | `''` | See Also file_temp_push, file_temp_pop : The deterministic-name queue. pyrfs.path_temp : The temp *directory* itself. Examples: ``` >>> file_temp(ext="csv") FsPath('/tmp/file2bf36b4eb5d8.csv') >>> _ = file_temp_push("/tmp/demo.csv") >>> file_temp() # deterministic: returns the queued name FsPath('/tmp/demo.csv') ``` ## pyrfs.file_temp_push ``` file_temp_push(path: PathInput | Iterable[PathInput]) -> list[FsPath] ``` Queue deterministic path(s) for subsequent `file_temp` calls. Returns: | Type | Description | | ---------------- | ------------------------------ | | `list of FsPath` | The queued paths (FIFO order). | Examples: ``` >>> file_temp_push(["/tmp/one", "/tmp/two"]) [FsPath('/tmp/one'), FsPath('/tmp/two')] >>> file_temp(), file_temp() (FsPath('/tmp/one'), FsPath('/tmp/two')) ``` ## pyrfs.file_temp_pop ``` file_temp_pop() -> FsPath | None ``` Remove and return the oldest queued temp path (`None` if empty). Examples: ``` >>> _ = file_temp_push("/tmp/queued") >>> file_temp_pop() FsPath('/tmp/queued') >>> file_temp_pop() is None True ``` ## pyrfs.FsError Bases: `Exception` Base class for all pyrfs validation errors. ## pyrfs.FsValueError Bases: `FsError`, `ValueError` An argument value (or combination of arguments) is invalid. # Links — `link_*` `link_create(path, new_path)` creates `new_path` pointing *to* `path` (the fs argument order). Symbolic links are the default. ## pyrfs.link_create ``` link_create(path: str, new_path: PathInput, *, symbolic: bool = True) -> FsPath ``` Create a link at `new_path` pointing to `path`. Note the argument order (fs's): target first, link name second. Parameters: | Name | Type | Description | Default | | ---------- | ----------------- | ------------------------------------------------------------ | ---------- | | `path` | `str or PathLike` | What the link points to (need not exist for symbolic links). | *required* | | `new_path` | `str or PathLike` | Where to create the link. | *required* | | `symbolic` | `bool` | Symbolic link (default) or hard link. | `True` | Returns: | Type | Description | | -------- | -------------------- | | `FsPath` | The new link's path. | Raises: | Type | Description | | ----------------- | --------------------------- | | `FileExistsError` | If new_path already exists. | See Also link_path : Read where a symlink points. Examples: ``` >>> from pyrfs import file_touch >>> _ = file_touch("big.csv") >>> link_create("big.csv", "latest.csv") FsPath('latest.csv') >>> link_path("latest.csv") FsPath('big.csv') ``` ## pyrfs.link_path ``` link_path(path: str) -> FsPath ``` Return the target a symlink points to (`OSError` if not a symlink). Vectorized: also accepts an iterable or pandas Series of paths. See Also pyrfs.path_real : Fully resolve a path through all links. ## pyrfs.link_exists ``` link_exists(path: str) -> bool ``` Whether the path is a symlink (its target need not exist). Equivalent to `pyrfs.is_link`. Vectorized: also accepts an iterable or pandas Series of paths. ## pyrfs.link_copy ``` link_copy(path: str, new_path: PathInput, *, overwrite: bool = False) -> FsPath ``` Copy a symlink itself (the new link points to the same target). The target is *not* copied — use `pyrfs.file_copy` to copy what the link points to. Parameters: | Name | Type | Description | Default | | ----------- | ----------------- | --------------------------------------------------------- | ---------- | | `path` | `str or PathLike` | An existing symlink. | *required* | | `new_path` | `str or PathLike` | Where to create the duplicate link. | *required* | | `overwrite` | `bool` | Allow clobbering an existing destination (default False). | `False` | Raises: | Type | Description | | ----------------- | ------------------------------------------------- | | `FileExistsError` | If the destination exists and overwrite is False. | ## pyrfs.link_delete ``` link_delete(path: str) -> FsPath ``` Delete a symlink — the target is untouched; non-links are refused. Raises: | Type | Description | | -------------- | ------------------------------------------------------------------------------------ | | `FsValueError` | If path is not a symlink (use pyrfs.file_delete or pyrfs.dir_delete for real files). | Examples: ``` >>> from pyrfs import file_exists, file_touch >>> _ = file_touch("real.txt") >>> _ = link_create("real.txt", "ln.txt") >>> _ = link_delete("ln.txt") >>> file_exists("real.txt") # target survives True ``` # Path algebra — `path_*` Pure path-string manipulation, no filesystem I/O (except the few that resolve against the running process, as documented). All functions accept a scalar, list, or pandas Series as the first argument and return tidy [`FsPath`](https://pyrfs.netlify.app/api/fspath/index.md) values. ## pyrfs.path ``` path(*parts: PathInput, ext: str = '') -> FsPath ``` Construct a tidy path from parts, optionally adding an extension. Parts are joined with `/` and tidied. The join is pure concatenation — an absolute later part does *not* reset the path, unlike `os.path.join`. Parameters: | Name | Type | Description | Default | | -------- | ----------------- | -------------------------------------------------------------------------------------------- | ------- | | `*parts` | `str or PathLike` | Path components to join. | `()` | | `ext` | `str` | Extension to append, with or without the leading dot (one dot is guaranteed, never doubled). | `''` | Returns: | Type | Description | | -------- | ---------------------- | | `FsPath` | The joined, tidy path. | See Also path_join : Join components given as a list (inverse of `path_split`). FsPath.**truediv** : The fluent `/` join operator. Examples: ``` >>> path("foo", "bar", "a", ext="txt") FsPath('foo/bar/a.txt') >>> path("a/", "/b") # concatenation, not os.path.join reset FsPath('a/b') ``` ## pyrfs.path_wd ``` path_wd() -> FsPath ``` Return the current working directory as a tidy path. See Also path_abs : Anchor a relative path to the working directory. ## pyrfs.path_abs ``` path_abs(path: str) -> FsPath ``` Make a path absolute against the working directory (links unresolved). A leading `~` is expanded first. Vectorized: also accepts an iterable or pandas Series of paths. See Also path_real : Also resolve symlinks (canonical form). path_norm : Lexical `.`/`..` normalization only. Examples: ``` >>> path_abs("data").startswith("/") True ``` ## pyrfs.path_real ``` path_real(path: str) -> FsPath ``` Canonicalize a path, resolving symlinks (touches the filesystem). Vectorized: also accepts an iterable or pandas Series of paths. See Also path_abs : Absolute form without resolving links. ## pyrfs.path_norm ``` path_norm(path: str) -> FsPath ``` Normalize `.` and `..` components lexically (no filesystem access). Vectorized: also accepts an iterable or pandas Series of paths. Examples: ``` >>> path_norm("a/../b/./c") FsPath('b/c') ``` ## pyrfs.path_rel ``` path_rel(path: str, start: PathInput = '.') -> FsPath ``` Return the path expressed relative to `start`. Vectorized: also accepts an iterable or pandas Series of paths. Parameters: | Name | Type | Description | Default | | ------- | ----------------- | ------------------------------------------------------ | ---------- | | `path` | `str or PathLike` | The path to re-express. | *required* | | `start` | `str or PathLike` | The anchor directory (default: the working directory). | `'.'` | See Also path_has_parent : Test containment instead of computing the relation. FsPath.rel_to : Fluent equivalent. Examples: ``` >>> path_rel("/a/b/c", "/a") FsPath('b/c') >>> path_rel("/a/b", "/a/d") FsPath('../b') ``` ## pyrfs.path_expand ``` path_expand(path: str) -> FsPath ``` Expand a leading `~` to the user's home directory. Vectorized: also accepts an iterable or pandas Series of paths. See Also path_home : Build paths under the home directory directly. ## pyrfs.path_home ``` path_home(*parts: PathInput) -> FsPath ``` Return the user's home directory, optionally joined with `parts`. Examples: ``` >>> path_home("data").endswith("/data") True ``` ## pyrfs.path_temp ``` path_temp(*parts: PathInput) -> FsPath ``` Return the session temp directory, optionally joined with `parts`. See Also pyrfs.file_temp : A unique temp *file name* (not just the directory). ## pyrfs.path_tidy ``` path_tidy(path: str) -> FsPath ``` Tidy a path: `/` separators, no doubled or trailing slashes. Every pyrfs function already returns tidy paths; use this to normalize paths from elsewhere. Vectorized: also accepts an iterable or pandas Series of paths. Examples: ``` >>> path_tidy("src//a.txt/") FsPath('src/a.txt') >>> path_tidy("C:\\data\\x") FsPath('C:/data/x') ``` ## pyrfs.path_split ``` path_split(path: str) -> list[str] ``` Split a tidy path into components (a leading root stays `'/'`). Vectorized: a list of paths yields a list of component lists. See Also path_join : The inverse operation. FsPath.parts : Fluent equivalent. Examples: ``` >>> path_split("/usr/bin") ['/', 'usr', 'bin'] >>> path_split("a/b") ['a', 'b'] ``` ## pyrfs.path_join ``` path_join(parts: Iterable[PathInput | Iterable[PathInput]]) -> FsPath | list[FsPath] ``` Join split components back into path(s) — the inverse of `path_split`. Parameters: | Name | Type | Description | Default | | ------- | ---------- | -------------------------------------------------------------------------------------- | ---------- | | `parts` | `iterable` | Either one sequence of components, or a sequence of such sequences (joining each one). | *required* | See Also path : Variadic construction with an optional extension. Examples: ``` >>> path_join(["/", "usr", "bin"]) FsPath('/usr/bin') >>> path_join([["a", "b"], ["c", "d"]]) [FsPath('a/b'), FsPath('c/d')] ``` ## pyrfs.path_file ``` path_file(path: str) -> FsPath ``` Return the file name — the last path component. Vectorized: also accepts an iterable or pandas Series of paths. See Also path_dir : The complementary directory part. FsPath.name : Fluent equivalent. Examples: ``` >>> path_file("a/b/c.txt") FsPath('c.txt') ``` ## pyrfs.path_dir ``` path_dir(path: str) -> FsPath ``` Return the directory part of a path (`'.'` if there is none). Vectorized: also accepts an iterable or pandas Series of paths. See Also path_file : The complementary file-name part. FsPath.dir : Fluent equivalent. Examples: ``` >>> path_dir("a/b/c.txt") FsPath('a/b') >>> path_dir("c.txt") FsPath('.') ``` ## pyrfs.path_ext ``` path_ext(path: str) -> str ``` Return the extension without the dot (`''` if none). Dotfiles like `.gitignore` count as having no extension. Vectorized: also accepts an iterable or pandas Series of paths. See Also path_ext_set, path_ext_remove Examples: ``` >>> path_ext("a.tar.gz") 'gz' >>> path_ext(".gitignore") '' ``` ## pyrfs.path_ext_remove ``` path_ext_remove(path: str) -> FsPath ``` Remove the extension (dotfiles like `.gitignore` are left intact). Vectorized: also accepts an iterable or pandas Series of paths. Examples: ``` >>> path_ext_remove("d/a.tar.gz") FsPath('d/a.tar') ``` ## pyrfs.path_ext_set ``` path_ext_set(path: str, ext: str) -> FsPath ``` Replace (or add) the extension; an empty `ext` removes it. Vectorized: also accepts an iterable or pandas Series of paths. Parameters: | Name | Type | Description | Default | | ------ | ----------------- | --------------------------------------------------------------------------------- | ---------- | | `path` | `str or PathLike` | The path to modify. | *required* | | `ext` | `str` | New extension, with or without the leading dot; "" removes the current extension. | *required* | See Also FsPath.with_ext : Fluent equivalent. Examples: ``` >>> path_ext_set("report.md", "html") FsPath('report.html') >>> path_ext_set(["a.txt", "b"], "py") [FsPath('a.py'), FsPath('b.py')] ``` ## pyrfs.path_common ``` path_common(paths: Iterable[PathInput]) -> FsPath ``` Return the longest common path prefix of `paths`. Parameters: | Name | Type | Description | Default | | ------- | -------------------------------- | ------------------------------------------------ | ---------- | | `paths` | `iterable of str or os.PathLike` | At least one path; all absolute or all relative. | *required* | Raises: | Type | Description | | -------------- | ------------------------------------------------------- | | `FsValueError` | If paths is empty or mixes absolute and relative paths. | Examples: ``` >>> path_common(["a/b/c", "a/b/d"]) FsPath('a/b') ``` ## pyrfs.path_filter ``` path_filter(paths: Iterable[PathInput], glob: str | None = None, regexp: str | None = None, *, invert: bool = False) -> list[FsPath] ``` Filter paths by a glob or a regular expression (mutually exclusive). Parameters: | Name | Type | Description | Default | | -------- | -------------------------------- | ----------------------------------------------------------------------------------------------- | ---------- | | `paths` | `iterable of str or os.PathLike` | Paths to filter. | *required* | | `glob` | `str` | Wildcard pattern matched against the whole path (e.g. "\*.py"); mutually exclusive with regexp. | `None` | | `regexp` | `str` | Regular expression searched within the path; mutually exclusive with glob. | `None` | | `invert` | `bool` | Keep the paths that do not match. | `False` | Raises: | Type | Description | | -------------- | -------------------------------- | | `FsValueError` | If both glob and regexp are set. | See Also pyrfs.dir_ls : Directory listing with the same filter arguments. Examples: ``` >>> path_filter(["a.py", "b.txt", "src/c.py"], glob="*.py") [FsPath('a.py'), FsPath('src/c.py')] >>> path_filter(["a.py", "b.txt"], glob="*.py", invert=True) [FsPath('b.txt')] ``` ## pyrfs.path_has_parent ``` path_has_parent(path: str, parent: PathInput) -> bool ``` Return whether `path` sits at or below `parent`. Both are anchored to the working directory before comparing, so relative and absolute forms compare consistently. Vectorized: also accepts an iterable or pandas Series of paths. See Also path_rel : Compute the relative path instead of testing containment. Examples: ``` >>> path_has_parent("/x/y", "/x") True >>> path_has_parent("/xy/z", "/x") False ``` ## pyrfs.path_sanitize ``` path_sanitize(filename: str, replacement: str = '') -> str ``` Turn an untrusted string into a filename safe on all major OSes. Removes control characters, characters illegal in filenames (`/\?<>:*|"`), trailing dots/spaces, and Windows-reserved device names; truncates to 255 characters. Operates on a *filename*, not a path — separators are stripped, not preserved. Parameters: | Name | Type | Description | Default | | ------------- | ----- | ------------------------------------------------------------- | ---------- | | `filename` | `str` | The untrusted string. | *required* | | `replacement` | `str` | What to substitute for removed characters (default: nothing). | `''` | Examples: ``` >>> path_sanitize("rep/ort:2026*") 'report2026' >>> path_sanitize("a/b", "_") 'a_b' ``` # Predicates & ids Type predicates classify the entry itself (lstat): a symlink answers `True` only to `is_link`, matching fs. ## pyrfs.is_file ``` is_file(path: str) -> bool ``` Whether the path is a regular file (symlinks answer `False`). Classifies the entry itself (lstat), matching fs — unlike `os.path.isfile`, which follows symlinks. Vectorized: also accepts an iterable or pandas Series of paths. See Also is_link : The predicate a symlink answers `True` to. pyrfs.file_exists : Existence regardless of type. Examples: ``` >>> from pyrfs import file_touch, link_create >>> _ = file_touch("data.txt") >>> _ = link_create("data.txt", "ln.txt") >>> is_file("data.txt"), is_file("ln.txt"), is_file("missing") (True, False, False) ``` ## pyrfs.is_dir ``` is_dir(path: str) -> bool ``` Whether the path is a directory (symlinks answer `False`). Classifies the entry itself (lstat), matching fs — unlike `os.path.isdir` and `pyrfs.dir_exists`, which follow symlinks. Vectorized: also accepts an iterable or pandas Series of paths. See Also pyrfs.dir_exists : Follow-symlink directory test. ## pyrfs.is_link ``` is_link(path: str) -> bool ``` Whether the path is a symlink (its target need not exist). Vectorized: also accepts an iterable or pandas Series of paths. See Also pyrfs.link_path : Read where the link points. ## pyrfs.is_file_empty ``` is_file_empty(path: str) -> bool ``` Whether the file exists and has size zero. Missing paths answer `False` (they are not empty files). Vectorized: also accepts an iterable or pandas Series of paths. ## pyrfs.is_dir_empty ``` is_dir_empty(path: str) -> bool ``` Whether the directory exists and has no entries (hidden included). Missing paths answer `False`. Vectorized: also accepts an iterable or pandas Series of paths. Examples: ``` >>> from pyrfs import dir_create >>> _ = dir_create("empty") >>> is_dir_empty("empty") True ``` ## pyrfs.is_absolute_path ``` is_absolute_path(path: str) -> bool ``` Whether the path is absolute (a leading `~` counts, as in fs). Pure string test — no filesystem access. Vectorized: also accepts an iterable or pandas Series of paths. Examples: ``` >>> is_absolute_path(["/usr", "~/data", "rel/path"]) [True, True, False] ``` ## pyrfs.user_ids ``` user_ids() -> list[dict[str, object]] ``` All known users as rows of `{"user_id", "user_name"}`. Returns an empty list on platforms without `pwd` (Windows). ## pyrfs.group_ids ``` group_ids() -> list[dict[str, object]] ``` All known groups as rows of `{"group_id", "group_name"}`. Returns an empty list on platforms without `grp` (Windows). # Bytes & Perms Typed scalars that subclass `int` — see the [typed values guide](https://pyrfs.netlify.app/guides/typed-values/index.md). With the `[pandas]` extra, columns of these become the `"bytes"`/`"perms"`/`"path"` ExtensionDtypes. ## pyrfs.Bytes Bases: `int` A byte count that parses and displays human-readable sizes. All units are 1024-based (`"10MB"` == `"10MiB"` == `10 * 1024**2`), matching R's fs. Examples: ``` >>> Bytes("10MB") Bytes(10485760) >>> str(Bytes(455200)) '444.5K' >>> Bytes(455200) < "1MB" True >>> str(Bytes("1MB") + "500KB") '1.49M' ``` Notes `repr` stays exact (`Bytes(455200)`); `str`/`format` humanize. With the `[pandas]` extra, columns of these use the `"bytes"` ExtensionDtype, so the same comparisons work in `DataFrame.query()`. See Also pyrfs.file_size : Returns sizes as `Bytes`. ## pyrfs.Perms Bases: `int` Unix permission bits that parse and display `rwxr-xr-x` style. Construct from octal (`"644"`), symbolic (`"u+rw,go+r"`), display (`"rw-r--r--"`) strings, or raw mode bits. Examples: ``` >>> Perms("644") Perms('rw-r--r--') >>> Perms("644") == "rw-r--r--" True >>> str(Perms("644") | "u+x") 'rwxr--r--' ``` Notes Symbolic strings here build from a base of `0` (so `"u+rw"` == `"u=rw"`); `pyrfs.file_chmod` applies symbolic modes to the file's *current* mode instead, like the `chmod` command. See Also pyrfs.file_chmod : Apply permissions to files. # Design notes # pyrfs — UX Design > A Pythonic port of R's [`fs`](https://fs.r-lib.org) · Status: **design draft** · Last updated: 2026-06-11 Companion: [`pyrfs-architecture.md`](https://pyrfs.netlify.app/design/pyrfs-architecture/index.md) (how it's built) This document defines the **feel** of pyrfs — names, return values, chaining, and the pandas workflow. The guiding goal: an ex-R user who knows `fs` should feel at home immediately, and a Python user should find it idiomatic and pipeable. ______________________________________________________________________ ## 1. UX thesis > **Every function takes path(s) in, and gives a predictable, path-carrying value back — or raises.** The same operation is reachable three ways: as a function, as a method on a path, or as a vectorized column operation in pandas. ``` flowchart LR in["path(s) in
str · FsPath · list · Series"] --> op["pyrfs operation"] op -->|success| out["typed result
FsPath · Bytes · Perms · bool · DataFrame"] op -->|failure| err["raises (OSError / FsError)"] out -.->|chains into| op ``` We inherit the five `fs` promises — **consistent naming · vectorization · predictable returns · explicit failure · tidy UTF-8 paths** — and add a sixth: **three interchangeable surfaces**. ______________________________________________________________________ ## 2. Naming — the four families (kept from `fs`) Functions are grouped by the **noun** they act on, `noun_verb`, snake_case. Type `dir_` + Tab and you see every directory operation. | Prefix | Domain | Examples | | ------- | ------------------------------------------------ | --------------------------------------------------------------------- | | `path_` | construct & manipulate path strings (**no I/O**) | `path()`, `path_dir()`, `path_ext_set()`, `path_rel()`, `path_norm()` | | `file_` | operate on files | `file_create()`, `file_copy()`, `file_info()`, `file_chmod()` | | `dir_` | operate on directories | `dir_create()`, `dir_ls()`, `dir_info()`, `dir_tree()` | | `link_` | operate on links | `link_create()`, `link_path()`, `link_copy()` | Plus predicates (`is_file`, `is_dir`, `is_link`, `is_file_empty`, `is_dir_empty`, `is_absolute_path`), id helpers (`user_ids`, `group_ids`), and temp helpers (`file_temp`, `path_temp`, `file_temp_push/pop`). The create/copy/delete/exists verbs repeat with identical shapes across `file_`/`dir_`/`link_` — a predictable matrix you learn once. ______________________________________________________________________ ## 3. The three surfaces (same engine, your choice of style) ### Surface A — Functional (closest to R `fs`) ``` import pyrfs as fs fs.path("foo", "bar", "a", ext="txt") # FsPath('foo/bar/a.txt') fs.dir_ls("pyrfs", recurse=True, glob="*.py") fs.file_copy("a.txt", "b.txt") # -> FsPath('b.txt') fs.path_ext_set("report.md", "html") # FsPath('report.html') ``` ### Surface B — Fluent `FsPath` (Pythonic chaining) ``` from pyrfs import FsPath (FsPath("foo") / "bar" / "a.txt") # FsPath('foo/bar/a.txt') <- '/' operator (FsPath("data") / "raw.csv").with_ext("parquet").copy_to("clean/") FsPath("project").mkdir().touch_file("README.md") FsPath("logs").ls(glob="*.log") # [FsPath, FsPath, ...] ``` `FsPath` **is a `str`** (subclass), so it works anywhere a path string is expected — `open(p)`, `pd.read_csv(p)`, `os.fspath(p)` — no conversion needed. ### Surface C — pandas `.fs` accessor + DataFrame returns ``` import pandas as pd import pyrfs as fs df = pd.DataFrame({"path": fs.dir_ls("pyrfs", recurse=True)}) df.assign( ext = df["path"].fs.ext(), # vectorized over the column dir = df["path"].fs.dir(), ok = df["path"].fs.exists(), ) ``` ``` flowchart TD eng["pyrfs engine (one implementation)"] eng --> A["A. functional
fs.file_copy()"] eng --> B["B. fluent
FsPath().copy_to()"] eng --> C["C. pandas
Series.fs.* / dir_info()"] ``` Pick per task: scripts lean A, OO code leans B, dataframe pipelines lean C. They interoperate — `dir_ls()` returns `FsPath`s you can drop straight into a DataFrame column. ______________________________________________________________________ ## 4. Predictable, typed return values Every function returns one of a small, learnable set of shapes — and it always conveys the path. | Return | Type | Produced by | | ------------------- | -------------------------------- | --------------------------------------------------- | | a path | `FsPath` (⊂ `str`) | `path()`, `file_copy()`, `dir_create()`, most verbs | | many paths | `list[FsPath]` / `Series[path]` | `dir_ls()`, vectorized calls | | existence/type test | `bool` / `dict`/`Series` of bool | `file_exists()`, `is_dir()` | | a size | `Bytes` (⊂ `int`) | `file_size()` | | permissions | `Perms` (⊂ `int`) | `file_info()["permissions"]` | | a table | `DataFrame` (or `list[dict]`) | `file_info()`, `dir_info()` | **Mutating verbs return the new path**, enabling chains and pipes: ``` (fs.file_temp() .pipe(... ) # any callable ) # fluent equivalent: FsPath(fs.file_temp()).mkdir().touch_file("a").touch_file("b") ``` ______________________________________________________________________ ## 5. Typed values that read like a human The heart of `fs`'s charm — values that *know what they are* and print accordingly. | You have | pyrfs shows | And you can write | | -------------- | ---------------------------- | ------------------------------------- | | `455200` bytes | `445.2K` | `fs.file_size("x") > "10KB"` → `True` | | mode `0o644` | `rw-r--r--` | `perms == "u=rw,go=r"` → `True` | | `"src//a.txt"` | `src/a.txt` (tidy, coloured) | `FsPath("src") / "a.txt"` | ``` from pyrfs import Bytes, Perms Bytes("10MB") # Bytes(10485760) -> displays '10M' Bytes(455200) < "1MB" # True sum([Bytes("1MB"), Bytes("500KB")]) # Bytes -> '1.46M' Perms("644") # Perms -> 'rw-r--r--' Perms("644") & "u+r" # Perms (bitwise), still prints rwx Perms("644") == "rw-r--r--" # True ``` In pandas these become **real column dtypes** (ExtensionArrays), so the R headline demo ports almost verbatim: ``` (fs.dir_info("pyrfs", recurse=False) .query("size > '10KB' and type == 'file'") # Bytes column compares to a string .sort_values("size", ascending=False) .loc[:, ["path", "permissions", "size", "modification_time"]]) # path permissions size modification_time # pyrfs/_engine/dirops.py rw-r--r-- 12.4K 2026-06-11 13:35:54 # ... ``` ______________________________________________________________________ ## 6. The pandas pipe workflow (a first-class use case) pyrfs is built to flow inside `.pipe()` chains because `*_info` returns a DataFrame and the `.fs` accessor vectorizes path algebra over columns. ``` import pyrfs as fs big_modules = ( fs.dir_info("pyrfs", recurse=True) .query("type == 'file'") .assign(stem=lambda d: d["path"].fs.name()) .pipe(lambda d: d[d["path"].fs.ext() == "py"]) .groupby(d_dir := lambda d: d["path"].fs.dir()) # group by directory .agg(total=("size", "sum"), n=("path", "size")) .sort_values("total", ascending=False) ) ``` Reading many files into one frame — `dir_ls()` returns paths you tag by source, the pandas analogue of R's named-vector `map_df(.id=)` trick: ``` files = fs.dir_ls("data", glob="*.tsv") frame = pd.concat( {p.name(): pd.read_csv(p, sep="\t") for p in files}, names=["file"], ) ``` ______________________________________________________________________ ## 7. Safe defaults & argument conventions (learn once) | Argument | Meaning | Default | On | | ----------------- | -------------------------------------------------------------- | ------------------------------- | ----------------------- | | `overwrite` | allow clobbering an existing target | `False` (safe) | copy/move | | `recurse` | recurse fully (`True`), not (`False`), or to depth (`int`) | `False` listing / `True` create | `dir_*` | | `all` | include hidden dotfiles | `False` | `dir_ls`, `dir_map` | | `type` | filter by entry type (`"file"`, `"directory"`, `"symlink"`, …) | `"any"` | `dir_ls`, `dir_info` | | `glob` / `regexp` | filter listings (mutually exclusive → `FsError` if both) | `None` | `dir_ls`, `path_filter` | | `fail` | raise (`True`) vs warn (`False`) on inaccessible entries | `True` | directory traversals | - **Destructive actions opt-in.** `overwrite=False` and bounded `recurse` mean nothing surprising gets deleted or walked. - **Keyword-only where it aids clarity** — flags like `overwrite`, `recurse`, `all` are keyword-only (`*,`) so call sites read self-documenting: `file_copy(a, b, overwrite=True)`. ______________________________________________________________________ ## 8. Explicit failure (Pythonic) pyrfs raises rather than silently returning a falsy value: ``` fs.file_copy("a.txt", "b.txt") # raises FileExistsError if b.txt exists fs.file_copy("a.txt", "b.txt", overwrite=True) # ok fs.dir_ls("nope") # raises FileNotFoundError fs.path_filter(paths, glob="*.py", regexp=r"\.py$") # raises pyrfs.FsError: cannot set both # soften a traversal when some entries are unreadable: fs.dir_ls("/var", recurse=True, fail=False) # warns + skips, returns what it could read ``` - Native `OSError` subclasses (`FileNotFoundError`, `FileExistsError`, `PermissionError`) for OS-level failures — familiar, `try/except`-able. - `pyrfs.FsError` (with subclasses) for pyrfs validation — friendly, actionable messages. ______________________________________________________________________ ## 9. R `fs` → pyrfs translation | R `fs` | pyrfs functional | pyrfs fluent | | ----------------------------- | ----------------------------- | ------------------------------------------- | | `path("a", "b", ext = "txt")` | `path("a", "b", ext="txt")` | `FsPath("a") / "b"` then `.with_ext("txt")` | | `dir_ls("d", recurse = TRUE)` | `dir_ls("d", recurse=True)` | `FsPath("d").ls(recurse=True)` | | `dir_info("d")` | `dir_info("d")` → DataFrame | `FsPath("d").info()` | | `file_copy("a", "b")` | `file_copy("a", "b")` | `FsPath("a").copy_to("b")` | | `file_size("a")` | `file_size("a")` → `Bytes` | `FsPath("a").size()` | | `path_ext_set("a.txt", "md")` | `path_ext_set("a.txt", "md")` | `FsPath("a.txt").with_ext("md")` | | `path_rel("a/b", "a")` | `path_rel("a/b", "a")` | `FsPath("a/b").rel_to("a")` | | `dir_tree("d")` | `dir_tree("d")` | `FsPath("d").tree()` | | `x %>% file_delete()` | `df.pipe(...)` / loop | `FsPath(x).delete()` | Naming is intentionally identical on the functional surface so muscle memory transfers; the fluent surface adds Pythonic method names for OO-style chaining. ______________________________________________________________________ ## 10. Small touches (ported from `fs`) - **`dir_tree()`** prints a coloured box-drawing tree (`├──`, `└──`), like Unix `tree`. - **`file_show()`** opens a file in the OS default app (cross-platform). - **`path(..., ext=)`** builds extensions correctly (one dot, no doubling). - **`path_sanitize()`** turns untrusted strings into safe filenames. - **`path_rel()` / `path_common()`** — relative paths and longest common dir (no stdlib one-liner). - **`file_temp_push()/pop()`** — deterministic temp names for reproducible docs/tests. - **Colour degrades** automatically on non-TTY / `NO_COLOR`. ______________________________________________________________________ ## 11. Sharp edges (honest notes) - **Stricter than stdlib in places.** `file_copy` refuses to overwrite by default — porting loose scripts may surface `FileExistsError`. Opt in with `overwrite=True`. - **No `dir_move`.** Directories move via `file_move` (dirs are files), matching `fs`. - **`FsPath` is a `str`, not a `pathlib.Path`.** Great for interop and pandas; if you want `pathlib` semantics call `.as_pathlib()` (helper) — we don't pretend to be `Path`. - **pandas-only features fail gracefully.** Without the `[pandas]` extra, `dir_info()` returns `list[dict]` and the `.fs` accessor is unavailable; the docstring says so. - **ExtensionDtype edge cases.** Some exotic pandas ops on `Bytes`/`Perms` columns may need `.astype(int)` first in v1; comparisons, sorting, and `sum/min/max` are supported from the start. ______________________________________________________________________ ## 12. Cheat-sheet ``` NOUN_VERB(path, ...) families: path_ file_ dir_ link_ (+ is_*, *_ids, *_temp) ├─ path(s) in str · FsPath · list · pandas.Series (vectorized) ├─ tidy FsPath out always '/', no '//' or trailing '/', UTF-8 ├─ typed result FsPath · Bytes('445.2K') · Perms('rwxr-xr-x') · DataFrame └─ raises on failure OSError subclasses · pyrfs.FsError ; fail=False to soften three surfaces: fs.file_copy(a,b) · FsPath(a).copy_to(b) · df['p'].fs.ext() pandas pipe: dir_info(d).query("size > '10KB'").sort_values('size') safe defaults: overwrite=False · recurse=False(list)/True(create) · all=False · fail=True from R fs: same functional names; fluent adds Pythonic methods ``` # pyrfs — Architecture > A Pythonic port of R's [`fs`](https://fs.r-lib.org) · Status: **design draft** · Last updated: 2026-06-11 Companion: [`pyrfs-ux.md`](https://pyrfs.netlify.app/design/pyrfs-ux/index.md) (user-facing design) ______________________________________________________________________ ## 1. Purpose & non-goals **Purpose.** Give Python the same file-system *ergonomics* that R users enjoy from `fs`: consistent `noun_verb` naming families, tidy paths, predictable path-carrying return values, explicit failure, and **typed self-describing values** (human-readable sizes, `rwxr-xr-x` permissions) — while being **chainable/pipeable** and integrating natively with **pandas**. **What pyrfs is.** A thin, ergonomic, fully-typed wrapper over the Python standard library (`pathlib`, `shutil`, `os`, `stat`, `pwd`/`grp`) plus an optional pandas integration layer. **Non-goals.** - *Not* a new filesystem abstraction over remote/cloud backends (that's `fsspec`/`PyFilesystem2`). - *Not* a C/native extension. R's `fs` needed **libuv** for cross-platform syscalls; Python's stdlib already abstracts that, so **pyrfs is pure Python** — no build step, trivial install. - *Not* a 1:1 transliteration. We keep `fs`'s *UX contract*, expressed in idiomatic Python. ______________________________________________________________________ ## 2. Core principle — *one engine, three surfaces* Every filesystem operation is implemented **once** in a pure-stdlib `_engine`. The three user-facing surfaces are thin delegations — no logic is duplicated across them. ``` flowchart TD subgraph surfaces["User-facing surfaces"] fn["Functional API
file_copy(a, b)
dir_ls(p) · path_ext(p)"] fp["Fluent FsPath
FsPath(a).copy_to(b)
(FsPath(p) / 'x').with_ext('md')"] acc["pandas .fs accessor
df['path'].fs.ext()
dir_info(p) -> DataFrame"] end eng["pyrfs._engine
(pure stdlib, no pandas)
paths · fileops · dirops · linkops · ids · temp"] std[("Python stdlib
pathlib · shutil · os · stat · pwd/grp")] fn --> eng fp --> eng acc --> eng eng --> std ``` **Why this matters:** `fs` itself uses this idea — high-level R verbs compose from a small set of C primitives. pyrfs applies it in pure Python: the fluent object and the pandas accessor are *presentation layers*, and correctness lives in one place. ______________________________________________________________________ ## 3. System context ``` flowchart LR user([Python user / data scientist]) subgraph pyrfs["pyrfs"] core["core API + FsPath + typed values"] pdx["optional pandas layer"] end pandas{{"pandas (optional extra)"}} std[("OS filesystem via stdlib")] user -->|"file_*/dir_*/path_* · FsPath · Series.fs"| pyrfs core --> std core -.->|"lazily, if installed"| pdx pdx --> pandas ``` - **Inbound:** scripts, notebooks, and packages call pyrfs. - **Hard dependency:** none beyond the standard library (Python ≥ 3.10). - **Optional:** pandas — enables `*_info` DataFrames, the `.fs` Series accessor, and the ExtensionDtypes. Absent pandas, the core still works and `*_info` returns `list[dict]`. ______________________________________________________________________ ## 4. Package layout (flat layout) The importable package sits at the **top level** (`pyrfs/pyrfs/`), not under `src/`. ``` pyrfs/ # repo root ├── pyproject.toml # setuptools backend, [project], optional-deps, tooling ├── docs/ # these design docs ├── pyrfs/ # the importable package │ ├── __init__.py # PUBLIC re-exports (functions + FsPath/Bytes/Perms + FsError) │ ├── py.typed # PEP 561 marker (ships type info) │ ├── errors.py # FsError hierarchy (validation) │ ├── fspath.py # FsPath(str) — fluent, chainable [PUBLIC] │ ├── values.py # Bytes(int), Perms(int) — typed scalars [PUBLIC] │ ├── display.py # humanize bytes · perms→rwx · LS_COLORS · tidy │ ├── _engine/ # pure-stdlib core (NEVER imports pandas) │ │ ├── paths.py # path_* algebra │ │ ├── fileops.py # file_* │ │ ├── dirops.py # dir_* (ls/map/walk/info/tree/create/copy/delete) │ │ ├── linkops.py # link_* │ │ ├── ids.py # user_ids/group_ids │ │ ├── temp.py # file_temp stack · path_temp │ │ └── vectorize.py # polymorphic scalar|iterable dispatch │ └── _pandas/ # OPTIONAL integration (imported only if pandas present) │ ├── __init__.py # registers .fs accessor + ExtensionDtypes │ ├── dtypes.py # BytesDtype, PermsDtype, PathDtype │ ├── arrays.py # BytesArray, PermsArray, PathArray │ ├── accessor.py # @register_series_accessor("fs") │ └── frames.py # build *_info DataFrames with typed columns └── tests/ # pytest mirror of the package ``` ### Module responsibilities | Module | Responsibility | Depends on | | ---------------------- | ----------------------------------------------------------------------------------------------------- | ----------------------------------- | | `_engine/paths.py` | Pure path string algebra (`path`, `path_dir`, `path_ext*`, `path_rel`, `path_norm`, …) | `pathlib`, `os.path` | | `_engine/fileops.py` | `file_create/copy/move/delete/touch/show/chmod/chown/info/size/access` | `shutil`, `os`, `stat` | | `_engine/dirops.py` | `dir_create/copy/delete/ls/map/walk/info/tree`, recursion & filtering | `os.scandir`, `pathlib` | | `_engine/linkops.py` | `link_create/copy/delete/exists/path` | `os` | | `_engine/ids.py` | `user_ids/group_ids` (POSIX; empty frames on Windows) | `pwd`, `grp` | | `_engine/temp.py` | `file_temp` deterministic stack, `path_temp` | `tempfile` | | `_engine/vectorize.py` | Decorator mapping scalar funcs over iterables/Series | — | | `fspath.py` | `FsPath(str)` fluent object; methods delegate to `_engine` | `_engine`, `display` | | `values.py` | `Bytes(int)`, `Perms(int)` typed scalars | `display` | | `display.py` | Formatting/parsing: `humanize_bytes`, `parse_bytes`, `perms_to_str`, `parse_perms`, `tidy`, LS_COLORS | stdlib | | `_pandas/*` | ExtensionDtypes/arrays, `.fs` accessor, DataFrame builders | `pandas`, reuses `display`/`values` | **Invariant:** `_engine` and `values`/`display` must never `import pandas`. The optional layer depends inward on them, never the reverse — a classic dependency-inversion boundary. ______________________________________________________________________ ## 5. The three surfaces in detail ### 5.1 Functional API (R-`fs` faithful) Mirrors `fs`'s families and names exactly: `path_*` (pure, no I/O), `file_*`, `dir_*`, `link_*`, predicates (`is_file`, `is_dir`, `is_link`, …), `user_ids`/`group_ids`, temp helpers. - **Predictable returns:** verbs return `FsPath` (or a list/Series of them); predicates return `bool` or a vectorized mapping; `file_size` → `Bytes`; `*_info` → DataFrame (or `list[dict]`). - **Safe defaults** ported verbatim: `overwrite=False`, `recurse` defaults matching `fs` (`False` for listing, `True` for `dir_create`), `all=False`, `fail=True`. - **`recurse: bool | int`** overload — `True`/`False`/depth, exactly like `fs`. ### 5.2 Fluent `FsPath` `FsPath` **subclasses `str`** — the same choice as R's `fs_path ⊂ character` and the `path` library. Because an `FsPath` *is* a string, it drops into any stdlib or third-party API that expects a path, and serializes cleanly into pandas. ``` classDiagram class str { <> } class FsPath { +__truediv__(other) FsPath +ext() str +with_ext(ext) FsPath +dir() FsPath +name() FsPath +abs() FsPath +real() FsPath +exists() bool +is_dir() bool +copy_to(dst) FsPath +move_to(dst) FsPath +touch() FsPath +delete() None +mkdir(recurse) FsPath +ls(...) list~FsPath~ +info() DataFrame } str <|-- FsPath FsPath ..> _engine : delegates ``` Methods return `FsPath` (or lists thereof) so calls chain: `(FsPath("a") / "b").with_ext("txt").copy_to("c")`. ### 5.3 pandas `.fs` accessor + DataFrame returns - A registered Series accessor gives **vectorized path algebra over a column**: `df["path"].fs.ext()`, `.dir()`, `.with_ext("md")`, `.exists()`, `.is_dir()`. - `dir_info()`/`file_info()` return a DataFrame whose `path`/`size`/`permissions` columns use the ExtensionDtypes, so the R headline demo translates directly: ``` (dir_info("pyrfs", recurse=False) .query("size > '10KB' and type == 'file'") .sort_values("size", ascending=False)) ``` ______________________________________________________________________ ## 6. Typed value system Two cooperating tiers, sharing one set of parse/format functions in `display.py`. ``` flowchart TD subgraph fmt["display.py — single source of truth"] hb["humanize_bytes / parse_bytes"] pp["perms_to_str / parse_perms"] ti["tidy (path normalizer)"] end subgraph scalars["values.py + fspath.py (always available)"] b["Bytes(int)"] p["Perms(int)"] fpath["FsPath(str)"] end subgraph arrays["_pandas/arrays.py (optional)"] ba["BytesArray / BytesDtype"] pa["PermsArray / PermsDtype"] pta["PathArray / PathDtype"] end hb --> b --> ba pp --> p --> pa ti --> fpath --> pta ``` ### Scalar wrappers (pure stdlib, always present) | Type | Subclass of | Construct from | Displays as | Overloads | | -------- | ----------- | ------------------------------------------ | -------------------------------- | ----------------------------------------------------- | | `Bytes` | `int` | `int`, `"10MB"`, `"1.5GiB"` | `445.2K` | `<,>,==` parse string RHS; arithmetic returns `Bytes` | | `Perms` | `int` | octal `"644"`, symbolic `"u+rw,go+r"`, int | `rw-r--r--` | `& \| ~` return `Perms`; `==` parses string RHS | | `FsPath` | `str` | any path-like | tidy path (coloured in terminal) | `/` for join | Subclassing the builtins mirrors `fs`'s S3-over-atomic-vector design (`fs_bytes ⊂ numeric`, `fs_perms ⊂ integer`, `fs_path ⊂ character`): a value still behaves like its base type but *remembers what it is* and prints for humans. ### pandas ExtensionArrays (optional) For each scalar there is a real `ExtensionArray`/`ExtensionDtype` so DataFrame columns are first-class typed: - `BytesDtype` (`name="bytes"`, backing `int64`) — elements show `445.2K`; native `>`/`<`/`==` against strings inside `.query()`; `sum`/`min`/`max` reductions. - `PermsDtype` (`name="perms"`) — elements show `rwxr-xr-x`. - `PathDtype` (`name="path"`, backing object of `FsPath`) — tidy display, ``-style repr. Implemented with the standard protocol (`_from_sequence`, `__getitem__`, `__len__`, `isna`, `take`, `copy`, `_concat_same_type`) plus `ExtensionScalarOpsMixin` for operators, registered via `@register_extension_dtype`. **They call the same `display.py` functions as the scalars** — no duplicated formatting logic. ______________________________________________________________________ ## 7. Vectorization model R's `fs` is vectorized end to end. Python is scalar-by-default; pyrfs bridges this with a small `@vectorized` decorator in `_engine/vectorize.py`: ``` input type → output type ------------------------------------- str | PathLike | FsPath → scalar (FsPath/Bytes/bool) list | tuple | set → list pandas.Series → pandas.Series (only if pandas importable) ``` This gives `file_exists(["a", "b"])` → `[bool, bool]` and `path_ext(series)` → `Series`, while a single path returns a single value. The `.fs` accessor is the *idiomatic* vectorized-over-column surface; the decorator makes the bare functions polymorphic too. ``` flowchart LR inp["caller input"] --> dec{"@vectorized
dispatch on type"} dec -->|scalar| s["f(x) -> scalar"] dec -->|iterable| l["[f(x) for x] -> list"] dec -->|Series| ser["x.map(f) -> Series"] ``` ______________________________________________________________________ ## 8. Error model `fs`'s promise is **explicit failure** (throw, never a silent `FALSE`). Python's stdlib already honors this — `os`/`shutil`/`pathlib` raise `OSError` subclasses. pyrfs's policy: - **Reuse native exceptions** where they fit: `FileNotFoundError`, `FileExistsError`, `PermissionError` (all `OSError`). `overwrite=False` on an existing target → `FileExistsError` (matches `fs`). - **Add `pyrfs.FsError(Exception)`** for pyrfs-level validation that has no native equivalent — e.g. `glob` and `regexp` both set, recycling length mismatch, bad permission/size literal. Subclasses (`FsValueError`, …) let callers `except` precisely, mirroring `fs`'s classed `fs_error`/`invalid_argument`. - **`fail=False`** softens directory traversals (`dir_ls`/`dir_map`/`dir_info`) from error to warning when a single entry is inaccessible — a direct port of `fs`'s `fail` knob. ``` flowchart TD op["pyrfs operation"] --> k{failure?} k -->|"OS-level"| oserr["raise FileNotFoundError /
FileExistsError / PermissionError"] k -->|"bad argument"| fserr["raise pyrfs.FsError subclass"] k -->|"traversal entry, fail=False"| warn["warnings.warn(), skip entry"] k -->|none| ok["return typed value (FsPath/Bytes/bool/DataFrame)"] ``` ______________________________________________________________________ ## 9. Optional-dependency strategy pandas is an **extra** (`pip install pyrfs[pandas]`). The mechanism: - `_engine` and `values`/`display` never import pandas → core is import-safe without it. - `pyrfs/__init__.py` attempts `import pyrfs._pandas` inside a `try/except ImportError`; success registers the `.fs` accessor and the ExtensionDtypes. - `*_info` functions check a cached `has_pandas()` flag: return a typed **DataFrame** when present, else a plain **`list[dict]`** (still useful, still typed scalars in each row). This mirrors `fs`'s R philosophy: hard deps minimal (`Imports: methods`), rich integrations as *Suggests* (`pillar`, `vctrs`) wired up lazily in `.onLoad`. ______________________________________________________________________ ## 10. Build & tooling - **Backend:** setuptools (`[build-system] requires = ["setuptools>=68"]`). - **Layout:** flat — `[tool.setuptools.packages.find] where = ["."]`, `include = ["pyrfs*"]`. - **Env/locking:** `uv` (`uv sync`, `uv run …`). - **Python:** `requires-python = ">=3.10"`. - **Extras:** `pandas = ["pandas>=2.0"]`, optional `color`, `dev = ["pytest","ruff","mypy"]`. - **Quality gates:** `ruff` (lint+format), `mypy --strict` (no `Any`, `py.typed` shipped), `pytest` (pandas tests guarded by `importorskip`, run with and without the extra). - **Docstrings:** NumPy style on the public API. ______________________________________________________________________ ## 11. Representative flow — `file_copy("a.txt", dest_dir)` ``` sequenceDiagram participant U as caller participant F as file_copy (functional API) participant V as vectorize participant E as _engine.fileops participant S as shutil/os participant D as display.tidy U->>F: file_copy("a.txt", "out/") F->>V: dispatch on input shape V->>E: _copy_one("a.txt", "out/", overwrite=False) E->>E: resolve dir target -> "out/a.txt"; check exists alt exists and not overwrite E-->>U: raise FileExistsError else E->>S: shutil.copy2("a.txt", "out/a.txt") E->>D: tidy("out/a.txt") D-->>F: FsPath("out/a.txt") F-->>U: FsPath end ``` The same `_engine._copy_one` backs `FsPath.copy_to` and any `.fs`-accessor copy — *one engine, three surfaces*. ______________________________________________________________________ ## 12. Open questions & notes - **Path display colour.** `FsPath.__repr__` colouring via `LS_COLORS` is deferred to a late phase (P6); it must degrade cleanly on non-TTY / `NO_COLOR`. Default plan: plain until P6. - **ExtensionArray scope.** Full operator/reduction coverage on `BytesArray` is the heaviest piece; v1 targets comparisons + `sum/min/max`. Edge cases (groupby aggregations, `astype` round-trips) to be pinned down with tests in P5. - **Windows specifics.** `user_ids`/`group_ids` return empty frames (no `pwd`/`grp`); symlink creation may require privilege. Tidy paths always use `/`. To be verified on a Windows runner. - **`path_expand` semantics.** `fs` distinguishes `path_expand` vs `path_expand_r`; pyrfs maps the former to `os.path.expanduser` and will document any divergence rather than hide it. - **`dir_move`.** Like `fs`, pyrfs intentionally has no `dir_move` — directories move via `file_move`. # Project # Changelog All notable changes to **pyrfs** are documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and the project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased](https://github.com/Lightbridge-KS/pyrfs/compare/v0.1.0...HEAD) ## [0.1.0](https://github.com/Lightbridge-KS/pyrfs/releases/tag/v0.1.0) - 2026-06-11 Initial release — a Pythonic port of the UX of R's [fs](https://fs.r-lib.org) package. ### Added - **Path algebra** (`path_*`, no I/O): `path()` with `ext=`, `path_dir`/ `path_file`/`path_ext*`, `path_rel`, `path_common`, `path_filter` (glob/regexp, mutually exclusive), `path_split`/`path_join`, `path_has_parent`, `path_sanitize`, `path_expand`/`path_home`/`path_temp`, `path_tidy`. - **`FsPath`** — a tidy path that subclasses `str`: `/` join operator, chainable methods delegating to the engine, `LS_COLORS`-coloured repr (degrades on non-TTY / `NO_COLOR`), `as_pathlib()` escape hatch. - **Typed scalars**: `Bytes ⊂ int` (parses `"10MB"`, displays `444.5K`, compares against literals, arithmetic stays typed — all units 1024-based) and `Perms ⊂ int` (octal/symbolic/`rw-r--r--` forms, mode algebra). - **File operations** (`file_*`): create/touch/copy/move/delete/exists/ access/size/chmod/chown/show/info — mutating verbs return the new path; `overwrite=False` raises `FileExistsError`; copy/move into an existing directory targets `dir/basename`; symbolic chmod applies to the current mode. - **Directory operations** (`dir_*`): create/copy/delete/exists, lazy `dir_walk` generator with the full fs filter set (`all`, `recurse: bool | int`, `type`, `glob`/`regexp`, `invert`, `fail=False` → warn-and-skip), `dir_ls`, `dir_map`, `dir_info`, and a box-drawing, coloured `dir_tree`. No `dir_move` by design — use `file_move`. - **Link operations** (`link_*`): symbolic (default) and hard creation, `link_path`, `link_exists`, `link_copy`, `link_delete` (refuses non-links). - **Predicates & ids**: `is_file`/`is_dir`/`is_link` (lstat semantics — a symlink is only `is_link`), `is_file_empty`, `is_dir_empty`, `is_absolute_path`; `user_ids`/`group_ids` (POSIX). - **Vectorization**: every path-taking function is polymorphic over a scalar, list/tuple/set, or pandas Series (without the engine importing pandas). - **pandas layer** (optional `[pandas]` extra): `bytes`/`perms`/`path` ExtensionDtypes lifting the scalar semantics onto columns (`size > "10KB"` works in `.query()`), the `Series.fs` accessor, and `file_info`/`dir_info` returning typed DataFrames (engine rows without pandas). - **Temp helpers**: `file_temp` with a deterministic `file_temp_push`/`pop` stack for reproducible docs and tests. - **Errors**: native `OSError` subclasses for OS failures; `FsError`/ `FsValueError` for pyrfs-level validation. - **Docs**: MkDocs Material site at with llms.txt/llms-full.txt, an executed tour notebook, and a Quarto-rendered README kept fresh by CI.