# pyrfs
> Pythonic filesystem ergonomics inspired by R's fs — tidy paths, typed values, chainable, pandas-friendly
pyrfs is a Python filesystem library porting the UX of R's fs package: consistent noun_verb naming (path_*, file_*, dir_*, link_*), tidy paths, typed self-describing values (FsPath, Bytes, Perms), explicit failure, and three interchangeable surfaces — functional, fluent FsPath chaining, and a pandas Series accessor with typed DataFrame columns.
# Start here
# pyrfs
**Pythonic filesystem ergonomics, inspired by R's [fs](https://fs.r-lib.org).**
Tidy paths, typed self-describing values, explicit failure — chainable, and pandas-native. Pure Python ≥ 3.10, zero hard dependencies.
```
import pyrfs as fs
fs.dir_ls("src", recurse=True, glob="*.py") # [FsPath('src/app.py'), ...]
fs.file_size("data.csv") > "10MB" # True — sizes compare to literals
fs.file_copy("a.txt", "backup/") # FsPath('backup/a.txt'), refuses to clobber
```
## Install
Not yet on PyPI — install from GitHub:
```
pip install "pyrfs @ git+https://github.com/Lightbridge-KS/pyrfs"
# with the pandas integration:
pip install "pyrfs[pandas] @ git+https://github.com/Lightbridge-KS/pyrfs"
```
## One engine, three surfaces
Every operation is implemented once and reachable three ways — pick per task, mix freely:
```
import pyrfs as fs
fs.path("foo", "bar", "a", ext="txt") # FsPath('foo/bar/a.txt')
fs.dir_ls("data", glob="*.csv")
fs.file_copy("a.txt", "b.txt") # -> FsPath('b.txt')
```
Closest to R's fs — the `noun_verb` names transfer directly. See [Coming from R's fs](https://pyrfs.netlify.app/coming-from-r/index.md).
```
from pyrfs import FsPath
(FsPath("data") / "raw.csv").with_ext("parquet").copy_to("clean/")
FsPath("project").mkdir().touch_file("README.md")
FsPath("logs").ls(glob="*.log")
```
`FsPath` **is a `str`** — it drops into `open()`, `pd.read_csv()`, any API that takes a path.
```
import pyrfs as fs
(fs.dir_info("src", recurse=True)
.query("size > '10KB' and type == 'file'") # typed columns!
.sort_values("size", ascending=False))
df["path"].fs.ext() # vectorized over a column
```
`size` and `permissions` are real ExtensionDtypes — string literals work inside `.query()`.
## Where next
- **[Coming from R's fs](https://pyrfs.netlify.app/coming-from-r/index.md)** — the translation table.
- **[The three surfaces](https://pyrfs.netlify.app/guides/three-surfaces/index.md)** — when to use which.
- **[Typed values](https://pyrfs.netlify.app/guides/typed-values/index.md)** — `Bytes('444.5K')`, `Perms('rw-r--r--')`.
- **[Tour notebook](https://pyrfs.netlify.app/tour/pyrfs-tour/index.md)** — everything, runnable.
- **[API reference](https://pyrfs.netlify.app/api/paths/index.md)** — by family: `path_*`, `file_*`, `dir_*`, `link_*`.
# Coming from R's fs
pyrfs keeps fs's **UX contract** — consistent `noun_verb` naming, tidy paths, predictable typed returns, explicit failure — expressed in idiomatic Python. If you know fs, your muscle memory transfers: the functional names are identical.
## The four families
| Prefix | Domain | Examples |
| ------- | ------------------------------------------------ | ------------------------------------------------------------- |
| `path_` | construct & manipulate path strings (**no I/O**) | `path()`, `path_dir()`, `path_ext_set()`, `path_rel()` |
| `file_` | operate on files | `file_create()`, `file_copy()`, `file_info()`, `file_chmod()` |
| `dir_` | operate on directories | `dir_create()`, `dir_ls()`, `dir_info()`, `dir_tree()` |
| `link_` | operate on links | `link_create()`, `link_path()`, `link_copy()` |
Plus predicates (`is_file`, `is_dir`, …), `user_ids`/`group_ids`, and temp helpers (`file_temp`, `path_temp`, `file_temp_push/pop`) — all as in fs.
## Translation table
| R fs | pyrfs functional | pyrfs fluent |
| ----------------------------- | ----------------------------- | ------------------------------------------- |
| `path("a", "b", ext = "txt")` | `path("a", "b", ext="txt")` | `FsPath("a") / "b"` then `.with_ext("txt")` |
| `dir_ls("d", recurse = TRUE)` | `dir_ls("d", recurse=True)` | `FsPath("d").ls(recurse=True)` |
| `dir_info("d")` | `dir_info("d")` → DataFrame | — |
| `file_copy("a", "b")` | `file_copy("a", "b")` | `FsPath("a").copy_to("b")` |
| `file_size("a")` | `file_size("a")` → `Bytes` | `FsPath("a").size()` |
| `path_ext_set("a.txt", "md")` | `path_ext_set("a.txt", "md")` | `FsPath("a.txt").with_ext("md")` |
| `path_rel("a/b", "a")` | `path_rel("a/b", "a")` | `FsPath("a/b").rel_to("a")` |
| `dir_tree("d")` | `dir_tree("d")` | `FsPath("d").tree()` |
| `fs_bytes("10MB")` | `Bytes("10MB")` | — |
| `fs_perms("644")` | `Perms("644")` | — |
| `x %>% file_delete()` | loop / `df.pipe(...)` | `FsPath(x).delete()` |
## The headline demo, ported
```
# R
dir_info("src", recurse = FALSE) |>
filter(type == "file", size > "10KB") |>
arrange(desc(size))
```
```
# Python (with the pandas extra)
(fs.dir_info("src")
.query("size > '10KB' and type == 'file'")
.sort_values("size", ascending=False))
```
`size` and `permissions` are real pandas ExtensionDtypes, so comparisons against human literals work inside `.query()` — same trick as fs's `fs_bytes`/`fs_perms` tibble columns.
## Vectorization
fs is vectorized end to end; Python is scalar-by-default. pyrfs functions are **polymorphic on the first argument**:
```
fs.path_ext("a.txt") # 'txt' (scalar -> scalar)
fs.path_ext(["a.txt", "b.md"]) # ['txt', 'md'] (list -> list)
fs.path_ext(df["path"]) # pandas Series (Series -> Series)
df["path"].fs.ext() # the idiomatic column form
```
## What's different (on purpose)
- **Errors are Python-native.** `FileExistsError`/`FileNotFoundError`/ `PermissionError` instead of classed `fs_error` conditions; `FsValueError` for pyrfs-level validation. `tryCatch` → `try/except`.
- **`recurse` defaults match fs** (`False` for listing, `True` for `dir_create`), and accepts an `int` depth, exactly like fs.
- **Byte units are 1024-based across the board** — `Bytes("10MB") == Bytes("10MiB")`, matching `fs_bytes`.
- **`is_file`/`is_dir` classify the entry itself** (lstat): a symlink answers `True` only to `is_link` — fs semantics, not `os.path.isdir` semantics.
- **No `dir_move()`** — directories move via `file_move()`, same as fs.
- **`FsPath` is a `str`, not a `pathlib.Path`.** Best interop and pandas round-tripping; call `.as_pathlib()` when you want pathlib semantics. The `/` join concatenates then tidies — an absolute right-hand side does *not* reset the path (unlike `os.path.join`).
- **The split method is `parts()`** — `str.split()` is left untouched so `FsPath` never surprises code that treats it as a string.
- **`dir_walk()` is a lazy generator** rather than a callback walker — the Pythonic spin; `dir_ls()`/`dir_map()` are built on it.
# Guides
# Safety & errors
pyrfs inherits fs's stance: **explicit failure, destructive actions opt-in**. Nothing silently returns `False`; nothing clobbers unless you ask.
## Safe defaults (learn once)
| Argument | Meaning | Default | On |
| ----------------- | ------------------------------------------------- | ------------------------------- | ----------------------- |
| `overwrite` | allow clobbering an existing target | `False` | copy/move |
| `recurse` | `True` = fully, `False` = no, `int` = to depth | `False` listing / `True` create | `dir_*` |
| `all` | include hidden dotfiles | `False` | `dir_ls`, `dir_map`, … |
| `type` | filter by entry type (`"file"`, `"directory"`, …) | `"any"` | traversals |
| `glob` / `regexp` | filter listings (mutually exclusive) | `None` | `dir_ls`, `path_filter` |
| `fail` | raise vs warn on unreadable entries | `True` | traversals |
Behavior flags are keyword-only, so call sites read self-documenting: `file_copy(a, b, overwrite=True)`.
## The error model
```
fs.file_copy("a.txt", "b.txt") # FileExistsError if b.txt exists
fs.dir_ls("nope") # FileNotFoundError
fs.path_filter(ps, glob="*.py", regexp=r"\.py$")
# FsValueError: cannot set both
```
- **OS-level failures raise native `OSError` subclasses** — `FileNotFoundError`, `FileExistsError`, `PermissionError` — familiar and `except`-able.
- **pyrfs-level validation raises `FsError`** (usually the `FsValueError` subclass): conflicting arguments, bad size/permission literals, deleting a non-symlink with `link_delete`.
## Softening traversals: `fail=False`
One unreadable entry shouldn't abort a whole directory walk:
```
fs.dir_ls("/var", recurse=True, fail=False)
# UserWarning: skipping unreadable directory: ...
# -> returns everything it *could* read
```
This is a direct port of fs's `fail` knob, and applies to `dir_ls`, `dir_walk`, `dir_map`, and `dir_info`.
## Destination resolution (copy/move)
Copying or moving **into an existing directory** targets `dir/basename` — shell `cp`/`mv` semantics — and the `overwrite` guard applies to that *resolved* target:
```
fs.file_copy("report.pdf", "archive/") # -> FsPath('archive/report.pdf')
fs.file_copy("report.pdf", "archive/") # FileExistsError
```
There is no `dir_move()`: directories are files at the OS level, so `file_move()` moves them — same deliberate choice as fs.
# The three surfaces
Every pyrfs operation is implemented **once** in a pure-stdlib engine; the three user-facing surfaces are thin delegates. They interoperate freely — `dir_ls()` returns `FsPath`s you can chain methods on or drop into a DataFrame column.
## A — Functional: scripts and R muscle memory
```
import pyrfs as fs
files = fs.dir_ls("data", recurse=True, glob="*.csv")
fs.dir_create("backup")
for f in files:
fs.file_copy(f, "backup/", overwrite=True)
```
Names mirror R's fs exactly — see [Coming from R's fs](https://pyrfs.netlify.app/coming-from-r/index.md). Functions are polymorphic on the first argument (scalar → scalar, list → list, Series → Series).
## B — Fluent `FsPath`: OO-style chaining
```
from pyrfs import FsPath
report = (FsPath("analysis") / "draft.md").with_ext("html")
work = FsPath("project").mkdir().touch_file("README.md").touch_file("setup.py")
big_logs = [p for p in FsPath("logs").walk(glob="*.log") if p.size() > "5MB"]
```
Because `FsPath` subclasses `str`:
- `open(p)`, `pd.read_csv(p)`, `json.dump(..., open(p, "w"))` all just work;
- every `str` method behaves normally (`p.startswith("src/")`, `p.split("/")`);
- it serializes cleanly (JSON, parquet, databases) as a plain string.
Mutating verbs return the resulting path, so chains read top-to-bottom like R pipes.
## C — pandas: columns and frames
Requires the extra: `pip install "pyrfs[pandas]"`.
```
import pandas as pd
import pyrfs as fs
# typed frame in, typed frame out
big = (fs.dir_info("src", recurse=True)
.query("size > '10KB' and type == 'file'")
.sort_values("size", ascending=False))
# vectorized path algebra over a column
df = pd.DataFrame({"path": fs.dir_ls("src", recurse=True, type="file")})
df.assign(
ext=df["path"].fs.ext(),
dir=df["path"].fs.dir(),
size=df["path"].fs.size(), # a real 'bytes'-dtype column
)
```
Without pandas installed, the core works unchanged and `*_info` returns `list[dict]` rows carrying the same typed scalars.
## Choosing
| Situation | Reach for |
| ----------------------------------------------- | ---------------- |
| Shell-script-like automation, R habits | **A** functional |
| Building paths through transformations, OO code | **B** fluent |
| Filtering/aggregating many files as data | **C** pandas |
# Typed values
The heart of fs's charm: values that *know what they are* and print for humans. pyrfs ships three — each subclasses a builtin, so it still behaves like its base type everywhere.
## `Bytes` ⊂ `int`
```
from pyrfs import Bytes
Bytes("10MB") # Bytes(10485760)
str(Bytes(455200)) # '444.5K'
Bytes(455200) < "1MB" # True — comparisons parse literals
sum([Bytes("1MB"), Bytes("500KB")]) # Bytes -> '1.49M' (arithmetic stays typed)
```
All units are 1024-based
`"10MB"`, `"10MiB"` and `"10M"` all mean `10 * 1024**2`, matching R's `fs_bytes`. `repr()` stays exact (`Bytes(455200)`); `str()`/`format()` humanize.
`file_size()` returns `Bytes`, so `fs.file_size("x.bin") > "10KB"` reads like the question you're asking.
## `Perms` ⊂ `int`
```
from pyrfs import Perms
Perms("644") # Perms('rw-r--r--')
Perms("644") == "rw-r--r--" # True
Perms("644") == "u=rw,go=r" # True — symbolic forms parse too
Perms("644") | "u+x" # Perms('rwxr--r--') — mode algebra stays typed
```
`file_chmod()` accepts all the same forms, and symbolic modes apply **relative to the current mode** (chmod semantics): `fs.file_chmod("run.sh", "u+x")`.
## `FsPath` ⊂ `str`
```
from pyrfs import FsPath
FsPath("src//a.txt/") # FsPath('src/a.txt') — tidied on construction
FsPath("a") / "b" / "c.md" # FsPath('a/b/c.md')
```
Tidy form: always `/` separators, no doubled or trailing slashes. In a terminal, the repr is coloured by on-disk type via `LS_COLORS` (degrades automatically on non-TTY or `NO_COLOR`).
## In pandas columns
With the `[pandas]` extra these become real ExtensionDtypes — `"bytes"`, `"perms"`, `"path"` — so whole columns display humanized, sort correctly, compare against literals inside `.query()`, and `sum()`/`min()`/`max()` return typed scalars:
```
s = pd.Series(["1K", "10MB", "455"], dtype="bytes")
s > "1K" # [False, True, False]
s.sum() # Bytes -> '10M'
```
# API reference
# Directories — `dir_*`
All traversals share the fs filter set: `all`, `recurse` (bool or depth), `type`, `glob`/`regexp` (mutually exclusive), `invert`, `fail`.
## pyrfs.dir_create
```
dir_create(path: str, *, mode: int | str = 493, recurse: bool = True) -> FsPath
```
Create a directory (parents too when `recurse`); existing dirs are fine.
Vectorized: also accepts an iterable or pandas Series of paths.
Parameters:
| Name | Type | Description | Default |
| --------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ---------- |
| `path` | `str or PathLike` | The directory to create. | *required* |
| `mode` | `int or str` | Permissions for newly created directories (default 0o755); subject to the process umask. | `493` |
| `recurse` | `bool` | Create missing parents too (default True, matching fs — note this differs from the recurse=False default of the listing functions). | `True` |
Returns:
| Type | Description |
| -------- | -------------------------- |
| `FsPath` | The created path (chains). |
See Also
file_create : The file counterpart. FsPath.mkdir : Fluent equivalent.
Examples:
```
>>> dir_create("out/plots")
FsPath('out/plots')
>>> dir_exists("out/plots")
True
```
## pyrfs.dir_exists
```
dir_exists(path: str) -> bool
```
Whether the path exists and is a directory (follows symlinks).
Vectorized: also accepts an iterable or pandas Series of paths.
See Also
pyrfs.is_dir : Entry-itself (lstat) semantics — a symlink to a directory answers `False` there but `True` here.
## pyrfs.dir_ls
```
dir_ls(path: PathInput = '.', *, all: bool = False, recurse: bool | int = False, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> list[FsPath]
```
List directory entries with the full fs filter set.
The eager form of `dir_walk` — same parameters, returns a sorted list.
Parameters:
| Name | Type | Description | Default |
| --------- | ------------------------ | ----------------------------------------------------------------------------------------------- | ------- |
| `path` | `str or PathLike` | Directory to list (default: the working directory). | `'.'` |
| `all` | `bool` | Include hidden dotfiles. | `False` |
| `recurse` | `bool or int` | True = full recursion, False = this level only, an int limits depth (1 = one level below path). | `False` |
| `type` | `str or iterable of str` | Keep only these entry types ("file", "directory", "symlink", ...); "any" keeps all. | `'any'` |
| `glob` | `str` | Keep entries whose path matches (mutually exclusive). | `None` |
| `regexp` | `str` | Keep entries whose path matches (mutually exclusive). | `None` |
| `invert` | `bool` | Keep entries that do not match glob/regexp. | `False` |
| `fail` | `bool` | Raise on unreadable entries (True) or warn and skip (False). | `True` |
Returns:
| Type | Description |
| ---------------- | ------------------------------------------------------- |
| `list of FsPath` | Entry paths, prefixed by path, siblings sorted by name. |
Raises:
| Type | Description |
| -------------- | --------------------------------------------------------------------- |
| `FsValueError` | If both glob and regexp are set, or type names an unknown entry type. |
See Also
dir_walk : The lazy (generator) form. dir_info : The same listing as typed stat rows / DataFrame. pyrfs.path_filter : The same glob/regexp filter for in-memory lists.
Examples:
```
>>> from pyrfs import file_touch
>>> _ = dir_create("proj/sub")
>>> _ = file_touch(["proj/a.py", "proj/b.txt"])
>>> dir_ls("proj")
[FsPath('proj/a.py'), FsPath('proj/b.txt'), FsPath('proj/sub')]
>>> dir_ls("proj", glob="*.py")
[FsPath('proj/a.py')]
>>> dir_ls("proj", type="directory")
[FsPath('proj/sub')]
```
## pyrfs.dir_walk
```
dir_walk(path: PathInput = '.', *, all: bool = False, recurse: bool | int = False, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> Iterator[FsPath]
```
Lazily yield directory entries, with the full fs filter set.
Parameters:
| Name | Type | Description | Default |
| --------- | ------------------------ | ----------------------------------------------------------------------------------------------- | ------- |
| `path` | `str or PathLike` | Directory to walk. | `'.'` |
| `all` | `bool` | Include hidden dotfiles. | `False` |
| `recurse` | `bool or int` | True = full recursion, False = this level only, an int limits depth (1 = one level below path). | `False` |
| `type` | `str or iterable of str` | Keep only these entry types ("file", "directory", "symlink", ...); "any" keeps all. | `'any'` |
| `glob` | `str` | Keep entries whose path matches (mutually exclusive). | `None` |
| `regexp` | `str` | Keep entries whose path matches (mutually exclusive). | `None` |
| `invert` | `bool` | Keep entries that do not match glob/regexp. | `False` |
| `fail` | `bool` | Raise on unreadable entries (True) or warn and skip (False). | `True` |
Yields:
| Type | Description |
| -------- | ------------------------------------------------------- |
| `FsPath` | Entry paths, prefixed by path, siblings sorted by name. |
Raises:
| Type | Description |
| -------------- | --------------------------------------------------------------------- |
| `FsValueError` | If both glob and regexp are set, or type names an unknown entry type. |
See Also
dir_ls : The eager (list-returning) form. dir_map : Apply a function to each entry.
Examples:
```
>>> from pyrfs import file_touch
>>> _ = dir_create("logs")
>>> _ = file_touch("logs/a.log")
>>> walker = dir_walk("logs") # nothing read yet — it's a generator
>>> next(walker)
FsPath('logs/a.log')
```
## pyrfs.dir_map
```
dir_map(path: PathInput, fn: Callable[[FsPath], object], *, all: bool = False, recurse: bool | int = False, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> list[object]
```
Apply `fn` to each entry and collect the results.
Takes the same filter arguments as `dir_ls`.
See Also
dir_walk : Iterate lazily instead of collecting.
Examples:
```
>>> from pyrfs import file_touch
>>> _ = dir_create("d")
>>> _ = file_touch(["d/a.py", "d/b.py"])
>>> dir_map("d", lambda p: p.ext())
['py', 'py']
```
## pyrfs.dir_copy
```
dir_copy(path: str, new_path: PathInput, *, overwrite: bool = False) -> FsPath
```
Copy a directory tree to `new_path` (a name, or an existing directory).
Same destination resolution and `overwrite` guard as `file_copy`: copying into an existing directory targets `new_path/basename` (shell `cp -r` semantics). With `overwrite=True` an existing destination is *replaced*, not merged. Symlinks are copied as symlinks.
Parameters:
| Name | Type | Description | Default |
| ----------- | ----------------- | ----------------------------------------------------------- | ---------- |
| `path` | `str or PathLike` | Source directory. | *required* |
| `new_path` | `str or PathLike` | Destination name, or an existing directory to copy into. | *required* |
| `overwrite` | `bool` | Replace an existing (resolved) destination (default False). | `False` |
Returns:
| Type | Description |
| -------- | ------------------------- |
| `FsPath` | The root of the new copy. |
Raises:
| Type | Description |
| -------------------- | ------------------------------------------------------------ |
| `NotADirectoryError` | If path is not a directory. |
| `FileExistsError` | If the (resolved) destination exists and overwrite is False. |
See Also
file_copy : Single files. file_move : Directories move via `file_move` (there is no dir_move).
Examples:
```
>>> _ = dir_create("src/sub")
>>> dir_copy("src", "backup")
FsPath('backup')
>>> dir_exists("backup/sub")
True
```
## pyrfs.dir_delete
```
dir_delete(path: str) -> FsPath
```
Delete a directory and everything below it (recursive, like `rm -rf`).
Vectorized: also accepts an iterable or pandas Series of paths.
Returns:
| Type | Description |
| -------- | ----------------- |
| `FsPath` | The deleted path. |
See Also
file_delete : Single files and symlinks. FsPath.rmdir : Fluent equivalent.
Examples:
```
>>> _ = dir_create("scratch/deep")
>>> dir_delete("scratch")
FsPath('scratch')
>>> dir_exists("scratch")
False
```
## pyrfs.dir_tree
```
dir_tree(path: PathInput = '.', *, recurse: bool | int = True, all: bool = False) -> None
```
Print a box-drawing tree of the directory, like the Unix `tree`.
Entries are coloured by type via `LS_COLORS` in a capable terminal (plain on non-TTY or `NO_COLOR`). Hidden files are skipped unless `all=True`; `recurse` limits depth as in `dir_ls`.
Examples:
```
>>> from pyrfs import file_touch
>>> _ = dir_create("proj/src")
>>> _ = file_touch("proj/README.md")
>>> dir_tree("proj")
proj
├── README.md
└── src
```
# Files — `file_*`
Mutating verbs return the (new) path so calls chain; `overwrite=False` on an existing target raises `FileExistsError`. Copy/move into an existing directory targets `dir/basename`.
## pyrfs.file_create
```
file_create(path: str, *, mode: int | str = 420) -> FsPath
```
Create a new file (an existing file is left unchanged).
Vectorized: also accepts an iterable or pandas Series of paths.
Parameters:
| Name | Type | Description | Default |
| ------ | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
| `path` | `str or PathLike` | The file to create. The parent directory must exist. | *required* |
| `mode` | `int or str` | Permissions for a newly created file — octal string ("644"), symbolic ("u=rw,go=r"), or raw bits (default 0o644); subject to the process umask. | `420` |
Returns:
| Type | Description |
| -------- | -------------------------- |
| `FsPath` | The created path (chains). |
See Also
file_touch : Also update timestamps when the file exists. pyrfs.dir_create : The directory counterpart.
Examples:
```
>>> file_create("notes.txt")
FsPath('notes.txt')
```
## pyrfs.file_touch
```
file_touch(path: str) -> FsPath
```
Update access/modification times, creating the file if needed.
Vectorized: also accepts an iterable or pandas Series of paths.
See Also
file_create : Create without updating timestamps of an existing file.
Examples:
```
>>> file_touch("stamp.txt")
FsPath('stamp.txt')
```
## pyrfs.file_copy
```
file_copy(path: str, new_path: PathInput, *, overwrite: bool = False) -> FsPath
```
Copy a file to `new_path` (a file name, or an existing directory).
Vectorized: copy many files into one directory with `file_copy([a, b], "dir")`.
Parameters:
| Name | Type | Description | Default |
| ----------- | ----------------- | --------------------------------------------------------------------------------------------------------- | ---------- |
| `path` | `str or PathLike` | Source file. | *required* |
| `new_path` | `str or PathLike` | Destination file name, or an existing directory to copy into (the target then becomes new_path/basename). | *required* |
| `overwrite` | `bool` | Allow clobbering an existing destination (default False). | `False` |
Returns:
| Type | Description |
| -------- | ------------------------- |
| `FsPath` | The path of the new copy. |
Raises:
| Type | Description |
| ----------------- | ------------------------------------------------------------ |
| `FileExistsError` | If the (resolved) destination exists and overwrite is False. |
See Also
file_move : Move instead of copy. pyrfs.dir_copy : Copy a directory tree. FsPath.copy_to : Fluent equivalent.
Examples:
```
>>> src = file_create("a.txt")
>>> file_copy(src, "b.txt")
FsPath('b.txt')
>>> file_copy(src, "b.txt")
Traceback (most recent call last):
...
FileExistsError: target already exists: FsPath('b.txt') (pass overwrite=True)
```
## pyrfs.file_move
```
file_move(path: str, new_path: PathInput, *, overwrite: bool = False) -> FsPath
```
Move (rename) a file — or a directory: dirs move via `file_move`.
Same destination resolution and `overwrite` guard as `file_copy`. There is deliberately no `dir_move`, matching fs.
Parameters:
| Name | Type | Description | Default |
| ----------- | ----------------- | --------------------------------------------------------- | ---------- |
| `path` | `str or PathLike` | Source file or directory. | *required* |
| `new_path` | `str or PathLike` | Destination name, or an existing directory to move into. | *required* |
| `overwrite` | `bool` | Allow clobbering an existing destination (default False). | `False` |
Returns:
| Type | Description |
| -------- | ----------------- |
| `FsPath` | The new location. |
Raises:
| Type | Description |
| ----------------- | ------------------------------------------------------------ |
| `FileExistsError` | If the (resolved) destination exists and overwrite is False. |
See Also
file_copy : Copy instead of move. FsPath.move_to : Fluent equivalent.
Examples:
```
>>> _ = file_create("a.txt")
>>> file_move("a.txt", "b.txt")
FsPath('b.txt')
```
## pyrfs.file_delete
```
file_delete(path: str) -> FsPath
```
Delete a file or symlink (for directories use `dir_delete`).
Vectorized: also accepts an iterable or pandas Series of paths.
Returns:
| Type | Description |
| -------- | ----------------- |
| `FsPath` | The deleted path. |
Raises:
| Type | Description |
| ------------------- | --------------------------- |
| `FileNotFoundError` | If the file does not exist. |
See Also
pyrfs.dir_delete : Recursive directory deletion. pyrfs.link_delete : Symlink-only deletion (refuses non-links).
Examples:
```
>>> p = file_create("scrap.txt")
>>> file_delete(p)
FsPath('scrap.txt')
>>> file_exists(p)
False
```
## pyrfs.file_exists
```
file_exists(path: str) -> bool
```
Whether the path exists — a broken symlink counts as existing.
Uses `lexists` (the entry itself), matching fs. Vectorized: also accepts an iterable or pandas Series of paths.
See Also
pyrfs.dir_exists : Directory-specific test (follows symlinks). pyrfs.is_file, pyrfs.is_dir, pyrfs.is_link : Type predicates.
Examples:
```
>>> _ = file_create("here.txt")
>>> file_exists(["here.txt", "gone.txt"])
[True, False]
```
## pyrfs.file_access
```
file_access(path: str, mode: str = 'exists') -> bool
```
Test access to a path for the current process.
Vectorized: also accepts an iterable or pandas Series of paths.
Parameters:
| Name | Type | Description | Default |
| ------ | ---------------------------------------- | ---------------------------- | ---------- |
| `path` | `str or PathLike` | The path to test. | *required* |
| `mode` | `('exists', 'read', 'write', 'execute')` | The kind of access to check. | `"exists"` |
Raises:
| Type | Description |
| -------------- | ----------------------------------------------- |
| `FsValueError` | If mode is not one of the four accepted values. |
Examples:
```
>>> p = file_create("data.txt")
>>> file_access(p, "read")
True
```
## pyrfs.file_size
```
file_size(path: str) -> Bytes
```
File size as a `pyrfs.Bytes` value (compares against literals).
Vectorized: also accepts an iterable or pandas Series of paths.
Returns:
| Type | Description |
| ------- | ----------------------------------------------------------------------------------------------------- |
| `Bytes` | The size — an int subclass that displays humanized (444.5K) and compares against strings like "10KB". |
See Also
pyrfs.Bytes : The typed scalar. file_info : Size together with the full stat row.
Examples:
```
>>> p = file_create("two-bytes.bin")
>>> with open(p, "wb") as fh:
... _ = fh.write(b"hi")
>>> file_size(p)
Bytes(2)
>>> file_size(p) < "1KB"
True
```
## pyrfs.file_chmod
```
file_chmod(path: str, mode: int | str) -> FsPath
```
Change permissions; symbolic modes apply relative to the current mode.
Vectorized: also accepts an iterable or pandas Series of paths.
Parameters:
| Name | Type | Description | Default |
| ------ | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
| `path` | `str or PathLike` | The file to change. | *required* |
| `mode` | `int or str` | Octal string ("644"), display form ("rw-r--r--"), or raw bits — all absolute; symbolic clauses ("u+x") modify the current mode, like the chmod command. | *required* |
See Also
pyrfs.Perms : The typed permission scalar. FsPath.chmod : Fluent equivalent.
Examples:
```
>>> p = file_create("run.sh", mode="644")
>>> _ = file_chmod(p, "u+x")
>>> file_access(p, "execute")
True
```
## pyrfs.file_chown
```
file_chown(path: str, user: str | int | None = None, group: str | int | None = None) -> FsPath
```
Change owner and/or group (names or numeric ids; POSIX only).
Parameters:
| Name | Type | Description | Default |
| ------- | ----------------- | ------------------------ | ---------- |
| `path` | `str or PathLike` | The file to change. | *required* |
| `user` | `str or int` | New owner (name or uid). | `None` |
| `group` | `str or int` | New group (name or gid). | `None` |
Raises:
| Type | Description |
| -------------- | ----------------------------------- |
| `FsValueError` | If neither user nor group is given. |
## pyrfs.file_show
```
file_show(path: str) -> FsPath
```
Open a file in the OS default application (`open`/`xdg-open`).
Examples:
```
>>> file_show("report.pdf")
FsPath('report.pdf')
```
# FsPath
## pyrfs.FsPath
Bases: `str`
A tidy filesystem path string — the fluent pyrfs surface.
Construction normalizes the path (`/` separators, no doubled or trailing slashes). The `/` operator joins; methods chain because each returns an `FsPath`. Inherited `str` behavior is untouched — `p.split("/")`, `p.startswith(...)`, `open(p)` all work as on any string (the *split-into-components* method is `parts`, so `str.split` is never shadowed). In a capable terminal the repr is coloured by on-disk type via `LS_COLORS`.
See Also
pyrfs.path : Functional construction with an `ext=` option. as_pathlib : Convert when you want `pathlib` semantics.
Examples:
```
>>> FsPath("src//a.txt/") # tidied on construction
FsPath('src/a.txt')
>>> (FsPath("foo") / "bar" / "a.txt").with_ext("md")
FsPath('foo/bar/a.md')
>>> FsPath("a/b").startswith("a") # still a str
True
```
### __truediv__
```
__truediv__(other: str | PathLike[str]) -> FsPath
```
Join with `other`: `FsPath('a') / 'b'` -> `FsPath('a/b')`.
Concatenation + tidy: an absolute right-hand side does *not* reset the path (unlike `pathlib`/`os.path.join`).
### __rtruediv__
```
__rtruediv__(other: str | PathLike[str]) -> FsPath
```
Support `'a' / FsPath('b')` joining from a plain string.
### ext
```
ext() -> str
```
Extension without the dot (`''` if none) — `pyrfs.path_ext`.
### with_ext
```
with_ext(ext: str) -> FsPath
```
Replace (or add) the extension; `''` removes it — `pyrfs.path_ext_set`.
Examples:
```
>>> (FsPath("data") / "raw.csv").with_ext("parquet")
FsPath('data/raw.parquet')
```
### dir
```
dir() -> FsPath
```
Directory part of the path (`'.'` if none) — `pyrfs.path_dir`.
### name
```
name() -> FsPath
```
File name — the last path component — `pyrfs.path_file`.
### parts
```
parts() -> list[str]
```
Path components (a leading root stays `'/'`) — `pyrfs.path_split`.
Named `parts` (as in `pathlib`) so `str.split` keeps its normal string behavior.
Examples:
```
>>> FsPath("/usr/bin").parts()
['/', 'usr', 'bin']
```
### rel_to
```
rel_to(start: str | PathLike[str]) -> FsPath
```
This path expressed relative to `start` — `pyrfs.path_rel`.
### has_parent
```
has_parent(parent: str | PathLike[str]) -> bool
```
Whether this path sits at or below `parent` — `pyrfs.path_has_parent`.
### expand
```
expand() -> FsPath
```
Expand a leading `~` to the home directory — `pyrfs.path_expand`.
### norm
```
norm() -> FsPath
```
Normalize `.` and `..` lexically — `pyrfs.path_norm`.
### abs
```
abs() -> FsPath
```
Absolute form (links unresolved) — `pyrfs.path_abs`.
### real
```
real() -> FsPath
```
Canonical form, symlinks resolved — `pyrfs.path_real`.
### copy_to
```
copy_to(new_path: str | PathLike[str], *, overwrite: bool = False) -> FsPath
```
Copy this file to `new_path` — `pyrfs.file_copy`.
Copying into an existing directory targets `new_path/basename`; an existing destination raises `FileExistsError` unless `overwrite=True`. Returns the new copy's path (chains).
### move_to
```
move_to(new_path: str | PathLike[str], *, overwrite: bool = False) -> FsPath
```
Move (rename) this file or directory — `pyrfs.file_move`.
Same destination resolution and `overwrite` guard as `copy_to`.
### create
```
create(*, mode: int | str = 420) -> FsPath
```
Create this file (existing files untouched) — `pyrfs.file_create`.
### touch
```
touch() -> FsPath
```
Update timestamps, creating the file if needed — `pyrfs.file_touch`.
### delete
```
delete() -> None
```
Delete this file or symlink — `pyrfs.file_delete`.
Returns `None`: a deleted path has nothing to chain onto. For directories use `rmdir`.
### exists
```
exists() -> bool
```
Whether this path exists (broken symlinks count) — `pyrfs.file_exists`.
### access
```
access(mode: str = 'exists') -> bool
```
Test `"exists"`/`"read"`/`"write"`/`"execute"` — `pyrfs.file_access`.
### size
```
size() -> Bytes
```
File size as a `pyrfs.Bytes` value — `pyrfs.file_size`.
Examples:
```
>>> FsPath("notes.txt").create().size() == 0
True
```
### chmod
```
chmod(mode: int | str) -> FsPath
```
Change permissions — `pyrfs.file_chmod`.
Symbolic modes (`"u+x"`) apply to the *current* mode; octal and display forms are absolute. Returns this path (chains).
### info
```
info() -> dict[str, object]
```
Stat this path into one row of typed values — `pyrfs.file_info`.
Returns a single `dict` (use the functional `pyrfs.file_info` / `pyrfs.dir_info` for tables).
### mkdir
```
mkdir(*, mode: int | str = 493, recurse: bool = True) -> FsPath
```
Create this directory (parents too when `recurse`) — `pyrfs.dir_create`.
Examples:
```
>>> FsPath("proj").mkdir().touch_file("README.md").ls()
[FsPath('proj/README.md')]
```
### rmdir
```
rmdir() -> None
```
Delete this directory and everything below it — `pyrfs.dir_delete`.
Recursive (`rm -rf` semantics), despite the `os.rmdir`-like name. Returns `None`: nothing left to chain onto.
### touch_file
```
touch_file(name: str | PathLike[str]) -> FsPath
```
Create a child file and return *this directory* (keeps chaining).
Returning the directory (not the new file) lets several `touch_file` calls chain; use `(p / name).touch()` when you want the file's path back.
### ls
```
ls(*, all: bool = False, recurse: bool | int = False, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> list[FsPath]
```
List entries of this directory — `pyrfs.dir_ls` (same filters).
### walk
```
walk(*, all: bool = False, recurse: bool | int = True, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> Iterator[FsPath]
```
Lazily yield entries below this directory — `pyrfs.dir_walk`.
Unlike the functional default, `recurse=True` here: walking a tree is the common fluent use.
### tree
```
tree(*, recurse: bool | int = True, all: bool = False) -> None
```
Print a box-drawing tree of this directory — `pyrfs.dir_tree`.
### is_file
```
is_file() -> bool
```
Whether this is a regular file (lstat; symlinks answer `False`) — `pyrfs.is_file`.
### is_dir
```
is_dir() -> bool
```
Whether this is a directory (lstat; symlinks answer `False`) — `pyrfs.is_dir`.
### is_link
```
is_link() -> bool
```
Whether this is a symlink — `pyrfs.is_link`.
### as_pathlib
```
as_pathlib() -> pathlib.Path
```
This path as a `pathlib.Path`, when you want pathlib semantics.
Examples:
```
>>> FsPath("a/b").as_pathlib()
PosixPath('a/b')
```
# Info, temp & errors
`*_info` returns a typed DataFrame when pandas is installed, otherwise `list[dict]` rows carrying the same typed scalars.
## pyrfs.file_info
```
file_info(path: PathInput | Iterable[PathInput], *, follow: bool = False) -> pd.DataFrame | list[dict[str, object]]
```
Stat path(s) into a typed table.
Returns a DataFrame with typed columns (`path`/`size`/`permissions` as pyrfs dtypes) when pandas is installed, else `list[dict]` rows of the same typed scalars.
Parameters:
| Name | Type | Description | Default |
| -------- | --------------------------------------- | --------------------------------------------------------------------- | ---------- |
| `path` | `str, os.PathLike, or iterable of them` | Path(s) to stat. | *required* |
| `follow` | `bool` | Stat symlink targets instead of the links themselves (default False). | `False` |
See Also
dir_info : Stat a directory's entries. pyrfs.FsPath.info : One row, as a plain dict.
Examples:
```
>>> file_info("pyproject.toml")
path type size permissions ...
0 pyproject.toml file 1.7K rw-r--r-- ...
```
## pyrfs.dir_info
```
dir_info(path: PathInput = '.', *, all: bool = False, recurse: bool | int = False, type: str | Iterable[str] = 'any', glob: str | None = None, regexp: str | None = None, invert: bool = False, fail: bool = True) -> pd.DataFrame | list[dict[str, object]]
```
Stat directory entries into a typed table (same filters as `dir_ls`).
Returns a DataFrame with typed columns when pandas is installed, else `list[dict]` rows. This is the fs headline: with typed columns, string literals work inside `.query()`.
See Also
file_info : Stat explicit path(s). pyrfs.dir_ls : The underlying listing and its filter arguments.
Examples:
```
>>> (dir_info("pyrfs", recurse=True)
... .query("size > '10KB' and type == 'file'")
... .sort_values("size", ascending=False))
```
## pyrfs.has_pandas
```
has_pandas() -> bool
```
Whether pandas is importable (cached; decides the `*_info` shape).
Examples:
```
>>> has_pandas() in (True, False)
True
```
## pyrfs.file_temp
```
file_temp(pattern: str = 'file', tmp_dir: PathInput | None = None, ext: str = '') -> FsPath
```
Return a unique temp path (a *name* only — the file is not created).
If names were queued with `file_temp_push`, the oldest queued name is returned instead — deterministic mode, fs's trick for reproducible examples, docs, and tests.
Parameters:
| Name | Type | Description | Default |
| --------- | ----------------- | ------------------------------------------------------------- | -------- |
| `pattern` | `str` | Filename prefix (default "file"). | `'file'` |
| `tmp_dir` | `str or PathLike` | Directory for the name (default: the session temp directory). | `None` |
| `ext` | `str` | Extension, with or without the leading dot. | `''` |
See Also
file_temp_push, file_temp_pop : The deterministic-name queue. pyrfs.path_temp : The temp *directory* itself.
Examples:
```
>>> file_temp(ext="csv")
FsPath('/tmp/file2bf36b4eb5d8.csv')
>>> _ = file_temp_push("/tmp/demo.csv")
>>> file_temp() # deterministic: returns the queued name
FsPath('/tmp/demo.csv')
```
## pyrfs.file_temp_push
```
file_temp_push(path: PathInput | Iterable[PathInput]) -> list[FsPath]
```
Queue deterministic path(s) for subsequent `file_temp` calls.
Returns:
| Type | Description |
| ---------------- | ------------------------------ |
| `list of FsPath` | The queued paths (FIFO order). |
Examples:
```
>>> file_temp_push(["/tmp/one", "/tmp/two"])
[FsPath('/tmp/one'), FsPath('/tmp/two')]
>>> file_temp(), file_temp()
(FsPath('/tmp/one'), FsPath('/tmp/two'))
```
## pyrfs.file_temp_pop
```
file_temp_pop() -> FsPath | None
```
Remove and return the oldest queued temp path (`None` if empty).
Examples:
```
>>> _ = file_temp_push("/tmp/queued")
>>> file_temp_pop()
FsPath('/tmp/queued')
>>> file_temp_pop() is None
True
```
## pyrfs.FsError
Bases: `Exception`
Base class for all pyrfs validation errors.
## pyrfs.FsValueError
Bases: `FsError`, `ValueError`
An argument value (or combination of arguments) is invalid.
# Links — `link_*`
`link_create(path, new_path)` creates `new_path` pointing *to* `path` (the fs argument order). Symbolic links are the default.
## pyrfs.link_create
```
link_create(path: str, new_path: PathInput, *, symbolic: bool = True) -> FsPath
```
Create a link at `new_path` pointing to `path`.
Note the argument order (fs's): target first, link name second.
Parameters:
| Name | Type | Description | Default |
| ---------- | ----------------- | ------------------------------------------------------------ | ---------- |
| `path` | `str or PathLike` | What the link points to (need not exist for symbolic links). | *required* |
| `new_path` | `str or PathLike` | Where to create the link. | *required* |
| `symbolic` | `bool` | Symbolic link (default) or hard link. | `True` |
Returns:
| Type | Description |
| -------- | -------------------- |
| `FsPath` | The new link's path. |
Raises:
| Type | Description |
| ----------------- | --------------------------- |
| `FileExistsError` | If new_path already exists. |
See Also
link_path : Read where a symlink points.
Examples:
```
>>> from pyrfs import file_touch
>>> _ = file_touch("big.csv")
>>> link_create("big.csv", "latest.csv")
FsPath('latest.csv')
>>> link_path("latest.csv")
FsPath('big.csv')
```
## pyrfs.link_path
```
link_path(path: str) -> FsPath
```
Return the target a symlink points to (`OSError` if not a symlink).
Vectorized: also accepts an iterable or pandas Series of paths.
See Also
pyrfs.path_real : Fully resolve a path through all links.
## pyrfs.link_exists
```
link_exists(path: str) -> bool
```
Whether the path is a symlink (its target need not exist).
Equivalent to `pyrfs.is_link`. Vectorized: also accepts an iterable or pandas Series of paths.
## pyrfs.link_copy
```
link_copy(path: str, new_path: PathInput, *, overwrite: bool = False) -> FsPath
```
Copy a symlink itself (the new link points to the same target).
The target is *not* copied — use `pyrfs.file_copy` to copy what the link points to.
Parameters:
| Name | Type | Description | Default |
| ----------- | ----------------- | --------------------------------------------------------- | ---------- |
| `path` | `str or PathLike` | An existing symlink. | *required* |
| `new_path` | `str or PathLike` | Where to create the duplicate link. | *required* |
| `overwrite` | `bool` | Allow clobbering an existing destination (default False). | `False` |
Raises:
| Type | Description |
| ----------------- | ------------------------------------------------- |
| `FileExistsError` | If the destination exists and overwrite is False. |
## pyrfs.link_delete
```
link_delete(path: str) -> FsPath
```
Delete a symlink — the target is untouched; non-links are refused.
Raises:
| Type | Description |
| -------------- | ------------------------------------------------------------------------------------ |
| `FsValueError` | If path is not a symlink (use pyrfs.file_delete or pyrfs.dir_delete for real files). |
Examples:
```
>>> from pyrfs import file_exists, file_touch
>>> _ = file_touch("real.txt")
>>> _ = link_create("real.txt", "ln.txt")
>>> _ = link_delete("ln.txt")
>>> file_exists("real.txt") # target survives
True
```
# Path algebra — `path_*`
Pure path-string manipulation, no filesystem I/O (except the few that resolve against the running process, as documented). All functions accept a scalar, list, or pandas Series as the first argument and return tidy [`FsPath`](https://pyrfs.netlify.app/api/fspath/index.md) values.
## pyrfs.path
```
path(*parts: PathInput, ext: str = '') -> FsPath
```
Construct a tidy path from parts, optionally adding an extension.
Parts are joined with `/` and tidied. The join is pure concatenation — an absolute later part does *not* reset the path, unlike `os.path.join`.
Parameters:
| Name | Type | Description | Default |
| -------- | ----------------- | -------------------------------------------------------------------------------------------- | ------- |
| `*parts` | `str or PathLike` | Path components to join. | `()` |
| `ext` | `str` | Extension to append, with or without the leading dot (one dot is guaranteed, never doubled). | `''` |
Returns:
| Type | Description |
| -------- | ---------------------- |
| `FsPath` | The joined, tidy path. |
See Also
path_join : Join components given as a list (inverse of `path_split`). FsPath.**truediv** : The fluent `/` join operator.
Examples:
```
>>> path("foo", "bar", "a", ext="txt")
FsPath('foo/bar/a.txt')
>>> path("a/", "/b") # concatenation, not os.path.join reset
FsPath('a/b')
```
## pyrfs.path_wd
```
path_wd() -> FsPath
```
Return the current working directory as a tidy path.
See Also
path_abs : Anchor a relative path to the working directory.
## pyrfs.path_abs
```
path_abs(path: str) -> FsPath
```
Make a path absolute against the working directory (links unresolved).
A leading `~` is expanded first. Vectorized: also accepts an iterable or pandas Series of paths.
See Also
path_real : Also resolve symlinks (canonical form). path_norm : Lexical `.`/`..` normalization only.
Examples:
```
>>> path_abs("data").startswith("/")
True
```
## pyrfs.path_real
```
path_real(path: str) -> FsPath
```
Canonicalize a path, resolving symlinks (touches the filesystem).
Vectorized: also accepts an iterable or pandas Series of paths.
See Also
path_abs : Absolute form without resolving links.
## pyrfs.path_norm
```
path_norm(path: str) -> FsPath
```
Normalize `.` and `..` components lexically (no filesystem access).
Vectorized: also accepts an iterable or pandas Series of paths.
Examples:
```
>>> path_norm("a/../b/./c")
FsPath('b/c')
```
## pyrfs.path_rel
```
path_rel(path: str, start: PathInput = '.') -> FsPath
```
Return the path expressed relative to `start`.
Vectorized: also accepts an iterable or pandas Series of paths.
Parameters:
| Name | Type | Description | Default |
| ------- | ----------------- | ------------------------------------------------------ | ---------- |
| `path` | `str or PathLike` | The path to re-express. | *required* |
| `start` | `str or PathLike` | The anchor directory (default: the working directory). | `'.'` |
See Also
path_has_parent : Test containment instead of computing the relation. FsPath.rel_to : Fluent equivalent.
Examples:
```
>>> path_rel("/a/b/c", "/a")
FsPath('b/c')
>>> path_rel("/a/b", "/a/d")
FsPath('../b')
```
## pyrfs.path_expand
```
path_expand(path: str) -> FsPath
```
Expand a leading `~` to the user's home directory.
Vectorized: also accepts an iterable or pandas Series of paths.
See Also
path_home : Build paths under the home directory directly.
## pyrfs.path_home
```
path_home(*parts: PathInput) -> FsPath
```
Return the user's home directory, optionally joined with `parts`.
Examples:
```
>>> path_home("data").endswith("/data")
True
```
## pyrfs.path_temp
```
path_temp(*parts: PathInput) -> FsPath
```
Return the session temp directory, optionally joined with `parts`.
See Also
pyrfs.file_temp : A unique temp *file name* (not just the directory).
## pyrfs.path_tidy
```
path_tidy(path: str) -> FsPath
```
Tidy a path: `/` separators, no doubled or trailing slashes.
Every pyrfs function already returns tidy paths; use this to normalize paths from elsewhere. Vectorized: also accepts an iterable or pandas Series of paths.
Examples:
```
>>> path_tidy("src//a.txt/")
FsPath('src/a.txt')
>>> path_tidy("C:\\data\\x")
FsPath('C:/data/x')
```
## pyrfs.path_split
```
path_split(path: str) -> list[str]
```
Split a tidy path into components (a leading root stays `'/'`).
Vectorized: a list of paths yields a list of component lists.
See Also
path_join : The inverse operation. FsPath.parts : Fluent equivalent.
Examples:
```
>>> path_split("/usr/bin")
['/', 'usr', 'bin']
>>> path_split("a/b")
['a', 'b']
```
## pyrfs.path_join
```
path_join(parts: Iterable[PathInput | Iterable[PathInput]]) -> FsPath | list[FsPath]
```
Join split components back into path(s) — the inverse of `path_split`.
Parameters:
| Name | Type | Description | Default |
| ------- | ---------- | -------------------------------------------------------------------------------------- | ---------- |
| `parts` | `iterable` | Either one sequence of components, or a sequence of such sequences (joining each one). | *required* |
See Also
path : Variadic construction with an optional extension.
Examples:
```
>>> path_join(["/", "usr", "bin"])
FsPath('/usr/bin')
>>> path_join([["a", "b"], ["c", "d"]])
[FsPath('a/b'), FsPath('c/d')]
```
## pyrfs.path_file
```
path_file(path: str) -> FsPath
```
Return the file name — the last path component.
Vectorized: also accepts an iterable or pandas Series of paths.
See Also
path_dir : The complementary directory part. FsPath.name : Fluent equivalent.
Examples:
```
>>> path_file("a/b/c.txt")
FsPath('c.txt')
```
## pyrfs.path_dir
```
path_dir(path: str) -> FsPath
```
Return the directory part of a path (`'.'` if there is none).
Vectorized: also accepts an iterable or pandas Series of paths.
See Also
path_file : The complementary file-name part. FsPath.dir : Fluent equivalent.
Examples:
```
>>> path_dir("a/b/c.txt")
FsPath('a/b')
>>> path_dir("c.txt")
FsPath('.')
```
## pyrfs.path_ext
```
path_ext(path: str) -> str
```
Return the extension without the dot (`''` if none).
Dotfiles like `.gitignore` count as having no extension. Vectorized: also accepts an iterable or pandas Series of paths.
See Also
path_ext_set, path_ext_remove
Examples:
```
>>> path_ext("a.tar.gz")
'gz'
>>> path_ext(".gitignore")
''
```
## pyrfs.path_ext_remove
```
path_ext_remove(path: str) -> FsPath
```
Remove the extension (dotfiles like `.gitignore` are left intact).
Vectorized: also accepts an iterable or pandas Series of paths.
Examples:
```
>>> path_ext_remove("d/a.tar.gz")
FsPath('d/a.tar')
```
## pyrfs.path_ext_set
```
path_ext_set(path: str, ext: str) -> FsPath
```
Replace (or add) the extension; an empty `ext` removes it.
Vectorized: also accepts an iterable or pandas Series of paths.
Parameters:
| Name | Type | Description | Default |
| ------ | ----------------- | --------------------------------------------------------------------------------- | ---------- |
| `path` | `str or PathLike` | The path to modify. | *required* |
| `ext` | `str` | New extension, with or without the leading dot; "" removes the current extension. | *required* |
See Also
FsPath.with_ext : Fluent equivalent.
Examples:
```
>>> path_ext_set("report.md", "html")
FsPath('report.html')
>>> path_ext_set(["a.txt", "b"], "py")
[FsPath('a.py'), FsPath('b.py')]
```
## pyrfs.path_common
```
path_common(paths: Iterable[PathInput]) -> FsPath
```
Return the longest common path prefix of `paths`.
Parameters:
| Name | Type | Description | Default |
| ------- | -------------------------------- | ------------------------------------------------ | ---------- |
| `paths` | `iterable of str or os.PathLike` | At least one path; all absolute or all relative. | *required* |
Raises:
| Type | Description |
| -------------- | ------------------------------------------------------- |
| `FsValueError` | If paths is empty or mixes absolute and relative paths. |
Examples:
```
>>> path_common(["a/b/c", "a/b/d"])
FsPath('a/b')
```
## pyrfs.path_filter
```
path_filter(paths: Iterable[PathInput], glob: str | None = None, regexp: str | None = None, *, invert: bool = False) -> list[FsPath]
```
Filter paths by a glob or a regular expression (mutually exclusive).
Parameters:
| Name | Type | Description | Default |
| -------- | -------------------------------- | ----------------------------------------------------------------------------------------------- | ---------- |
| `paths` | `iterable of str or os.PathLike` | Paths to filter. | *required* |
| `glob` | `str` | Wildcard pattern matched against the whole path (e.g. "\*.py"); mutually exclusive with regexp. | `None` |
| `regexp` | `str` | Regular expression searched within the path; mutually exclusive with glob. | `None` |
| `invert` | `bool` | Keep the paths that do not match. | `False` |
Raises:
| Type | Description |
| -------------- | -------------------------------- |
| `FsValueError` | If both glob and regexp are set. |
See Also
pyrfs.dir_ls : Directory listing with the same filter arguments.
Examples:
```
>>> path_filter(["a.py", "b.txt", "src/c.py"], glob="*.py")
[FsPath('a.py'), FsPath('src/c.py')]
>>> path_filter(["a.py", "b.txt"], glob="*.py", invert=True)
[FsPath('b.txt')]
```
## pyrfs.path_has_parent
```
path_has_parent(path: str, parent: PathInput) -> bool
```
Return whether `path` sits at or below `parent`.
Both are anchored to the working directory before comparing, so relative and absolute forms compare consistently. Vectorized: also accepts an iterable or pandas Series of paths.
See Also
path_rel : Compute the relative path instead of testing containment.
Examples:
```
>>> path_has_parent("/x/y", "/x")
True
>>> path_has_parent("/xy/z", "/x")
False
```
## pyrfs.path_sanitize
```
path_sanitize(filename: str, replacement: str = '') -> str
```
Turn an untrusted string into a filename safe on all major OSes.
Removes control characters, characters illegal in filenames (`/\?<>:*|"`), trailing dots/spaces, and Windows-reserved device names; truncates to 255 characters. Operates on a *filename*, not a path — separators are stripped, not preserved.
Parameters:
| Name | Type | Description | Default |
| ------------- | ----- | ------------------------------------------------------------- | ---------- |
| `filename` | `str` | The untrusted string. | *required* |
| `replacement` | `str` | What to substitute for removed characters (default: nothing). | `''` |
Examples:
```
>>> path_sanitize("rep/ort:2026*")
'report2026'
>>> path_sanitize("a/b", "_")
'a_b'
```
# Predicates & ids
Type predicates classify the entry itself (lstat): a symlink answers `True` only to `is_link`, matching fs.
## pyrfs.is_file
```
is_file(path: str) -> bool
```
Whether the path is a regular file (symlinks answer `False`).
Classifies the entry itself (lstat), matching fs — unlike `os.path.isfile`, which follows symlinks. Vectorized: also accepts an iterable or pandas Series of paths.
See Also
is_link : The predicate a symlink answers `True` to. pyrfs.file_exists : Existence regardless of type.
Examples:
```
>>> from pyrfs import file_touch, link_create
>>> _ = file_touch("data.txt")
>>> _ = link_create("data.txt", "ln.txt")
>>> is_file("data.txt"), is_file("ln.txt"), is_file("missing")
(True, False, False)
```
## pyrfs.is_dir
```
is_dir(path: str) -> bool
```
Whether the path is a directory (symlinks answer `False`).
Classifies the entry itself (lstat), matching fs — unlike `os.path.isdir` and `pyrfs.dir_exists`, which follow symlinks. Vectorized: also accepts an iterable or pandas Series of paths.
See Also
pyrfs.dir_exists : Follow-symlink directory test.
## pyrfs.is_link
```
is_link(path: str) -> bool
```
Whether the path is a symlink (its target need not exist).
Vectorized: also accepts an iterable or pandas Series of paths.
See Also
pyrfs.link_path : Read where the link points.
## pyrfs.is_file_empty
```
is_file_empty(path: str) -> bool
```
Whether the file exists and has size zero.
Missing paths answer `False` (they are not empty files). Vectorized: also accepts an iterable or pandas Series of paths.
## pyrfs.is_dir_empty
```
is_dir_empty(path: str) -> bool
```
Whether the directory exists and has no entries (hidden included).
Missing paths answer `False`. Vectorized: also accepts an iterable or pandas Series of paths.
Examples:
```
>>> from pyrfs import dir_create
>>> _ = dir_create("empty")
>>> is_dir_empty("empty")
True
```
## pyrfs.is_absolute_path
```
is_absolute_path(path: str) -> bool
```
Whether the path is absolute (a leading `~` counts, as in fs).
Pure string test — no filesystem access. Vectorized: also accepts an iterable or pandas Series of paths.
Examples:
```
>>> is_absolute_path(["/usr", "~/data", "rel/path"])
[True, True, False]
```
## pyrfs.user_ids
```
user_ids() -> list[dict[str, object]]
```
All known users as rows of `{"user_id", "user_name"}`.
Returns an empty list on platforms without `pwd` (Windows).
## pyrfs.group_ids
```
group_ids() -> list[dict[str, object]]
```
All known groups as rows of `{"group_id", "group_name"}`.
Returns an empty list on platforms without `grp` (Windows).
# Bytes & Perms
Typed scalars that subclass `int` — see the [typed values guide](https://pyrfs.netlify.app/guides/typed-values/index.md). With the `[pandas]` extra, columns of these become the `"bytes"`/`"perms"`/`"path"` ExtensionDtypes.
## pyrfs.Bytes
Bases: `int`
A byte count that parses and displays human-readable sizes.
All units are 1024-based (`"10MB"` == `"10MiB"` == `10 * 1024**2`), matching R's fs.
Examples:
```
>>> Bytes("10MB")
Bytes(10485760)
>>> str(Bytes(455200))
'444.5K'
>>> Bytes(455200) < "1MB"
True
>>> str(Bytes("1MB") + "500KB")
'1.49M'
```
Notes
`repr` stays exact (`Bytes(455200)`); `str`/`format` humanize. With the `[pandas]` extra, columns of these use the `"bytes"` ExtensionDtype, so the same comparisons work in `DataFrame.query()`.
See Also
pyrfs.file_size : Returns sizes as `Bytes`.
## pyrfs.Perms
Bases: `int`
Unix permission bits that parse and display `rwxr-xr-x` style.
Construct from octal (`"644"`), symbolic (`"u+rw,go+r"`), display (`"rw-r--r--"`) strings, or raw mode bits.
Examples:
```
>>> Perms("644")
Perms('rw-r--r--')
>>> Perms("644") == "rw-r--r--"
True
>>> str(Perms("644") | "u+x")
'rwxr--r--'
```
Notes
Symbolic strings here build from a base of `0` (so `"u+rw"` == `"u=rw"`); `pyrfs.file_chmod` applies symbolic modes to the file's *current* mode instead, like the `chmod` command.
See Also
pyrfs.file_chmod : Apply permissions to files.
# Design notes
# pyrfs — UX Design
> A Pythonic port of R's [`fs`](https://fs.r-lib.org) · Status: **design draft** · Last updated: 2026-06-11 Companion: [`pyrfs-architecture.md`](https://pyrfs.netlify.app/design/pyrfs-architecture/index.md) (how it's built)
This document defines the **feel** of pyrfs — names, return values, chaining, and the pandas workflow. The guiding goal: an ex-R user who knows `fs` should feel at home immediately, and a Python user should find it idiomatic and pipeable.
______________________________________________________________________
## 1. UX thesis
> **Every function takes path(s) in, and gives a predictable, path-carrying value back — or raises.** The same operation is reachable three ways: as a function, as a method on a path, or as a vectorized column operation in pandas.
```
flowchart LR
in["path(s) in
str · FsPath · list · Series"] --> op["pyrfs operation"]
op -->|success| out["typed result
FsPath · Bytes · Perms · bool · DataFrame"]
op -->|failure| err["raises (OSError / FsError)"]
out -.->|chains into| op
```
We inherit the five `fs` promises — **consistent naming · vectorization · predictable returns · explicit failure · tidy UTF-8 paths** — and add a sixth: **three interchangeable surfaces**.
______________________________________________________________________
## 2. Naming — the four families (kept from `fs`)
Functions are grouped by the **noun** they act on, `noun_verb`, snake_case. Type `dir_` + Tab and you see every directory operation.
| Prefix | Domain | Examples |
| ------- | ------------------------------------------------ | --------------------------------------------------------------------- |
| `path_` | construct & manipulate path strings (**no I/O**) | `path()`, `path_dir()`, `path_ext_set()`, `path_rel()`, `path_norm()` |
| `file_` | operate on files | `file_create()`, `file_copy()`, `file_info()`, `file_chmod()` |
| `dir_` | operate on directories | `dir_create()`, `dir_ls()`, `dir_info()`, `dir_tree()` |
| `link_` | operate on links | `link_create()`, `link_path()`, `link_copy()` |
Plus predicates (`is_file`, `is_dir`, `is_link`, `is_file_empty`, `is_dir_empty`, `is_absolute_path`), id helpers (`user_ids`, `group_ids`), and temp helpers (`file_temp`, `path_temp`, `file_temp_push/pop`).
The create/copy/delete/exists verbs repeat with identical shapes across `file_`/`dir_`/`link_` — a predictable matrix you learn once.
______________________________________________________________________
## 3. The three surfaces (same engine, your choice of style)
### Surface A — Functional (closest to R `fs`)
```
import pyrfs as fs
fs.path("foo", "bar", "a", ext="txt") # FsPath('foo/bar/a.txt')
fs.dir_ls("pyrfs", recurse=True, glob="*.py")
fs.file_copy("a.txt", "b.txt") # -> FsPath('b.txt')
fs.path_ext_set("report.md", "html") # FsPath('report.html')
```
### Surface B — Fluent `FsPath` (Pythonic chaining)
```
from pyrfs import FsPath
(FsPath("foo") / "bar" / "a.txt") # FsPath('foo/bar/a.txt') <- '/' operator
(FsPath("data") / "raw.csv").with_ext("parquet").copy_to("clean/")
FsPath("project").mkdir().touch_file("README.md")
FsPath("logs").ls(glob="*.log") # [FsPath, FsPath, ...]
```
`FsPath` **is a `str`** (subclass), so it works anywhere a path string is expected — `open(p)`, `pd.read_csv(p)`, `os.fspath(p)` — no conversion needed.
### Surface C — pandas `.fs` accessor + DataFrame returns
```
import pandas as pd
import pyrfs as fs
df = pd.DataFrame({"path": fs.dir_ls("pyrfs", recurse=True)})
df.assign(
ext = df["path"].fs.ext(), # vectorized over the column
dir = df["path"].fs.dir(),
ok = df["path"].fs.exists(),
)
```
```
flowchart TD
eng["pyrfs engine (one implementation)"]
eng --> A["A. functional
fs.file_copy()"]
eng --> B["B. fluent
FsPath().copy_to()"]
eng --> C["C. pandas
Series.fs.* / dir_info()"]
```
Pick per task: scripts lean A, OO code leans B, dataframe pipelines lean C. They interoperate — `dir_ls()` returns `FsPath`s you can drop straight into a DataFrame column.
______________________________________________________________________
## 4. Predictable, typed return values
Every function returns one of a small, learnable set of shapes — and it always conveys the path.
| Return | Type | Produced by |
| ------------------- | -------------------------------- | --------------------------------------------------- |
| a path | `FsPath` (⊂ `str`) | `path()`, `file_copy()`, `dir_create()`, most verbs |
| many paths | `list[FsPath]` / `Series[path]` | `dir_ls()`, vectorized calls |
| existence/type test | `bool` / `dict`/`Series` of bool | `file_exists()`, `is_dir()` |
| a size | `Bytes` (⊂ `int`) | `file_size()` |
| permissions | `Perms` (⊂ `int`) | `file_info()["permissions"]` |
| a table | `DataFrame` (or `list[dict]`) | `file_info()`, `dir_info()` |
**Mutating verbs return the new path**, enabling chains and pipes:
```
(fs.file_temp()
.pipe(... ) # any callable
)
# fluent equivalent:
FsPath(fs.file_temp()).mkdir().touch_file("a").touch_file("b")
```
______________________________________________________________________
## 5. Typed values that read like a human
The heart of `fs`'s charm — values that *know what they are* and print accordingly.
| You have | pyrfs shows | And you can write |
| -------------- | ---------------------------- | ------------------------------------- |
| `455200` bytes | `445.2K` | `fs.file_size("x") > "10KB"` → `True` |
| mode `0o644` | `rw-r--r--` | `perms == "u=rw,go=r"` → `True` |
| `"src//a.txt"` | `src/a.txt` (tidy, coloured) | `FsPath("src") / "a.txt"` |
```
from pyrfs import Bytes, Perms
Bytes("10MB") # Bytes(10485760) -> displays '10M'
Bytes(455200) < "1MB" # True
sum([Bytes("1MB"), Bytes("500KB")]) # Bytes -> '1.46M'
Perms("644") # Perms -> 'rw-r--r--'
Perms("644") & "u+r" # Perms (bitwise), still prints rwx
Perms("644") == "rw-r--r--" # True
```
In pandas these become **real column dtypes** (ExtensionArrays), so the R headline demo ports almost verbatim:
```
(fs.dir_info("pyrfs", recurse=False)
.query("size > '10KB' and type == 'file'") # Bytes column compares to a string
.sort_values("size", ascending=False)
.loc[:, ["path", "permissions", "size", "modification_time"]])
# path permissions size modification_time
# pyrfs/_engine/dirops.py rw-r--r-- 12.4K 2026-06-11 13:35:54
# ...
```
______________________________________________________________________
## 6. The pandas pipe workflow (a first-class use case)
pyrfs is built to flow inside `.pipe()` chains because `*_info` returns a DataFrame and the `.fs` accessor vectorizes path algebra over columns.
```
import pyrfs as fs
big_modules = (
fs.dir_info("pyrfs", recurse=True)
.query("type == 'file'")
.assign(stem=lambda d: d["path"].fs.name())
.pipe(lambda d: d[d["path"].fs.ext() == "py"])
.groupby(d_dir := lambda d: d["path"].fs.dir()) # group by directory
.agg(total=("size", "sum"), n=("path", "size"))
.sort_values("total", ascending=False)
)
```
Reading many files into one frame — `dir_ls()` returns paths you tag by source, the pandas analogue of R's named-vector `map_df(.id=)` trick:
```
files = fs.dir_ls("data", glob="*.tsv")
frame = pd.concat(
{p.name(): pd.read_csv(p, sep="\t") for p in files},
names=["file"],
)
```
______________________________________________________________________
## 7. Safe defaults & argument conventions (learn once)
| Argument | Meaning | Default | On |
| ----------------- | -------------------------------------------------------------- | ------------------------------- | ----------------------- |
| `overwrite` | allow clobbering an existing target | `False` (safe) | copy/move |
| `recurse` | recurse fully (`True`), not (`False`), or to depth (`int`) | `False` listing / `True` create | `dir_*` |
| `all` | include hidden dotfiles | `False` | `dir_ls`, `dir_map` |
| `type` | filter by entry type (`"file"`, `"directory"`, `"symlink"`, …) | `"any"` | `dir_ls`, `dir_info` |
| `glob` / `regexp` | filter listings (mutually exclusive → `FsError` if both) | `None` | `dir_ls`, `path_filter` |
| `fail` | raise (`True`) vs warn (`False`) on inaccessible entries | `True` | directory traversals |
- **Destructive actions opt-in.** `overwrite=False` and bounded `recurse` mean nothing surprising gets deleted or walked.
- **Keyword-only where it aids clarity** — flags like `overwrite`, `recurse`, `all` are keyword-only (`*,`) so call sites read self-documenting: `file_copy(a, b, overwrite=True)`.
______________________________________________________________________
## 8. Explicit failure (Pythonic)
pyrfs raises rather than silently returning a falsy value:
```
fs.file_copy("a.txt", "b.txt") # raises FileExistsError if b.txt exists
fs.file_copy("a.txt", "b.txt", overwrite=True) # ok
fs.dir_ls("nope") # raises FileNotFoundError
fs.path_filter(paths, glob="*.py", regexp=r"\.py$") # raises pyrfs.FsError: cannot set both
# soften a traversal when some entries are unreadable:
fs.dir_ls("/var", recurse=True, fail=False) # warns + skips, returns what it could read
```
- Native `OSError` subclasses (`FileNotFoundError`, `FileExistsError`, `PermissionError`) for OS-level failures — familiar, `try/except`-able.
- `pyrfs.FsError` (with subclasses) for pyrfs validation — friendly, actionable messages.
______________________________________________________________________
## 9. R `fs` → pyrfs translation
| R `fs` | pyrfs functional | pyrfs fluent |
| ----------------------------- | ----------------------------- | ------------------------------------------- |
| `path("a", "b", ext = "txt")` | `path("a", "b", ext="txt")` | `FsPath("a") / "b"` then `.with_ext("txt")` |
| `dir_ls("d", recurse = TRUE)` | `dir_ls("d", recurse=True)` | `FsPath("d").ls(recurse=True)` |
| `dir_info("d")` | `dir_info("d")` → DataFrame | `FsPath("d").info()` |
| `file_copy("a", "b")` | `file_copy("a", "b")` | `FsPath("a").copy_to("b")` |
| `file_size("a")` | `file_size("a")` → `Bytes` | `FsPath("a").size()` |
| `path_ext_set("a.txt", "md")` | `path_ext_set("a.txt", "md")` | `FsPath("a.txt").with_ext("md")` |
| `path_rel("a/b", "a")` | `path_rel("a/b", "a")` | `FsPath("a/b").rel_to("a")` |
| `dir_tree("d")` | `dir_tree("d")` | `FsPath("d").tree()` |
| `x %>% file_delete()` | `df.pipe(...)` / loop | `FsPath(x).delete()` |
Naming is intentionally identical on the functional surface so muscle memory transfers; the fluent surface adds Pythonic method names for OO-style chaining.
______________________________________________________________________
## 10. Small touches (ported from `fs`)
- **`dir_tree()`** prints a coloured box-drawing tree (`├──`, `└──`), like Unix `tree`.
- **`file_show()`** opens a file in the OS default app (cross-platform).
- **`path(..., ext=)`** builds extensions correctly (one dot, no doubling).
- **`path_sanitize()`** turns untrusted strings into safe filenames.
- **`path_rel()` / `path_common()`** — relative paths and longest common dir (no stdlib one-liner).
- **`file_temp_push()/pop()`** — deterministic temp names for reproducible docs/tests.
- **Colour degrades** automatically on non-TTY / `NO_COLOR`.
______________________________________________________________________
## 11. Sharp edges (honest notes)
- **Stricter than stdlib in places.** `file_copy` refuses to overwrite by default — porting loose scripts may surface `FileExistsError`. Opt in with `overwrite=True`.
- **No `dir_move`.** Directories move via `file_move` (dirs are files), matching `fs`.
- **`FsPath` is a `str`, not a `pathlib.Path`.** Great for interop and pandas; if you want `pathlib` semantics call `.as_pathlib()` (helper) — we don't pretend to be `Path`.
- **pandas-only features fail gracefully.** Without the `[pandas]` extra, `dir_info()` returns `list[dict]` and the `.fs` accessor is unavailable; the docstring says so.
- **ExtensionDtype edge cases.** Some exotic pandas ops on `Bytes`/`Perms` columns may need `.astype(int)` first in v1; comparisons, sorting, and `sum/min/max` are supported from the start.
______________________________________________________________________
## 12. Cheat-sheet
```
NOUN_VERB(path, ...) families: path_ file_ dir_ link_ (+ is_*, *_ids, *_temp)
├─ path(s) in str · FsPath · list · pandas.Series (vectorized)
├─ tidy FsPath out always '/', no '//' or trailing '/', UTF-8
├─ typed result FsPath · Bytes('445.2K') · Perms('rwxr-xr-x') · DataFrame
└─ raises on failure OSError subclasses · pyrfs.FsError ; fail=False to soften
three surfaces: fs.file_copy(a,b) · FsPath(a).copy_to(b) · df['p'].fs.ext()
pandas pipe: dir_info(d).query("size > '10KB'").sort_values('size')
safe defaults: overwrite=False · recurse=False(list)/True(create) · all=False · fail=True
from R fs: same functional names; fluent adds Pythonic methods
```
# pyrfs — Architecture
> A Pythonic port of R's [`fs`](https://fs.r-lib.org) · Status: **design draft** · Last updated: 2026-06-11 Companion: [`pyrfs-ux.md`](https://pyrfs.netlify.app/design/pyrfs-ux/index.md) (user-facing design)
______________________________________________________________________
## 1. Purpose & non-goals
**Purpose.** Give Python the same file-system *ergonomics* that R users enjoy from `fs`: consistent `noun_verb` naming families, tidy paths, predictable path-carrying return values, explicit failure, and **typed self-describing values** (human-readable sizes, `rwxr-xr-x` permissions) — while being **chainable/pipeable** and integrating natively with **pandas**.
**What pyrfs is.** A thin, ergonomic, fully-typed wrapper over the Python standard library (`pathlib`, `shutil`, `os`, `stat`, `pwd`/`grp`) plus an optional pandas integration layer.
**Non-goals.**
- *Not* a new filesystem abstraction over remote/cloud backends (that's `fsspec`/`PyFilesystem2`).
- *Not* a C/native extension. R's `fs` needed **libuv** for cross-platform syscalls; Python's stdlib already abstracts that, so **pyrfs is pure Python** — no build step, trivial install.
- *Not* a 1:1 transliteration. We keep `fs`'s *UX contract*, expressed in idiomatic Python.
______________________________________________________________________
## 2. Core principle — *one engine, three surfaces*
Every filesystem operation is implemented **once** in a pure-stdlib `_engine`. The three user-facing surfaces are thin delegations — no logic is duplicated across them.
```
flowchart TD
subgraph surfaces["User-facing surfaces"]
fn["Functional API
file_copy(a, b)
dir_ls(p) · path_ext(p)"]
fp["Fluent FsPath
FsPath(a).copy_to(b)
(FsPath(p) / 'x').with_ext('md')"]
acc["pandas .fs accessor
df['path'].fs.ext()
dir_info(p) -> DataFrame"]
end
eng["pyrfs._engine
(pure stdlib, no pandas)
paths · fileops · dirops · linkops · ids · temp"]
std[("Python stdlib
pathlib · shutil · os · stat · pwd/grp")]
fn --> eng
fp --> eng
acc --> eng
eng --> std
```
**Why this matters:** `fs` itself uses this idea — high-level R verbs compose from a small set of C primitives. pyrfs applies it in pure Python: the fluent object and the pandas accessor are *presentation layers*, and correctness lives in one place.
______________________________________________________________________
## 3. System context
```
flowchart LR
user([Python user / data scientist])
subgraph pyrfs["pyrfs"]
core["core API + FsPath + typed values"]
pdx["optional pandas layer"]
end
pandas{{"pandas (optional extra)"}}
std[("OS filesystem via stdlib")]
user -->|"file_*/dir_*/path_* · FsPath · Series.fs"| pyrfs
core --> std
core -.->|"lazily, if installed"| pdx
pdx --> pandas
```
- **Inbound:** scripts, notebooks, and packages call pyrfs.
- **Hard dependency:** none beyond the standard library (Python ≥ 3.10).
- **Optional:** pandas — enables `*_info` DataFrames, the `.fs` Series accessor, and the ExtensionDtypes. Absent pandas, the core still works and `*_info` returns `list[dict]`.
______________________________________________________________________
## 4. Package layout (flat layout)
The importable package sits at the **top level** (`pyrfs/pyrfs/`), not under `src/`.
```
pyrfs/ # repo root
├── pyproject.toml # setuptools backend, [project], optional-deps, tooling
├── docs/ # these design docs
├── pyrfs/ # the importable package
│ ├── __init__.py # PUBLIC re-exports (functions + FsPath/Bytes/Perms + FsError)
│ ├── py.typed # PEP 561 marker (ships type info)
│ ├── errors.py # FsError hierarchy (validation)
│ ├── fspath.py # FsPath(str) — fluent, chainable [PUBLIC]
│ ├── values.py # Bytes(int), Perms(int) — typed scalars [PUBLIC]
│ ├── display.py # humanize bytes · perms→rwx · LS_COLORS · tidy
│ ├── _engine/ # pure-stdlib core (NEVER imports pandas)
│ │ ├── paths.py # path_* algebra
│ │ ├── fileops.py # file_*
│ │ ├── dirops.py # dir_* (ls/map/walk/info/tree/create/copy/delete)
│ │ ├── linkops.py # link_*
│ │ ├── ids.py # user_ids/group_ids
│ │ ├── temp.py # file_temp stack · path_temp
│ │ └── vectorize.py # polymorphic scalar|iterable dispatch
│ └── _pandas/ # OPTIONAL integration (imported only if pandas present)
│ ├── __init__.py # registers .fs accessor + ExtensionDtypes
│ ├── dtypes.py # BytesDtype, PermsDtype, PathDtype
│ ├── arrays.py # BytesArray, PermsArray, PathArray
│ ├── accessor.py # @register_series_accessor("fs")
│ └── frames.py # build *_info DataFrames with typed columns
└── tests/ # pytest mirror of the package
```
### Module responsibilities
| Module | Responsibility | Depends on |
| ---------------------- | ----------------------------------------------------------------------------------------------------- | ----------------------------------- |
| `_engine/paths.py` | Pure path string algebra (`path`, `path_dir`, `path_ext*`, `path_rel`, `path_norm`, …) | `pathlib`, `os.path` |
| `_engine/fileops.py` | `file_create/copy/move/delete/touch/show/chmod/chown/info/size/access` | `shutil`, `os`, `stat` |
| `_engine/dirops.py` | `dir_create/copy/delete/ls/map/walk/info/tree`, recursion & filtering | `os.scandir`, `pathlib` |
| `_engine/linkops.py` | `link_create/copy/delete/exists/path` | `os` |
| `_engine/ids.py` | `user_ids/group_ids` (POSIX; empty frames on Windows) | `pwd`, `grp` |
| `_engine/temp.py` | `file_temp` deterministic stack, `path_temp` | `tempfile` |
| `_engine/vectorize.py` | Decorator mapping scalar funcs over iterables/Series | — |
| `fspath.py` | `FsPath(str)` fluent object; methods delegate to `_engine` | `_engine`, `display` |
| `values.py` | `Bytes(int)`, `Perms(int)` typed scalars | `display` |
| `display.py` | Formatting/parsing: `humanize_bytes`, `parse_bytes`, `perms_to_str`, `parse_perms`, `tidy`, LS_COLORS | stdlib |
| `_pandas/*` | ExtensionDtypes/arrays, `.fs` accessor, DataFrame builders | `pandas`, reuses `display`/`values` |
**Invariant:** `_engine` and `values`/`display` must never `import pandas`. The optional layer depends inward on them, never the reverse — a classic dependency-inversion boundary.
______________________________________________________________________
## 5. The three surfaces in detail
### 5.1 Functional API (R-`fs` faithful)
Mirrors `fs`'s families and names exactly: `path_*` (pure, no I/O), `file_*`, `dir_*`, `link_*`, predicates (`is_file`, `is_dir`, `is_link`, …), `user_ids`/`group_ids`, temp helpers.
- **Predictable returns:** verbs return `FsPath` (or a list/Series of them); predicates return `bool` or a vectorized mapping; `file_size` → `Bytes`; `*_info` → DataFrame (or `list[dict]`).
- **Safe defaults** ported verbatim: `overwrite=False`, `recurse` defaults matching `fs` (`False` for listing, `True` for `dir_create`), `all=False`, `fail=True`.
- **`recurse: bool | int`** overload — `True`/`False`/depth, exactly like `fs`.
### 5.2 Fluent `FsPath`
`FsPath` **subclasses `str`** — the same choice as R's `fs_path ⊂ character` and the `path` library. Because an `FsPath` *is* a string, it drops into any stdlib or third-party API that expects a path, and serializes cleanly into pandas.
```
classDiagram
class str {
<>
}
class FsPath {
+__truediv__(other) FsPath
+ext() str
+with_ext(ext) FsPath
+dir() FsPath
+name() FsPath
+abs() FsPath
+real() FsPath
+exists() bool
+is_dir() bool
+copy_to(dst) FsPath
+move_to(dst) FsPath
+touch() FsPath
+delete() None
+mkdir(recurse) FsPath
+ls(...) list~FsPath~
+info() DataFrame
}
str <|-- FsPath
FsPath ..> _engine : delegates
```
Methods return `FsPath` (or lists thereof) so calls chain: `(FsPath("a") / "b").with_ext("txt").copy_to("c")`.
### 5.3 pandas `.fs` accessor + DataFrame returns
- A registered Series accessor gives **vectorized path algebra over a column**: `df["path"].fs.ext()`, `.dir()`, `.with_ext("md")`, `.exists()`, `.is_dir()`.
- `dir_info()`/`file_info()` return a DataFrame whose `path`/`size`/`permissions` columns use the ExtensionDtypes, so the R headline demo translates directly:
```
(dir_info("pyrfs", recurse=False)
.query("size > '10KB' and type == 'file'")
.sort_values("size", ascending=False))
```
______________________________________________________________________
## 6. Typed value system
Two cooperating tiers, sharing one set of parse/format functions in `display.py`.
```
flowchart TD
subgraph fmt["display.py — single source of truth"]
hb["humanize_bytes / parse_bytes"]
pp["perms_to_str / parse_perms"]
ti["tidy (path normalizer)"]
end
subgraph scalars["values.py + fspath.py (always available)"]
b["Bytes(int)"]
p["Perms(int)"]
fpath["FsPath(str)"]
end
subgraph arrays["_pandas/arrays.py (optional)"]
ba["BytesArray / BytesDtype"]
pa["PermsArray / PermsDtype"]
pta["PathArray / PathDtype"]
end
hb --> b --> ba
pp --> p --> pa
ti --> fpath --> pta
```
### Scalar wrappers (pure stdlib, always present)
| Type | Subclass of | Construct from | Displays as | Overloads |
| -------- | ----------- | ------------------------------------------ | -------------------------------- | ----------------------------------------------------- |
| `Bytes` | `int` | `int`, `"10MB"`, `"1.5GiB"` | `445.2K` | `<,>,==` parse string RHS; arithmetic returns `Bytes` |
| `Perms` | `int` | octal `"644"`, symbolic `"u+rw,go+r"`, int | `rw-r--r--` | `& \| ~` return `Perms`; `==` parses string RHS |
| `FsPath` | `str` | any path-like | tidy path (coloured in terminal) | `/` for join |
Subclassing the builtins mirrors `fs`'s S3-over-atomic-vector design (`fs_bytes ⊂ numeric`, `fs_perms ⊂ integer`, `fs_path ⊂ character`): a value still behaves like its base type but *remembers what it is* and prints for humans.
### pandas ExtensionArrays (optional)
For each scalar there is a real `ExtensionArray`/`ExtensionDtype` so DataFrame columns are first-class typed:
- `BytesDtype` (`name="bytes"`, backing `int64`) — elements show `445.2K`; native `>`/`<`/`==` against strings inside `.query()`; `sum`/`min`/`max` reductions.
- `PermsDtype` (`name="perms"`) — elements show `rwxr-xr-x`.
- `PathDtype` (`name="path"`, backing object of `FsPath`) — tidy display, ``-style repr.
Implemented with the standard protocol (`_from_sequence`, `__getitem__`, `__len__`, `isna`, `take`, `copy`, `_concat_same_type`) plus `ExtensionScalarOpsMixin` for operators, registered via `@register_extension_dtype`. **They call the same `display.py` functions as the scalars** — no duplicated formatting logic.
______________________________________________________________________
## 7. Vectorization model
R's `fs` is vectorized end to end. Python is scalar-by-default; pyrfs bridges this with a small `@vectorized` decorator in `_engine/vectorize.py`:
```
input type → output type
-------------------------------------
str | PathLike | FsPath → scalar (FsPath/Bytes/bool)
list | tuple | set → list
pandas.Series → pandas.Series (only if pandas importable)
```
This gives `file_exists(["a", "b"])` → `[bool, bool]` and `path_ext(series)` → `Series`, while a single path returns a single value. The `.fs` accessor is the *idiomatic* vectorized-over-column surface; the decorator makes the bare functions polymorphic too.
```
flowchart LR
inp["caller input"] --> dec{"@vectorized
dispatch on type"}
dec -->|scalar| s["f(x) -> scalar"]
dec -->|iterable| l["[f(x) for x] -> list"]
dec -->|Series| ser["x.map(f) -> Series"]
```
______________________________________________________________________
## 8. Error model
`fs`'s promise is **explicit failure** (throw, never a silent `FALSE`). Python's stdlib already honors this — `os`/`shutil`/`pathlib` raise `OSError` subclasses. pyrfs's policy:
- **Reuse native exceptions** where they fit: `FileNotFoundError`, `FileExistsError`, `PermissionError` (all `OSError`). `overwrite=False` on an existing target → `FileExistsError` (matches `fs`).
- **Add `pyrfs.FsError(Exception)`** for pyrfs-level validation that has no native equivalent — e.g. `glob` and `regexp` both set, recycling length mismatch, bad permission/size literal. Subclasses (`FsValueError`, …) let callers `except` precisely, mirroring `fs`'s classed `fs_error`/`invalid_argument`.
- **`fail=False`** softens directory traversals (`dir_ls`/`dir_map`/`dir_info`) from error to warning when a single entry is inaccessible — a direct port of `fs`'s `fail` knob.
```
flowchart TD
op["pyrfs operation"] --> k{failure?}
k -->|"OS-level"| oserr["raise FileNotFoundError /
FileExistsError / PermissionError"]
k -->|"bad argument"| fserr["raise pyrfs.FsError subclass"]
k -->|"traversal entry, fail=False"| warn["warnings.warn(), skip entry"]
k -->|none| ok["return typed value (FsPath/Bytes/bool/DataFrame)"]
```
______________________________________________________________________
## 9. Optional-dependency strategy
pandas is an **extra** (`pip install pyrfs[pandas]`). The mechanism:
- `_engine` and `values`/`display` never import pandas → core is import-safe without it.
- `pyrfs/__init__.py` attempts `import pyrfs._pandas` inside a `try/except ImportError`; success registers the `.fs` accessor and the ExtensionDtypes.
- `*_info` functions check a cached `has_pandas()` flag: return a typed **DataFrame** when present, else a plain **`list[dict]`** (still useful, still typed scalars in each row).
This mirrors `fs`'s R philosophy: hard deps minimal (`Imports: methods`), rich integrations as *Suggests* (`pillar`, `vctrs`) wired up lazily in `.onLoad`.
______________________________________________________________________
## 10. Build & tooling
- **Backend:** setuptools (`[build-system] requires = ["setuptools>=68"]`).
- **Layout:** flat — `[tool.setuptools.packages.find] where = ["."]`, `include = ["pyrfs*"]`.
- **Env/locking:** `uv` (`uv sync`, `uv run …`).
- **Python:** `requires-python = ">=3.10"`.
- **Extras:** `pandas = ["pandas>=2.0"]`, optional `color`, `dev = ["pytest","ruff","mypy"]`.
- **Quality gates:** `ruff` (lint+format), `mypy --strict` (no `Any`, `py.typed` shipped), `pytest` (pandas tests guarded by `importorskip`, run with and without the extra).
- **Docstrings:** NumPy style on the public API.
______________________________________________________________________
## 11. Representative flow — `file_copy("a.txt", dest_dir)`
```
sequenceDiagram
participant U as caller
participant F as file_copy (functional API)
participant V as vectorize
participant E as _engine.fileops
participant S as shutil/os
participant D as display.tidy
U->>F: file_copy("a.txt", "out/")
F->>V: dispatch on input shape
V->>E: _copy_one("a.txt", "out/", overwrite=False)
E->>E: resolve dir target -> "out/a.txt"; check exists
alt exists and not overwrite
E-->>U: raise FileExistsError
else
E->>S: shutil.copy2("a.txt", "out/a.txt")
E->>D: tidy("out/a.txt")
D-->>F: FsPath("out/a.txt")
F-->>U: FsPath
end
```
The same `_engine._copy_one` backs `FsPath.copy_to` and any `.fs`-accessor copy — *one engine, three surfaces*.
______________________________________________________________________
## 12. Open questions & notes
- **Path display colour.** `FsPath.__repr__` colouring via `LS_COLORS` is deferred to a late phase (P6); it must degrade cleanly on non-TTY / `NO_COLOR`. Default plan: plain until P6.
- **ExtensionArray scope.** Full operator/reduction coverage on `BytesArray` is the heaviest piece; v1 targets comparisons + `sum/min/max`. Edge cases (groupby aggregations, `astype` round-trips) to be pinned down with tests in P5.
- **Windows specifics.** `user_ids`/`group_ids` return empty frames (no `pwd`/`grp`); symlink creation may require privilege. Tidy paths always use `/`. To be verified on a Windows runner.
- **`path_expand` semantics.** `fs` distinguishes `path_expand` vs `path_expand_r`; pyrfs maps the former to `os.path.expanduser` and will document any divergence rather than hide it.
- **`dir_move`.** Like `fs`, pyrfs intentionally has no `dir_move` — directories move via `file_move`.
# Project
# Changelog
All notable changes to **pyrfs** are documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and the project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased](https://github.com/Lightbridge-KS/pyrfs/compare/v0.1.0...HEAD)
## [0.1.0](https://github.com/Lightbridge-KS/pyrfs/releases/tag/v0.1.0) - 2026-06-11
Initial release — a Pythonic port of the UX of R's [fs](https://fs.r-lib.org) package.
### Added
- **Path algebra** (`path_*`, no I/O): `path()` with `ext=`, `path_dir`/ `path_file`/`path_ext*`, `path_rel`, `path_common`, `path_filter` (glob/regexp, mutually exclusive), `path_split`/`path_join`, `path_has_parent`, `path_sanitize`, `path_expand`/`path_home`/`path_temp`, `path_tidy`.
- **`FsPath`** — a tidy path that subclasses `str`: `/` join operator, chainable methods delegating to the engine, `LS_COLORS`-coloured repr (degrades on non-TTY / `NO_COLOR`), `as_pathlib()` escape hatch.
- **Typed scalars**: `Bytes ⊂ int` (parses `"10MB"`, displays `444.5K`, compares against literals, arithmetic stays typed — all units 1024-based) and `Perms ⊂ int` (octal/symbolic/`rw-r--r--` forms, mode algebra).
- **File operations** (`file_*`): create/touch/copy/move/delete/exists/ access/size/chmod/chown/show/info — mutating verbs return the new path; `overwrite=False` raises `FileExistsError`; copy/move into an existing directory targets `dir/basename`; symbolic chmod applies to the current mode.
- **Directory operations** (`dir_*`): create/copy/delete/exists, lazy `dir_walk` generator with the full fs filter set (`all`, `recurse: bool | int`, `type`, `glob`/`regexp`, `invert`, `fail=False` → warn-and-skip), `dir_ls`, `dir_map`, `dir_info`, and a box-drawing, coloured `dir_tree`. No `dir_move` by design — use `file_move`.
- **Link operations** (`link_*`): symbolic (default) and hard creation, `link_path`, `link_exists`, `link_copy`, `link_delete` (refuses non-links).
- **Predicates & ids**: `is_file`/`is_dir`/`is_link` (lstat semantics — a symlink is only `is_link`), `is_file_empty`, `is_dir_empty`, `is_absolute_path`; `user_ids`/`group_ids` (POSIX).
- **Vectorization**: every path-taking function is polymorphic over a scalar, list/tuple/set, or pandas Series (without the engine importing pandas).
- **pandas layer** (optional `[pandas]` extra): `bytes`/`perms`/`path` ExtensionDtypes lifting the scalar semantics onto columns (`size > "10KB"` works in `.query()`), the `Series.fs` accessor, and `file_info`/`dir_info` returning typed DataFrames (engine rows without pandas).
- **Temp helpers**: `file_temp` with a deterministic `file_temp_push`/`pop` stack for reproducible docs and tests.
- **Errors**: native `OSError` subclasses for OS failures; `FsError`/ `FsValueError` for pyrfs-level validation.
- **Docs**: MkDocs Material site at with llms.txt/llms-full.txt, an executed tour notebook, and a Quarto-rendered README kept fresh by CI.