pyrfs tour — one engine, three surfaces¶
pyrfs ports the UX of R's fs to Python: consistent noun_verb
naming, tidy paths, typed self-describing values, explicit failure — chainable and
pandas-native.
The same operation is reachable three ways:
| Surface | Style | Example |
|---|---|---|
| A. Functional | closest to R fs |
fs.file_copy(a, b) |
B. Fluent FsPath |
Pythonic chaining | FsPath(a).copy_to(b) |
| C. pandas | columns & frames | df["path"].fs.ext() |
import pyrfs as fs
from pyrfs import Bytes, FsPath, Perms
fs.__version__
'0.1.0'
Setup — build a demo tree (fluent surface)¶
file_temp() returns a fresh temp name; every mutating verb returns the new path,
so calls chain.
work = FsPath(fs.file_temp(pattern="pyrfs_tour")).mkdir()
(work / "src").mkdir().touch_file("app.py").touch_file("utils.py")
(work / "data").mkdir().touch_file("raw.csv").touch_file("clean.parquet")
work.touch_file("README.md")
with open(work / "data" / "raw.csv", "w") as fh:
fh.write("x,y\n" + "\n".join(f"{i},{i**2}" for i in range(500)))
work.tree()
/tmp/pyrfs_tourdd803e92b8aa ├── README.md ├── data │ ├── clean.parquet │ └── raw.csv └── src ├── app.py └── utils.py
Surface A — functional (R fs muscle memory transfers)¶
Four families by noun: path_* (pure string algebra, no I/O), file_*, dir_*, link_*.
fs.path("foo", "bar", "a", ext="txt") # pure construction, tidy output
FsPath('foo/bar/a.txt')
fs.dir_ls(work, recurse=True, glob="*.py")
[FsPath('/tmp/pyrfs_tourdd803e92b8aa/src/app.py'),
FsPath('/tmp/pyrfs_tourdd803e92b8aa/src/utils.py')]
# path algebra is vectorized: scalar in -> scalar out, list in -> list out
fs.path_ext(["a.txt", "b.md", "c.tar.gz"])
['txt', 'md', 'gz']
# explicit failure: overwrite is opt-in
backup = fs.file_copy(work / "data/raw.csv", work / "data/raw_backup.csv")
try:
fs.file_copy(work / "data/raw.csv", backup)
except FileExistsError as err:
print("refused:", err)
refused: target already exists: FsPath('/tmp/pyrfs_tourdd803e92b8aa/data/raw_backup.csv') (pass overwrite=True)
Surface B — fluent FsPath¶
FsPath is a str — it drops into open(), pd.read_csv(), anything. Methods
return FsPath, so everything chains.
p = (work / "data" / "raw.csv").with_ext("parquet")
p, p.dir(), p.name(), p.ext(), isinstance(p, str)
(FsPath('/tmp/pyrfs_tourdd803e92b8aa/data/raw.parquet'),
FsPath('/tmp/pyrfs_tourdd803e92b8aa/data'),
FsPath('raw.parquet'),
'parquet',
True)
(work / "data" / "raw.csv").copy_to(work / "src", overwrite=True).size()
Bytes(4930)
Typed values — they know what they are¶
Bytes ⊂ int and Perms ⊂ int: still numbers, but they parse human literals and
print for humans.
size = fs.file_size(work / "data/raw.csv")
print(f"{size!r} -> displays as {size}")
print("bigger than 1KB? ", size > "1KB")
print("sum stays typed: ", sum([Bytes("1MB"), Bytes("500KB")]))
Bytes(4930) -> displays as 4.81K bigger than 1KB? True sum stays typed: 1.49M
perms = Perms("644")
print(repr(perms))
print(perms == "rw-r--r--", perms == "u=rw,go=r")
print("after u+x:", perms | "u+x")
Perms('rw-r--r--')
True True
after u+x: rwxr--r--
Surface C — pandas (pip install pyrfs[pandas])¶
dir_info() returns a DataFrame whose path / size / permissions columns are
real ExtensionDtypes — so the R fs headline demo ports almost verbatim:
string literals work inside .query().
import pandas as pd
(
fs.dir_info(work, recurse=True)
.query("size > '1KB' and type == 'file'")
.sort_values("size", ascending=False)
.loc[:, ["path", "permissions", "size", "modification_time"]]
)
| path | permissions | size | modification_time | |
|---|---|---|---|---|
| 3 | /tmp/pyrfs_tourdd803e92b8aa/data/raw.csv | rw-r--r-- | 4.81K | 2026-06-11 15:59:44.333220 |
| 4 | /tmp/pyrfs_tourdd803e92b8aa/data/raw_backup.csv | rw-r--r-- | 4.81K | 2026-06-11 15:59:44.333220 |
| 7 | /tmp/pyrfs_tourdd803e92b8aa/src/raw.csv | rw-r--r-- | 4.81K | 2026-06-11 15:59:44.333220 |
# the .fs accessor vectorizes path algebra over a column
df = pd.DataFrame({"path": fs.dir_ls(work, recurse=True, type="file")})
df.assign(
name=df["path"].fs.name(),
ext=df["path"].fs.ext(),
dir=df["path"].fs.rel_to(work).fs.dir(),
size=df["path"].fs.size(),
)
| path | name | ext | dir | size | |
|---|---|---|---|---|---|
| 0 | /tmp/pyrfs_tourdd803e92b8aa/README.md | README.md | md | . | 0B |
| 1 | /tmp/pyrfs_tourdd803e92b8aa/data/clean.parquet | clean.parquet | parquet | data | 0B |
| 2 | /tmp/pyrfs_tourdd803e92b8aa/data/raw.csv | raw.csv | csv | data | 4.81K |
| 3 | /tmp/pyrfs_tourdd803e92b8aa/data/raw_backup.csv | raw_backup.csv | csv | data | 4.81K |
| 4 | /tmp/pyrfs_tourdd803e92b8aa/src/app.py | app.py | py | src | 0B |
| 5 | /tmp/pyrfs_tourdd803e92b8aa/src/raw.csv | raw.csv | csv | src | 4.81K |
| 6 | /tmp/pyrfs_tourdd803e92b8aa/src/utils.py | utils.py | py | src | 0B |
# pipe workflow: total size per directory, sorted
(
fs.dir_info(work, recurse=True, type="file")
.assign(dir=lambda d: d["path"].fs.rel_to(work).fs.dir())
.groupby("dir", observed=True)["size"]
.agg(["count", "max"])
.sort_values("max", ascending=False)
)
| count | max | |
|---|---|---|
| dir | ||
| data | 3 | 4.81K |
| src | 3 | 4.81K |
| . | 1 | 0B |
Cleanup¶
work.rmdir()
work.exists()
False