pyrfs tour — one engine, three surfaces¶

pyrfs ports the UX of R's fs to Python: consistent noun_verb naming, tidy paths, typed self-describing values, explicit failure — chainable and pandas-native.

The same operation is reachable three ways:

Surface	Style	Example
A. Functional	closest to R `fs`	`fs.file_copy(a, b)`
B. Fluent `FsPath`	Pythonic chaining	`FsPath(a).copy_to(b)`
C. pandas	columns & frames	`df["path"].fs.ext()`

In [1]:

Copied!

import pyrfs as fs
from pyrfs import Bytes, FsPath, Perms

fs.__version__
import pyrfs as fs
from pyrfs import Bytes, FsPath, Perms

fs.__version__

Out[1]:

'0.1.0'

Setup — build a demo tree (fluent surface)¶

file_temp() returns a fresh temp name; every mutating verb returns the new path, so calls chain.

In [2]:

Copied!

work = FsPath(fs.file_temp(pattern="pyrfs_tour")).mkdir()

(work / "src").mkdir().touch_file("app.py").touch_file("utils.py")
(work / "data").mkdir().touch_file("raw.csv").touch_file("clean.parquet")
work.touch_file("README.md")

with open(work / "data" / "raw.csv", "w") as fh:
    fh.write("x,y\n" + "\n".join(f"{i},{i**2}" for i in range(500)))

work.tree()
work = FsPath(fs.file_temp(pattern="pyrfs_tour")).mkdir()

(work / "src").mkdir().touch_file("app.py").touch_file("utils.py")
(work / "data").mkdir().touch_file("raw.csv").touch_file("clean.parquet")
work.touch_file("README.md")

with open(work / "data" / "raw.csv", "w") as fh:
    fh.write("x,y\n" + "\n".join(f"{i},{i**2}" for i in range(500)))

work.tree()

/tmp/pyrfs_tourdd803e92b8aa
├── README.md
├── data
│   ├── clean.parquet
│   └── raw.csv
└── src
    ├── app.py
    └── utils.py

Surface A — functional (R `fs` muscle memory transfers)¶

Four families by noun: path_* (pure string algebra, no I/O), file_*, dir_*, link_*.

In [3]:

Copied!

fs.path("foo", "bar", "a", ext="txt")  # pure construction, tidy output
fs.path("foo", "bar", "a", ext="txt")  # pure construction, tidy output

Out[3]:

FsPath('foo/bar/a.txt')

In [4]:

Copied!

fs.dir_ls(work, recurse=True, glob="*.py")
fs.dir_ls(work, recurse=True, glob="*.py")

Out[4]:

[FsPath('/tmp/pyrfs_tourdd803e92b8aa/src/app.py'),
 FsPath('/tmp/pyrfs_tourdd803e92b8aa/src/utils.py')]

In [5]:

Copied!

# path algebra is vectorized: scalar in -> scalar out, list in -> list out
fs.path_ext(["a.txt", "b.md", "c.tar.gz"])
# path algebra is vectorized: scalar in -> scalar out, list in -> list out
fs.path_ext(["a.txt", "b.md", "c.tar.gz"])

Out[5]:

['txt', 'md', 'gz']

In [6]:

Copied!





# explicit failure: overwrite is opt-in
backup = fs.file_copy(work / "data/raw.csv", work / "data/raw_backup.csv")
try:
    fs.file_copy(work / "data/raw.csv", backup)
except FileExistsError as err:
    print("refused:", err)
# explicit failure: overwrite is opt-in
backup = fs.file_copy(work / "data/raw.csv", work / "data/raw_backup.csv")
try:
    fs.file_copy(work / "data/raw.csv", backup)
except FileExistsError as err:
    print("refused:", err)

refused: target already exists: FsPath('/tmp/pyrfs_tourdd803e92b8aa/data/raw_backup.csv') (pass overwrite=True)

Surface B — fluent `FsPath`¶

FsPath is a str — it drops into open(), pd.read_csv(), anything. Methods return FsPath, so everything chains.

In [7]:

Copied!

p = (work / "data" / "raw.csv").with_ext("parquet")
p, p.dir(), p.name(), p.ext(), isinstance(p, str)
p = (work / "data" / "raw.csv").with_ext("parquet")
p, p.dir(), p.name(), p.ext(), isinstance(p, str)

Out[7]:

(FsPath('/tmp/pyrfs_tourdd803e92b8aa/data/raw.parquet'),
 FsPath('/tmp/pyrfs_tourdd803e92b8aa/data'),
 FsPath('raw.parquet'),
 'parquet',
 True)

In [8]:

Copied!

(work / "data" / "raw.csv").copy_to(work / "src", overwrite=True).size()
(work / "data" / "raw.csv").copy_to(work / "src", overwrite=True).size()

Out[8]:

Bytes(4930)

Typed values — they know what they are¶

Bytes ⊂ int and Perms ⊂ int: still numbers, but they parse human literals and print for humans.

In [9]:

Copied!





size = fs.file_size(work / "data/raw.csv")
print(f"{size!r} -> displays as {size}")
print("bigger than 1KB? ", size > "1KB")
print("sum stays typed:  ", sum([Bytes("1MB"), Bytes("500KB")]))
size = fs.file_size(work / "data/raw.csv")
print(f"{size!r} -> displays as {size}")
print("bigger than 1KB? ", size > "1KB")
print("sum stays typed:  ", sum([Bytes("1MB"), Bytes("500KB")]))

Bytes(4930) -> displays as 4.81K
bigger than 1KB?  True
sum stays typed:   1.49M

In [10]:

Copied!





perms = Perms("644")
print(repr(perms))
print(perms == "rw-r--r--", perms == "u=rw,go=r")
print("after u+x:", perms | "u+x")
perms = Perms("644")
print(repr(perms))
print(perms == "rw-r--r--", perms == "u=rw,go=r")
print("after u+x:", perms | "u+x")

Perms('rw-r--r--')
True True
after u+x: rwxr--r--

Surface C — pandas (`pip install pyrfs[pandas]`)¶

dir_info() returns a DataFrame whose path / size / permissions columns are real ExtensionDtypes — so the R fs headline demo ports almost verbatim: string literals work inside .query().

In [11]:

Copied!





import pandas as pd

(
    fs.dir_info(work, recurse=True)
    .query("size > '1KB' and type == 'file'")
    .sort_values("size", ascending=False)
    .loc[:, ["path", "permissions", "size", "modification_time"]]
)
import pandas as pd

(
    fs.dir_info(work, recurse=True)
    .query("size > '1KB' and type == 'file'")
    .sort_values("size", ascending=False)
    .loc[:, ["path", "permissions", "size", "modification_time"]]
)

Out[11]:

	path	permissions	size	modification_time
3	/tmp/pyrfs_tourdd803e92b8aa/data/raw.csv	rw-r--r--	4.81K	2026-06-11 15:59:44.333220
4	/tmp/pyrfs_tourdd803e92b8aa/data/raw_backup.csv	rw-r--r--	4.81K	2026-06-11 15:59:44.333220
7	/tmp/pyrfs_tourdd803e92b8aa/src/raw.csv	rw-r--r--	4.81K	2026-06-11 15:59:44.333220

In [12]:

Copied!





# the .fs accessor vectorizes path algebra over a column
df = pd.DataFrame({"path": fs.dir_ls(work, recurse=True, type="file")})
df.assign(
    name=df["path"].fs.name(),
    ext=df["path"].fs.ext(),
    dir=df["path"].fs.rel_to(work).fs.dir(),
    size=df["path"].fs.size(),
)
# the .fs accessor vectorizes path algebra over a column
df = pd.DataFrame({"path": fs.dir_ls(work, recurse=True, type="file")})
df.assign(
    name=df["path"].fs.name(),
    ext=df["path"].fs.ext(),
    dir=df["path"].fs.rel_to(work).fs.dir(),
    size=df["path"].fs.size(),
)

Out[12]:

	path	name	ext	dir	size
0	/tmp/pyrfs_tourdd803e92b8aa/README.md	README.md	md	.	0B
1	/tmp/pyrfs_tourdd803e92b8aa/data/clean.parquet	clean.parquet	parquet	data	0B
2	/tmp/pyrfs_tourdd803e92b8aa/data/raw.csv	raw.csv	csv	data	4.81K
3	/tmp/pyrfs_tourdd803e92b8aa/data/raw_backup.csv	raw_backup.csv	csv	data	4.81K
4	/tmp/pyrfs_tourdd803e92b8aa/src/app.py	app.py	py	src	0B
5	/tmp/pyrfs_tourdd803e92b8aa/src/raw.csv	raw.csv	csv	src	4.81K
6	/tmp/pyrfs_tourdd803e92b8aa/src/utils.py	utils.py	py	src	0B

In [13]:

Copied!





# pipe workflow: total size per directory, sorted
(
    fs.dir_info(work, recurse=True, type="file")
    .assign(dir=lambda d: d["path"].fs.rel_to(work).fs.dir())
    .groupby("dir", observed=True)["size"]
    .agg(["count", "max"])
    .sort_values("max", ascending=False)
)
# pipe workflow: total size per directory, sorted
(
    fs.dir_info(work, recurse=True, type="file")
    .assign(dir=lambda d: d["path"].fs.rel_to(work).fs.dir())
    .groupby("dir", observed=True)["size"]
    .agg(["count", "max"])
    .sort_values("max", ascending=False)
)

Out[13]:

	count	max
dir
data	3	4.81K
src	3	4.81K
.	1	0B

Cleanup¶

In [14]:

Copied!

work.rmdir()
work.exists()
work.rmdir()
work.exists()

Out[14]:

False