Files
claude-skills/my-python-senior/files-io.md
T
2026-03-21 19:36:11 +03:00

35 lines
1.3 KiB
Markdown

# Files, Streams, and Data Processing Guidelines
## Basic principles
- Prefer streaming and iterators over loading entire large files into memory.
- Be explicit about encodings; default to UTF-8 when reasonable.
- Use context managers for all resources (`with open(...) as f:`).
## Large files and performance
- For large text/binary files:
- process in chunks or line-by-line
- consider `mmap` for specific use cases where it simplifies access patterns.
- Avoid unnecessary copies of large data structures.
- For data processing, consider columnar formats (e.g. Parquet) when appropriate.
## Safety and atomicity
- For writes that must not corrupt data:
- write to a temporary file
- fsync if necessary
- then atomically rename.
- Validate paths and avoid directory traversal vulnerabilities when working with user-supplied paths.
- Handle missing directories gracefully (create them when sensible, or fail with a clear error).
## Formats and parsing
- Prefer standard libraries (`json`, `csv`, `pathlib`) where possible.
- When using third-party libraries (e.g. `pyyaml`), use safe loading functions.
- Clearly define schemas (via `pydantic` or dataclasses) when reading structured data.
## Cross-platform behavior
- Use `pathlib` instead of manual string path manipulation.
- Be mindful of line endings, file permissions, and case sensitivity across OSes.