# Files, Streams, and Data Processing Guidelines

## Basic principles

- Prefer streaming and iterators over loading entire large files into memory.
- Be explicit about encodings; default to UTF-8 when reasonable.
- Use context managers for all resources (`with open(...) as f:`).

## Large files and performance

- For large text/binary files:
  - process in chunks or line-by-line
  - consider `mmap` for specific use cases where it simplifies access patterns.
- Avoid unnecessary copies of large data structures.
- For data processing, consider columnar formats (e.g. Parquet) when appropriate.

## Safety and atomicity

- For writes that must not corrupt data:
  - write to a temporary file
  - fsync if necessary
  - then atomically rename.
- Validate paths and avoid directory traversal vulnerabilities when working with user-supplied paths.
- Handle missing directories gracefully (create them when sensible, or fail with a clear error).

## Formats and parsing

- Prefer standard libraries (`json`, `csv`, `pathlib`) where possible.
- When using third-party libraries (e.g. `pyyaml`), use safe loading functions.
- Clearly define schemas (via `pydantic` or dataclasses) when reading structured data.

## Cross-platform behavior

- Use `pathlib` instead of manual string path manipulation.
- Be mindful of line endings, file permissions, and case sensitivity across OSes.