步骤 1:先做“选编码”——多编码尝试读取
这是最关键的一步。中国导出的 CSV 经常不是 UTF-8,所以代码不能假定编码,而是按顺序“试一遍”。
def _load_csv_rows(path: Path) -> tuple[list[dict[str, str]], str]: """ 按多种编码尝试读取 CSV。 Windows/国内导出常见 gb18030、gbk;Excel「CSV UTF-8」为 utf-8-sig。 """ candidates = ("utf-8-sig", "utf-8", "gb18030", "gbk", "cp936") last_err: Exception | None = None for enc in candidates: try: with path.open("r", encoding=enc, newline="") as f: reader = csv.DictReader(f) rows = list(reader) return rows, enc except UnicodeDecodeError as e: last_err = e raise RuntimeError( f"无法解码 CSV(已尝试