【记录】LLM｜魔搭社区加载本地模型Qwen-Image-Layered（附可运行的代码）-深圳市維司達科技有限公司

文章目录

【记录】LLM｜魔搭社区加载本地模型 Qwen-Image-Layered
- 前言
- 一、准备环境
- 二、在线加载（网络可用时）
- 三、常见报错与解决
- - 四、运行效果
- 结语

【记录】LLM｜魔搭社区加载本地模型 Qwen-Image-Layered

运行环境：
CPU：Intel® Xeon® Gold 6133 CPU @ 2.50GHz
GPU：NVIDIA GeForce RTX 4090
系统：Ubuntu 24.04.2 LTS

成文验证时间：2026/01/07
若之后接口更新，本方法无法使用，请在评论区留言，我看到之后可能会更新新版。
本文主要适用于 Linux 场景，但 Windows 场景或 MacOS 也只要略微调整终端指令，改动不大，也可以参考本文。
对应的模型：Qwen-Image-Layered · 模型库

若无特殊指出，本文中提到的指令都是在终端直接运行的。

前言

网上有不少用 HuggingFaceEmbeddings 的示例，但那是“文本嵌入模型”的用法；Qwen-Image-Layered 是一个“图像分层生成/分解”的扩散模型，应该用 diffusers 的专用 Pipeline 来加载。
如果你像我一样误入歧途，跑去 HF 页面逐个下载文件、或者用 Embedding 类去加载它，十有八九会报错（例如 Unrecognized model 或 peft 版本过低等）。下面给出在 Linux 下“从魔搭社区下载到本地，然后离线加载”的最稳妥流程，并补充网络受限/限流时的镜像与 Token 方案。

性能提示：该模型的生成速度相对较慢，运行时间较长。在 RTX Pro 6000 96GB VRAM 上，峰值显存占用可以达到 45GB, 1024 px 生成时间需要 120s。根据一些 RTX 4090 用户的反馈，该工作流几乎会占满所有显存。建议显存较小的用户使用 FP8 版本以降低显存占用。
参考：Qwen-Image-Layered ComfyUI 工作流使用指南 | ComfyUI Wiki

FP8 版本指路：Qwen-Image-Layered ComfyUI 工作流使用指南 | ComfyUI Wiki。
如果你和我一样，不知道怎么用 fp8 版本，那你可以不用，然后继续往下看。

一、准备环境

建议使用独立虚拟环境（可选）：

python -m venv ~/.venvs/qwen-imgsource~/.venvs/qwen-img/bin/activate python -V# 建议 Python 3.12+

安装依赖（含关键版本约束，避免已知兼容性问题）：

首先跟着这篇博客去安装 pytorch：【安装】PyTorch｜查看并调整 Cuda 版本以适应 PyTorch 安装的指南_cuda11.2 能装 11.1 的 pytorch 吗-CSDN 博客

安装好之后，安装以下依赖：

# 步骤 1:安装所需包pipinstall-U pip pipinstalltransformers>=4.57.3 pipinstallgit+https://github.com/huggingface/diffusers pipinstallpython-pptx torch pillow pipinstall-U"accelerate>=0.26.0"\"diffusers>=0.30.0""huggingface_hub>=0.23.0""peft>=0.17.0"Pillow pipinstallpsd-tools# 步骤 2:验证 CUDA 可用性(GPU 用户)python -c"import torch; print(torch.cuda.is_available())"

说明：

peft>=0.17.0 很关键，否则 diffusers 初始化会直接报错（你可能见过 peft==0.15.1 的报错）。
有 NVIDIA GPU 时，建议安装与你 CUDA 匹配的 PyTorch 轮子，以便使用 GPU + bfloat16/float16 推理。

二、在线加载（网络可用时）

如果你倾向于在线拉取（首次下载，后续走本地缓存），建议配置镜像与 Token，降低 429 限流概率。

配置镜像与 Token（可选但强烈建议）：

# 镜像（国内常用）exportHF_ENDPOINT=https://hf-mirror.com# Token（在 Hugging Face 个人设置中创建 Read 权限 Token）exportHF_TOKEN="hf_xxx_your_token_here"

Token 获取方式参考这篇博客：镜像站也能被限流？用 Unsloth 踩坑实录：加个 Hugging Face Token，下载速度直接起飞_to continue using our service, create a hf account-CSDN 博客

代码中登录或显式传入 Token（两种选其一）：

fromhuggingface_hubimportlogin login(token="hf_xxx_your_token_here")

或

pipe=QwenImageLayeredPipeline.from_pretrained("Qwen/Qwen-Image-Layered",token="hf_xxx_your_token_here",cache_dir="./hf_cache"# 指定本地缓存目录)

常见 429（Too Many Requests）限流处理：

一定要使用 Token（匿名访问配额极低）。
降低并发下载数量（如首次拉取时不要并行开太多进程/线程）。
命中 429 时，遵循返回的 Retry-After 再重试。
下载成功后会走本地缓存，后续不再重复请求。

4）完整代码：

fromdiffusersimportQwenImageLayeredPipelineimporttorchfromPILimportImagedefpick_best_gpu():best_i,best_free=0,-1foriinrange(torch.cuda.device_count()):torch.cuda.set_device(i)free,total=torch.cuda.mem_get_info()iffree>best_free:best_i,best_free=i,freereturnbest_i gpu_idx=pick_best_gpu()device=torch.device(f"cuda:{gpu_idx}")pipeline=QwenImageLayeredPipeline.from_pretrained("Qwen/Qwen-Image-Layered")pipeline=pipeline.to("cuda",torch.bfloat16)pipeline.set_progress_bar_config(disable=None)image=Image.open("test.jpg").convert("RGBA")inputs={"image":image,"generator":torch.Generator(device=device).manual_seed(777),"true_cfg_scale":4.0,"negative_prompt":" ","num_inference_steps":50,"num_images_per_prompt":1,"layers":4,"resolution":640,# Using different bucket (640, 1024) to determine the resolution. For this version, 640 is recommended"cfg_normalize":True,# Whether enable cfg normalization."use_en_prompt":True,# Automatic caption language if user does not provide caption}withtorch.inference_mode():output=pipeline(**inputs)output_image=output.images[0]fori,imageinenumerate(output_image):image.save(f"{i}.png")

如果你的显存不足的话，可以开均衡模式，这样子的话，单张显卡不会爆掉：

fromdiffusersimportQwenImageLayeredPipelineimporttorchfromPILimportImagefromaccelerateimportAccelerator# 1. 移除手动选择 GPU 的代码# 2. 使用 device_map="balanced" 自动分配显存# balanced 策略会尽量均匀分配，auto 策略可能会优先填满第一张pipeline=QwenImageLayeredPipeline.from_pretrained("Qwen/Qwen-Image-Layered",torch_dtype=torch.bfloat16,device_map="balanced",# 关键参数：自动切分模型到所有可用 GPU# 如果模型不支持直接 device_map，请参考下文的“备选方案”)# 注意：使用了 device_map 后，不要再调用 pipeline.to("cuda")，否则会破坏分配# 3. 启用显存优化（可选，如果显存依然紧张）# pipeline.enable_model_cpu_offload() # 单卡跑大模型时用这个# pipeline.enable_vae_slicing() # 降低 VAE 解码时的显存image=Image.open("test.jpg").convert("RGBA")inputs={"image":image,# generator 不需要手动指定 device，或者指定为 "cuda" 即可，pipeline 会自动处理"generator":torch.Generator(device="cuda").manual_seed(777),"true_cfg_scale":4.0,"negative_prompt":" ","num_inference_steps":50,"num_images_per_prompt":1,"layers":4,"resolution":1024,"cfg_normalize":True,"use_en_prompt":True,}withtorch.inference_mode():output=pipeline(**inputs)output_image=output.images[0]fori,imginenumerate(output_image):img.save(f"{i}.png")

三、常见报错与解决

报错：ImportError: peft>=0.17.0 is required … but found peft==0.15.1
解决：升级 peft
```
pipinstall-U"peft>=0.17.0"
```
报错：429 Client Error: Too Many Requests（尤其使用 hf-mirror.com 时）
解决：设置 Token，并尽量减少并发；必要时等待片刻再试。
```
exportHF_ENDPOINT=https://hf-mirror.comexportHF_TOKEN="hf_xxx_your_token_here"
```
也可在 from_pretrained 里传入 token=… 或使用 huggingface_hub.login。
报错：Cannot load model … model is not cached locally and an error occurred while trying to fetch metadata
解决：要么在离线场景下用 local_files_only=True 并确保本地目录完整；要么联网+Token 让 from_pretrained 能拉取元数据。
报错：torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB. GPU 0 has a total capacity of 23.55 GiB of which 56.56 MiB is free. Including non-PyTorch memory, this process has 23.45 GiB memory in use. Of the allocated memory 22.88 GiB is allocated by PyTorch, and 198.48 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
解决：必须换一个大的显卡。
该模型的生成速度相对较慢，运行时间较长。在 RTX Pro 6000 96GB VRAM 上，峰值显存占用可以达到45GB,1024 px 生成时间需要 120s。根据一些 RTX 4090 用户的反馈，该工作流几乎会占满所有显存。建议显存较小的用户使用 FP8 版本以降低显存占用。或者请用我这篇文章中提到的均衡模式的代码。
报错：Could not import module ‘Qwen2_5_VLForConditionalGeneration’. Are this object’s requirements defined correctly?
解决：很可能是因为 pytorch 的版本和 pytorch vision 的版本不一致，参考这篇博客安装一下：【安装】PyTorch｜查看并调整 Cuda 版本以适应 PyTorch 安装的指南_cuda11.2 能装 11.1 的 pytorch 吗-CSDN 博客
结果异常：输出不是多张 RGBA 图层
检查输入是否已 convert(“RGBA”)；确保使用了正确的 Pipeline（QwenImageLayeredPipeline），且模型目录完整。

四、运行效果

这里选了别人发给我的一张图，原图就不公开了。是一个手账类型的图。

设置 640 像素的时候，效果是这样（马赛克是我自己打的，因为我不想有什么粉丝争端。可以看到，文字背景分离的不是很好，然后图片也有一点模糊）（大概跑了 23 分钟）：

设置 1024 像素的时候，效果是这样，清楚很多，效果也很好（跑了 39 分钟）：

结语

Qwen-Image-Layered 是“图像分层”的扩散模型，用 diffusers 的 QwenImageLayeredPipeline 加载最合适。
“魔搭社区加载本地模型”的关键在于：先把完整模型目录落到本地（包含 model_index.json 等），然后 from_pretrained 指向该目录，并加上 local_files_only=True。
网络可用时，配合镜像与 HF_TOKEN 在线拉取，成功后走缓存；网络不佳或需离线时，直接走本地目录。

如果你在不同显卡/不同版本组合下遇到其他问题，欢迎在评论区反馈。

本账号所有文章均为原创，欢迎转载，请注明文章出处：https://shandianchengzi.blog.csdn.net/article/details/156690977。百度和各类采集站皆不可信，搜索请谨慎鉴别。技术类文章一般都有时效性，本人习惯不定期对自己的博文进行修正和更新，因此请访问出处以查看本文的最新版本。