Intel IPEX的策略是优化现有优化器,而不是提供新的优化器类

service
[service]
帖子作者
离线
管理员

2026-01-31 08:11 #1116 by service

新帖

=== Intel IPEX 优化器路径诊断 ===
操作系统: win32
Python 版本: 3.11.14 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 18:30:03) [MSC v.1929 64 bit (AMD64)]
Torch 版本: 2.6.0+xpu
Intel IPEX 版本: 2.6.10+xpu

=== 检查 IPEX 模块结构 ===
1. ipex 主要模块:
- FP32MathMode: <class 'enum.EnumType'>
- WarningType: <class 'enum.EnumType'>
- base_py_dll_path: <class 'str'>
- builtins: <class 'module'>
- cmake_prefix_path: <class 'str'>
- compatible_mode: <class 'function'>
- cpu: <class 'module'>
- ctypes: <class 'module'>
- disable_auto_channels_last: <class 'function'>
- distributed: <class 'module'>
- dll: <class 'str'>
- dll_path: <class 'str'>
- dll_paths: <class 'list'>
- dlls: <class 'list'>
- enable_auto_channels_last: <class 'function'>
- frontend: <class 'module'>
- fx: <class 'module'>
- get_fp32_math_mode: <class 'function'>
- glob: <class 'module'>
- has_cpu: <class 'function'>
- has_xpu: <class 'function'>
- intel_extension_for_pytorch: <class 'module'>
- ipex_version: <class 'str'>
- is_loaded: <class 'bool'>
- jit: <class 'module'>
- kernel32: <class 'ctypes.WinDLL'>
- last_error: <class 'int'>
- llm: <class 'module'>
- logger: <class 'intel_extension_for_pytorch.utils._logger._Logger'>
- matches: <class 're.Match'>
- nn: <class 'module'>
- optim: <class 'module'>
- optimize: <class 'function'>
- optimize_transformers: <class 'function'>
- os: <class 'module'>
- path_patched: <class 'bool'>
- pfiles_path: <class 'str'>
- platform: <class 'module'>
- prev_error_mode: <class 'int'>
- proxy_compute_eng: <class 'type'>
- proxy_log_level: <class 'type'>
- proxy_math_mode: <class 'type'>
- py_dll_path: <class 'str'>
- quantization: <class 'module'>
- re: <class 'module'>
- res: <class 'int'>
- set_fp32_math_mode: <class 'function'>
- sys: <class 'module'>
- th_dll_path: <class 'str'>
- torch_version: <class 'str'>
- transformers: <class 'module'>
- utils: <class 'module'>
- with_load_library_flags: <class 'bool'>
- xpu: <class 'module'>

2. 检查优化器相关模块:
✓ ipex.optim 存在
属性示例:

3. 检查具体的优化器类:

4. 检查 torch.xpu 优化器支持:
✓ torch.xpu.optim 存在
包含:

5. 尝试创建优化器测试:
✓ XPU 可用，在 XPU 上测试
✓ 标准 torch.optim.AdamW: 创建成功
优化器类型: <class 'torch.optim.adamw.AdamW'>
✗ IPEX 优化器 (尝试 ipex.optim): 不可用
✗ IPEX 优化器 (尝试 ipex.optimizers): 不可用

6. 检查 IPEX 优化函数:
✓ ipex.optimize 存在
✓ ipex.optimize 应用成功

=== 诊断完成 ===

关键点：Intel IPEX的策略是优化现有优化器，而不是提供新的优化器类。所以正确的做法是先用torch创建优化器，再用ipex优化它。🚀 优化器创建流程总结根据诊断结果，正确的XPU优化器创建流程是：

使用标准torch.optim.AdamW创建优化器
使用ipex.optimize()对模型和优化器进行优化
返回优化后的优化器

这就是为什么之前会失败：代码试图调用不存在的ipex.optim.AdamW，但实际上IPEX不提供单独的AdamW类，而是通过ipex.optimize()函数来优化现有的优化器。

请登录或注册一个帐号参加讨论

service
[service]
帖子作者
离线
管理员

2026-01-31 08:13 #1117 by service

Replied by service on topic Intel IPEX的策略是优化现有优化器,而不是提供新的优化器类

完整修复的trainer_utils.py (llama factory)

Code:

def _create_xpu_optimizer(
    model: "PreTrainedModel",
    training_args: "TrainingArguments",
    finetuning_args: "FinetuningArguments",
) -> Optional["torch.optim.Optimizer"]:
    """创建 XPU 优化器 - 修复版本（基于诊断结果）"""
    if not is_xpu_available():
        return None
    
    try:
        # 检查并导入 IPEX
        try:
            import intel_extension_for_pytorch as ipex
            ipex_available = True
        except ImportError as e:
            logger.warning_rank0(f"无法导入 Intel IPEX: {e}")
            ipex_available = False
        
        # 获取可训练参数
        decay_params, nodecay_params = [], []
        decay_param_names = _get_decay_parameter_names(model)
        for name, param in model.named_parameters():
            if param.requires_grad:
                if name in decay_param_names:
                    decay_params.append(param)
                else:
                    nodecay_params.append(param)
        
        # 创建参数组
        param_groups = [
            dict(params=nodecay_params, weight_decay=0.0),
            dict(params=decay_params, weight_decay=training_args.weight_decay),
        ]
        
        # 确定数据类型
        if training_args.fp16:
            dtype = torch.float16
        elif training_args.bf16:
            dtype = torch.bfloat16
        else:
            dtype = torch.float32
        
        # Windows XPU 特殊处理
        if torch.xpu.is_available() and hasattr(torch.xpu, 'device_count'):
            logger.info_rank0(f"Windows XPU 环境：使用 XPU 设备，数据类型 {dtype}")
        
        # 创建优化器
        if training_args.optim == "adamw_torch":
            # 使用标准 torch AdamW
            optimizer = torch.optim.AdamW(
                param_groups,
                lr=training_args.learning_rate,
                betas=(training_args.adam_beta1, training_args.adam_beta2),
                eps=training_args.adam_epsilon,
                weight_decay=training_args.weight_decay,
            )
            
            # 如果 IPEX 可用，应用优化
            if ipex_available:
                try:
                    # 应用 IPEX 优化
                    model, optimizer = ipex.optimize(
                        model,
                        optimizer=optimizer,
                        dtype=dtype,
                        level='O1',  # 基本优化级别
                        auto_kernel_selection=True
                    )
                    logger.info_rank0("XPU 优化器：使用 torch.optim.AdamW + ipex.optimize 优化")
                except Exception as ipex_err:
                    logger.warning_rank0(f"ipex.optimize 失败，使用标准优化器: {ipex_err}")
                    # 回退到标准优化器
            else:
                logger.info_rank0("XPU 优化器：使用标准 torch.optim.AdamW (IPEX 不可用)")
            
            return optimizer
            
        else:
            # 其他优化器
            optim_class, optim_kwargs = Trainer.get_optimizer_cls_and_kwargs(training_args)
            optimizer = optim_class(param_groups, **optim_kwargs)
            
            # 如果 IPEX 可用，尝试优化
            if ipex_available:
                try:
                    model, optimizer = ipex.optimize(
                        model,
                        optimizer=optimizer,
                        dtype=dtype
                    )
                    logger.info_rank0(f"XPU 优化器：使用 {training_args.optim} + ipex.optimize 优化")
                except:
                    logger.info_rank0(f"XPU 优化器：使用标准 {training_args.optim} (IPEX 优化失败)")
            
            return optimizer
            
    except Exception as e:
        logger.warning_rank0(f"创建 XPU 优化器失败: {e}")
        # 回退到标准优化器
        try:
            return torch.optim.AdamW(
                model.parameters(),
                lr=training_args.learning_rate,
                betas=(training_args.adam_beta1, training_args.adam_beta2),
                eps=training_args.adam_epsilon,
                weight_decay=training_args.weight_decay,
            )
        except:
            return None

请登录或注册一个帐号参加讨论

æ ¸å¿ƒï¼š Kunena 论坛

FaLang translation system by Faboba

Intel IPEX的策略是优化现有优化器,而不是提供新的优化器类

微信