SGLang 无法正常部署

article/2025/7/30 17:16:25

1. 考虑 torch 版本  与 SGLang 不兼容:

->  Python环境中的包被更新(如torch, deepspeed, transformers等)导致不兼容 -

参考:Exception: Capture CUDA graph failed: CUDA error: out of memory-CSDN博客

# 虽然锁定了 sglang 版本,但未锁定其依赖项(如 PyTorch、transformers 等)
pip install "sglang[all]>=0.4.6.post5" 

2. 添加外部网络参数,卡死

avail_mem=0.00 GB

解决方法:添加 --disable-cuda-graph

参考:

[Bug] ROCm6.1.2 sglang0.3.3 cuda graph coredump · Issue #1683 · sgl-project/sglang · GitHub[Bug] Exception: Capture cuda graph failed: Could not run 'sgl_kernel::rmsnorm' with arguments from the 'CUDA' backend. · Issue #6322 · sgl-project/sglang · GitHub

python3 -m sglang.launch_server --model ./deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --tp 1 --disable-cuda-graph
[Bug] OOM for concurrent long requests · Issue #1030 · sgl-project/sglang · GitHub
实际使用出错

错误信息:
(myenv) ubun22:/mnt/c/Users/lms/Desktop/LLaVA-main/myserver$ python3 -m sglang.launch_server  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --tp 1 --host 0.0.0.0 --port 30000
[2025-06-01 00:01:32,639] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.def forward(ctx, input, weight, bias=None):
/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.def backward(ctx, grad_output):
[2025-06-01 00:01:55] server_args=ServerArgs(model_path='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', tokenizer_path='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization=None, quantization_param_path=None, context_length=None, device='cuda', served_model_name='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', chat_template=None, completion_template=None, is_embedding=False, enable_multimodal=None, revision=None, host='0.0.0.0', port=30000, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='fcfs', schedule_conservativeness=1.0, cpu_offload_gb=0, page_size=1, tp_size=1, pp_size=1, max_micro_batch_size=None, stream_interval=1, stream_output=False, random_seed=314452693, constrained_json_whitespace_pattern=None, watchdog_timeout=300, dist_timeout=None, download_dir=None, base_gpu_id=0, gpu_id_step=1, log_level='info', log_level_http=None, log_requests=False, log_requests_level=0, show_time_cost=False, enable_metrics=False, bucket_time_to_first_token=None, bucket_e2e_request_latency=None, bucket_inter_token_latency=None, collect_tokens_histogram=False, decode_log_interval=40, enable_request_time_stats_logging=False, kv_events_config=None, api_key=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', preferred_sampling_params=None, lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend=None, sampling_backend='flashinfer', grammar_backend='xgrammar', speculative_algorithm=None, speculative_draft_model_path=None, speculative_num_steps=None, speculative_eagle_topk=None, speculative_num_draft_tokens=None, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, enable_nccl_nvls=False, enable_tokenizer_batch_encode=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_dp_lm_head=False, enable_ep_moe=False, enable_deepep_moe=False, deepep_mode='auto', ep_num_redundant_experts=0, ep_dispatch_algorithm=None, init_expert_location='trivial', enable_eplb=False, eplb_rebalance_num_iterations=1000, expert_distribution_recorder_mode=None, expert_distribution_recorder_buffer_size=None, enable_expert_distribution_metrics=False, deepep_config=None, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False, hicache_ratio=2.0, hicache_size=0, hicache_write_policy='write_through_selective', flashinfer_mla_disable_ragged=False, warmups=None, moe_dense_tp_size=None, n_share_experts_fusion=0, disable_chunked_prefix_cache=False, disable_fast_image_processor=False, mm_attention_backend=None, debug_tensor_dump_output_folder=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False, disaggregation_mode='null', disaggregation_bootstrap_port=8998, disaggregation_transfer_backend='mooncake', disaggregation_ib_device=None, pdlb_url=None)
[2025-06-01 00:03:16,258] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-06-01 00:03:16,258] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.def forward(ctx, input, weight, bias=None):
/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.def backward(ctx, grad_output):
/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.def forward(ctx, input, weight, bias=None):
/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.def backward(ctx, grad_output):
[2025-06-01 00:03:26] Attention backend not set. Use flashinfer backend by default.
[2025-06-01 00:03:26] Init torch distributed begin.
[2025-06-01 00:03:27] Init torch distributed ends. mem usage=0.00 GB
[2025-06-01 00:03:27] init_expert_location from trivial
[2025-06-01 00:04:01] Ignore import error when loading sglang.srt.models.deepseek_janus_pro. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:03] Ignore import error when loading sglang.srt.models.gemma3_causal. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:03] Ignore import error when loading sglang.srt.models.gemma3_mm. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:04] Ignore import error when loading sglang.srt.models.internvl. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:05] Ignore import error when loading sglang.srt.models.kimi_vl. cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:05] Ignore import error when loading sglang.srt.models.kimi_vl_moonvit. cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:06] Ignore import error when loading sglang.srt.models.llava. Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:07] Ignore import error when loading sglang.srt.models.llavavid. Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:07] Ignore import error when loading sglang.srt.models.minicpmo. Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:08] Ignore import error when loading sglang.srt.models.mistral. cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:09] Ignore import error when loading sglang.srt.models.mllama. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:09] Ignore import error when loading sglang.srt.models.mllama4. Failed to import transformers.models.llama4.modeling_llama4 because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:10] Ignore import error when loading sglang.srt.models.pixtral. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:11] Ignore import error when loading sglang.srt.models.qwen2_5_vl. cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:11] Ignore import error when loading sglang.srt.models.yivl. Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/mnt/c/Users/lms/Desktop/LLaVA-main/myenv/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 00:04:15] Load weight begin. avail mem=6.92 GB
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [06:04<00:00, 364.60s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [06:04<00:00, 364.60s/it][2025-06-01 00:10:22] Load weight end. type=Qwen2ForCausalLM, dtype=torch.bfloat16, avail mem=3.41 GB, mem usage=3.51 GB.
[2025-06-01 00:10:23] KV Cache is allocated. #tokens: 96589, K size: 1.29 GB, V size: 1.29 GB
[2025-06-01 00:10:23] Memory pool end. avail mem=0.00 GB
2025-06-01 00:10:26,469 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[2025-06-01 00:10:26] Capture cuda graph begin. This can take up to several minutes. avail mem=0.00 GB
[2025-06-01 00:10:27] Capture cuda graph bs [1, 2, 4, 8]
Capturing batches (avail_mem=0.00 GB):   0%|                                                                                 | 0/4 [00:00<?, ?it/s]2025-06-01 00:10:35,809 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[2025-06-01 00:10:50] Child process unexpectedly failed with an exit code 9. pid=114785
成功部署 
ubun22:/mnt/c/Users/lms/Desktop/LLaVA-main/myserver$ python3 -m sglang.launch_server --model ./deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --tp 1 --disable-cuda-graph
INFO 06-01 01:24:55 [__init__.py:243] Automatically detected platform cuda.
[2025-06-01 01:24:57,256] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/ubun22/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.def forward(ctx, input, weight, bias=None):
/home/ubun22/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.def backward(ctx, grad_output):
INFO 06-01 01:24:57 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:24:57 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:24:57 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:24:58 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:24:58 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:24:58 [__init__.py:243] Automatically detected platform cuda.
[2025-06-01 01:25:01] server_args=ServerArgs(model_path='./deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', tokenizer_path='./deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization=None, quantization_param_path=None, context_length=None, device='cuda', served_model_name='./deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', chat_template=None, completion_template=None, is_embedding=False, enable_multimodal=None, revision=None, host='127.0.0.1', port=30000, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='fcfs', schedule_conservativeness=1.0, cpu_offload_gb=0, page_size=1, tp_size=1, pp_size=1, max_micro_batch_size=None, stream_interval=1, stream_output=False, random_seed=201101802, constrained_json_whitespace_pattern=None, watchdog_timeout=300, dist_timeout=None, download_dir=None, base_gpu_id=0, gpu_id_step=1, log_level='info', log_level_http=None, log_requests=False, log_requests_level=0, show_time_cost=False, enable_metrics=False, bucket_time_to_first_token=None, bucket_e2e_request_latency=None, bucket_inter_token_latency=None, collect_tokens_histogram=False, decode_log_interval=40, enable_request_time_stats_logging=False, kv_events_config=None, api_key=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', preferred_sampling_params=None, lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend=None, sampling_backend='flashinfer', grammar_backend='xgrammar', speculative_algorithm=None, speculative_draft_model_path=None, speculative_num_steps=None, speculative_eagle_topk=None, speculative_num_draft_tokens=None, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_cuda_graph=True, disable_cuda_graph_padding=False, enable_nccl_nvls=False, enable_tokenizer_batch_encode=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_dp_lm_head=False, enable_ep_moe=False, enable_deepep_moe=False, deepep_mode='auto', ep_num_redundant_experts=0, ep_dispatch_algorithm=None, init_expert_location='trivial', enable_eplb=False, eplb_rebalance_num_iterations=1000, expert_distribution_recorder_mode=None, expert_distribution_recorder_buffer_size=None, enable_expert_distribution_metrics=False, deepep_config=None, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False, hicache_ratio=2.0, hicache_size=0, hicache_write_policy='write_through_selective', flashinfer_mla_disable_ragged=False, warmups=None, moe_dense_tp_size=None, n_share_experts_fusion=0, disable_chunked_prefix_cache=False, disable_fast_image_processor=False, mm_attention_backend=None, debug_tensor_dump_output_folder=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False, disaggregation_mode='null', disaggregation_bootstrap_port=8998, disaggregation_transfer_backend='mooncake', disaggregation_ib_device=None, pdlb_url=None)
INFO 06-01 01:25:06 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:06 [__init__.py:243] Automatically detected platform cuda.
[2025-06-01 01:25:07,689] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-06-01 01:25:07,689] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/ubun22/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.def forward(ctx, input, weight, bias=None):
/home/ubun22/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.def backward(ctx, grad_output):
/home/ubun22/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.def forward(ctx, input, weight, bias=None):
/home/ubun22/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.def backward(ctx, grad_output):
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
INFO 06-01 01:25:08 [__init__.py:243] Automatically detected platform cuda.
[2025-06-01 01:25:10] Attention backend not set. Use flashinfer backend by default.
[2025-06-01 01:25:10] Init torch distributed begin.
[2025-06-01 01:25:10] Init torch distributed ends. mem usage=0.00 GB
[2025-06-01 01:25:10] init_expert_location from trivial
^[[A[2025-06-01 01:25:14] Ignore import error when loading sglang.srt.models.deepseek_janus_pro. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.gemma3_causal. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.gemma3_mm. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.internvl. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.kimi_vl. cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.kimi_vl_moonvit. cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.llava. Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.llavavid. Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.minicpmo. Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.mistral. cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.mllama. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:15] Ignore import error when loading sglang.srt.models.mllama4. Failed to import transformers.models.llama4.modeling_llama4 because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:16] Ignore import error when loading sglang.srt.models.pixtral. Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:16] Ignore import error when loading sglang.srt.models.qwen2_5_vl. cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:16] Ignore import error when loading sglang.srt.models.yivl. Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/home/ubun22/.local/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[2025-06-01 01:25:16] Load weight begin. avail mem=6.92 GB
INFO 06-01 01:25:16 [__init__.py:243] Automatically detected platform cuda.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [03:19<00:00, 199.58s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [03:19<00:00, 199.58s/it]INFO 06-01 01:28:36 [__init__.py:243] Automatically detected platform cuda.
[2025-06-01 01:28:36] Load weight end. type=Qwen2ForCausalLM, dtype=torch.bfloat16, avail mem=3.41 GB, mem usage=3.51 GB.
[2025-06-01 01:28:37] KV Cache is allocated. #tokens: 96589, K size: 1.29 GB, V size: 1.29 GB
[2025-06-01 01:28:37] Memory pool end. avail mem=0.00 GB
2025-06-01 01:28:37,689 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[2025-06-01 01:28:38] max_total_num_tokens=96589, chunked_prefill_size=2048, max_prefill_tokens=16384, max_running_requests=2049, context_len=131072
[2025-06-01 01:28:39] INFO:     Started server process [26241]
[2025-06-01 01:28:39] INFO:     Waiting for application startup.
[2025-06-01 01:28:39] INFO:     Application startup complete.
[2025-06-01 01:28:39] INFO:     Uvicorn running on http://127.0.0.1:30000 (Press CTRL+C to quit)
[2025-06-01 01:28:40] INFO:     127.0.0.1:54268 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-06-01 01:28:40] Prefill batch. #new-seq: 1, #new-token: 7, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0
2025-06-01 01:28:44,336 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False


http://www.hkcw.cn/article/tFqheKarah.shtml

相关文章

【拓扑排序】P6560 [SBCOI2020] 时光的流逝|普及+

本文涉及知识点 C图论 拓扑排序 P6560 [SBCOI2020] 时光的流逝 题目背景 时间一分一秒的过着&#xff0c;伴随着雪一同消融在了这个冬天&#xff0c; 或许&#xff0c;要是时光能停留在这一刻&#xff0c;该有多好啊。 … “这是…我在这个小镇的最后一个冬天了吧。” “嗯…

第13讲、Odoo 18 配置文件(odoo.conf)详细解读

1. 概述 Odoo 配置文件&#xff08;odoo.conf&#xff09;是管理 Odoo 服务器行为的核心工具&#xff0c;涵盖了网络、安全、数据库、性能等多方面的关键参数。本文档系统梳理 Odoo 18 配置文件的主要参数&#xff0c;结合实际应用场景&#xff0c;提供权威的配置建议与最佳实…

“大巴黎”欧冠夺冠引发法国多地骚乱

法甲球队巴黎圣日耳曼5月31日晚赢得欧冠联赛冠军奖杯,法国多地球迷彻夜庆祝。据法国内政部消息,狂欢夜有559人因滋事被捕,并发生两起命案。据法国媒体援引法国内政部等信源,在巴黎,一名20岁男子骑摩托车与汽车相撞,导致重伤不治身亡。在法国西南部城市达克斯,一名17岁未…

马斯克最新发声:不想为美政府所做的一切承担责任

当地时间6月1日,美国企业家埃隆马斯克当日在接受美国哥伦比亚广播公司的采访时表示,他不想公开反对美国政府,但也不想为政府所做的一切承担责任。马斯克在采访中表示,他所领导的“政府效率部”成了一切的替罪羊,所有的裁员无论是真是假都被怪罪到了“政府效率部”的头上。…

c++ QicsTable使用实例

效果图&#xff1a; #include <QicsTable.h> #include <QicsDataModelDefault.h> #include <QVBoxLayout> Demo1::Demo1(QWidget *parent) : QWidget(parent) { ui.setupUi(this); const int numRows 10; const int numCols 5; // create th…

俄乌第二轮会谈前夕 飞出数只 “黑天鹅”

俄乌第二轮谈判将于6月2日举行。根据土耳其外交部在1日晚间发布消息,本次会谈将于当地时间2日13时在伊斯坦布尔的契拉昂宫举行。截至目前,俄乌双方均就这次谈判各自的立场和方向做出了一定说明。但就在本次俄乌会谈前夕,多起“黑天鹅”事件却接连发生。△伊斯坦布尔的契拉昂…

内存马mama

一、Tomcat三种内存马 首先了解下tomcat的三种内存马的原理和简单实用 filter型内存马 Tomcat filter注册流程 FilterDefs&#xff1a;存放FilterDef的数组 &#xff0c;FilterDef 中存储着我们过滤器名&#xff0c;过滤器实例&#xff0c;作用 url 等基本信息 FilterConf…

PySide6 GUI 学习笔记——常用类及控件使用方法(地址类QUrl)

文章目录 地址类QUrl主要功能URL 格式介绍常见 scheme&#xff08;协议&#xff09;类型QUrl 类常用方法常用方法示例典型应用场景 地址类QUrl QUrl 是 PySide6.QtCore 模块中的一个类&#xff0c;用于处理和操作 URL&#xff08;统一资源定位符&#xff09;。它可以解析、构建…

DAY40 训练和测试

昨天我们介绍了图像数据的格式以及模型定义的过程&#xff0c;发现和之前结构化数据的略有不同&#xff0c;主要差异体现在2处 模型定义的时候需要展平图像由于数据过大&#xff0c;需要将数据集进行分批次处理&#xff0c;这往往涉及到了dataset和dataloader来规范代码的组织…

彻底理解Spring三级缓存机制

文章目录 前言一、Spring解决循环依赖时&#xff0c;为什么要使用三级缓存&#xff1f; 前言 Spring解决循环依赖的手段&#xff0c;是通过三级缓存&#xff1a; singletonObjects&#xff1a;存放所有生命周期完整的单例对象。&#xff08;一级缓存&#xff09;earlySingleto…

Diffusion Planner:扩散模型重塑自动驾驶路径规划(ICLR‘25)

1. 概述 2025年2月14日&#xff0c;清华大学AIR智能产业研究院联合毫末智行、中科院自动化所和香港中文大学团队&#xff0c;在ICLR 2025会议上发布了Diffusion Planner——一种创新性的基于Diffusion Transformer的自动驾驶规划模型架构。该系统联合建模周车运动预测与自车行…

财管5-投资项目的评价指标现金流量构成

一、投资项目评价指标 独立项目评价指标包括净现值&#xff08;NPV&#xff09;、现值指数&#xff08;PI&#xff09;、内含报酬率&#xff08;IRR&#xff09;、回收期&#xff08;PP&#xff09;、会计报酬率&#xff1b; 1、净现值 计算NPV 未来现金流量的现值 - 原始投…

【Bluedroid】蓝牙启动之 l2c_init 源码解析

蓝牙 L2CAP&#xff08;逻辑链路控制和适配协议&#xff09;层是蓝牙协议栈的核心传输层&#xff0c;负责为上层协议&#xff08;如 ATT、SMP、GATT&#xff09;提供逻辑通道、服务路由和流量控制等关键功能。本文围绕 L2CAP 层的五大核心数据结构&#xff08;全局控制块tL2C_C…

NACOS 配置中心--数据隔离

1.实现效果 名称空间 -- 区分 多套环境 group 分组 -- 区分多种微服务 data id 数据集 -- 区分多种配置 2.新建命名空间 3.创建 group 和 data id 同逻辑 创建 test 和prod 环境配置 5.yml文件配置进行映射 server:port: 8000 spring:config:import: # 映射data id 和gro…

rtpmixsound:实现音频混音攻击!全参数详细教程!Kali Linux教程!

简介 一种将预先录制的音频与指定目标音频流中的音频&#xff08;即 RTP&#xff09;实时混合的工具。 一款用于将预先录制的音频与指定目标音频流中的音频&#xff08;即 RTP&#xff09;实时混合的工具。该工具创建于 2006 年 8 月至 9 月之间。该工具名为 rtpmixsound。它…

【java面试】redis篇

一、适用场景 问&#xff1a;你在项目中&#xff0c;都用到了redis,你在最近的哪些场景中使用了redis&#xff1f; 答&#xff1a;&#xff08;结合实际项目情况&#xff09; &#xff08;一&#xff09;缓存 查询流程&#xff1a; 请求路径&#xff1a; 一个get请求&#x…

行业分析---小米汽车2025第一季度财报

1 背景 最近几年是新能源汽车的淘汰赛&#xff0c;前短时间比亚迪再次开始了降价&#xff0c;导致一片上市车企的股价大跌&#xff0c;足见车圈现在的敏感度。因此笔者会一直跟踪新势力车企的财报状况&#xff0c;对之前财报分析感兴趣的读者朋友可以参考以下博客&#xff1a;…

TensorFlow深度学习实战(19)——受限玻尔兹曼机

TensorFlow深度学习实战&#xff08;19&#xff09;——受限玻尔兹曼机 0. 前言1. 受限玻尔兹曼机1.1 受限玻尔兹曼机架构1.2 受限玻尔兹曼机的数学原理 2. 使用受限玻尔兹曼机重建图像3. 深度信念网络小结系列链接 0. 前言 受限玻尔兹曼机 (Restricted Boltzmann Machine, RB…

设计模式——桥接设计模式(结构型)

摘要 桥接设计模式是一种结构型设计模式&#xff0c;用于将抽象与实现解耦&#xff0c;使二者可以独立变化。它通过将一个类拆分为“抽象”和“实现”两部分&#xff0c;并通过桥接关系组合&#xff0c;避免了类继承层次结构过于庞大。桥接模式包含抽象类、扩充抽象类、实现类…

java反射

简介 获取Class 误区 解释一下 “类” 和 “Class对象” 的区别&#xff0c;以及为什么每个类都有关联的 Class 对象&#xff1a; &#x1f9e9; 核心概念&#xff1a;类 vs Class对象 想象你有一本《汽车使用说明书》&#xff1a; 类 这本说明书本身&#xff08;纸上的文…