onnx的模型加载,需要用到onnx runtime。 推理代码: import torch import onnxruntime as rt from transformers import LlamaTokenizer def generate_prompt(text): return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request. ### …