ibm-research/PowerMoE-3b

Name: ibm-research/PowerMoE-3b
Rating: 5 (14 reviews)
Author: ibm-research

text generationtransformerstransformerssafetensorsgranitemoetext-generationarxiv:2408.13359license:apache-2.0apache-2.0

Runnable with vLLM

14

285.0K

drop device_map if running on CPU

model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval()

prompt = "Write a code to find the maximum value in a list of numbers."

input_tokens = tokenizer(prompt, return_tensors="pt")

for i in input_tokens: input_tokens[i] = input_tokens[i].to(device)

output = model.generate(**input_tokens, max_new_tokens=100)

output = tokenizer.batch_decode(output)

for i in output: print(i)

Run this model on powerful GPU infrastructure. Deploy in 60 seconds.

Pay per second

H100, A100, RTX GPUs

Instant deployment

DEPLOY IN 60 SECONDS

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.