timm/mobilenetv4_conv_small_050.e3000_r224_in1k

image classificationtimmtimmpytorchsafetensorsimage-classificationtransformersdataset:imagenet-1kapache-2.0
245.3K

Model card for mobilenetv4_conv_small_050.e3000_r224_in1k

A MobileNet-V4 image classification model. Trained on ImageNet-1k by Ross Wightman.

Trained with timm scripts using hyper-parameters inspired by the MobileNet-V4 paper with timm enhancements.

NOTE: So far, these are the only known MNV4 weights. Official weights for Tensorflow models are unreleased.

Model Details

Model Usage

Image Classification

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('mobilenetv4_conv_small_050.e3000_r224_in1k', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Feature Map Extraction

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'mobilenetv4_conv_small_050.e3000_r224_in1k',
    pretrained=True,
    features_only=True,
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

for o in output:
    # print shape of each feature map in output
    # e.g.:
    #  torch.Size([1, 32, 112, 112])
    #  torch.Size([1, 16, 56, 56])
    #  torch.Size([1, 32, 28, 28])
    #  torch.Size([1, 48, 14, 14])
    #  torch.Size([1, 480, 7, 7])

    print(o.shape)

Image Embeddings

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'mobilenetv4_conv_small_050.e3000_r224_in1k',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 480, 7, 7) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

Model Comparison

By Top-1

modeltop1top5param_countimg_size
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k84.9997.29432.59544
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k84.77297.34432.59480
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k84.6497.11432.59448
mobilenetv4_hybrid_large.ix_e600_r384_in1k84.35696.89237.76448
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k84.31497.10232.59384
mobilenetv4_hybrid_large.e600_r384_in1k84.26696.93637.76448
mobilenetv4_hybrid_large.ix_e600_r384_in1k83.99096.70237.76384
mobilenetv4_conv_aa_large.e600_r384_in1k83.82496.73432.59480
mobilenetv4_hybrid_large.e600_r384_in1k83.80096.77037.76384
mobilenetv4_hybrid_medium.ix_e550_r384_in1k83.39496.76011.07448
mobilenetv4_conv_large.e600_r384_in1k83.39296.62232.59448
mobilenetv4_conv_aa_large.e600_r384_in1k83.24496.39232.59384
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k82.9996.6711.07320
mobilenetv4_hybrid_medium.ix_e550_r384_in1k82.96896.47411.07384
mobilenetv4_conv_large.e600_r384_in1k82.95296.26632.59384
mobilenetv4_conv_large.e500_r256_in1k82.67496.3132.59320
mobilenetv4_hybrid_medium.ix_e550_r256_in1k82.49296.27811.07320
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k82.36496.25611.07256
mobilenetv4_conv_large.e500_r256_in1k81.86295.6932.59256
resnet50d.ra4_e3600_r224_in1k81.83895.92225.58288
mobilenetv3_large_150d.ra4_e3600_r256_in1k81.80695.914.62320
mobilenetv4_hybrid_medium.ix_e550_r256_in1k81.44695.70411.07256
efficientnet_b1.ra4_e3600_r240_in1k81.44095.7007.79288
mobilenetv4_hybrid_medium.e500_r224_in1k81.27695.74211.07256
resnet50d.ra4_e3600_r224_in1k80.95295.38425.58224
mobilenetv3_large_150d.ra4_e3600_r256_in1k80.94495.44814.62256
mobilenetv4_conv_medium.e500_r256_in1k80.85895.7689.72320
mobilenet_edgetpu_v2_m.ra4_e3600_r224_in1k80.68095.4428.46256
mobilenetv4_hybrid_medium.e500_r224_in1k80.44295.3811.07224
efficientnet_b1.ra4_e3600_r240_in1k80.40695.1527.79240
mobilenetv4_conv_blur_medium.e500_r224_in1k80.14295.2989.72256
mobilenet_edgetpu_v2_m.ra4_e3600_r224_in1k80.13095.0028.46224
mobilenetv4_conv_medium.e500_r256_in1k79.92895.1849.72256
mobilenetv4_conv_medium.e500_r224_in1k79.80895.1869.72256
mobilenetv4_conv_blur_medium.e500_r224_in1k79.43894.9329.72224
efficientnet_b0.ra4_e3600_r224_in1k79.36494.7545.29256
mobilenetv4_conv_medium.e500_r224_in1k79.09494.779.72224
efficientnet_b0.ra4_e3600_r224_in1k78.58494.3385.29224
mobilenetv1_125.ra4_e3600_r224_in1k77.60093.8046.27256
mobilenetv3_large_100.ra4_e3600_r224_in1k77.16493.3365.48256
mobilenetv1_125.ra4_e3600_r224_in1k76.92493.2346.27224
mobilenetv1_100h.ra4_e3600_r224_in1k76.59693.2725.28256
mobilenetv3_large_100.ra4_e3600_r224_in1k76.31092.8465.48224
mobilenetv1_100.ra4_e3600_r224_in1k76.09493.0044.23256
mobilenetv1_100h.ra4_e3600_r224_in1k75.66292.5045.28224
mobilenetv1_100.ra4_e3600_r224_in1k75.38292.3124.23224
mobilenetv4_conv_small.e2400_r224_in1k74.61692.0723.77256
mobilenetv4_conv_small.e1200_r224_in1k74.29292.1163.77256
mobilenetv4_conv_small.e2400_r224_in1k73.75691.4223.77224
mobilenetv4_conv_small.e1200_r224_in1k73.45491.343.77224
mobilenetv4_conv_small_050.e3000_r224_in1k65.81086.4242.24256
mobilenetv4_conv_small_050.e3000_r224_in1k64.76285.5142.24224

Citation

@article{qin2024mobilenetv4,
  title={MobileNetV4-Universal Models for the Mobile Ecosystem},
  author={Qin, Danfeng and Leichner, Chas and Delakis, Manolis and Fornoni, Marco and Luo, Shixin and Yang, Fan and Wang, Weijun and Banbury, Colby and Ye, Chengxi and Akin, Berkin and others},
  journal={arXiv preprint arXiv:2404.10518},
  year={2024}
}
@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}
DEPLOY IN 60 SECONDS

Run mobilenetv4_conv_small_050.e3000_r224_in1k on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.