timm/convnext_tiny.in12k_ft_in1k

image classificationtimmtimmpytorchsafetensorsimage-classificationtransformersdataset:imagenet-1kapache-2.0
245.4K

Model card for convnext_tiny.in12k_ft_in1k

A ConvNeXt image classification model. Pretrained in timm on ImageNet-12k (a 11821 class subset of full ImageNet-22k) and fine-tuned on ImageNet-1k by Ross Wightman.

ImageNet-12k training done on TPUs thanks to support of the TRC program.

Fine-tuning performed on 8x GPU Lambda Labs cloud instances.

Model Details

Model Usage

Image Classification

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('convnext_tiny.in12k_ft_in1k', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Feature Map Extraction

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'convnext_tiny.in12k_ft_in1k',
    pretrained=True,
    features_only=True,
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

for o in output:
    # print shape of each feature map in output
    # e.g.:
    #  torch.Size([1, 96, 56, 56])
    #  torch.Size([1, 192, 28, 28])
    #  torch.Size([1, 384, 14, 14])
    #  torch.Size([1, 768, 7, 7])

    print(o.shape)

Image Embeddings

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'convnext_tiny.in12k_ft_in1k',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 768, 7, 7) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

Model Comparison

Explore the dataset and runtime metrics of this model in timm model results.

All timing numbers from eager model PyTorch 1.13 on RTX 3090 w/ AMP.

modeltop1top5img_sizeparam_countgmacsmactssamples_per_secbatch_size
convnextv2_huge.fcmae_ft_in22k_in1k_51288.84898.742512660.29600.81413.0728.5848
convnextv2_huge.fcmae_ft_in22k_in1k_38488.66898.738384660.29337.96232.3550.5664
convnext_xxlarge.clip_laion2b_soup_ft_in1k88.61298.704256846.47198.09124.45122.45256
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_38488.31298.578384200.13101.11126.74196.84256
convnextv2_large.fcmae_ft_in22k_in1k_38488.19698.532384197.96101.1126.74128.94128
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_32087.96898.47320200.1370.2188.02283.42256
convnext_xlarge.fb_in22k_ft_in1k_38487.7598.556384350.2179.2168.99124.85192
convnextv2_base.fcmae_ft_in22k_in1k_38487.64698.42238488.7245.2184.49209.51256
convnext_large.fb_in22k_ft_in1k_38487.47698.382384197.77101.1126.74194.66256
convnext_large_mlp.clip_laion2b_augreg_ft_in1k87.34498.218256200.1344.9456.33438.08256
convnextv2_large.fcmae_ft_in22k_in1k87.2698.248224197.9634.443.13376.84256
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_38487.13898.21238488.5945.2184.49365.47256
convnext_xlarge.fb_in22k_ft_in1k87.00298.208224350.260.9857.5368.01256
convnext_base.fb_in22k_ft_in1k_38486.79698.26438488.5945.2184.49366.54256
convnextv2_base.fcmae_ft_in22k_in1k86.7498.02222488.7215.3828.75624.23256
convnext_large.fb_in22k_ft_in1k86.63698.028224197.7734.443.13581.43256
convnext_base.clip_laiona_augreg_ft_in1k_38486.50497.9738488.5945.2184.49368.14256
convnext_base.clip_laion2b_augreg_ft_in12k_in1k86.34497.9725688.5920.0937.55816.14256
convnextv2_huge.fcmae_ft_in1k86.25697.75224660.29115.079.07154.72256
convnext_small.in12k_ft_in1k_38486.18297.9238450.2225.5863.37516.19256
convnext_base.clip_laion2b_augreg_ft_in1k86.15497.6825688.5920.0937.55819.86256
convnext_base.fb_in22k_ft_in1k85.82297.86622488.5915.3828.751037.66256
convnext_small.fb_in22k_ft_in1k_38485.77897.88638450.2225.5863.37518.95256
convnextv2_large.fcmae_ft_in1k85.74297.584224197.9634.443.13375.23256
convnext_small.in12k_ft_in1k85.17497.50622450.228.7121.561474.31256
convnext_tiny.in12k_ft_in1k_38485.11897.60838428.5913.1439.48856.76256
convnextv2_tiny.fcmae_ft_in22k_in1k_38485.11297.6338428.6413.1439.48491.32256
convnextv2_base.fcmae_ft_in1k84.87497.0922488.7215.3828.75625.33256
convnext_small.fb_in22k_ft_in1k84.56297.39422450.228.7121.561478.29256
convnext_large.fb_in1k84.28296.892224197.7734.443.13584.28256
convnext_tiny.in12k_ft_in1k84.18697.12422428.594.4713.442433.7256
convnext_tiny.fb_in22k_ft_in1k_38484.08497.1438428.5913.1439.48862.95256
convnextv2_tiny.fcmae_ft_in22k_in1k83.89496.96422428.644.4713.441452.72256
convnext_base.fb_in1k83.8296.74622488.5915.3828.751054.0256
convnextv2_nano.fcmae_ft_in22k_in1k_38483.3796.74238415.627.2224.61801.72256
convnext_small.fb_in1k83.14296.43422450.228.7121.561464.0256
convnextv2_tiny.fcmae_ft_in1k82.9296.28422428.644.4713.441425.62256
convnext_tiny.fb_in22k_ft_in1k82.89896.61622428.594.4713.442480.88256
convnext_nano.in12k_ft_in1k82.28296.34422415.592.468.373926.52256
convnext_tiny_hnf.a2h_in1k82.21695.85222428.594.4713.442529.75256
convnext_tiny.fb_in1k82.06695.85422428.594.4713.442346.26256
convnextv2_nano.fcmae_ft_in22k_in1k82.0396.16622415.622.468.372300.18256
convnextv2_nano.fcmae_ft_in1k81.8395.73822415.622.468.372321.48256
convnext_nano_ols.d1h_in1k80.86695.24622415.652.659.383523.85256
convnext_nano.d1h_in1k80.76895.33422415.592.468.373915.58256
convnextv2_pico.fcmae_ft_in1k80.30495.0722249.071.376.13274.57256
convnext_pico.d1_in1k79.52694.5582249.051.376.15686.88256
convnext_pico_ols.d1_in1k79.52294.6922249.061.436.55422.46256
convnextv2_femto.fcmae_ft_in1k78.48893.982245.230.794.574264.2256
convnext_femto_ols.d1_in1k77.8693.832245.230.824.876910.6256
convnext_femto.d1_in1k77.45493.682245.220.794.577189.92256
convnextv2_atto.fcmae_ft_in1k76.66493.0442243.710.553.814728.91256
convnext_atto_ols.a2_in1k75.8892.8462243.70.584.117963.16256
convnext_atto.d2_in1k75.66492.92243.70.553.818439.22256

Citation

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}
@article{liu2022convnet,
  author  = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
  title   = {A ConvNet for the 2020s},
  journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year    = {2022},
}
DEPLOY IN 60 SECONDS

Run convnext_tiny.in12k_ft_in1k on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.