A SegFormer-B4 model fine-tuned for human parsing with 18 semantic classes, optimized for fashion and virtual try-on applications.
This model segments human images into 18 semantic categories including body parts (face, hair, arms, hands, legs, feet, torso), clothing items (top, dress, skirt, pants, belt, scarf), and accessories (bag, hat, glasses, jewelry).
from transformers import pipeline
pipe = pipeline("image-segmentation", model="fashn-ai/fashn-human-parser")
result = pipe("image.jpg")
# result is a list of dicts with 'label', 'score', 'mask' for each detected class
The pipeline automatically manages GPU/CPU and returns per-class masks at the original image resolution.
from transformers import SegformerForSemanticSegmentation, SegformerImageProcessor
from PIL import Image
import torch
# Load model and processor
processor = SegformerImageProcessor.from_pretrained("fashn-ai/fashn-human-parser")
model = SegformerForSemanticSegmentation.from_pretrained("fashn-ai/fashn-human-parser")
# Load and preprocess image
image = Image.open("path/to/image.jpg")
inputs = processor(images=image, return_tensors="pt")
# Inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits # (1, 18, H/4, W/4)
# Upsample to original size and get predictions
upsampled = torch.nn.functional.interpolate(
logits, size=image.size[::-1], mode="bilinear", align_corners=False
)
predictions = upsampled.argmax(dim=1).squeeze().numpy()
For maximum accuracy, use our Python package which implements the exact preprocessing used during training:
pip install fashn-human-parser
from fashn_human_parser import FashnHumanParser
parser = FashnHumanParser() # auto-detects GPU
segmentation = parser.predict("image.jpg")
# segmentation is a numpy array of shape (H, W) with class IDs 0-17
The package uses cv2.INTER_AREA for resizing (matching training), while the HuggingFace pipeline uses PIL LANCZOS.
| ID | Label |
|---|---|
| 0 | background |
| 1 | face |
| 2 | hair |
| 3 | top |
| 4 | dress |
| 5 | skirt |
| 6 | pants |
| 7 | belt |
| 8 | bag |
| 9 | hat |
| 10 | scarf |
| 11 | glasses |
| 12 | arms |
| 13 | hands |
| 14 | legs |
| 15 | feet |
| 16 | torso |
| 17 | jewelry |
For virtual try-on applications:
| Category | Body Coverage | Relevant Labels |
|---|---|---|
| Tops | Upper body | top, dress, scarf |
| Bottoms | Lower body | skirt, pants, belt |
| One-pieces | Full body | top, dress, scarf, skirt, pants, belt |
Labels typically preserved during virtual try-on: face, hair, jewelry, bag, glasses, hat
This model was fine-tuned on a proprietary dataset curated and annotated by FASHN AI, specifically designed for virtual try-on applications. The 18-class label schema was developed to capture the semantic regions most relevant for clothing transfer and human body understanding in fashion contexts.
@misc{fashn-human-parser,
author = {FASHN AI},
title = {FASHN Human Parser: SegFormer for Fashion Human Parsing},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/fashn-ai/fashn-human-parser}
}
This model inherits the NVIDIA Source Code License for SegFormer. Please review the license terms before use.