Preference-aligned 3D quality, step by step
The project builds criterion-wise supervision for real 3D assets, validates its alignment with human judgment, and distills it into an efficient 3D evaluator.
Real 3D assets need a different quality target.
Most existing 3D-QA benchmarks start from a small set of clean objects and inject synthetic degradations such as noise, compression, or downsampling. That setup measures distortion severity, but it does not fully capture naturally occurring artifacts in diverse asset repositories.
- Asset quality is shaped by geometry, texture, material, plausibility, and visible artifacts.
- Human preference matters for evaluation, curation, and downstream generation pipelines.
Direct per-asset scoring replaces reference dependence.
The task estimates a quality vector for an arbitrary 3D asset. It does not require a pristine reference, an artificial distortion operator, or pairwise A/B aggregation for every target asset.
Holistic
- Preference
- Plausibility
- Artifacts
Components
- Geometry
- Texture
- Material
Output
A quality vector that can diagnose where an asset is strong or weak.
Exemplar anchors ground subjective judgments.
The large-scale annotation pipeline injects human preference through stratified exemplar anchors. Each target is shown with multi-view evidence and quality-spanning references so the MLLM judgment is grounded by examples.
- Visual prompting supplies criterion-specific exemplar anchors.
- Relative Ranking positions the target among anchored quality levels.
- Repeated anchor sampling and ensembling reduce anchor-specific bias.
The supervision scales across assets and criteria.
3D-PAQA gathers preference-aligned annotations for over 260K Objaverse-based assets across ten semantic domains. The criteria separate overall perception from component-level weaknesses.
- Natural artifacts instead of hand-injected distortions.
- Six quality signals for richer diagnosis than one global score.
- MLLM-aligned labels designed for training and analysis.
A compact 3D model distills the quality signal.
The project trains a lightweight evaluator on 3D-PAQA annotations. A Point Transformer-v3 backbone predicts criterion-wise quality directly from sampled 3D features.
- Inputs include point cloud signals such as RGB, normals, and PBR attributes.
- The evaluator has 33M parameters, roughly 0.046% of the 72B teacher scale.
- Preference-aligned supervision remains useful after distillation.
Criterion-wise scores become diagnostic signals.
The analysis shows that 3D-PAQA annotations align with human preferences and capture perceptual quality beyond simple geometry complexity. Component-level axes also reveal localized failures.
- 72B relative ranking is adopted for preference-aligned supervision.
- The compact evaluator achieves strong human alignment on overall preference.
- High geometry quality does not guarantee strong texture or material quality.
Applications and current limits
The quality signal can support evaluation, repository curation, and future quality-guided generation or refinement. The current work builds the supervision and evaluator foundation; full downstream validation, transfer to AI-generated assets, and application-specific model ranking remain future work.