Data Cleaning & Formatting for AI-Driven Garment Pattern Generation
1. Introduction
This section details the data cleaning and formatting process developed to transform raw garment pattern data into a machine learning (ML)–ready format. The raw dataset, provided by URBN, comprises over 10,000 garment patterns in CLO3D’s proprietary .ZPRJ format. The goal is to convert these files into formats—such as PNG, DXF, SVG, and JSON—that can be easily ingested by our ML models (ChatGPT-4 and Gemini) for subsequent training tasks.
 |
CLO3D UI displaying a ZPRJ file
|
 |
| CLO3D Python Script API UI |
2. Desired Data Format for AI Training
The end goal of this project is to produce high-quality training data pairs specifically tailored for AI fine-tuning. Each training pair consists of two key components:
Input: A JSON-structured garment description containing detailed attributes about the garment. This includes:
- Garment Category: e.g., womenswear, menswear.
- Gender: e.g., female, male, unisex.
- Style: Overall style designation.
- Description: A comprehensive textual description of the garment.
- Style Details: Specific elements such as neckline, sleeve type, fit, and other design attributes.
- Pattern List: An array of pattern identifiers included in the garment design
Each JSON must adhere to strict standardization for consistent and accurate training outcomes. Below is a complete index of pattern terminologies that can be used in the JSON:
 |
| Garment Description Terminology Diagrams |
Output: The corresponding SVG code representing the garment’s flat sewing pattern. The SVG format is chosen for its scalability and precision in representing the garment's design details. This vectorized output is optimized through cleaning and minification steps to meet the file size and token requirements of our ML models.
{
"category": "womenswear",
"gender": "female",
"style": "",
"description": "This garment features a peplum-style top with a flared silhouette that leads into wide-leg trousers, giving it a relaxed and modern look.",
"styleDetails": {
"neckline": "square",
"lining": "",
"sleeveType": "cap sleeves",
"length": "full length",
"fit": "relaxed",
"rise": "high rise",
"closures": "front tie",
"hem": "flared",
"trims": {
"pockets": "two back pockets",
"buttons": "",
"zipper": "",
"rivets": "",
"thread": "",
"drawstrings": "",
"embellishments": ""
},
"dart_placement": "",
"otherDetails": "features distressed details on the trousers"
},
"patternList": [
{
"name": "OB1310614-SGON_17"
},
{
"name": "Pattern_4953493"
},
{
"name": "Pattern_4953494"
}
]
}
 |
Visual representation of a training data pair
 | Binary representation of a training data pair
|
|
The primary aim is to leverage raw .ZPRJ files and transform them into these structured training pairs using a multi-step process that involves the CLO3D API, Python scripting, AI generation, and manual verification. This approach ensures that our final dataset is both efficient and accurate, providing a foundation for AI garment pattern generation training.
3. Data Cleaning Process
3.1 Objectives
- File Format Transformation: Convert
.ZPRJ files into PNG, DXF, SVG, and JSON formats. - Compatibility: Ensure that the generated files are in a format that the ML models can process.
- Quality vs. File Size Trade-off: Optimize for sufficient detail while maintaining file sizes within acceptable token limits for ML training.
3.2 Conversion Approaches
Approach 1: ZPRJ > DXF > SVG
- Step 1: Convert the
.ZPRJ garment pattern to DXF format, and capture the front & back views in PNG format. - Script:
zprj_to_dxf.py
Generates garment front & back view PNG images and exports the pattern file in DXF format, this script must run in the Python Script window inside of CLO3D.
| Garment Front & Back Capture in PNG format Example |
 |
Pattern DXF file Example 1
|
 |
Pattern DXF file Example 2
|
- Step 2: Convert the DXF file into an SVG format.
- Script:
dxf_to_svg.py
This script processes DXF files and converts them into optimized SVG files. It performs DXF-to-SVG conversion, SVG cleaning, optimization, and further minification before saving the final SVG output.
Observation: Although this method produced detailed SVG files, the resulting file sizes were considerably large—often exceeding the maximum token limits required by our ML models. Though additional data processing was added including, remove extra white space and XML tags such as <style> from each SVG code; remove annotations such as title and size in the SVG file; remove <path> class name; remove move only commands in the SVG code space, etc.
 |
Converted SVG File (Visual Example 1)
|
 |
| Converted SVG File (Visual Example 2) |
3.3 Additional Data Processing Steps
{
"category": "womenswear",
"gender": "female",
"style": "",
"description": "This garment features a peplum-style top with a flared silhouette that leads into wide-leg trousers, giving it a relaxed and modern look.",
"styleDetails": {
"neckline": "square",
"lining": "",
"sleeveType": "cap sleeves",
"length": "full length",
"fit": "relaxed",
"rise": "high rise",
"closures": "front tie",
"hem": "flared",
"trims": {
"pockets": "two back pockets",
"buttons": "",
"zipper": "",
"rivets": "",
"thread": "",
"drawstrings": "",
"embellishments": ""
},
"dart_placement": "",
"otherDetails": "features distressed details on the trousers"
},
"patternList": [
{
"name": "OB1310614-SGON_17"
},
{
"name": "Pattern_4953493"
},
{
"name": "Pattern_4953494"
}
]
}
Example of AI Generated Garment Description in JSON format
|
Quality Assurance:
- Script:
file_preview.py
Provides an automated UI for manual verification of the generated files (SVG, PNG, and JSON) to ensure data integrity and correct formatting.
 |
| File Preview User Interface (Example 1) |
|
 |
| File Preview User Interface (Example 2) |
3.4 Final Data Structure
Upon completion of the cleaning process, each garment design in the dataset includes the following components:
- SVG Pattern File: Contains the vector representation of the garment pattern.
- JSON File: Holds the text prompt detailing the design.
- PNG File of the Pattern: Provides a raster image of the garment pattern.
- Two PNG Files: Capture the front and back views of the garment design.
4. Preparing the JSONL Dataset for ML Training
4.1 Training Data Requirements
For effective ML model training, each example in the dataset requires:
- System Role: Define the AI assistant's task
- User Input: The textual description of the garment design (generated in JSON format).
- System Output: The corresponding SVG pattern file.
4.2 Dataset Compilation
To efficiently compile the training dataset:
- Script:
jsonl_generation.py
This script pairs the JSON text prompt (user input) with its corresponding SVG pattern file (system output) and formats them into a JSON Lines (JSONL) file. The JSONL format is ideal for large-scale training with LLMs, as it allows each training example to be processed individually.
 |
Training data in JSONL format (Example 1)
|
 |
Training data in JSONL format (Example 2)
|
Comments
Post a Comment