AI-Driven Garment Pattern Generation 02 - Data Cleaning & Formatting

February 06, 2025

AI-Driven Garment Pattern Generation 02 - Data Cleaning & Formatting

Data Cleaning & Formatting for AI-Driven Garment Pattern Generation

Project GitHub Repo: https://github.com/PINKDIAMONDVVS/ai_patternmaker_data

1. Introduction

This section details the data cleaning and formatting process developed to transform raw garment pattern data into a machine learning (ML)–ready format. The raw dataset, provided by URBN, comprises over 10,000 garment patterns in CLO3D’s proprietary .ZPRJ format. The goal is to convert these files into formats—such as PNG, DXF, SVG, and JSON—that can be easily ingested by our ML models (ChatGPT-4 and Gemini) for subsequent training tasks.

CLO3D UI displaying a ZPRJ file

CLO3D Python Script API UI

2. Desired Data Format for AI Training

The end goal of this project is to produce high-quality training data pairs specifically tailored for AI fine-tuning. Each training pair consists of two key components:

Input: A JSON-structured garment description containing detailed attributes about the garment. This includes:

Garment Category: e.g., womenswear, menswear.
Gender: e.g., female, male, unisex.
Style: Overall style designation.
Description: A comprehensive textual description of the garment.
Style Details: Specific elements such as neckline, sleeve type, fit, and other design attributes.
Pattern List: An array of pattern identifiers included in the garment design

Each JSON must adhere to strict standardization for consistent and accurate training outcomes. Below is a complete index of pattern terminologies that can be used in the JSON:

Garment Description Terminology Diagrams

Output: The corresponding SVG code representing the garment’s flat sewing pattern. The SVG format is chosen for its scalability and precision in representing the garment's design details. This vectorized output is optimized through cleaning and minification steps to meet the file size and token requirements of our ML models.

{

    "category": "womenswear",

    "gender": "female",

    "style": "",

    "description": "This garment features a peplum-style top with a flared silhouette that leads into wide-leg trousers, giving it a relaxed and modern look.",

    "styleDetails": {

        "neckline": "square",

        "lining": "",

        "sleeveType": "cap sleeves",

        "length": "full length",

        "fit": "relaxed",

        "rise": "high rise",

        "closures": "front tie",

        "hem": "flared",

        "trims": {

            "pockets": "two back pockets",

            "buttons": "",

            "zipper": "",

            "rivets": "",

            "thread": "",

            "drawstrings": "",

            "embellishments": ""

},

        "dart_placement": "",

        "otherDetails": "features distressed details on the trousers"

},

    "patternList": [

{

            "name": "OB1310614-SGON_17"

},

{

            "name": "Pattern_4953493"

},

{

            "name": "Pattern_4953494"

}

]

}

Visual representation of a training data pair

Binary representation of a training data pair

The primary aim is to leverage raw .ZPRJ files and transform them into these structured training pairs using a multi-step process that involves the CLO3D API, Python scripting, AI generation, and manual verification. This approach ensures that our final dataset is both efficient and accurate, providing a foundation for AI garment pattern generation training.

3. Data Cleaning Process

3.1 Objectives

File Format Transformation: Convert .ZPRJ files into PNG, DXF, SVG, and JSON formats.
Compatibility: Ensure that the generated files are in a format that the ML models can process.
Quality vs. File Size Trade-off: Optimize for sufficient detail while maintaining file sizes within acceptable token limits for ML training.

3.2 Conversion Approaches

Approach 1: ZPRJ > DXF > SVG

Step 1: Convert the .ZPRJ garment pattern to DXF format, and capture the front & back views in PNG format.

Script: zprj_to_dxf.py
Generates garment front & back view PNG images and exports the pattern file in DXF format, this script must run in the Python Script window inside of CLO3D.

Garment Front & Back Capture in PNG format Example

Pattern DXF file Example 1

Pattern DXF file Example 2

Step 2: Convert the DXF file into an SVG format.

Script: dxf_to_svg.py
This script processes DXF files and converts them into optimized SVG files. It performs DXF-to-SVG conversion, SVG cleaning, optimization, and further minification before saving the final SVG output.

Observation: Although this method produced detailed SVG files, the resulting file sizes were considerably large—often exceeding the maximum token limits required by our ML models. Though additional data processing was added including, remove extra white space and XML tags such as <style> from each SVG code; remove annotations such as title and size in the SVG file; remove <path> class name; remove move only commands in the SVG code space, etc.

Converted SVG File (Visual Example 1)

Converted SVG File (Visual Example 2)

3.3 Additional Data Processing Steps

Text Description Generation:

Script: png_to_json.py
Utilizes the generated front and back view PNG images to create a detailed textual description of each garment design in JSON format. This description serves as the user prompt for ML training.

{

    "category": "womenswear",

    "gender": "female",

    "style": "",

    "description": "This garment features a peplum-style top with a flared silhouette that leads into wide-leg trousers, giving it a relaxed and modern look.",

    "styleDetails": {

        "neckline": "square",

        "lining": "",

        "sleeveType": "cap sleeves",

        "length": "full length",

        "fit": "relaxed",

        "rise": "high rise",

        "closures": "front tie",

        "hem": "flared",

        "trims": {

            "pockets": "two back pockets",

            "buttons": "",

            "zipper": "",

            "rivets": "",

            "thread": "",

            "drawstrings": "",

            "embellishments": ""

},

        "dart_placement": "",

        "otherDetails": "features distressed details on the trousers"

},

    "patternList": [

{

            "name": "OB1310614-SGON_17"

},

{

            "name": "Pattern_4953493"

},

{

            "name": "Pattern_4953494"

}

]

}

Example of AI Generated Garment Description in JSON format

Quality Assurance:

Script: file_preview.py
Provides an automated UI for manual verification of the generated files (SVG, PNG, and JSON) to ensure data integrity and correct formatting.

File Preview User Interface (Example 1)

File Preview User Interface (Example 2)

3.4 Final Data Structure

Upon completion of the cleaning process, each garment design in the dataset includes the following components:

SVG Pattern File: Contains the vector representation of the garment pattern.
JSON File: Holds the text prompt detailing the design.
PNG File of the Pattern: Provides a raster image of the garment pattern.
Two PNG Files: Capture the front and back views of the garment design.

4. Preparing the JSONL Dataset for ML Training

4.1 Training Data Requirements

For effective ML model training, each example in the dataset requires:

System Role: Define the AI assistant's task
User Input: The textual description of the garment design (generated in JSON format).
System Output: The corresponding SVG pattern file.

4.2 Dataset Compilation

To efficiently compile the training dataset:

Script: jsonl_generation.py
This script pairs the JSON text prompt (user input) with its corresponding SVG pattern file (system output) and formats them into a JSON Lines (JSONL) file. The JSONL format is ideal for large-scale training with LLMs, as it allows each training example to be processed individually.

Training data in JSONL format (Example 1)

Training data in JSONL format (Example 2)

Search This Blog

A deep dive into WebGPU vs. WebGL for real-time cloth simulation