MODIFIKASI ARSITEKTUR REAL-TIME DETECTION TRANSFORMER (RT-DETR) UNTUK INSTANCE SEGMENTATION YANG EFISIEN

    Dwiki Fajar Kurniawan, - and Muhamad Nursalman, - and Yaya Wihardi, - (2025) MODIFIKASI ARSITEKTUR REAL-TIME DETECTION TRANSFORMER (RT-DETR) UNTUK INSTANCE SEGMENTATION YANG EFISIEN. S1 thesis, Universitas Pendidikan Indonesia.

    Abstract

    Implementasi instance segmentation secara real-time menghadapi tantangan fundamental berupa trade-off antara akurasi dan kecepatan. Model deteksi objek yang cepat seperti Real-Time Detection Transformer (RT-DETR) belum memiliki kemampuan segmentasi, sementara model segmentasi yang akurat seperti Mask DINO tidak efisien untuk aplikasi real-time. Penelitian ini bertujuan untuk memodifikasi arsitektur RT-DETR agar mampu melakukan instance segmentation dengan tetap mempertahankan efisiensi inferensinya. Metode yang diusulkan adalah dengan mengintegrasikan tiga komponen kunci dari Mask DINO, yaitu mask branch untuk prediksi level piksel, mekanisme hybrid matching, dan unified denoising training ke dalam arsitektur RT-DETR. Model hasil modifikasi yang dinamakan Insta-RT-DETR ini dilatih dan dievaluasi menggunakan dataset COCO 2017 dengan backbone ResNet-50. Hasil eksperimen menunjukkan bahwa Insta-RT-DETR berhasil mencapai performa yang sangat kompetitif dengan 42.5% Mask AP dan 50.2% Box AP. Keunggulan utamanya terletak pada efisiensi, di mana model ini mampu beroperasi pada kecepatan 35.6 FPS (GPU NVIDIA A100), secara signifikan melampaui kecepatan Mask DINO (14.2 FPS) dan akurasi model real-time lainnya. Kesimpulannya, Insta-RT-DETR berhasil mencapai trade-off yang optimal antara akurasi dan kecepatan, menjadikannya solusi yang unggul dan seimbang untuk kebutuhan instance segmentation secara real-time. The implementation of real-time instance segmentation faces a fundamental trade-off between accuracy and speed. Fast object detection models like the Real-Time Detection Transformer (RT-DETR) lack segmentation capabilities, while accurate segmentation models such as Mask DINO are inefficient for real-time applications. This research aims to modify the RT-DETR architecture to enable it to perform instance segmentation while maintaining its inference efficiency. The proposed method involves integrating three key components from Mask DINO, a mask branch for pixel-level prediction, a hybrid matching mechanism, and unified denoising training into the RT-DETR architecture. The resulting model, named Insta-RT-DETR, was trained and evaluated on the COCO 2017 dataset using a ResNet-50 backbone. Experimental results demonstrate that Insta-RT-DETR achieves highly competitive performance with 42.5% Mask AP and 50.2% Box AP. Its primary advantage lies in efficiency, operating at a speed of 35.6 FPS (on an NVIDIA A100 GPU), significantly surpassing the speed of Mask DINO (14.2 FPS) and the accuracy of other real-time models. In conclusion, Insta-RT-DETR successfully achieves an optimal trade-off between accuracy and speed, establishing it as a superior and well-balanced solution for real-time instance segmentation needs.

    [thumbnail of S_KOM_1903761_Title.pdf] Text
    S_KOM_1903761_Title.pdf

    Download (5MB)
    [thumbnail of S_KOM_1903761_Chapter1.pdf] Text
    S_KOM_1903761_Chapter1.pdf

    Download (2MB)
    [thumbnail of S_KOM_1903761_Chapter2.pdf] Text
    S_KOM_1903761_Chapter2.pdf
    Restricted to Staf Perpustakaan

    Download (13MB)
    [thumbnail of S_KOM_1903761_Chapter3.pdf] Text
    S_KOM_1903761_Chapter3.pdf

    Download (3MB)
    [thumbnail of S_KOM_1903761_Chapter4.pdf] Text
    S_KOM_1903761_Chapter4.pdf
    Restricted to Staf Perpustakaan

    Download (8MB)
    [thumbnail of S_KOM_1903761_Chapter5.pdf] Text
    S_KOM_1903761_Chapter5.pdf

    Download (909kB)
    Official URL: https://repository.upi.edu/
    Item Type: Thesis (S1)
    Additional Information: https://scholar.google.com/citations?view_op=new_profile&hl=en ID SINTA Dosen Pembimbing Muhamad Nursalman: 6143456 Yaya Wihardi: 5994413
    Uncontrolled Keywords: Deep Learning, Instance Segmentation, Mask DINO, Object Detection, Real-Time, RT-DETR, Transformer Deep Learning, Instance Segmentation, Mask DINO, Object Detection, Real-Time, RT-DETR, Transformer
    Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
    Divisions: Fakultas Pendidikan Matematika dan Ilmu Pengetahuan Alam > Program Studi Ilmu Komputer
    Depositing User: Dwiki Fajar Kurniawan
    Date Deposited: 09 Sep 2025 10:55
    Last Modified: 09 Sep 2025 10:55
    URI: http://repository.upi.edu/id/eprint/138327

    Actions (login required)

    View Item View Item