AlphaBit - OpenML

Android Studio Implementation

For Low Quality Camera Machine Learning Model (low.tflite) (Will be added support for high.tflite today)

package org.firstinspires.ftc.teamcode.drive.opmodes;

                    import android.content.res.AssetFileDescriptor;
                    import android.content.res.AssetManager;
                    import com.qualcomm.robotcore.eventloop.opmode.LinearOpMode;
                    import com.qualcomm.robotcore.eventloop.opmode.TeleOp;
                    import org.firstinspires.ftc.robotcore.external.hardware.camera.WebcamName;
                    import org.firstinspires.ftc.robotcore.internal.camera.calibration.CameraCalibration;
                    import org.firstinspires.ftc.vision.VisionPortal;
                    import org.firstinspires.ftc.vision.VisionProcessor;
                    import org.opencv.core.Core;
                    import org.opencv.core.CvType;
                    import org.opencv.core.Mat;
                    import org.opencv.imgproc.Imgproc;
                    import org.tensorflow.lite.Interpreter;

                    import android.graphics.Canvas;
                    import android.util.Size;
                    import java.io.FileInputStream;
                    import java.io.IOException;
                    import java.nio.ByteBuffer;
                    import java.nio.ByteOrder;
                    import java.nio.channels.FileChannel;
                    import java.util.ArrayList;
                    import java.util.List;

                    @TeleOp(name = "AI Vision", group = "AITesting")
                    public class AIVision extends LinearOpMode {

                        // Configuration
                        private static final String MODEL_FILE = "best_float32.tflite";
                        private static final int MODEL_SIZE = 320;
                        private static final float CONF_THRESHOLD = 0.6f;
                        private static final float IOU_THRESHOLD = 0.5f;

                        private VisionPortal visionPortal;

                        @Override
                        public void runOpMode() {
                            YoloProcessor frameProcessor = new YoloProcessor(hardwareMap.appContext.getAssets());

                            visionPortal = new VisionPortal.Builder()
                                    .setCamera(hardwareMap.get(WebcamName.class, "AICam"))
                                    .setStreamFormat(VisionPortal.StreamFormat.YUY2)
                                    .setCameraResolution(new Size(320, 240))
                                    .addProcessor(frameProcessor)
                                    .build();

                            waitForStart();

                            try {
                                while (opModeIsActive()) {
                                    List detections = frameProcessor.getLatestDetections();
                                    telemetry.addData("Detections", detections.size());
                                    int bestConfidence = -1;
                                    double confCounter=0;
                                    for(int i=0; i 0){
                                        Detection d = detections.get(bestConfidence);
                                        double angle = Math.toDegrees(Math.atan(d.height/d.width));
                                        double x = 6.4131;
                                        double y = 0.7223;
                                        double final_angle = (angle - x)/y;
                                        telemetry.addData(String.format("Obj %d", bestConfidence),
                                                "%s %.1f%% | W: %.1f H: %.1f Angle: %.1f",
                                                d.className(),
                                                d.confidence * 100,
                                                d.width,
                                                d.height,
                                                final_angle);
                                    }

                                    //!!! This one is to show all the detections. The one from above is to get only the detection with the highest confidence out of all.
                                    /*for (int i = 0; i < detections.size(); i++) {
                                        Detection d = detections.get(i);
                                        telemetry.addData(String.format("Obj %d", i),
                                                "%s %.1f%% | W: %.1f H: %.1f",
                                                d.className(),
                                                d.confidence * 100,
                                                d.width,
                                                d.height);
                                    }*/
                                    telemetry.update();
                                }
                            } finally {
                                visionPortal.close();
                                frameProcessor.close();
                            }
                        }

                        static class YoloProcessor implements VisionProcessor {
                            private final Interpreter tflite;
                            private final Object syncLock = new Object();
                            private List detections = new ArrayList<>();

                            // Modified output buffer for [1,7,2100] shape
                            private final float[][][] outputBuffer = new float[1][7][2100];  // CHANGED HERE
                            private final ByteBuffer inputBuffer;
                            private final Mat resizedMat = new Mat();
                            private final Mat rgbMat = new Mat();

                            public YoloProcessor(AssetManager assets) {
                                try {
                                    Interpreter.Options options = new Interpreter.Options();
                                    options.setNumThreads(4);
                                    options.setUseXNNPACK(true);
                                    tflite = new Interpreter(loadModelFile(assets), options);

                                    inputBuffer = ByteBuffer.allocateDirect(MODEL_SIZE * MODEL_SIZE * 3 * 4);
                                    inputBuffer.order(ByteOrder.nativeOrder());
                                } catch (IOException e) {
                                    throw new RuntimeException("Model loading failed", e);
                                }
                            }

                            private ByteBuffer loadModelFile(AssetManager assets) throws IOException {
                                try (AssetFileDescriptor afd = assets.openFd(MODEL_FILE)) {
                                    try (FileInputStream fis = new FileInputStream(afd.getFileDescriptor())) {
                                        FileChannel channel = fis.getChannel();
                                        return channel.map(FileChannel.MapMode.READ_ONLY, afd.getStartOffset(), afd.getDeclaredLength());
                                    }
                                }
                            }

                            @Override
                            public void init(int width, int height, CameraCalibration calibration) {}

                            @Override
                            public Object processFrame(Mat frame, long captureTimeNanos) {
            
                                Mat rotated = new Mat();
                                Core.rotate(frame, rotated, Core.ROTATE_90_CLOCKWISE);

                                Imgproc.resize(rotated  , resizedMat, new org.opencv.core.Size(MODEL_SIZE, MODEL_SIZE));
                                Imgproc.cvtColor(resizedMat, rgbMat, Imgproc.COLOR_BGR2RGB);
                                rgbMat.convertTo(rgbMat, CvType.CV_32FC3, 1.0 / 255.0);

                                float[] floatBuffer = new float[MODEL_SIZE * MODEL_SIZE * 3];
                                rgbMat.get(0, 0, floatBuffer);
                                inputBuffer.rewind();
                                inputBuffer.asFloatBuffer().put(floatBuffer);

            
                                tflite.run(inputBuffer, outputBuffer);

                                List newDetections = processOutput(frame.width(), frame.height());
                                synchronized (syncLock) {
                                    detections = newDetections;
                                }
                                return null;
                            }

                            private List processOutput(int frameWidth, int frameHeight) {
                                List rawDetections = new ArrayList<>();
                                final float[] xCenterArray = outputBuffer[0][0];
                                final float[] yCenterArray = outputBuffer[0][1];
                                final float[] widthArray = outputBuffer[0][2];
                                final float[] heightArray = outputBuffer[0][3];
                                final float[] class0Array = outputBuffer[0][4];
                                final float[] class1Array = outputBuffer[0][5];
                                final float[] class2Array = outputBuffer[0][6];

            
                                for (int j = 0; j < 2100; j++) {  
                                    float maxScore = Math.max(class0Array[j], Math.max(class1Array[j], class2Array[j]));
                                    if (maxScore < CONF_THRESHOLD) continue;

                                    int classId = 0;
                                    if (maxScore == class1Array[j]) classId = 1;
                                    else if (maxScore == class2Array[j]) classId = 2;

                                    rawDetections.add(new Detection(
                                            xCenterArray[j] * frameWidth,
                                            yCenterArray[j] * frameHeight,
                                            widthArray[j] * frameWidth,
                                            heightArray[j] * frameHeight,
                                            maxScore,
                                            classId
                                    ));
                                }
                                return nms(rawDetections);
                            }

        
                            private List nms(List detections) {
                                List results = new ArrayList<>(10);
                                detections.sort((d1, d2) -> Float.compare(d2.confidence, d1.confidence));

                                while (!detections.isEmpty()) {
                                    Detection best = detections.remove(0);
                                    results.add(best);
                                    detections.removeIf(d -> iou(best, d) > IOU_THRESHOLD);
                                }
                                return results;
                            }

                            private float iou(Detection a, Detection b) {
                                float intersectionLeft = Math.max(a.x1, b.x1);
                                float intersectionTop = Math.max(a.y1, b.y1);
                                float intersectionRight = Math.min(a.x2, b.x2);
                                float intersectionBottom = Math.min(a.y2, b.y2);

                                if (intersectionRight < intersectionLeft || intersectionBottom < intersectionTop)
                                    return 0.0f;

                                float intersectionArea = (intersectionRight - intersectionLeft) * (intersectionBottom - intersectionTop);
                                float areaA = a.width * a.height;
                                float areaB = b.width * b.height;

                                return intersectionArea / (areaA + areaB - intersectionArea);
                            }

                            @Override
                            public void onDrawFrame(Canvas canvas, int width, int height, float scale, float density, Object tag) {}

                            public List getLatestDetections() {
                                synchronized (syncLock) {
                                    return new ArrayList<>(detections);
                                }
                            }

                            public void close() {
                                tflite.close();
                                resizedMat.release();
                                rgbMat.release();
                            }
                        }

                        static class Detection {
                            final float x1, y1, x2, y2;
                            final float confidence;
                            final int classId;
                            public final float width;
                            public final float height;

                            public Detection(float cx, float cy, float w, float h, float conf, int cls) {
                                width = w;
                                height = h;
                                x1 = cx - width/2;
                                y1 = cy - height/2;
                                x2 = cx + width/2;
                                y2 = cy + height/2;
                                confidence = conf;
                                classId = cls;
                            }

                            String className() {
                                switch (classId) {
                                    case 0: return "Yellow";
                                    case 1: return "Blue";
                                    case 2: return "Red";
                                    default: return "Unknown";
                                }
                            }
                        }
                    }

This code implements an AI-based vision system for an FTC robot, using a YOLOv8 model run on a TensorFlow Lite (TFLite) interpreter. It uses a camera to detect and classify objects, providing information about the position and dimensions of each detected object.

Workflow:

Camera and AI model initialization

1. Camera and AI model initialization

A VisionPortal object is created which initializes the 'AICam' camera.

The camera resolution is configured to 320x240.

A custom processor (YoloProcessor) is attached to handle AI inference.

2. Image preprocessing

The image is rotated 90° for alignment.

It is resized to 320x320, the size required by the model.

The image is converted to RGB format and pixel values are normalized to the [0,1] range.

Data is copied into a buffer that will be used for inference.

3. AI Inference

The YOLOv8 model is run on the preprocessed image.

An output with coordinates and confidence scores for multiple detected objects is obtained.

Results are filtered to keep only objects with a confidence score over 60%.

4. Post-processing and overlap removal

A Non-Maximum Suppression (NMS) algorithm is applied to remove duplicate or redundant detections.

The best detected object is determined, i.e., the one with the highest confidence.

The object angle is calculated based on the height-to-width ratio using arctan.

A linear adjustment of the angle is made using some empirical constants.

5. Displaying results

Data about detected objects is sent to telemetry to be viewed on the Driver Station.

VisionPortal and allocated resources are closed when the opmode is stopped.

Important aspects:

The YOLOv8 model uses an output size of [1, 7, 2100], which means it returns up to 2100 detections per frame.

The Non-Maximum Suppression algorithm removes redundant detections based on an IoU threshold of 50%.

Detected classes are limited to three categories: 'Yellow', 'Blue', and 'Red'.

Support -> Discord

Choose Language / Alege Limba