Vision Summary
Tags: think, control, software, and organizationPersonhours: 20
Task: Reflect on our vision development
One of our priorities this season was our autonomous, as a perfect autonomous could score us a considerable amount of points. A large portion of these points come from sampling, so that was one of our main focuses within autonomous. Throughout the season, we developed a few different approaches to sampling.
Early on in the season, we began experimenting with using a Convolutional Neural Network to detect the location of the gold mineral. A Convolutional Neural Network, or CNN, is a machine learning algorithm that uses multiple layers which "vote" on what the output should be based on the output of previous layers. We developed a tool to label training images for use in training a CNN, publicly available at https://github.com/arjvik/MineralLabler. We then began training a CNN with the training data we labeled. However, our CNN was unable to reach a high accuracy level, despite us spending lots of time tuning it. A large part of this came to our lack of training data. We haven't given up on it, though, and we hope to improve this approach in the coming weeks.
We then turned to other alternatives. At this time, the built-in TensorFlow Object Detection code was released in the FTC SDK. We tried out TensorFlow, but we were unable to use it reliably. Our testing revealed that the detection provided by TensorFlow was not always able to detect the location of the gold mineral. We attempted to modify some of the parameters, however, since only the trained model was provided to us by FIRST, we were unable to increase its accuracy. We are currently looking to see if we can detect the sampling order even if we only detect some of the sampling minerals. We still have code to use TensorFlow on our robot, but it is only one of a few different vision backends available for selection during runtime.
Another alternative vision framework we tried was OpenCV. OpenCV is a collection of vision processing algorithms which can be combined to form powerful pipelines. OpenCV pipelines perform sequential transformations on their input image, until it ends up in a desired form, such as a set of contours or boundaries of all minerals detected in the image. We developed an OpenCV pipeline to find the center of the gold mineral given an image of the sampling order. To create our pipeline, we used a tool called GRIP, which allows us to visualize and tune our pipeline. However, since we have found that bad lighting conditions greatly influence the quality of detection, we hope to add LED lights to the top of our phone mount so we can get consistent lighting on the field, hopefully further increasing our performance in dark field conditions.
Since we wanted to be able to switch easily between these vision backends, we decided to write a modular framework which allows us to swap out vision implementations with ease. As such, we are now able to choose which vision backend we would like to use during the match, with just a single button press. Because of this, we can also work in parallel on all of the vision backends.
Another abstraction we made was the ability to switch between different viewpoints, or cameras. This allows us to decide at runtime which viewpoint we wish to use, either the front/back camera of the phone, or external webcam. Of course, while there is no good reason to change this during competition (hopefully by then the placement of the phone and webcam on the robot will be finalized), it is extremely useful during the development of the robot, because we don't have everything about our robot finalized.
-
Summary of what we have done:
- Designed a convolutional neural network to perform sampling.
- Tested out the provided TensorFlow model for sampling.
- Developed an OpenCV pipeline to perform sampling.
- Created a framework to switch between different Vision Providers at runtime.
- Created a framework to switch between different camera viewpoints at runtime.
Next Steps
We would like to continue improving on and testing our vision software so that we can reliably sample during our autonomous.