An Overview of NVIDIA NGC Pretrained Models for Computer Vision
One of the biggest complaints from data scientists, machine learning engineers and researchers is not having enough time to actually do research, as their time gets sucked up in the long process of developing models from scratch, and then training and tweaking them until they give the expected results.
Thankfully, the old days of rolling up your sleeves and taking the time building your own models is no longer necessary in many cases. AI-driven companies like NVIDIA have created pretrained models for a variety of use cases including Computer Vision, Natural Language Processing (NLP), and Speech.
You can find these pretrained models in NVIDIA's NGC™ catalog. Users can re-train NVIDIA® NGC™ catalog models with their own datasets much faster than starting from scratch, saving valuable time. In addition, these pre-trained models offer high accuracy and have won MLPerf benchmarks, which can be fine-tuned on custom datasets to achieve unparalleled performance and accuracy.
What is Computer Vision?
Computer vision is a field of artificial intelligence that trains computers to understand the visual world through image and video data. This enables devices like sensors and smart cameras to acquire, process, analyze, and interpret images and videos. Most computer vision techniques begin with a model, or a mathematical algorithm, that has been trained with volumes of data to accomplish a specific task. The primary techniques used in computer vision are classification, detection, segmentation, and image synthesis.
With computer vision, devices can understand the world around us through images and videos. It uses image classification, object detection and tracking, object recognition, semantic segmentation, and instance segmentation.
Pretrained Computer Vision Models
- License Plate Detection - LPDNet models detect one or more license plate objects from a car image and return a box around each object, along with an LPD label for each object.
- PeopleNet - PeopleNet models detect one or more physical objects from three categories within an image and return a box around each object, along with a category label for each object. The three categories of objects detected are persons, bags, and faces.
- ResNet-50 - Residual network architecture introduced “skip connections.” The main advantage of these models is the usage of residual layers as a building block that helps with gradient propagation during training.
- SSD - The SSD model is based on the "SSD: Single Shot MultiBox Detector" paper, which describes SSD as "a method for detecting objects in images using a single deep neural network."
Feel free to contact us if you have any questions or take a look at our Deep Learning Solutions if you're interested in a workstation or server that can be delivered with NVIDIA NGC preinstalled and ready to use.