Review of computer vision libraries and platforms
Computer vision is a field that has undergone great development in recent years and it is becoming more widespread, both among the industry, as well as in everyday activities and consumer/user applications.
The goal of this post/article is to help researchers, developers and enthusiasts involved in computer vision to gain an overview over some of the most popular and widely used software libraries and tools in this field.
Introduction of computer vision and machine learning
Before focusing on current computer vision libraries and platforms, we will provide definitions for computer vision and machine learning. Then we will clarify what is the difference between the two areas and how they are related, because in recent research and development, these concepts are often mentioned together and their meaning and relation are often not well explained.
There are a lot of definitions for Computer Vision [1, 2, 3]. Our way of defining computer vision is:
Definition 1. Computer vision is the property of a computer system to gather information and understand the content from an image or a video stream. The result of that ability/function/action is for example object recognition or detection.
Similarly for Machine Learning there are also multiple definitions [e.g. 4, 5]. In Perelik Soft we are using the following one:
Definition 2. Machine Learning is a technique that enables computer systems to make decisions, without the need for a special rule-based programming but with using automated self-learning from data points. The result of the application of machine learning is a system, capable of making predictions.
In modern technology, machine learning has become an integral part of computer vision systems and algorithms . Using machine learning approaches, the research community has developed and applied algorithms, which successfully recognize a very large percentage of the objects in an image, while achieving quick inference times .
With Machine Learning approaches we need to train a model on data, which can be a very resource intensive task. Whereas years ago there were significant limitations in terms of hardware resources, today there are powerful computing devices, e.g. GPUs, that allow training of systems with very large amounts of data . This advancement in the hardware has significantly expanded the applications of computer vision in combination with machine learning.
Some of the main applications of computer vision are: manufacturing, robotics, automotive, healthcare, social networks, mobile/smart technologies, space exploration, security, agriculture.
It is not always necessary to reinvent the wheel in software projects. There is already a significant set of libraries and platforms for developing computer vision applications . With their help, already established algorithms and models can be used, both for computer vision and machine learning individually and in combination. In the following lines we will look at some of the most popular and applicable libraries and platforms used in modern R&D in computer vision through machine learning.
Software platforms and libraries
The provided data and statistics herein is to be read as of December 2021. If you are reading the article in the distant future some of the data, e.g. community statistics, might have changed.
- OpenCV – Official site: https://opencv.org/.
Origins: OpenCV is set of libraries for image processing and computer vision. It was created by Intel and originally released in 2000. It has been in the industry since 2000. It is an open source library under the BSD license.
Compatibility: It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS. Programming is in C++ and Python. OpenCV is also available as docker container.
Integration and compatibility with other libs: OpenCV supports importing from both Keras, Tensorflow and CUDA.
Pretrained Models and Algos: OpenCV provides support for state-of-the-art, pre-trained neural networks, including ResNet, Inception, SqueezeNet, and more, all of which are capable of performing automatic image classification.
Community: More than 50k stars in GitHub. StackOverflow – more than 60k questions.
- PyTorch – Official site: https://pytorch.org/.
Origins: It is an optimized tensor library for deep learning using GPUs and CPUs.
Compatibility: The library can be installed on Mac, Windows, and Different Linux distributions. The programming languages in PyTorch are Python, C++ and Java. The developer can choose the processing unit: GPU or CPU.
Integration and compatibility with other libs: compatable with CUDA for GPU processing.
Pretrained Models and Algos: Basic libraries for deep learning processes, good documentation, official tutorials.
Community: more than 53K stars in GitHub. StackOverflow – more than 15k questions.
- The Accord.NET – Official site: http://accord-framework.net/.
Origins: developed by César Roberto de Souza and originally released in 2010 under the terms of the Gnu Lesser Public License and open source.
Compatibility: Microsoft Windows, Xamarin, Unity3D, Windows Store applications, Linux or mobile. Written in C# and compatible with .NET.
Integration and compatibility with other libs: –
Pretrained Models and Algos: a large number of image processing, machine learning and vision samples and algos.
Community: More than 4k stars in GitHub. StackOverflow – 288 questions. Available as Nuget library.
- TensorFlow – Official site: https://www.tensorflow.org/.
Origins: A platform for creation and training of machine learning models and experimentation. It was created by the Google Brain team and initially released on November 9, 2015, open source and under the Apache License.
Integration and compatibility with other libs: Keras, CUDA, OpenCV.
Pretrained Models and Algos: large set of pre-trained ML models for CV – ResNet, RetineNet, Mask R-CNN and more.
Community: More than 150k stars in GitHub. StackOverflow – more than 60k questions.
- Keras – a Python written high-level API for a faster and more convenient work with TensorFlow. It is used for defining and training neural networks. Keras enables fast prototyping, state-of-the-art research, and production . https://keras.io/about/
- CUDA – Official site: https://developer.nvidia.com/cuda-zone.
Origins: a parallel computing platform that was created by Nvidia and released in 2007. CUDA EULA license terms – https://docs.nvidia.com/cuda/eula/index.html.
Compatibility: supported OS – Linux, Windows, MacOS. Developers can program in various languages like C, C++, Fortran, MATLAB, Python, etc.
Integration and compatibility with other libs: Some libraries and collections include GPU4Vision, OpenVIDIA for popular computer vision algorithms on CUDA, MinGPU which is a minimum GPU library for Computer Vision, etc.
Pretrained Models and Algos: supports training on some of the most popular object detection architectures, such as YOLOv3, FasterRCNN, SSD/DSSD, and RetinaNet, as well as popular classification networks such as ResNet, DarkNet, and MobileNet.
Community: StackOverflow – more than 12k questions.
There is a wide range of libraries an tools for working with computer vision. The list of the ones we mentioned is surely not complete. Besides the above mentioned there are a number of other libraries and tools with a similar application out there.
Of course, there is no best library or platform. Each of the above mentioned has its advantages and disadvantages when used for different tasks and projects. Also different development teams might find one or another platform more convenient to work with.
In case that you are new in computer vision, you might want to start with OpenCV. These libraries consist of ready to use fundamental/basic programs for image processing and basic object recognition.
If you want to experiment more actively with Machine Learning, Tensorflow and Keras might be a good place to start.
For larger real-time applications one might want to use multiple libraries/plarforms in composition.
- Computer Vision: Algorithms and Applications – https://szeliski.org/Book/
- Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. “You only look once: Unified, real-time object detection.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788. 2016
- https://enlyft.com/tech/ , https://stackshare.io/feed, https://discovery.hgdata.com/
— — —
We put a lot of effort in the content creation in our blog. Multiple information sources are used, we do our own analysis and always double check what we have written down. However, it is still possible that factual or other mistakes occur. If you choose to use what is written on our blog in your own business or personal activities, you do so at your own risk. Be aware that Perelik Soft Ltd. is not liable for any direct or indirect damages you may suffer regarding the use of the content of our blog.
Author: Denis Chikurtev