- IEEE CS Standards
- Career Center
- Subscribe to Newsletter
- IEEE Standards
- For Industry Professionals
- For Students
- Launch a New Career
- Membership FAQ
- Membership FAQs
- Membership Grades
- Special Circumstances
- Discounts & Payments
- Distinguished Contributor Recognition
- Grant Programs
- Find a Local Chapter
- Find a Distinguished Visitor
- Find a Speaker on Early Career Topics
- Technical Communities
- Collabratec (Discussion Forum)
- Start a Chapter
- My Subscriptions
- My Referrals
- Computer Magazine
- ComputingEdge Magazine
- Let us help make your event a success. EXPLORE PLANNING SERVICES
- Events Calendar
- Calls for Papers
- Conference Proceedings
- Conference Highlights
- Top 2024 Conferences
- Conference Sponsorship Options
- Conference Planning Services
- Conference Organizer Resources
- Virtual Conference Guide
- Get a Quote
- CPS Dashboard
- CPS Author FAQ
- CPS Organizer FAQ
- Find the latest in advanced computing research. VISIT THE DIGITAL LIBRARY
- Open Access
- Tech News Blog
- Author Guidelines
- Reviewer Information
- Guest Editor Information
- Editor Information
- Editor-in-Chief Information
- Volunteer Opportunities
- Video Library
- Member Benefits
- Institutional Library Subscriptions
- Advertising and Sponsorship
- Code of Ethics
- Educational Webinars
- Online Education
- Certifications
- Industry Webinars & Whitepapers
- Research Reports
- Bodies of Knowledge
- CS for Industry Professionals
- Resource Library
- Newsletters
- Women in Computing
- Digital Library Access
- Organize a Conference
- Run a Publication
- Become a Distinguished Speaker
- Participate in Standards Activities
- Peer Review Content
- Author Resources
- Publish Open Access
- Society Leadership
- Boards & Committees
- Special Technical Communities
- Local Chapters
- Governance Resources
- Conference Publishing Services
- Chapter Resources
- About the Board of Governors
- Board of Governors Members
- Diversity & Inclusion
- Open Volunteer Opportunities
- Award Recipients
- Student Scholarships & Awards
- Nominate an Election Candidate
- Nominate a Colleague
- Corporate Partnerships
- Conference Sponsorships & Exhibits
- Advertising
- Recruitment
- Publications
- Education & Career
Resources for Computer Vision Professionals
With the ever-growing interest in computer vision, the research, applications, and commercial possibilities for this technology are immense. discover how the world of computer vision is evolving and explore the career opportunities that are newly emerging., page content:, what is computer vision, the fundamentals of computer vision, where is computer vision headed, transportation & aviation, security & privacy, entertainment, agriculture, career opportunities, computer vision engineers, xr design/graphics engineers, data visualization engineers, challenges and limitations of computer vision technology, ethics, standards, diversity, and inclusion, ethics in computer vision, standards & inclusion in xr, diversity in visualization research, voices from the community, ieee computer society fellow: greg welch.
- No results found.
On this resource page you’ll learn…
- Foundations of Computer Vision: Understand the core principles of computer vision and gain insights into how these systems work.
- Market Projections: Gain insight into the anticipated growth of the computer vision market, set to exceed USD $20.88 billion by 2030, with impacts on key domains such as transportation , healthcare , security , entertainment , and agriculture .
- Opportunities in Research and Development: Learn about the increasing demand for research and development in the expanding landscape of computer vision, and discover the rising job opportunities within this dynamic field.
- Industry Impact and Challenges : Uncover the transformative effects of computer vision across various sectors, while acknowledging the existing limitations and barriers that require attention.
- Ethical Considerations: Examine the ethical concerns of computer vision, including the pressing need for transparency, fairness, accountability, privacy, and the adoption of best practices to ensure responsible deployment.
Back to Top
“‘Intelligent’ computers require knowledge of their environment, and the most effective means of acquiring such knowledge is by seeing. Vision opens a new realm of computer applications,” Computer magazine, May 1973.
Grounded in the principles of artificial intelligence (AI), computer vision provides machines the capability to perceive and analyze visual data such as images, graphics, and videos. The intention is similar to AI — to automate decisions — yet its area of focus is exclusive to activities a human’s visual system would generally conduct. IBM describes the contrast lucidly: “If AI enables computers to think, computer vision enables them to see, observe, and understand.”
Computer vision, which seems like a modern innovation, is the outcome of extensive research stretching back to the 1960s. First coming into discovery with Seymour Papert’s Summer Vision Project of 1966, computer vision has been in development for decades, improving all along the way and creating new possibilities for everyone. Though complex, the process of these systems can be broken down into four fundamental steps:
- Visual data such as images or video is taken into the computer vision systems as input. Since images are made up of pixels, these machines process information at the pixel level.
- To analyze the data, distinctive features in the image, such as contours, corners, or colors, are identified using algorithms and models.
- Through the process of identification, the computer recognizes objects such as people, as well as certain behaviors in the visuals. With the powers of machine learning, the computer can improve this ability over time.
- Finally, the computer can provide an output based on this interpretation. To be put simply, this is when the computer communicates what it’s seeing.
Before the technology of computer vision came to today’s application methods, there were of course key pioneers that led the way first. For example, the Optical Character Recognition system was developed by Ray Kurzweil of Kurzweil Computer Products, Inc. in 1974. This system could recognize and process printed text, no matter the font and without manual entry. When placed in a machine learning format and enhanced with text-to-speech features, the technology was used to read for the blind.
This is just one pivotal example of the many applications that display the power and impact of computer vision. Thanks to waves of developments and crucial research, the technology has improved several domains of human life including transportation, healthcare, security, entertainment, and agriculture. Because of this, it is no surprise that the market of computer vision is expected to expand in the very near future.
According to the Top Trends in Computer Vision Report , which reviews the latest trends covered at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , the computer vision industry raked in over $12.14 billion USD in 2022 and has a 7% projected growth rate with $20.88 billion USD expected by 2030.
The revenue is projected to increase due to the surging need for the technology in various fields, like transportation, healthcare, and security. Moreover, according to PS Market Research , XR entertainment systems which were worth $38.3 billion in 2022 are predicted to reach an immense value of $394.8 billion by 2030.
Discover the Future of Computer Vision at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- The U.S. National Highway Traffic Safety Administration (NHTSA) has reported that 94% of critical collisions are caused by human error. With the help of computer vision, advanced cameras and sensors allow vehicles to analyze surroundings, detect objects such as pedestrians and other vehicles, and safely navigate around them. Furthermore, the technology is also used within the aviation sector to create flight simulators. Within these sectors, Extended Reality (XR) is also used to simulate flight training while reducing costs, time, and possible damages to aircraft.
- Toward Fully Autonomous and Networked Vehicles
- Autonomous Driving Technologies Special Technical Community
- Using Extended Reality in Flight Simulators: A Literature Review
Learn more about computer vision and automated vehicles by taking the IEEE course on ‘Using Machine Vision Perception to Control Automated Vehicle Maneuvering’
- Computer vision is also the technology to thank for an improved patient experience within the healthcare system. This includes medical treatments and procedures. Specifically, computer vision has transformed the capabilities of medical imaging data , which allows practitioners to diagnose, monitor, or treat medical conditions. The technology also permits augmented reality (AR)-assisted surgical guidance , which can visualize human anatomy and aid practitioners when performing operations such as neurosurgical procedures.
- AR-Assisted Surgical Guidance System for Ventriculostomy
- Augmented and Virtual Reality in Surgery
- Standardizing 3D Medical Imaging
- Driven by progress made within machine learning, edge computing, IoT, and AI, computer vision enables the capability to mitigate security threats in real time. For example, with the help of image processing and statistical pattern recognition, biometrics allow computers to recognize persons based on physiological characteristics, such as faces or fingerprints. Additionally, computer vision aids security within smart security surveillance . This includes cameras that are placed in different areas within a city that monitor and detect threatening behavior. Attracting more attention is privacy-preserving biometrics as it may be used to resolve concerns related to cryptographic authentication processes.
- The Interplay of AI and Biometrics: Challenges and Opportunities
- Biometrics and Privacy-Preservation: How Do They Evolve?
- Biometrics Based Access Framework for Secure Cloud Computing
XR gaming blurs the line between virtual and physical realities, simulating new worlds and adventures for players to be fully immersed within. According to XR Today , the technology has provided the capability to transform social gatherings by giving its users the ability to create virtual events and exhibitions anywhere at any time.
- Virtual Reality: A Journey from Vision to Commodity
- Affective Virtual Reality: How to Design Artificial Experiences Impacting Human Emotions
Learn More About Virtual Reality and its Applications at IEEE VR 2024
- According to researchers, insects affect 35% of farmland. Understanding and monitoring how insects play a role in agriculture is vital for food production, however, can be very labor-intensive and may even be unreliable at times. Computer vision can potentially improve this process by monitoring it automatically. On top of that, computer vision offers the opportunity to give automated machine systems ‘eyes’, enabling them to navigate fields, without manual labor.
- Towards Computer Vision and Deep Learning Facilitated Pollination Monitoring for Agriculture
- The 1st Agriculture-Vision Challenge: Methods and Results
- Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis
According to the US Bureau of Labor Statistics , the employment of professionals in the computer and information science industry is expected to increase significantly over the next decade, reaching a 21% rise by 2031. To fill these new roles, experts in computer vision, extended reality (XR), and data visualization will be needed.
- Computer vision engineers work in highly collaborative environments, usually guided by the needs of their clients. In addition to building architectures and using algorithms, their typical areas of expertise include image classification, face detection, pose estimation, and optical flow . Within this field, time is mainly spent developing models, retraining them, and creating reliable datasets.
- Skills: Developing image analysis algorithms, deep learning architectures, image processing and visualization, computer vision libraries, and data flow programming Salary: $160K USD (This is a salary estimation for United States employees according to talent.com . View estimates for other countries via Salary Expert .)
- Degree: Bachelor’s in mathematics, computer vision, computer science, machine learning, information systems
- IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- IEEE/CVF International Conference on Computer Vision
- Technical Community on Pattern Analysis and Machine Intelligence
- Those within the XR industry, such as XR Design/Graphics Engineers , use their knowledge of computer vision to bring creative projects to life. Furthermore, they research and develop technology that augments reality, re-creates real-life environments, or generates other spaces that users can interact with virtually. Working cross functionally with creative teams, they use their knowledge within computer vision to help aid the design, optimization, integration, and testing of XR devices and products such as video games and other entertainment systems.
- Skills: 3D visualization tools/art, coding languages such as python, C/C++ programming, and/or Java, Linear algebra, multimedia software stacks and frameworks
- Salary: $107,000 USD (This is a salary estimation for United States employees according to circuitstream.com . View estimates for other countries via Salary Expert .)
- Degree: Bachelor’s in Computer engineering, mathematics, or related fields of study. Master’s in Human Centered Design and Engineering or Interaction Design
- IEEE Virtual Reality 2024
- Technical Community on Intelligent Informatics
- The power of visualizing data helps decision makers to recognize and address patterns and mistakes in their information, allowing them to make educated choices for their organization. Data visualization engineers create visual representations of data, then build dashboards for different business departments to inspect. They play a pivotal role in the process of informed decision-making.
- Skills: Business Intelligence (BI) tools, Data analysis, python-based visualizations, Data Visualization Tools such as Tableau, Yellowfin, and Qlik Sense, and mathematics/statistics
- Salary: $96,317 (This is a salary estimation for United States employees according to salary.com . View estimates for other countries via Salary Expert .)
- Degree: bachelor’s degree in computer science, computer information systems, software engineering, or a closely related field. Master’s degree in Data Analytics or Visualization
- IEEE VIS: Visualization & Visual Analytics
- Technical Community on Visualization and Graphics
While computer vision has made significant improvements, challenges still prevail, emphasizing the necessity for continuous research and development in the field. This includes concerns related to data quality and bias. It’s important to note that any technology created or managed by humans is susceptible to biases. To ensure accurate detections and optimal functionality, these systems must be developed with diversity in inputs.
Moreover, the question remains: Can a computer not only perceive but truly comprehend its observations? It is crucial to instill trust in these systems, ensuring they understand what they observe with minimal errors and increased adoption to be accurate.
Lastly, security and privacy stand as major considerations for any widely adapted technology. However, these aspects continue to be challenging with room for improvement. In the context of facial recognition, this issue becomes particularly pronounced and ongoing, necessitating scrutiny and improvement.
As the usage of computer vision technology progresses, ethics considerations have begun dominating the discussion. It’s crucial to examine specifics related to computer vision rather than depending on the general ethics linked to AI. These conversations are taking place during conferences, standards development and working groups, and research projects.
The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) aims to initiate further discussion within computer vision applications and research. In 2022, it was encouraged that researchers submit papers and proposals including potential negative societal impacts of their proposed research and possible methods on how to mitigate them. Potential ethical concerns include the safety of living beings, privacy, environmental impact, and economic security.
The organizers prioritized transparency and stated, “Grappling with ethics is a difficult problem for the field [computer vision], and thinking about ethics is still relatively new to many authors… In certain cases, it will not be possible to draw a bright line between ethical and unethical.”
The committee of IEEE/CVF CVPR 2023 planed to continue this conversation for the next annual conference and called for papers that focus on transparency, fairness, accountability, privacy, and ethics in vision.
Specifically, in regard to ethics for XR, IEEE is laying down the foundation with standardization. As stated in IEEE Spectrum , “… the IEEE Standards Association (IEEE SA) is working to help define, develop, and deploy the technologies, applications, and governance practices needed to help turn metaverse concepts into practical realities, and to drive new markets.”
It’s also vital to keep in mind that this cutting-edge technology should be made accessible. For instance, it needs to accommodate people who are visually impaired . The study “ Toward inclusivity: Virtual Reality Museums for the Visually Impaired ” examines how narrations, spatialized “reference” audio, along with haptic feedback can be an effective replacement for the traditional use of vision in a virtual reality. The study discovered that those with visual impairments could locate objects more quickly with the aid of enhanced audio and tactile feedback.
Lastly, IEEE Transactions on Visualization and Computer Graphics ( IEEE TVCG ) conducted an analysis of gender representation among the attendees, organizers, and presenters at the IEEE Visualization (VIS) conference over the last 30 years. It was found that the proportion of female authors has increased from 9% in the first five years to 22% in the last five years of the conference.
The IEEE Computer Society urges academics and practitioners to send any ideas that may advance the dialogue to [email protected] since, it is efforts such as these, that have the potential to push the industry towards a brighter future.
IEEE Computer Society Fellow and computer scientist engineer, Greg Welch, is the AdventHealth Endowed Chair in Healthcare Simulation in UCF’s College of Nursing in addition to being co-director of the UCF Synthetic Reality Laborator y. In 2021, Welch reached fellowship status, for contributions to tracking methods in augmented reality applications . Specifically, his primary area of study is virtual reality (VR) and augmented reality (AR), collectively known as “XR,” with a focus in both hardware and software applications.
Currently, Welch spends his time researching the way humans perceive AR related experiences when interacting with the technology. Additionally, he is the lead of the pending NSF project, “Virtual Experience Research Accelerator (VERA),” a system that will improve the process of generating VR related research for scientists.
When asked what advice Welch had for readers with an interest in pursuing a similar path, he mentioned how beneficial ongoing exploration can be, “The field changes fast — something that is hot today might not be tomorrow. In addition, a broader perspective can enable one to see connections and opportunities.”
He recommends taking advantage of community resources and networking opportunities, “From an experiential perspective, get involved! The community [IEEE Computer Society] would not exist without volunteers, but there are so many benefits — it really is true that you get out what you put in.”
Inside the Computer Society
Expo and Leadership Forum | 27-28 August 2024
Our Commitment to equity, diversity, and inclusion
CS Members can now add full CSDL access for one flat rate! Use promo code CSDLTRACK
Software Engineering Radio: The Podcast for Professional Software Developers
Sign up for our newsletter.
EMAIL ADDRESS
IEEE COMPUTER SOCIETY
- Board of Governors
- IEEE Support Center
DIGITAL LIBRARY
- Librarian Resources
COMPUTING RESOURCES
- Courses & Certifications
COMMUNITY RESOURCES
- Conference Organizers
- Communities
BUSINESS SOLUTIONS
- Conference Sponsorships & Exhibits
- Digital Library Institutional Subscriptions
- Accessibility Statement
- IEEE Nondiscrimination Policy
- XML Sitemap
©IEEE — All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A not-for-profit organization, the Institute of Electrical and Electronics Engineers (IEEE) is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.
CVF Sponsored Conferences
Cvf sponsored conferences errata.
It is the policy of the Computer Vision Foundation to maintain PDF copies of conference papers as submitted during the camera-ready paper collection. These papers are considered the final published versions of the work. We recognize the need for minor corrections after publication, and thus provide links to arXiv versions of the papers where available. If a correction must be made, it should be made as an update to the arXiv version of the paper by the authors. The CVF maintainers should then be notified of the update via email ( [email protected] ). The conference open access website will be updated periodically to indicate changes made to an arXiv version since the original conference publication date. The original camera-ready version of the paper will be maintained within the open access archive, and will not be removed or replaced by request.
Other Computer Vision Conferences and Workshops
Subscribe to the PwC Newsletter
Join the community, computer vision, semantic segmentation.
Tumor Segmentation
Panoptic Segmentation
3D Semantic Segmentation
Weakly-Supervised Semantic Segmentation
Classification.
Text Classification
Graph Classification
Audio Classification
Medical Image Classification
Representation learning.
Disentanglement
Graph representation learning, sentence embeddings.
Network Embedding
Object detection.
3D Object Detection
Real-Time Object Detection
RGB Salient Object Detection
Few-Shot Object Detection
Image classification.
Out of Distribution (OOD) Detection
Few-Shot Image Classification
Fine-Grained Image Classification
Semi-Supervised Image Classification
Reinforcement learning (rl), off-policy evaluation, multi-objective reinforcement learning, 3d point cloud reinforcement learning, 2d object detection.
Edge Detection
Open Vocabulary Object Detection
Semi-Supervised Object Detection
Deep hashing, table retrieval, domain adaptation.
Unsupervised Domain Adaptation
Domain Generalization
Source-Free Domain Adaptation
Universal domain adaptation, image generation.
Image-to-Image Translation
Image Inpainting
Text-to-Image Generation
Conditional Image Generation
Data augmentation.
Image Augmentation
Text Augmentation
Autonomous vehicles.
Autonomous Driving
Self-Driving Cars
Simultaneous Localization and Mapping
Autonomous Navigation
Image Denoising
Color Image Denoising
Sar Image Despeckling
Grayscale image denoising, meta-learning.
Few-Shot Learning
Sample Probing
Depth Estimation
Style Transfer
3D Reconstruction
Neural Rendering
3D Face Reconstruction
Contrastive learning.
Super-Resolution
Image Super-Resolution
Video Super-Resolution
Multi-Frame Super-Resolution
Reference-based Super-Resolution
Pose estimation.
3D Human Pose Estimation
Keypoint Detection
3D Pose Estimation
6D Pose Estimation
Text-based Image Editing
Text-guided-image-editing.
Zero-Shot Text-to-Image Generation
Concept alignment, 3d architecture, 2d semantic segmentation, image segmentation.
Text-To-SQL
Text style transfer.
Scene Parsing
Self-supervised learning.
Point Cloud Pre-training
Unsupervised video clustering, visual question answering (vqa).
Visual Question Answering
Machine Reading Comprehension
Chart Question Answering
Embodied Question Answering
Sentiment analysis.
Aspect-Based Sentiment Analysis (ABSA)
Multimodal Sentiment Analysis
Aspect Sentiment Triplet Extraction
Twitter Sentiment Analysis
Anomaly detection.
Unsupervised Anomaly Detection
One-Class Classification
Supervised anomaly detection, anomaly detection in surveillance videos.
Temporal Action Localization
Video Understanding
Video Object Segmentation
Action Classification
Video generation, activity recognition.
Action Recognition
Human Activity Recognition
Egocentric activity recognition.
Group Activity Recognition
One-Shot Learning
Few-Shot Semantic Segmentation
Cross-domain few-shot.
Unsupervised Few-Shot Learning
3d object super-resolution, medical image segmentation.
Lesion Segmentation
Brain Tumor Segmentation
Cell Segmentation
Brain Segmentation
Monocular depth estimation.
Stereo Depth Estimation
Depth and camera motion.
3D Depth Estimation
Exposure fairness, optical character recognition (ocr).
Active Learning
Handwriting Recognition
Handwritten digit recognition, irregular text recognition, facial recognition and modelling.
Face Recognition
Face Swapping
Face Detection
Face Verification
Facial Expression Recognition (FER)
Instance segmentation.
Referring Expression Segmentation
3D Instance Segmentation
Real-time Instance Segmentation
Unsupervised Object Segmentation
Object tracking.
Multi-Object Tracking
Visual Object Tracking
Multiple Object Tracking
Cell Tracking
Zero-shot learning.
Generalized Zero-Shot Learning
Compositional Zero-Shot Learning
Multi-label zero-shot learning.
Action Recognition In Videos
Self-supervised action recognition.
3D Action Recognition
Few shot action recognition, quantization, data free quantization, unet quantization, continual learning.
Class Incremental Learning
Continual named entity recognition, unsupervised class-incremental learning.
Scene Understanding
Scene Text Recognition
Scene Graph Generation
Scene Recognition
Adversarial attack.
Backdoor Attack
Adversarial Text
Adversarial attack detection, real-world adversarial attack, active object detection, image retrieval.
Sketch-Based Image Retrieval
Content-Based Image Retrieval
Composed Image Retrieval (CoIR)
Medical Image Retrieval
Dimensionality reduction.
Supervised dimensionality reduction
Online nonnegative cp decomposition.
Image Stylization
Font style transfer, style generalization, face transfer, optical flow estimation.
Video Stabilization
Monocular 3D Object Detection
3D Object Detection From Stereo Images
Multiview Detection
Robust 3d object detection, emotion recognition.
Speech Emotion Recognition
Emotion Recognition in Conversation
Multimodal Emotion Recognition
Emotion-cause pair extraction, image reconstruction.
MRI Reconstruction
Action localization.
Action Segmentation
Spatio-temporal action localization, person re-identification.
Unsupervised Person Re-Identification
Video-based person re-identification, generalizable person re-identification, cloth-changing person re-identification, image captioning.
3D dense captioning
Controllable image captioning, aesthetic image captioning.
Relational Captioning
Action detection.
Skeleton Based Action Recognition
Online Action Detection
Audio-visual active speaker detection, visual relationship detection, lighting estimation.
3D Room Layouts From A Single RGB Panorama
Road scene understanding, metric learning.
Image Restoration
Demosaicking
Spectral reconstruction, underwater image restoration.
JPEG Artifact Correction
Object recognition.
3D Object Recognition
Continuous object recognition.
Depiction Invariant Object Recognition
Monocular 3D Human Pose Estimation
Pose prediction.
3D Multi-Person Pose Estimation
3d human pose and shape estimation, image enhancement.
Low-Light Image Enhancement
Image relighting, de-aliasing, continuous control.
Steering Control
Drone controller, 3d face modelling.
Semi-Supervised Video Object Segmentation
Unsupervised Video Object Segmentation
Referring Video Object Segmentation
Video Salient Object Detection
Multi-label classification.
Extreme Multi-Label Classification
Medical code prediction, hierarchical multi-label classification, trajectory prediction.
Trajectory Forecasting
Human motion prediction.
Multivariate Time Series Imputation
Object localization.
Weakly-Supervised Object Localization
Image-based localization, unsupervised object localization, monocular 3d object localization, out-of-distribution detection, image quality assessment, no-reference image quality assessment, blind image quality assessment.
Aesthetics Quality Assessment
Stereoscopic image quality assessment.
Blind Image Deblurring
Single-image blind deblurring, video semantic segmentation.
Camera shot segmentation
Cloud removal.
Facial Inpainting
Fine-Grained Image Inpainting
Novel view synthesis.
Gournd video synthesis from satellite image
Saliency detection.
Saliency Prediction
Co-Salient Object Detection
Video saliency detection, change detection.
Semi-supervised Change Detection
Image compression.
Feature Compression
Jpeg compression artifact reduction.
Lossy-Compression Artifact Reduction
Color image compression artifact reduction, explainable artificial intelligence, explainable models, explanation fidelity evaluation, fad curve analysis, salient object detection, saliency ranking, ensemble learning, visual reasoning.
Visual Commonsense Reasoning
Instruction following, visual instruction following, image registration.
2D Classification
Neural Network Compression
Music Source Separation
Cell detection.
Plant Phenotyping
Open-set classification, visual tracking.
Point Tracking
Real-time visual tracking, rgb-t tracking.
RF-based Visual Tracking
Image manipulation detection.
Generalized Zero Shot skeletal action recognition
Zero shot skeletal action recognition, motion estimation, activity prediction, motion prediction, cyber attack detection, sequential skip prediction, 3d point cloud classification.
3D Object Classification
Few-Shot 3D Point Cloud Classification
Zero-shot transfer 3d point cloud classification, prompt engineering.
Visual Prompting
Robust 3D Semantic Segmentation
Real-Time 3D Semantic Segmentation
Unsupervised 3D Semantic Segmentation
Furniture segmentation, gesture recognition.
Hand Gesture Recognition
Hand-Gesture Recognition
RF-based Gesture Recognition
Whole slide images, point cloud registration.
Image to Point Cloud Registration
Video captioning.
Dense Video Captioning
Boundary captioning, visual text correction, audio-visual video captioning, 3d point cloud interpolation, text detection, medical diagnosis.
Alzheimer's Disease Detection
Retinal OCT Disease Classification
Blood cell count, thoracic disease classification.
Hand Pose Estimation
Hand Segmentation
Gesture-to-gesture translation, visual grounding.
Person-centric Visual Grounding
Phrase Extraction and Grounding (PEG)
Visual odometry.
Face Anti-Spoofing
Monocular visual odometry, rain removal.
Single Image Deraining
Image clustering.
Online Clustering
Face Clustering
Multi-view subspace clustering, multi-modal subspace clustering, colorization.
Line Art Colorization
Point-interactive Image Colorization
Color Mismatch Correction
Robot navigation.
PointGoal Navigation
Social navigation.
Sequential Place Learning
Image Dehazing
Single Image Dehazing
Video question answering.
Zero-Shot Video Question Answer
Few-shot video question answering.
Unsupervised Image-To-Image Translation
Synthetic-to-Real Translation
Multimodal Unsupervised Image-To-Image Translation
Cross-View Image-to-Image Translation
Fundus to Angiography Generation
Image manipulation, visual localization.
Image Editing
Rolling shutter correction, shadow removal, joint deblur and frame interpolation, multimodal fashion image editing, multimodel-guided image editing, stereo matching, conformal prediction.
Crowd Counting
Visual Crowd Analysis
Group detection in crowds, human-object interaction detection.
Affordance Recognition
Visual place recognition.
Indoor Localization
3d place recognition, image matching.
Semantic correspondence
Patch matching, set matching.
Matching Disparate Images
Point cloud classification, jet tagging, few-shot point cloud classification, deepfake detection.
Synthetic Speech Detection
Human detection of deepfakes, multimodal forgery detection, image deblurring, low-light image deblurring and enhancement, object reconstruction.
3D Object Reconstruction
Document text classification, learning with noisy labels, multi-label classification of biomedical texts, political salient issue orientation detection, hyperspectral.
Hyperspectral Image Classification
Hyperspectral unmixing, hyperspectral image segmentation, classification of hyperspectral images.
Weakly Supervised Action Localization
Weakly-supervised temporal action localization.
Temporal Action Proposal Generation
Activity recognition in videos, 2d human pose estimation, action anticipation.
3D Face Animation
Semi-supervised human pose estimation, scene classification.
Referring Expression
Point cloud generation, point cloud completion, compressive sensing, video quality assessment, video alignment, temporal sentence grounding, long-video activity recognition, keyword spotting.
Small-Footprint Keyword Spotting
Visual keyword spotting, scene text detection.
Curved Text Detection
Multi-oriented scene text detection, boundary detection.
Junction Detection
Reconstruction, 3d human reconstruction.
Single-View 3D Reconstruction
4d reconstruction, single-image-based hdr reconstruction, image matting.
Semantic Image Matting
Camera calibration, superpixels, emotion classification.
Video Retrieval
Video-text retrieval, video grounding, video-adverb retrieval, replay grounding, composed video retrieval (covr), point cloud segmentation, sensor fusion.
Point cloud reconstruction
3D Semantic Scene Completion
3D Semantic Scene Completion from a single RGB image
Garment reconstruction.
Few-Shot Transfer Learning for Saliency Prediction
Aerial Video Saliency Prediction
Remote sensing.
Remote Sensing Image Classification
Change detection for remote sensing images, building change detection for remote sensing images.
Segmentation Of Remote Sensing Imagery
The Semantic Segmentation Of Remote Sensing Imagery
Cross-modal retrieval, image-text matching, multilingual cross-modal retrieval.
Zero-shot Composed Person Retrieval
Cross-modal retrieval on rsitmd, video summarization.
Unsupervised Video Summarization
Supervised video summarization, document layout analysis.
Document AI
Document understanding, human detection.
Face Generation
Talking Head Generation
Talking face generation.
Face Age Editing
Facial expression generation, kinship face generation, video instance segmentation.
Motion Synthesis
Motion Style Transfer
Temporal human motion composition, privacy preserving deep learning, membership inference attack.
Generalized Few-Shot Semantic Segmentation
Depth completion.
Video Editing
Video temporal consistency, face reconstruction, motion forecasting.
Multi-Person Pose forecasting
Multiple Object Forecasting
Object discovery, virtual try-on, carla map leaderboard, dead-reckoning prediction, 3d anomaly detection, video anomaly detection, 3d classification, scene flow estimation.
Self-supervised Scene Flow Estimation
Generalized Referring Expression Segmentation
Gaze estimation.
Texture Synthesis
Human parsing.
Multi-Human Parsing
Weakly supervised segmentation.
3D Multi-Person Pose Estimation (absolute)
3D Multi-Person Pose Estimation (root-relative)
3D Multi-Person Mesh Recovery
Facial landmark detection.
Unsupervised Facial Landmark Detection
3D Facial Landmark Localization
Pose tracking.
3D Human Pose Tracking
Activity detection, inverse rendering, gait recognition.
Multiview Gait Recognition
Gait recognition in the wild, interest point detection, homography estimation, multi-view learning, incomplete multi-view clustering, scene segmentation.
Thermal Image Segmentation
Sign language recognition.
3D Character Animation From A Single Photo
Interactive segmentation, disease prediction, disease trajectory forecasting.
Dichotomous Image Segmentation
Temporal localization.
Language-Based Temporal Localization
Temporal defect localization, scene generation, template matching, event-based vision.
Event-based Optical Flow
Event-Based Video Reconstruction
Event-based motion estimation, multi-label image classification.
Multi-label Image Recognition with Partial Labels
3D Hand Pose Estimation
Object counting, intelligent surveillance.
Vehicle Re-Identification
Relation network, visual dialog.
Image Recognition
Fine-grained image recognition, license plate recognition, motion segmentation, camera localization.
Camera Relocalization
Disparity estimation.
3D Object Tracking
3D Single Object Tracking
Lidar semantic segmentation, text to video retrieval, partially relevant video retrieval, text spotting.
Person Search
Decision making under uncertainty.
Uncertainty Visualization
Knowledge distillation.
Data-free Knowledge Distillation
Self-knowledge distillation, mixed reality, few-shot class-incremental learning, class-incremental semantic segmentation, non-exemplar-based class incremental learning, text-to-video generation, text-to-video editing, subject-driven video generation, shadow detection.
Shadow Detection And Removal
Unconstrained Lip-synchronization
Cross-corpus
Micro-expression recognition, micro-expression spotting.
3D Facial Expression Recognition
Smile Recognition
Moment retrieval.
Video Inpainting
Future prediction
Overlapped 10-1, overlapped 15-1, overlapped 15-5, disjoint 10-1, disjoint 15-1, image categorization, fine-grained visual categorization, deep attention, video enhancement.
Face Image Quality Assessment
Lightweight face recognition.
Age-Invariant Face Recognition
Synthetic face recognition, face quality assessement.
Stereo Image Super-Resolution
Burst image super-resolution, satellite image super-resolution, multispectral image super-resolution, physics-informed machine learning, soil moisture estimation, line detection, color constancy.
Few-Shot Camera-Adaptive Color Constancy
Image cropping, stereo matching hand.
Visual Recognition
Fine-Grained Visual Recognition
Human mesh recovery, zero-shot action recognition.
3D Multi-Object Tracking
Real-time multi-object tracking, multi-animal tracking with identification, grounded multiple object tracking, sign language translation.
Tone Mapping
Video reconstruction.
Zero Shot Segmentation
Surface normals estimation.
Natural Language Transduction
Transparent object detection, transparent objects, video restoration.
Analog Video Restoration
3d absolute human pose estimation.
Text-to-Face Generation
Image forensics, novel class discovery.
HDR Reconstruction
Multi-exposure image fusion, abnormal event detection in video.
Semi-supervised Anomaly Detection
Cross-domain few-shot learning, probabilistic deep learning, unsupervised few-shot image classification, generalized few-shot classification, breast cancer histology image classification.
Breast Cancer Detection
Breast cancer histology image classification (20% labels), infrared and visible image fusion.
Steganalysis
Texture classification, vision-language navigation.
Spoof Detection
Face presentation attack detection, detecting image manipulation, cross-domain iris presentation attack detection, finger dorsal image spoof detection, image animation.
Iris Recognition
Pupil dilation, pedestrian attribute recognition.
Reflection Removal
One-shot visual object segmentation
Sketch Recognition
Face Sketch Synthesis
Drawing pictures.
Photo-To-Caricature Translation
Unbiased Scene Graph Generation
Panoptic Scene Graph Generation
Action understanding, automatic post-editing.
Document Image Classification
Geometric Matching
Highlight detection, multi-view 3d reconstruction, object categorization, severity prediction, intubation support prediction, meme classification, hateful meme classification, blind face restoration.
Cloud Detection
Dense Captioning
Face reenactment.
Human action generation
Action Generation
Image outpainting.
Person Retrieval
Surgical phase recognition, online surgical phase recognition, offline surgical phase recognition, human dynamics.
3D Human Dynamics
Semantic SLAM
Object SLAM
Action quality assessment, image stitching.
Text based Person Retrieval
Text-to-image, story visualization, complex scene breaking and synthesis, object segmentation.
Camouflaged Object Segmentation
Landslide segmentation, text-line extraction, situation recognition, grounded situation recognition, image deconvolution.
Intrinsic Image Decomposition
Line segment detection, multi-target domain adaptation, image fusion, pansharpening, image to video generation.
Unconditional Video Generation
Table recognition, weakly-supervised instance segmentation, image smoothing.
Camouflaged Object Segmentation with a Single Task-generic Prompt
Image morphing, image steganography, point clouds, rotated mnist, diffusion personalization.
Diffusion Personalization Tuning Free
Efficient diffusion personalization, image shadow removal, layout design, motion detection, sports analytics, viewpoint estimation.
Fake Image Detection
GAN image forensics
Fake Image Attribution
Drone navigation, drone-view target localization, lane detection.
3D Lane Detection
License plate detection.
Multi-Object Tracking and Segmentation
Occlusion Handling
Video Panoptic Segmentation
Person identification, zero-shot transfer image classification.
Value prediction
Body mass index (bmi) prediction, contour detection.
Face Image Quality
Photo retouching.
Grasp Generation
3D Canonical Hand Pose Estimation
Shape representation of 3d point clouds, 3d point cloud reconstruction, dense pixel correspondence estimation, human part segmentation.
Image to 3D
Symmetry detection, video style transfer, motion retargeting, referring image matting.
Referring Image Matting (Expression-based)
Referring Image Matting (Keyword-based)
Referring Image Matting (RefMatte-RW100)
Referring image matting (prompt-based).
hand-object pose
Robot pose estimation, 3d point cloud linear classification, crop yield prediction, image quality estimation.
Material Recognition
Road damage detection.
Document Shadow Removal
Space-time video super-resolution, traffic sign detection, video matting.
Human Interaction Recognition
One-shot 3d action recognition, mutual gaze, affordance detection.
Hand Detection
Image similarity search.
Multiview Learning
Person recognition.
Precipitation Forecasting
Inverse tone mapping, image/document clustering, self-organized clustering, 3d shape modeling.
Action Analysis
Facial editing.
Holdout Set
Image forgery detection, image instance retrieval, amodal instance segmentation, material classification.
Open Vocabulary Attribute Detection
Referring expression generation, instance search.
Audio Fingerprint
Open-World Semi-Supervised Learning
Semi-supervised image classification (cold start).
3D Object Reconstruction From A Single Image
Art analysis, event segmentation, generic event boundary detection, food recognition.
Gaze Prediction
Image-variation, point cloud super resolution, semi-supervised instance segmentation, skills assessment.
Sensor Modeling
Video segmentation, camera shot boundary detection, open-vocabulary video segmentation, open-world video segmentation, lung nodule classification, lung nodule 3d classification, lung nodule detection, lung nodule 3d detection, video prediction, earth surface forecasting, predict future video frames, 3d scene reconstruction, handwriting generation, image retouching, motion magnification, scene change detection.
Sketch-to-Image Translation
Skills evaluation, highlight removal, 3d shape reconstruction from a single 2d image.
Shape from Texture
Handwriting verification, bangla spelling error correction, birds eye view object detection.
Zero-Shot Composed Image Retrieval (ZS-CIR)
JPEG Artifact Removal
Multispectral object detection, pose retrieval, rgb-d reconstruction, scanpath prediction, seeing beyond the visible, deception detection, deception detection in videos, constrained lip-synchronization, face dubbing.
Video Visual Relation Detection
Human-object relationship detection, 3d shape reconstruction, 3d shape representation.
3D Dense Shape Correspondence
Audio-visual synchronization, image manipulation localization, kinship verification, medical image enhancement, multiple people tracking.
Network Interpretation
Semi-supervised domain generalization, single-object discovery, training-free 3d point cloud classification, unsupervised semantic segmentation.
Unsupervised Semantic Segmentation with Language-image Pre-training
Binary classification, cancer-no cancer per image classification, llm-generated text detection, cancer-no cancer per breast classification, suspicous (birads 4,5)-no suspicous (birads 1,2,3) per image classification, cancer-no cancer per view classification.
Sequential Place Recognition
Autonomous flight (dense forest), multimodal machine translation.
Face to Face Translation
Multimodal lexical translation, multiple object tracking with transformer.
Multiple Object Track and Segmentation
10-shot image generation, bokeh effect rendering, drivable area detection, face anonymization, font recognition, horizon line estimation, image imputation.
Instance Shadow Detection
Long video retrieval (background removed), medical image denoising.
Occlusion Estimation
Open vocabulary panoptic segmentation, physiological computing.
Lake Ice Monitoring
Short-term object interaction anticipation, spatio-temporal video grounding, unsupervised 3d point cloud linear evaluation, video forensics, wireframe parsing, single-image-generation, unsupervised anomaly detection with specified settings -- 30% anomaly, root cause ranking, anomaly detection at 30% anomaly, anomaly detection at various anomaly percentages.
Unsupervised Contextual Anomaly Detection
Facial expression recognition, cross-domain facial expression recognition, zero-shot facial expression recognition, landmark tracking, muscle tendon junction identification, 2d semantic segmentation task 3 (25 classes), document enhancement, 3d scene editing, action assessment, ad-hoc video search, defocus blur detection, event data classification, generalized referring expression comprehension, image deblocking, motion disentanglement, personality trait recognition, synthetic image detection, traffic accident detection, accident anticipation, unsupervised landmark detection, visual speech recognition, lip to speech synthesis, gaze redirection, 2d pose estimation, category-agnostic pose estimation, overlapping pose estimation, weakly supervised action segmentation (transcript), weakly supervised action segmentation (action set)), calving front delineation in synthetic aperture radar imagery, calving front delineation in synthetic aperture radar imagery with fixed training amount.
Handwritten Line Segmentation
Handwritten word segmentation, handwritten text recognition, handwritten document recognition, unsupervised text recognition.
General Action Video Anomaly Detection
Physical video anomaly detection, monocular cross-view road scene parsing(road), monocular cross-view road scene parsing(vehicle).
Transparent Object Depth Estimation
3d open-vocabulary instance segmentation.
4D Panoptic Segmentation
Animated gif generation, historical color image dating, stochastic human motion prediction, image retargeting, image and video forgery detection, infrared image super-resolution, motion captioning, personalized segmentation, persuasion strategies, scene-aware dialogue, spatial relation recognition, spatial token mixer, steganographics, story continuation.
Unsupervised Anomaly Detection with Specified Settings -- 0.1% anomaly
Unsupervised anomaly detection with specified settings -- 1% anomaly, unsupervised anomaly detection with specified settings -- 10% anomaly, unsupervised anomaly detection with specified settings -- 20% anomaly, vehicle speed estimation, visual social relationship recognition, zero-shot text-to-video generation, continual anomaly detection, continual semantic segmentation, overlapped 5-3, overlapped 25-25, evolving domain generalization, source-free domain generalization, micro-expression generation, micro-expression generation (megc2021), unsupervised panoptic segmentation, unsupervised zero-shot panoptic segmentation, 3d rotation estimation, 3d semantic occupancy prediction, camera auto-calibration, data ablation, defocus estimation, derendering.
Occluded Face Detection
Fingertip detection, gait identification, human-object interaction concept discovery, image comprehension, speaker-specific lip to speech synthesis, multi-person pose estimation, neural stylization.
Part-aware Panoptic Segmentation
Population Mapping
Pornography detection, raw reconstruction, semi-supervised video classification, spectrum cartography, synthetic image attribution, training-free 3d part segmentation, unsupervised image decomposition, video propagation, visual analogies, explanatory visual question answering, weakly supervised 3d point cloud segmentation, weakly-supervised panoptic segmentation, drone-based object tracking, text-guided-generation, brain visual reconstruction, brain visual reconstruction from fmri, fashion understanding, semi-supervised fashion compatibility.
intensity image denoising
Lifetime image denoising, observation completion, active observation completion, boundary grounding.
Video Narrative Grounding
3d inpainting, 4d spatio temporal semantic segmentation.
Age Estimation
Few-shot Age Estimation
Age and gender estimation, brdf estimation, camouflage segmentation, clothing attribute recognition, depth image estimation, detecting shadows, dynamic texture recognition.
Disguised Face Verification
Few shot open set object detection, generalized zero-shot learning - unseen, hd semantic map learning, human-object interaction anticipation, image deep networks, keypoint detection and image matching, manufacturing quality control, materials imaging, multi-person pose estimation and tracking.
Multi-modal image segmentation
Multi-object discovery, neural radiance caching.
Parking Space Occupancy
Partial Video Copy Detection
Multimodal Patch Matching
Perpetual view generation, prediction of occupancy grid maps, procedure learning, prompt-driven zero-shot domain adaptation, repetitive action counting, svbrdf estimation, single-shot hdr reconstruction, on-the-fly sketch based image retrieval, thermal image denoising, trademark retrieval, unsupervised instance segmentation, unsupervised zero-shot instance segmentation, vehicle key-point and orientation estimation.
Video-Adverb Retrieval (Unseen Compositions)
Video-to-image affordance grounding.
Visual Sentiment Prediction
Human-scene contact detection, localization in video forgery, 3d canonicalization.
Cube Engraving Classification
3d scene graph alignment, 3d surface generation.
Visibility Estimation from Point Cloud
Amodal layout estimation, blink estimation, camera absolute pose regression, change data generation.
Image-Guided Composition
Constrained diffeomorphic image registration, continuous affect estimation, deep feature inversion, document image skew estimation, earthquake prediction, fashion compatibility learning.
Displaced People Recognition
Finger vein recognition, flooded building segmentation.
Future Hand Prediction
Gaze target estimation, house generation, human fmri response prediction, hurricane forecasting, ifc entity classification, image declipping, image similarity detection.
One-Shot Face Stylization
Image text removal, image-to-gps verification.
Image-based Automatic Meter Reading
Dial meter reading, indoor scene reconstruction, jpeg decompression.
Kiss Detection
Laminar-turbulent flow localisation.
Landmark Recognition
Brain landmark detection, corpus video moment retrieval, mllm evaluation: aesthetics, medical image deblurring, mental workload estimation, meter reading, micro-gesture recognition, mistake detection, motion expressions guided video segmentation, natural image orientation angle detection, multi-object colocalization, multilingual text-to-image generation, video emotion detection, nwp post-processing, occluded 3d object symmetry detection, open set video captioning, open vocabulary semantic segmentation, zero-guidance segmentation, pso-convnets dynamics 1, pso-convnets dynamics 2, partial point cloud matching.
Partially View-aligned Multi-view Learning
Pedestrian Detection
Thermal Infrared Pedestrian Detection
Personality trait recognition by face, physical attribute prediction, point cloud semantic completion, point cloud classification dataset, point- of-no-return (pnr) temporal localization, pose contrastive learning, potrait generation, prostate zones segmentation, pulmorary vessel segmentation, pulmonary artery–vein classification, reference expression generation, safety perception recognition, interspecies facial keypoint transfer, specular reflection mitigation, specular segmentation, state change object detection, surface normals estimation from point clouds, transform a video into a comics, transparency separation, typeface completion.
Unbalanced Segmentation
Unsupervised Long Term Person Re-Identification
Video correspondence flow, video frame interpolation.
eXtreme-Video-Frame-Interpolation
Video individual counting.
Key-Frame-based Video Super-Resolution (K = 15)
Yield mapping in apple orchards, lidar absolute pose regression, opd: single-view 3d openable part detection, self-supervised scene text recognition, video narration captioning, spectral estimation, spectral estimation from a single rgb image, 3d prostate segmentation, aggregate xview3 metric, atomic action recognition, composite action recognition, calving front delineation from synthetic aperture radar imagery, computer vision transduction, crosslingual text-to-image generation, damaged building detection, zero-shot dense video captioning, document to image conversion, frame duplication detection, geometrical view, hyperview challenge.
Image Operation Chain Detection
Kinematic based workflow recognition, logo recognition.
MLLM Aesthetic Evaluation
Motion detection in non-stationary scenes, open-set video tagging, segmentation based workflow recognition, small object detection.
Rice Grain Disease Detection
Sperm morphology classification, video & kinematic base workflow recognition, video based workflow recognition, video, kinematic & segmentation base workflow recognition, animal pose estimation.
International Journal of Computer Vision
International Journal of Computer Vision (IJCV) details the science and engineering of this rapidly growing field. Regular articles present major technical advances of broad general interest. Survey articles offer critical reviews of the state of the art and/or tutorial presentations of pertinent topics.
Coverage includes:
- Mathematical, physical and computational aspects of computer vision: image formation, processing, analysis, and interpretation; machine learning techniques; statistical approaches; sensors.
- Applications: image-based rendering, computer graphics, robotics, photo interpretation, image retrieval, video analysis and annotation, multi-media, and more.
- Connections with human perception: computational and architectural aspects of human vision.
The journal also features book reviews, position papers, editorials by leading scientific figures, as well as additional on-line material, such as still images, video sequences, data sets, and software. Please note: the median time indicated below is computed over all the submitted manuscripts including the ones that are not put into the review pipeline at the onset of the review process. The typical time to first decision for manuscripts is approximately 96 days.
- Yasuyuki Matsushita,
- Jiri Matas,
- Svetlana Lazebnik
Latest issue
Volume 132, Issue 3
Latest articles
Robust heterogeneous model fitting for multi-source image correspondences.
- Shuyuan Lin
- Feiran Huang
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
- Zhi-Song Liu
- Robin Courant
- Vicky Kalogeiton
Learning to Generalize over Subpartitions for Heterogeneity-Aware Domain Adaptive Nuclei Segmentation
- Dongnan Liu
- Weidong Cai
UniMod1K: Towards a More Universal Large-Scale Dataset and Benchmark for Multi-modal Learning
- Xue-Feng Zhu
- Tianyang Xu
- Josef Kittler
Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion
- Gongjie Zhang
- Zhipeng Luo
- Eric P. Xing
Journal updates
Special issue guidelines.
Guidelines for IJCV special issue papers and proposals
Call for Papers: Special Issue on Biometrics Security and Privacy
Guest editors: Jun Wan, Sergio Escalera, Arun Ross, Philip Torr Submission deadline: extended to 15 September 2023
Call for Papers: Special Issue on Open-World Visual Recognition
Guest editors: Zhun Zhong, Hong Liu, Yin Cui, Shin'ichi Satoh, Nicu Sebe, Ming-Hsuan Yang Submission deadline: extended to 15 December 2023
Call for Papers: Special Issue on Computer Vision Approaches for Animal Tracking and Modeling 2023
Guest editors: Anna Zamansky, Helge Rhodin, Silvia Zuffi, Hyun Soo Park, Sara Beery, Angjoo Kanazawa, Shohei Nobuhara Submission deadline: 31 August 2023
Journal information
- ACM Digital Library
- Current Contents/Engineering, Computing and Technology
- EI Compendex
- Google Scholar
- Japanese Science and Technology Agency (JST)
- Norwegian Register for Scientific Journals and Series
- OCLC WorldCat Discovery Service
- Science Citation Index Expanded (SCIE)
- TD Net Discovery Service
- UGC-CARE List (India)
Rights and permissions
Springer policies
© Springer Science+Business Media, LLC, part of Springer Nature
- Find a journal
- Publish with us
- Track your research
Survey of Internet of Things Applications using Raspberry Pi and Computer Vision
Ieee account.
- Change Username/Password
- Update Address
Purchase Details
- Payment Options
- Order History
- View Purchased Documents
Profile Information
- Communications Preferences
- Profession and Education
- Technical Interests
- US & Canada: +1 800 678 4333
- Worldwide: +1 732 981 0060
- Contact & Support
- About IEEE Xplore
- Accessibility
- Terms of Use
- Nondiscrimination Policy
- Privacy & Opting Out of Cookies
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.
IEEE Transactions on Games
Special Issue on Computer Vision and Games
Video Games and Computer Vision research have long held a symbiotic relationship. On the one hand, virtual worlds in games are often used for collecting training data or as testbeds for computer vision models since they provide a greater deal of flexibility, control and scalability in the data collection process compared to the real world. On the other hand, computer vision advancements have enabled us to push the frontiers of what is possible within these artificial game worlds and have transformed the processes with which these worlds are created. However, significant research questions still remain unaddressed both in the field (Computer Vision) and the domain (Games), which include technical and engineering challenges.
This special issue invites research papers aiming to bridge the existing gaps between computer vision research and games engineering, with the motive of bringing together the games research community and the computer vision community that have largely operated independently until now. We are inviting papers for two main tracks. The first track focuses on introducing novel techniques within computer vision research that can advance the field of digital games. The second track, instead, focuses on leveraging game technologies to advance state-of-the-art techniques in computer vision. The list of topics below is not inclusive of all research directions that will be represented.
- 1) Computer Vision for Games
- CV for game-playing, game testing and player modelling.
- Data-driven CV to improve game graphics, animations, level-design, etc. as well as procedural content generation.
- HCI through visual interfaces (gestures, posture, gaze, etc.).
- Extended reality games.
- Synthetic data and media generation based on users' emotions, behaviour, etc.
- Improving real-time applicability of vision models integrated within games and game engines.
- 2) Games for Computer Vision
- Game worlds that aid data augmentation techniques.
- Rich game-based labelled datasets for tasks such as object detection, segmentation, or depth and flow estimation.
- Ethics of game-based data collection and inference.
- Forward modelling in and for games.
- Generalisation and robustness in vision models leveraging a plethora of existing commercial games.
- Unsupervised pre-training of image/video representations and world transition models from gameplay data.
We invite the submission of high quality papers on the topics above in the full paper format. Authors should follow normal IEEE Transactions on Games guidelines for their submissions, but clearly identify their papers for this special issue during the submission process. Extended versions of previously published conference or workshop papers are welcome, provided that the journal paper is a significant extension, and is accompanied by a cover letter explaining the additional contribution. You may visit the submission guidelines for author information guidelines and page length limits.
- Important Dates:
- Paper submission: January 31, 2024
- First decisions: May 31st, 2024
- Early access SI publication (online): August 2024
- Publication in print: End 2024
- Guest Editors:
- Chintan Trivedi (University of Malta)
- Matthew Guzdial (University of Alberta)
- Konstantinos Makantasis (University of Malta)
- Julian Togelius (New York University)
- Nicu Sebe (University of Trento)
computer vision Recently Published Documents
Total documents.
- Latest Documents
- Most Cited Documents
- Contributed Authors
- Related Sources
- Related Keywords
2D Computer Vision
A survey on generative adversarial networks: variants, applications, and training.
The Generative Models have gained considerable attention in unsupervised learning via a new and practical framework called Generative Adversarial Networks (GAN) due to their outstanding data generation capability. Many GAN models have been proposed, and several practical applications have emerged in various domains of computer vision and machine learning. Despite GANs excellent success, there are still obstacles to stable training. The problems are Nash equilibrium, internal covariate shift, mode collapse, vanishing gradient, and lack of proper evaluation metrics. Therefore, stable training is a crucial issue in different applications for the success of GANs. Herein, we survey several training solutions proposed by different researchers to stabilize GAN training. We discuss (I) the original GAN model and its modified versions, (II) a detailed analysis of various GAN applications in different domains, and (III) a detailed study about the various GAN training obstacles as well as training solutions. Finally, we reveal several issues as well as research outlines to the topic.
Efficient Channel Attention Based Encoder–Decoder Approach for Image Captioning in Hindi
Image captioning refers to the process of generating a textual description that describes objects and activities present in a given image. It connects two fields of artificial intelligence, computer vision, and natural language processing. Computer vision and natural language processing deal with image understanding and language modeling, respectively. In the existing literature, most of the works have been carried out for image captioning in the English language. This article presents a novel method for image captioning in the Hindi language using encoder–decoder based deep learning architecture with efficient channel attention. The key contribution of this work is the deployment of an efficient channel attention mechanism with bahdanau attention and a gated recurrent unit for developing an image captioning model in the Hindi language. Color images usually consist of three channels, namely red, green, and blue. The channel attention mechanism focuses on an image’s important channel while performing the convolution, which is basically to assign higher importance to specific channels over others. The channel attention mechanism has been shown to have great potential for improving the efficiency of deep convolution neural networks (CNNs). The proposed encoder–decoder architecture utilizes the recently introduced ECA-NET CNN to integrate the channel attention mechanism. Hindi is the fourth most spoken language globally, widely spoken in India and South Asia; it is India’s official language. By translating the well-known MSCOCO dataset from English to Hindi, a dataset for image captioning in Hindi is manually created. The efficiency of the proposed method is compared with other baselines in terms of Bilingual Evaluation Understudy (BLEU) scores, and the results obtained illustrate that the method proposed outperforms other baselines. The proposed method has attained improvements of 0.59%, 2.51%, 4.38%, and 3.30% in terms of BLEU-1, BLEU-2, BLEU-3, and BLEU-4 scores, respectively, with respect to the state-of-the-art. Qualities of the generated captions are further assessed manually in terms of adequacy and fluency to illustrate the proposed method’s efficacy.
Feature Matching-based Approaches to Improve the Robustness of Android Visual GUI Testing
In automated Visual GUI Testing (VGT) for Android devices, the available tools often suffer from low robustness to mobile fragmentation, leading to incorrect results when running the same tests on different devices. To soften these issues, we evaluate two feature matching-based approaches for widget detection in VGT scripts, which use, respectively, the complete full-screen snapshot of the application ( Fullscreen ) and the cropped images of its widgets ( Cropped ) as visual locators to match on emulated devices. Our analysis includes validating the portability of different feature-based visual locators over various apps and devices and evaluating their robustness in terms of cross-device portability and correctly executed interactions. We assessed our results through a comparison with two state-of-the-art tools, EyeAutomate and Sikuli. Despite a limited increase in the computational burden, our Fullscreen approach outperformed state-of-the-art tools in terms of correctly identified locators across a wide range of devices and led to a 30% increase in passing tests. Our work shows that VGT tools’ dependability can be improved by bridging the testing and computer vision communities. This connection enables the design of algorithms targeted to domain-specific needs and thus inherently more usable and robust.
Computer vision to recognize construction waste compositions: A novel boundary-aware transformer (BAT) model
Computer vision for autonomous uav flight safety: an overview and a vision-based safe landing pipeline example.
Recent years have seen an unprecedented spread of Unmanned Aerial Vehicles (UAVs, or “drones”), which are highly useful for both civilian and military applications. Flight safety is a crucial issue in UAV navigation, having to ensure accurate compliance with recently legislated rules and regulations. The emerging use of autonomous drones and UAV swarms raises additional issues, making it necessary to transfuse safety- and regulations-awareness to relevant algorithms and architectures. Computer vision plays a pivotal role in such autonomous functionalities. Although the main aspects of autonomous UAV technologies (e.g., path planning, navigation control, landing control, mapping and localization, target detection/tracking) are already mature and well-covered, ensuring safe flying in the vicinity of crowds, avoidance of passing over persons, or guaranteed emergency landing capabilities in case of malfunctions, are generally treated as an afterthought when designing autonomous UAV platforms for unstructured environments. This fact is reflected in the fragmentary coverage of the above issues in current literature. This overview attempts to remedy this situation, from the point of view of computer vision. It examines the field from multiple aspects, including regulations across the world and relevant current technologies. Finally, since very few attempts have been made so far towards a complete UAV safety flight and landing pipeline, an example computer vision-based UAV flight safety pipeline is introduced, taking into account all issues present in current autonomous drones. The content is relevant to any kind of autonomous drone flight (e.g., for movie/TV production, news-gathering, search and rescue, surveillance, inspection, mapping, wildlife monitoring, crowd monitoring/management), making this a topic of broad interest.
Automatic recognition and classification of microseismic waveforms based on computer vision
Promises and pitfalls of using computer vision to make inferences about landscape preferences: evidence from an urban-proximate park system, weight-sharing neural architecture search: a battle to shrink the optimization gap.
Neural architecture search (NAS) has attracted increasing attention. In recent years, individual search methods have been replaced by weight-sharing search methods for higher search efficiency, but the latter methods often suffer lower instability. This article provides a literature review on these methods and owes this issue to the optimization gap . From this perspective, we summarize existing approaches into several categories according to their efforts in bridging the gap, and we analyze both advantages and disadvantages of these methodologies. Finally, we share our opinions on the future directions of NAS and AutoML. Due to the expertise of the authors, this article mainly focuses on the application of NAS to computer vision problems.
Assessing surface drainage conditions at the street and neighborhood scale: A computer vision and flow direction method applied to lidar data
Export citation format, share document.
IMAGES
VIDEO
COMMENTS
This paper first reviews the main ideas of deep learning, and displays several related frequently-used algorithms for computer vision. Afterwards, the current research status of computer vision field is demonstrated in this paper, particularly the main applications of deep learning in the research field.
Based on the current commonly used method of computer vision technology-deep learning, this paper outlines the development of deep learning models, and determines the inflection point of the development of the introduction of convolutional neural networks.
Computer vision Literature review Code metadata Permanent link to reproducible Capsule: https://doi.org/10.24433/CO.0411648.v1. 1. Introduction Deep learning (DL), a prevailing branch of artificial intelligence (AI), has been extended with diversified network structures.
Computer Vision Based Object Detection and Recognition System for Image Searching Abstract: Computer Vision is a concept which works with the methods for automatic extraction, analysis and understanding of useful information from a single image or a sequence of images.
Abstract: The author provides a general introduction to computer vision. He discusses basic techniques and computer implementations, and also indicates areas in which further research is needed. He focuses on two-dimensional object recognition, i.e. recognition of an object whose spatial orientation, relative to the viewing direction is known.< >
Abstract: The field of computer vision is rapidly evolving. Pictures and videos can be obtained and processed to model, duplicate, and occasionally introduce additional visuals to complete valuable tasks. This Paper outlines a method for gathering, refining, and comprehending video and images.
These CVPR 2021 papers are the Open Access versions, provided by the Computer Vision Foundation. Except for the watermark, they are identical to the accepted versions; the final published version of the proceedings is available on IEEE Xplore. This material is presented to ensure timely dissemination of scholarly and technical work.
Abstract: In order to study artificial intelligence-based computer vision imaging, computer technology was used to efficiently and accurately obtain relevant information from environmental images or videos. Things and phenomena in the objective world were analyzed, judged, and decided.
Reviews Released: 24 January 2023 Rebuttal Period: 24-31 January 2023 Final Decisions: 27 February 2023 Papers in the main technical program must describe high-quality, original research. Topics of interest include all aspects of computer vision and pattern recognition including, but not limited to: 3D from multi-view and sensors
Evolutionary computer vision (ECV) is at the intersection of two major research fields of artificial intelligence: computer vision and evolutionary computation. This special issue aims to provide an overview of state-of-the-art contributions to the latest research and development in the discipline.
First coming into discovery with Seymour Papert's Summer Vision Project of 1966, computer vision has been in development for decades, improving all along the way and creating new possibilities for everyone. Though complex, the process of these systems can be broken down into four fundamental steps:
Starting from computer vision algorithms and image processing technologies, the computer vision display system is designed, and image distortion correction algorithms are explored for reference. Published in: 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL) Article #: Date of Conference: 10-12 July 2020
These WACV 2023 papers are the Open Access versions, provided by the Computer Vision Foundation. ... {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {4799-4808} } Self Supervised Low Dose Computed Tomography Image Denoising Using Invertible Network Exploiting ...
José-M. Acosta-Triana, David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos. Comments: Accepted at the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING) Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is ...
These research papers are the Open Access versions, provided by the Computer Vision Foundation. Except for the watermark, they are identical to the accepted versions; the final published version of the proceedings is available on IEEE Xplore.
Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets. ... Or, discuss a change on Slack. Browse SoTA > Computer Vision Computer Vision. 4450 benchmarks • 1365 tasks • 2844 datasets • 42384 papers with code Semantic Segmentation Semantic Segmentation. 287 benchmarks ...
IET Computer Vision. IET Computer Vision is a fully open access journal that introduces new horizons and sets the agenda for future avenues of research in a wide range of areas of computer vision. We are a fully open access journal that welcomes research articles reporting novel methodologies and significant results of interest.
International Journal of Computer Vision (IJCV) details the science and engineering of this rapidly growing field. Regular articles present major technical advances of broad general interest. Survey articles offer critical reviews of the state of the art and/or tutorial presentations of pertinent topics. Coverage includes:
The machine learning and computer vision research is still evolving [1]. Computer vision is an essential part of Internet of Things, Industrial Internet of Things, and brain human interfaces. The complex human activities are recognized and monitored in multimedia streams using machine learning and computer vison.
This paper gives a survey of Internet of Things (IoT) solutions using Raspberry Pi (RPi) Single Board Computer (SBC) and methods of Artificial Intelligence (AI) area - Computer Vision (CV). Solutions for several areas of IoT applications are presented and compared. An overview of the used hardware, software, CV methods, and CV algorithms are given.
This special issue invites research papers aiming to bridge the existing gaps between computer vision research and games engineering, with the motive of bringing together the games research community and the computer vision community that have largely operated independently until now. We are inviting papers for two main tracks.
Statistics of keywords of different manufacturing stages in computer vision research papers from 1970 to 2020. ... 2 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS.
computer vision Latest Research Papers | ScienceGate computer vision Recently Published Documents TOTAL DOCUMENTS 11111 (FIVE YEARS 4372) H-INDEX 109 (FIVE YEARS 19) Latest Documents Most Cited Documents Contributed Authors Related Sources Related Keywords 2D Computer Vision 10.1142/12497 2022 Author (s): Yu-Jin Zhang Keyword (s):