Most of the open problems related to CV lie more in the AI plus CV realm.
Weather and lighting: For cameras, weather and lighting are the obvious major issues. Many of the capabilities that are possible during day aren’t available during night. Similarly, visual landmark localization just fails in snow. These are important open problems.
Unexpected situations: How to respond to something unexpected is a big open problem. What happens if an infant starts crawling in front of your car? Your object detectors have never been trained on humans in that pose, and will likely fail at recognition. The size of the “obstacle” is too small to register as something to avoid to any depth sensor! How do you understand when construction work is happening in two lanes of the road? Can we construct abstract representations that help deal when such unexpected situation inevitably occur? One example is to explicitly reason about drive-able regions ahead, instead of relying entirely on traffic participant detection — but there must be more effective ones.
Infra-structure parsing: As computer vision deals with more than just image processing, I include problems such as infra-structure parsing in computer vision too. On a complex intersection, which traffic light belongs to which lane? Is it okay to drive into a big texture-less wall (or for that matter a big bus) that’s next to me? Where are the guardrails? Where does the road end and side-walk begin? How do I parse multiple complicated exit ramps on a highway?
Human understanding: Another related challenge is open world human understanding. How does my system understand when a construction worker is controlling the flow of traffic along a patch of the road, and signalling me to shift to the opposite side of the road for these 20 meters? How do I parse a traffic policeman hand waving me to keep driving while the (broken?) traffic light is red! How do I understand the hand or face gestures of a driver sitting in another car, allowing me to take the left turn at an intersection? Will she even perform those gestures when she’s not seeing any driver on my driving seat is another problem! Will the pedestrian standing on the sidewalk jump onto the road all of a sudden?
Aggressive behavior: This ties in to human understanding, but how do you understand driver intent on a round about or on an on-ramp to a highway?
Less controlled settings: What happens when there aren’t any lane markings? Sure, there are demos that predict lanes in such situations, but how robust are they?
Blind spot tracking: How do I keep track of a pedestrian who went out of my sight because there’s a bus next to me, and may or may not appear in front of me all of a sudden? Even visible object tracking in nice sunny weather doesn’t work robustly at the moment.
Obviously, all self-driving projects so far are predicated on highly controlled settings that atleast include the concept of lanes. That concept doesn’t exist on the roads of many developing countries.
Currently the most important challenge faced by autonomous car in computer vision is how to run most of the algorithm in real time and that too in complex and cluttered environment. Making a vision model which can achieve high accuracy in no more a problem today because of deep learning but making it work in real time that too in more complex environments like in Indian roads is.
Today computer vision algorithms for autonomous vehicles are getting a direct competition from LiDAR because LiDAR can solve most of the problems which is much complex for vision algorithms and at at same time can also help vehicles with mapping and localizing.
Vision fails in SLAM thus which is why its usage in autonomous vehicle is now limited for lane marking and generation and to make low cost ADAS systems.
Most of the open problems related to CV lie more in the AI plus CV realm.
Weather and lighting: For cameras, weather and lighting are the obvious major issues. Many of the capabilities that are possible during day aren’t available during night. Similarly, visual landmark localization just fails in snow. These are important open problems.
Unexpected situations: How to respond to something unexpected is a big open problem. What happens if an infant starts crawling in front of your car? Your object detectors have never been trained on humans in that pose, and will likely fail at recognition. The size of the “obstacle” is too small to register as something to avoid to any depth sensor! How do you understand when construction work is happening in two lanes of the road? Can we construct abstract representations that help deal when such unexpected situation inevitably occur? One example is to explicitly reason about drive-able regions ahead, instead of relying entirely on traffic participant detection — but there must be more effective ones.
Infra-structure parsing: As computer vision deals with more than just image processing, I include problems such as infra-structure parsing in computer vision too. On a complex intersection, which traffic light belongs to which lane? Is it okay to drive into a big texture-less wall (or for that matter a big bus) that’s next to me? Where are the guardrails? Where does the road end and side-walk begin? How do I parse multiple complicated exit ramps on a highway?
Human understanding: Another related challenge is open world human understanding. How does my system understand when a construction worker is controlling the flow of traffic along a patch of the road, and signalling me to shift to the opposite side of the road for these 20 meters? How do I parse a traffic policeman hand waving me to keep driving while the (broken?) traffic light is red! How do I understand the hand or face gestures of a driver sitting in another car, allowing me to take the left turn at an intersection? Will she even perform those gestures when she’s not seeing any driver on my driving seat is another problem! Will the pedestrian standing on the sidewalk jump onto the road all of a sudden?
Aggressive behavior: This ties in to human understanding, but how do you understand driver intent on a round about or on an on-ramp to a highway?
Less controlled settings: What happens when there aren’t any lane markings? Sure, there are demos that predict lanes in such situations, but how robust are they?
Blind spot tracking: How do I keep track of a pedestrian who went out of my sight because there’s a bus next to me, and may or may not appear in front of me all of a sudden? Even visible object tracking in nice sunny weather doesn’t work robustly at the moment.
Obviously, all self-driving projects so far are predicated on highly controlled settings that atleast include the concept of lanes. That concept doesn’t exist on the roads of many developing countries.
Currently the most important challenge faced by autonomous car in computer vision is how to run most of the algorithm in real time and that too in complex and cluttered environment. Making a vision model which can achieve high accuracy in no more a problem today because of deep learning but making it work in real time that too in more complex environments like in Indian roads is.
Today computer vision algorithms for autonomous vehicles are getting a direct competition from LiDAR because LiDAR can solve most of the problems which is much complex for vision algorithms and at at same time can also help vehicles with mapping and localizing.
Vision fails in SLAM thus which is why its usage in autonomous vehicle is now limited for lane marking and generation and to make low cost ADAS systems.