A Crash Course in Computer Vision

08.03.187 min read

In 1966, Artificial Intelligence pioneer Marvin Minsky — co-founder of MIT’s Artificial Intelligence laboratory — told one of his graduate students to ‘connect a camera to a computer and have it describe what it sees.’ In the fifty years following, computers learned to count and read but still weren’t able to see — until now. Today, the field of computer vision is rapidly flourishing, holding vast potential to alleviate everything from healthcare disparities to mobility limitations on a global scale.

Although computer vision is defined as ‘the scientific discipline of giving machines the ability of sight,’ in the past decade or so it’s transformed into a practice that integrates and takes advantage of developments within artificial intelligence. As a result, computer vision today isn’t about ‘seeing’ as much as it is about machines being able to understand and contextualise situations by learning to identify objects on mass scales.

On a practical level, that could lead to machines managing certain aspects of healthcare in developing countries — which would fill the current gap of trained medical professionals those areas are experiencing. Or it could lead to NGOs being able to gather evidence faster, enabling them to hold governments and corporations accountable to treating people humanely worldwide. Computer vision could even help the visually impaired regain some ability to see by leveraging machine learning through wearables to identify objects. So despite still being in what researchers call an ‘embryonic state’ in the field, this developing function in contemporary computer vision — the ability to combine machine vision with cognition — is considered by researchers to be a key stepping stone to A.I. and a potentially revolutionary development.

From healthcare to high finance

So far, computer vision drives healthcare startups to attempt to alleviate issues like human error in surgery or diagnosis. There’s Gauss Surgical, for example, who are building a real-time blood monitor to eradicate inaccurate blood loss measurement during surgery. Or in another case, Microsoft’s InnerEye is working on a tool that analyses 3D radiological images to diagnose patients with more accuracy and agility than medical professionals. (Indeed, recent tests have found that computer vision identifies cancer cells better than humans do.) In fact, these developments have led some researchers to see computer vision as a way to significantly increase healthcare access in developing countries, where medical professionals tend to be lacking.

Meanwhile in mobility, Tesla released a semi-autonomous technology called Autopilot that can be integrated into your vehicle for $5000; while it isn’t precise enough to let drivers completely take their hands off the wheel, computer vision and A.I. enables Autopilot to do things like tell the difference between hard and soft objects, use radar to adapt to weather conditions and adjust its speed and switch lanes based on other surrounding cars. Computer vision is even making its way into banking: in particular, Chinese startup SenseTime has been setting the standard for using deep learning to enable facial recognition and payment processing (so much so that they’ve partnered with MIT for further A.I. research.)

An Eye for Retail

In 2018, Amazon opened up Amazon Go — a partially automated store that has replaced checkout stations and cashiers with computer vision, deep learning and sensor fusion, which unite to track what you pick up and instantly charge you once you leave the store. But perhaps most practically for smartphone owners, computer vision explains why retailers such as ASOS are increasingly using augmented reality — a technology that places virtual objects into real environments using computer vision — as a method to make shopping more personalised and immersive on your phone and in store. But the potential of augmented reality extends beyond simply taking a picture of that bike you passed by while walking to work to see if it’s available for purchase online: it could very well ‘reinvent the way people interact with computers and reality’ by positioning virtual objects as a natural part of the world and effectively ‘mixing’ reality.

Or that’s what elusive AR startup Magic Leap have been working on, anyway: in 2018, they revealed a mixed reality headset called Lightwear which injects life-like, moving and reactive objects and people into the wearer’s view of the real world. While we’re a bit far away from everyone we know walking around with AR headsets, it’s already taking off within popular culture (Icelandic band Sigur Ros collaborated with Magic Leap to create an ‘interactive soundscape.’)

I spy with my little eye

But on the other, perhaps most murky end of the spectrum, computer vision is quietly transforming public security. For example, B2B platforms like ella are leveraging computer vision to enable surveillance cameras to do things like recognise individual people and locations as well as increasingly analyse situations. And we’ve seen how swiftly governments have adopted these types of developments to potentially sinister purposes: computer vision has enabled everything from widespread facial recognition of civilians to more efficient military drone operations, leading to what the Verge calls ‘total surveillance states’.

A Visionary Future?

Indeed, many diverse industries actively benefit from developments in computer vision, but the same yet-to-be-answered questions linger around all of them. Will computer vision live up to its supposed promise of making life safer for people all over the world? Will we develop systems for alleviating privacy concerns as computer vision develops — or will the speed of advancements in the field lead to ‘surveillance states’ and increasing eradication of individual agency? And more tangibly, will augmented reality offer us the option of playing with our daily experiences when we want to — or is it, as one researcher puts it, ‘the death of reality’?

I, Robot

Although those ethical questions hover above the field and will probably increasingly do as computer vision becomes a clearer influence on our lives, we can scratch out the perception that we’re teaching computers to see and think so that one day they’ll be just as human as we are. For already in the 1970s and 1980s, researchers realised it’s practically impossible for a computer to replicate the human eye, visual cortex or brain — the latter of which is arguably the most difficult problem ever attempted.

Instead, recent years have seen the boom of a machine learning technique called Convolutional Neural Networks. It’s true that the process is modelled after the human brain in terms of how it learns; a network of learning units called neurons learn how to convert input signals (a picture of a house, say) into corresponding output signals ( the label ‘house’). But the crux lies in the conditions required for learning: before a computer can even get to identifying an object, it needs to be algorithmically trained by humans to look for what we deem is important in each particular situation. In fact, Benedict Evans puts it quite aptly: computers see and think like computers and not people. They can accomplish far more in terms of analysing data than humans can, but they still do so following parameters that we define for them — at least for now.

A house is not a home

To illustrate what that means, let’s take the example of a house. For a computer to be able to identify a picture of one, it needs to be repeatedly exposed to images of houses from diverse angles and contexts on a mass scale (think millions of times). The more images of houses it sees, the more it learns to recognise what a house is — but humans still need to keep their eye on the algorithm that lets them do so and continuously tweak it so the computers can recognise the house more accurately. More importantly, a computer might be able to identify a house from the sky behind it in a picture — but it won’t necessarily know what a house is used for, or how it differs from a car, or what else falls under the category of home.

What that means is that computers can identify objects at precisions and speeds that far surpass ours — but unlike humans, they can’t contextualise those objects. Their sensors and image processors exceed the human eye’s capabilities, but a computer still can’t achieve even toddler-like levels of recognition and cognition without absurdly large sets of data. Bottom line is, while computers learn much faster than humans do, they lack common sense. That’s why a car can’t be fully self-driving, or a robotically-assisted surgery can’t be 100 percent safe. It’s also why humans are able to bias computers by teaching them what makes a good selfie, or reprogram how they prioritise information to make them ignore important objects and focus on weird, psychedelic-looking stickers instead.

Better together?

Considering all this, Data scientist Jeremy Howard suggests that framing computer vision as a fully autonomous technology is more sensational than it is beneficial. And although the field is currently defined by ethical dilemmas just as much as the promise of a more captivating and efficient reality, Howard’s on to something. For up until now, humans and computers have been working together and complementing each other’s unique skills — an interdependent relationship that inherently develops machines in tandem with human goals and desires, however utopian or malicious those may be. It remains to be seen if and when that will change in the near future.

Appendix

The M Tank’s ‘A Year in Computer Vision’ Report

A deep dive into the nitty-gritty of some recent advances in computer vision.

Fei Fei Li’s ‘How We’re Teaching Computers to Understand Pictures’

A TED talk that makes computer vision digestible and focuses on its utopian possibilities.

How Snapchat’s Filters Work

Demystifying the dog filter, and more.

Dark Interactions Are Invading Our Lives

There’s tech we consciously interact with and there’s the tech we don’t really notice—even thought we should.

All About Face-Controlled Apps

Unpacking the functionality behind your soon-to-be favourite method of procrastination.

The First Decade of Augmented Reality

We’ve come pretty far, and here’s why.

Ten Year Futures

A handy explanation of what advancements in machine learning and computer vision actually mean for you. Also doubles as a reassurance that no, Amazon and Facebook aren’t taking over the world.

These Psychedelic Stickers Blow AI Minds

Looks like even really smart machines can’t resist the allure of some shiny, swirly stickers.

A Look Inside Magic Leap and Their Mixed-Reality Goggles

After much shroudiness, the start up finally let a journalist visit them and try out their product. Here’s why it’s crazy.

Learn More About the World With Google Lens and the Assistant

Already googly-eyed about Google? This is for you. Have your doubts? A decent read all the same.

WTF is Computer Vision?

One of the internet’s decent starting points for understanding the technology.

The Wonderful and Terrifying Implications of Computers That Can Learn

Hint: they’re mostly wonderful, according to this TED talk.

Imaging Snapchat and Mobile

Or, why you need to stop looking at your phone but looking through it.

Artificial Intelligence is Going to Supercharge Surveillance

A most definitely disconcerting look at how computer vision is being used in surveillance—and the questionable implications that bears for society.

What a Deep Neural Network Thinks About Your #Selfie

And if you’re in doubt about the insta-worthiness of your last selfie, you can still ask a neural network. Link in article.

Understanding Neural Networks Through Deep Visualization

Not for the faint of heart, but if you make it through this in-depth explanation we guarantee you’ll be able to explain the basics of how computer vision works.