Editor’s Note: Many TOPBOTS readers have asked us for advice on learning modern techniques like deep learning and getting jobs in AI. No one knows more about this subject than Rachel Thomas, deep learning researcher and co-founder of Fast.ai. Fast.ai is committed to democratizing practical AI education globally and offers popular MOOCs to get you ramped up fast. Without further ado, I’ll let Rachel share her wisdom about starting a career in AI without a machine learning PhD.
I was recently asked questions by two readers with diametrically opposed premises: one was excited that machine learning is now “automated” by services like Google Cloud, the other was concerned that machine learning takes too many years of prerequisite study, citing a popular Hacker News thread as his source.
Many people are having trouble navigating the hype and intimidating “AI is an exclusive tool for geniuses” warnings. AI is a hard topic for journalists to cover and sadly many misrepresentations are spread. Just read this extreme example by Stephen Merity detailing how DeepCoder was misconstrued in the media.
These two opposing sentiments – one believing machine learning is easy, and one considering it impossible – may seem unrelated, but both misconceptions are propagated by those working in the AI industry who have skewed incentives to either:
- Convince you to buy their general purpose machine learning API (none of which have been good for anything other than getting acqui-hired).
- Convince you that what they’re doing is so complicated, hard, and exclusive, that us mere mortals have no chance of understanding their sorcery. (This is such a common theme that recently a reddit parody of it was voted to the top of the machine learning page: A super harsh guide to machine learning)
Advancements in machine learning are coming rapidly, but for now, you need to be able to code to effectively use the technology. We’ve found from our free online course Practical Deep Learning for Coders that it takes about 70 hours of study to become an effective deep learning practitioner.
“Machine Learning As A Service” (MLaaS) Is A Disappointment In Practice
A general purpose machine learning API seems like a great idea, but the technology is simply not there yet. Existing APIs are too overly specified to be widely useful, or attempt to be very general and have unacceptably poor performance. I agree with Bradford Cross, former founder of Flightcaster and Prismatic and partner at Data Collective VC, who recently wrote about the failure of many AI companies to try to build products that customers need and would pay for: “It’s the attitude that those working in and around AI are now responsible for shepherding all human progress just because we’re working on something that matters. This haze of hubris blinds people to the fact that they are stuck in an echo chamber where everyone is talking about the tech trend rather than the customer needs and the economics of the businesses.” (emphasis mine)
Cross continues, “Machine Learning as a Service is an idea we’ve been seeing for nearly 10 years and it’s been failing the whole time. The bottom line on why it doesn’t work: the people that know what they’re doing just use open source, and the people that don’t will not get anything to work, ever, even with APIs. Many very smart friends have fallen into this tarpit. Those who’ve been gobbled up by bigcos as a way to beef up ML teams include Alchemy API by IBM, Saffron by Intel, and Metamind by Salesforce. Nevertheless, the allure of easy money from sticking an ML model up behind an API function doesn’t fail to continue attracting lost souls. Amazon, Google, and Microsoft are all trying to sell an MLaaS layer as a component of their cloud strategy. I’ve yet to see startups or big companies use these APIs in the wild, and I see a lot of AI usage in the wild so its doubtful that its due to the small sample size of my observations.”
Is Google Cloud the answer?
Google is very poorly positioned to help “democratize” the field of deep learning. Not because of bad intentions– it’s just that they have way too many servers, way too much cash, and way too much data to appreciate the challenges the majority of the world faces in how to make the most of limited GPUs, on a limited budget (those AWS bills add up quickly!), and with limited size data sets. Google Brain is so deeply technical they are out of touch with the average coder.
For instance, TensorFlow is a low level language, but Google seemed unaware of this when they released it and in how they marketed it. The designers of TensorFlow could have used a more standard Object-Oriented approach (like the excellent PyTorch), but instead they kept with the fine Google tradition of inventing new conventions just for Google.
So if Google can’t even design a library that is easily usable by sophisticated data scientists, how likely is it that they can create something that regular people can use to solve their real-world problems?
Hacker News Gives Awful Advice
Why do Hacker News contributors regularly give such awful advice on machine learning? While the theory behind machine learning draws on a lot of advanced math, that is very different from the practical knowledge needed to use machine learning in practice. I have a math PhD, yet knowing the math has been less helpful than you might expect in building practical, working models.
The line of thinking espoused in that Hacker News comment is harmful for a number of reasons:
- It’s totally wrong
- Good education motivates the study of underlying concepts. To borrow an analogy from Paul Lockhart’s Mathmatician’s Lament, kids would quit music if you made them study music theory for years before they were ever allowed to sing or touch an instrument
- Good education doesn’t overly complicate the material. If you truly understand something, you can explain it in an accessible way. In Practical Deep Learning for Coders, Jeremy Howard implemented different modern optimization techniques (often considered a complex topic) in Excel to make it clearer how they work.
As I wrote a few months ago, it is “far better to take a domain expert within your organization and teach them deep learning, than it is to take a deep learning expert and throw them into your organization.” Deep learning PhD graduates are unlikely to have the wide range of relevant experiences that you value in your most effective employees. They’re also much more likely to spend time on fun engineering problems instead of keeping a razor-sharp focus on the most important problems for business and society.
In our experiences across many industries and many years of applying machine learning to a range of problems, we’ve consistently seen organizations under-appreciate and under invest in their existing in-house talent. In the days of the big data fad, this meant companies spent their money on external consultants. And in these days of the false ‘deep learning exclusivity’ meme, it means searching for those unicorn deep learning experts, often including paying vastly inflated sums for failing deep learning startups.
Cutting Through Hype When You’re Not A Researcher
Computational linguist Dan Simonson wrote a handy guide of questions for the to ask to evaluate NLP, ML, and AI and sniff out the snake oil:
- Is there existing training data? If not, how do they plan on getting it?
- Do they have an evaluation procedure built into their application development process?
- Does their proposed application rely on unprecedentedly high performance on specific AI components?
- Do the proposed solutions rely on attested, reliable phenomena?
- If using pre-packaged AI components, do they have a clear plan on how they will go from using those components to having meaningful application output?
As an NLP researcher, Simonson is excited about the current advances in AI, but points out that the whole field is harmed when people exploit the gap in knowledge between practiotioners and the public.
Deep learning researcher Stephen Merity has an aptly titled post called “It’s ML, not magic: simple questions you should ask to help reduce AI hype.” His questions include:
- How much training data is required?
- Can this work unsupervised (= without labelling the examples)?
- Can the system predict out of vocabulary names? (i.e. Imagine if I said “My friend Rudinyard was mean to me” – many AI systems would never be able to answer “Who was mean to me?” as Rudinyard is out of its vocabulary)
- How much does the accuracy fall as the input story gets longer?
- How stable is the model’s performance over time?
Merity also reminds us that models are often evaluated on highly processed, contrived, or limited datasets that don’t accurately reflect the real data you are working with.
What Are Your Next Steps?
If you are an aspiring machine learning practitioner: Good news! You don’t need a PhD, you don’t need to code algorithms from scratch in CUDA or MPI. If you have a year of coding experience, we recommend that you try Practical Deep Learning for Coders, or consider my additional advice about how to become a data scientist.
You work in tech and want to build a business that uses ML: Good news! You don’t need to hire one of those highly elusive, highly expensive AI PhDs away from OpenAI. Give your coders the resources and time they need to get up to speed. Focus on a specific domain, collaborate with domain experts, and build a product that people actually want and will use.
However, there are even more effective ways to get your name and work out there:
- Start a blog. It’s like a resume, only better. Several people I know had blog posts lead to job offers!
- Create an interesting app and put it online. Real user and industry feedback is illuminating.
- Write helpful answers to others’ questions on the learn machine learning subreddit or on the fast.ai forums. Altruism is important to me, but that’s not why I recommend helping others. Explaining something you’ve learned to someone else is a key part of solidifying your own understanding.
- Do your own experiments and share results via a blogpost or github. One of our students, Slav Ivanov, asked about using different optimizers for style transfer. Jeremy suggested he try it out, and Slav wrote an excellent blog post on what he found. This post was very popular on reddit and made Slav’s work more widely known.
- Contribute to open source. Here, one of our students shares about his positive experience contributing to TensorFlow. With 3 lines of code, he reduced the binary size of TensorFlow on Android to less than 10MB!
To inspire you, here are some sample blog posts from students who have taken our course:
- Linear Algebra Cheat Sheet for Deep Learning
- CNNs from Different Viewpoints
- Setting up a Deep Learning Machine in a Lazy yet Quick Way
- Non-artistic Style Transfer (or How to Draw Kanye using Captain Picard’s Face)
I enjoyed all of the above blog posts and I don’t think any of them are too intimidating. They’re meant to be accessible.
You’re on the right path to building machine learning, deep learning, and data science expertise by taking MOOCs, working on side projects, connecting with online communities, and blogging. All of these strategies will gain you even more opportunities to learn and meet others who are also aspiring learners.