How AI Learned to Talk - Recurrent Neural Networks & Transformers
From David Morgan
Discover the fascinating world of AI language processing as you take a journey documenting how AI learned to talk. This insightful episode is part of our ongoing series delving into the intricacies of neural networks in an easy-to-understand way, made for all career fields. We start by contrasting previously discussed network types, like feed-forward and convolutional networks, with the unique ability of Recurrent Neural Networks (RNNs) to analyze sequential data.
Witness the evolution of Natural Language Processing (NLP), beginning with Michael Jordan's pioneering work on simple RNNs in the mid-1980s. Discover how these networks, by learning sequences, navigate through state space, laying the groundwork for future advancements. Continue that journey through the 1990s with Elman's RNNs, which showed remarkable skill in word partitioning and text generation, albeit with limitations. Enter Yoshua Bengio's groundbreaking 2003 paper advocating for neural networks in NLP and introducing the concept of word embeddings. Then leap to the transformative works of Geoffry Hinton, Ilya Sutskever, Andrej Karpathy and others, exploring how RNNs evolved into powerful text generators, as demonstrated by OpenAI's relatively massive RNN that was trained on Amazon reviews and developed an emergent capability to detect sentiment. Our journey culminates in the game-changing paper "Attention is All You Need", introducing Transformers. Discover how OpenAI leveraged this to create models like GPT-1, GPT-2, GPT-3, InstructGPT, ChatGPT, and GPT-4; a journey marked by emergent properties and in-context learning capabilities.
Finally, we tease our next video, promising a deeper dive into the capabilities of Large Language Models (LLMs) and other Transformer-based models in the ongoing AI gold rush.
Enjoy this video and stay tuned to explore how AI is redefining the way we understand language and communication!
The link to a course playlist of DAU recommended AI courses is: https://dau.csod.com/ui/lms-learner-playlist/PlaylistDetails?playlistId=00118adb-20e1-4dc5-95a8-9ffd03ab7f70
You will need a DAU account to access these resources. If you are a DoD member and need a DAU account and you can request one here: https://www.dau.edu/faq/p/New-DAU-Account
02:22 Simple Single Neuron RNN Example:
04:26 RNNs Show Early Progress for Natural Language Processing:
07:55 Pioneers Like Yoshua Bengio Help to Thaw the AI Winter:
11:10 Progress Resumes for RNNs & Natural Language:
14:36 Limitations of Scaling RNNs
15:00 Transformers / Attention Is All You Need
16:09 Self Attention / Multi-Headed Self Attention and the FFN
19:22 OpenAI's Modified Transformer and GPTs (GPT-1 through ChatGPT 4)
26:58 Other Large Language Models and next video's topics
Links to the research papers used to produce this video:
Michael I Jordan's 1986 work with RNNs: https://cseweb.ucsd.edu/~gary/PAPER-SUGGESTIONS/Jordan-TR-8604-OCRed.pdf
Jeffrey Elman's 1990 work; Finding Structure in Time: https://scholar.google.com/citations?user=Cxi26JcAAAAJ&hl=en
Bengio et al,, 2003 paper: https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
Ilya Sutskever, Geoffrey Hinton, et al., 2011 Paper: https://www.semanticscholar.org/paper/93c20e38c85b69fc2d2eb314b3c1217913f7db11
Andrej Karpathy Blog: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Andrej karpathy et al., 2015 paper: https://arxiv.org/abs/1506.02078
Ilya Sutskever, et al., 2017 paper: https://arxiv.org/abs/1704.01444
Attention Is All You Need, 2017: https://arxiv.org/abs/1706.03762
OpenAI Papers GPT 1-4: https://openai.com/research
I wish I had emphasized right up front that the movie sentiment example was to convey concepts on how RNNs worked and not demonstrate an actual functioning network.
I should have placed words along the top of the excel spreadsheet and had columns represent word embedding vectors. That would have conformed with standard statistical convention.