Suggestions
Andrej Karpathy
Senior Director of Artifical Intelligence at Tesla
Andrej Karpathy is a Slovak-Canadian computer scientist who has made significant contributions to the field of artificial intelligence, particularly in deep learning and computer vision.1 He was the director of AI and Autopilot Vision at Tesla.1 Karpathy is also a co-founder and former member of OpenAI.1 As of July 16, 2024, Karpathy announced that he started a new AI+Education company called Eureka Labs.1
Early Life and Education::
- Karpathy was born in Bratislava, Czechoslovakia (now Slovakia), on October 23, 1986.1
- At age 15, he moved with his family to Toronto, Canada.1
- He obtained his bachelor's degrees in Computer Science and Physics from the University of Toronto in 2009.1
- In 2011, he earned a master's degree from the University of British Columbia, where he researched physically simulated figures with advisor Michiel van de Panne.1
- He completed his PhD at Stanford University in 2015 under Fei-Fei Li, specializing in natural language processing, computer vision, and deep learning models.1
Career and Research::
- Stanford: Karpathy authored and instructed Stanford's first deep learning course, CS 231n: Convolutional Neural Networks for Visual Recognition, which grew to be one of the largest classes at Stanford.1
- OpenAI: He was a founding member and research scientist at OpenAI from 2015 to 2017.1
- Tesla: In June 2017, Karpathy became Tesla's Director of Artificial Intelligence, reporting to Elon Musk.1 He led the Autopilot computer vision team.23 After a sabbatical, he left Tesla in July 2022.12
- Return to OpenAI: It was reported that Karpathy returned to OpenAI in February 2023 but left a year later.1
- Eureka Labs: Karpathy started a new AI+Education company called Eureka Labs on July 16, 2024.1 According to Eureka Labs, their first product will be the AI course, LLM101n.1
Karpathy was named one of MIT Technology Review's Innovators Under 35 for 2020.1 He likes to train deep neural nets on large datasets.3
Highlights
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc.
More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage:
- raw text (hard/effortful to read)
- markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default
- HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations
Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral https://t.co/z21CP5iQfu
There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen.
TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.





