Back to Blog

Somebody give Andrej some H100s already

OpenAI's GPT-2 sparked controversy five years ago for being "too dangerous to release." Now, with FineWeb and llm.c, a tiny GPT-2 model can be trained in 90 minutes for $20 using 8xA100 GPUs, with the full 1.6B model estimated to take 1 week and $2.5k. The project is notable for its heavy use of CUDA (75.8%) aiming to simplify the training stack. Meanwhile, a Twitter debate between Yann LeCun and Elon Musk highlighted the importance of convolutional neural networks (CNNs) in real-time image processing for autonomous driving, with LeCun emphasizing scientific research's role in technological progress. LeCun also criticized AI doomsday scenarios, arguing for cautious optimism about AI safety and regulation.

Read original post