Rendered at 11:49:15 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
yu3zhou4 15 hours ago [-]
README is in my opinion (author here) the most interesting - I wrote it to help others build useful mental model to be able to recreate the project yourself, without need to even read my code
janalsncm 10 hours ago [-]
Really practical teaching approach. I clicked in to see how safetensors are loaded and just kept reading. Thanks for sharing.
tom-wal 4 hours ago [-]
I feel like I learned twice as much in 10 minutes reading this than I did reading LLM for Dummies. Thank you
xuanlin314 10 hours ago [-]
The lesson-style README is a great approach. Breaking down LLM inference into digestible steps makes the codebase approachable even for people who haven't touched CUDA before.
GoldenJade 9 hours ago [-]
Thanks for sharing this. As someone currently researching LLMs, I'm sure I'll be referencing this quite a bit going forward.
dwa3592 14 hours ago [-]
Very nice job on read me.
>>Physically, LLM is a file which contains a lot of float numbers.
aka atoms of the LLM.
cyanydeez 14 hours ago [-]
the universe is just atomic if statments
nullpoint420 4 hours ago [-]
it from bit
juancn 14 hours ago [-]
Looks interesting, it reminds me of the first llama.cpp, but better documented.
nazgulsenpai 15 hours ago [-]
I love the documentation formatted in lessons. I can't wait to read through it.
sylware 2 hours ago [-]
I am looking at a plain and simple C implemented LLM inference, and/or x86_64 assembly implemented, and/or AMD GPU RDNA assembly.
Anybody?
cookiengineer 13 hours ago [-]
Wanted to add that the author has an amazing blog with lots of interesting papers: https://jedrzej.maczan.pl/
einpoklum 14 hours ago [-]
It seems the author believes checking the return values of CUDA API calls is not "tiny" enough :-(
>>Physically, LLM is a file which contains a lot of float numbers.
aka atoms of the LLM.
Anybody?