Introducing EasyLLMBench
TL;DR: I made a tool for benchmarking LLMs that doesn’t suck. We’re often told about vague, subjective leaps in quality – remember the hype around GPT-4.5’s supposed “subtle intelligence”? Yeah, c...
TL;DR: I made a tool for benchmarking LLMs that doesn’t suck. We’re often told about vague, subjective leaps in quality – remember the hype around GPT-4.5’s supposed “subtle intelligence”? Yeah, c...
This blog post suggests a three-axis model for categorizing automation initiatives. It is meant to help the reader understand the impact of different types of automation on organizations, the space...