There's a persistent belief that open-weight models are the budget option — good for prototyping, not ready for production. This was true in 2023. It hasn't been true for over a year.
Llama 3.1 70B, running quantized on a single A6000 GPU, handles every task a typical service business needs: lead qualification, email generation, document summarization, scheduling logic, invoice extraction, and customer communication. The quality difference from GPT-4 on these specific tasks is statistically insignificant.
More importantly, open-weight models can be fine-tuned on your data. A plumbing company's sales agent, trained on 18 months of successful proposals, outperforms any general-purpose cloud model — because it knows your pricing, your service area, your competitive advantages, and your customers' common objections.
The fine-tuning happens locally. Your data never leaves the machine. The resulting model weights belong to you. If you want to switch frameworks, export the weights. If you want to run the same model on a second machine, copy it.
We chose open-weight models for TabTab not because they're cheaper (though they are). We chose them because they're better for the specific use case of running a business — where domain knowledge matters more than general reasoning ability.
The frontier model race is exciting for researchers. For the HVAC company that needs to answer leads at 2 AM, Llama 3.1 has been sufficient since the day it shipped.