Subscribe to receive notifications of new posts:

Jesse Kipp

Jesse Kipp

Making Workers AI faster and more efficient: Performance optimization with KV cache compression and speculative decoding

2024-09-26

Birthday WeekProduct NewsCloudflare WorkersDevelopersAgile Developer ServicesDeveloper PlatformLLM

With a new generation of data center accelerator hardware and using optimization techniques such as KV cache compression and speculative decoding, we’ve made large language model (LLM) inference lightning-fast on the Cloudflare Workers AI platform....

Leveling up Workers AI: general availability and more new capabilities

2024-04-02

Developer WeekDevelopersWorkers AIGeneral AvailabilityDeveloper PlatformCloudflare Workers

Today, we’re excited to make a series of announcements, including Workers AI, Cloudflare’s inference platform becoming GA and support for fine-tuned models with LoRAs and one-click deploys from HuggingFace. Cloudflare Workers now supports the Python programming language, and more...