In early December 2022, while I was leading web development and user experience for big data products at Alibaba Cloud, ChatGPT's emergence left me astounded. Its rapid responses and surprisingly high-quality outputs made one thing crystal clear: AI was quietly revolutionizing our world. During that time, I was gripped by intense FOMO. As a web developer, I realized that failing to adapt could mean missing out on the unprecedented opportunities AI was bringing.
These thoughts led me to quickly leave Alibaba Cloud and join Lepton AI as part of the founding team. While our company primarily provides AI cloud services, we also developed several experimental projects to validate our cloud capabilities. These projects gave me fresh insights into AI application development.
Over the past two years, I've worked on several interesting side projects. One notable example is Lepton Search, an open-source project exploring the integration of traditional search engines with LLMs. It offers functionality similar to Perplexity and has garnered significant attention, accumulating 8,000 stars on GitHub.
Another story is our Chrome extension, elmo.chat. This practical tool helps users quickly summarize web pages, videos, PDFs, and other content while enabling interactive Q&A conversations about the material, This seemingly simple tool now boasts nearly 50,000 monthly active users.
While these were side projects developed outside of my main work, each provided valuable experience in implementing AI capabilities in practical, user-facing applications.
When developing AI applications, choosing between client-side and server-side processing isn't black and white. Based on our practical experience, several key factors need consideration:
In practice, these factors often intertwine. Speech recording systems illustrate this well. A mature system deploys lightweight voice activity detection models client-side to filter invalid audio, sending only valid speech segments for server processing. This ensures real-time performance while significantly reducing server-side processing costs.
This hybrid architecture leverages the strengths of both sides: client-side processing ensures basic real-time response, while the server provides powerful speech recognition capabilities.
To better illustrate how client-side and server-side AI work together in practice, we've created a demonstration project focused on voice recording. You can explore all the features of this demo.
For those interested in the technical implementation, all the source code available on GitHub.
The initial concept stemmed from a simple need: users wanted to record important meetings without the distraction of operating devices. While straightforward in concept, implementation presented several key challenges:
After thorough analysis, we developed a typical client-server collaborative architecture:
// Create inference session
const session = await ort.InferenceSession.create('./model.onnx');
// Prepare input data
const inputTensor = new ort.Tensor('float32', []);
const feeds = { input: inputTensor };
// Run inference
const results = await session.run(feeds);
// Get output
const output = results.output.data;
During implementation, we found the original solution using OpenAI's real-time API was expensive ($0.06/minute). By adopting Whisper, we reduced costs to one-tenth ($0.006/minute). We further optimized through:
We implemented several key optimizations to enhance user experience:
// Using Cache API for model files
const cache = await caches.open('model-cache');
const modelResponse = await cache.match(modelUrl);
if (!modelResponse) {
const response = await fetch(modelUrl);
await cache.put(modelUrl, response.clone());
}
Our project experience has demonstrated the immense potential of client-server collaborative architecture in AI applications. As NVIDIA CEO Jensen Huang said in 2017: "Software is eating the world, but AI is going to eat software." This prediction is becoming reality.
As web developers, we're in an exciting era. With client-side GPU capabilities constantly improving, JavaScript's role in AI applications will become increasingly important. I believe that soon:
Web developers need to embrace these changes and deeply understand client-server architecture to stay competitive in the AI era. Future opportunities belong to those who master client-server collaboration and effectively utilize AI capabilities.
Client-server architecture isn't just a technical choice – it's an art of balance. We must find the sweet spot between performance, cost, privacy, and user experience. Through continuous experimentation and improvement, we can create efficient and practical AI applications that truly benefit users.