Building Applications with client-side and server-side AI

Introduction

In early December 2022, while I was leading web development and user experience for big data products at Alibaba Cloud, ChatGPT's emergence left me astounded. Its rapid responses and surprisingly high-quality outputs made one thing crystal clear: AI was quietly revolutionizing our world. During that time, I was gripped by intense FOMO. As a web developer, I realized that failing to adapt could mean missing out on the unprecedented opportunities AI was bringing.

These thoughts led me to quickly leave Alibaba Cloud and join Lepton AI as part of the founding team. While our company primarily provides AI cloud services, we also developed several experimental projects to validate our cloud capabilities. These projects gave me fresh insights into AI application development.

Understanding AI Through Practice

Evolution of AI Applications

Over the past two years, I've worked on several interesting side projects. One notable example is Lepton Search, an open-source project exploring the integration of traditional search engines with LLMs. It offers functionality similar to Perplexity and has garnered significant attention, accumulating 8,000 stars on GitHub.

Another story is our Chrome extension, elmo.chat. This practical tool helps users quickly summarize web pages, videos, PDFs, and other content while enabling interactive Q&A conversations about the material, This seemingly simple tool now boasts nearly 50,000 monthly active users.

While these were side projects developed outside of my main work, each provided valuable experience in implementing AI capabilities in practical, user-facing applications.

Client vs. Server Trade-offs

When developing AI applications, choosing between client-side and server-side processing isn't black and white. Based on our practical experience, several key factors need consideration:

  1. Performance Requirements
  1. Cost Sensitivity
  1. Data Privacy
  1. Availability Requirements

In practice, these factors often intertwine. Speech recording systems illustrate this well. A mature system deploys lightweight voice activity detection models client-side to filter invalid audio, sending only valid speech segments for server processing. This ensures real-time performance while significantly reducing server-side processing costs.

This hybrid architecture leverages the strengths of both sides: client-side processing ensures basic real-time response, while the server provides powerful speech recognition capabilities.

Practical Case Study: Smart Voice Recording System

To better illustrate how client-side and server-side AI work together in practice, we've created a demonstration project focused on voice recording. You can explore all the features of this demo.

For those interested in the technical implementation, all the source code available on GitHub.

Project Background and Challenges

The initial concept stemmed from a simple need: users wanted to record important meetings without the distraction of operating devices. While straightforward in concept, implementation presented several key challenges:

  1. Cost Management: Continuously sending audio streams for server processing would be prohibitively expensive
  2. Accuracy Requirements: Needed precise voice detection while filtering environmental noise
  3. Resource Efficiency: Couldn't excessively drain device battery or bandwidth
  4. Privacy Protection: Had to balance user privacy with high-quality speech recognition

Technical Solution Design

After thorough analysis, we developed a typical client-server collaborative architecture:

  1. Client Components:
  1. Server Components:

Technical Implementation Details

Voice Activity Detector Implementation

// Create inference session
const session = await ort.InferenceSession.create('./model.onnx');
// Prepare input data
const inputTensor = new ort.Tensor('float32', []);
const feeds = { input: inputTensor };
// Run inference
const results = await session.run(feeds);
// Get output
const output = results.output.data;

Cost Optimization Strategies

During implementation, we found the original solution using OpenAI's real-time API was expensive ($0.06/minute). By adopting Whisper, we reduced costs to one-tenth ($0.006/minute). We further optimized through:

  1. Smart Routing:
  1. Local Processing:

Performance Optimization

We implemented several key optimizations to enhance user experience:

  1. Model Caching:
// Using Cache API for model files
const cache = await caches.open('model-cache');
const modelResponse = await cache.match(modelUrl);
if (!modelResponse) {
    const response = await fetch(modelUrl);
    await cache.put(modelUrl, response.clone());
}
  1. Progressive Loading:

Future Outlook

Our project experience has demonstrated the immense potential of client-server collaborative architecture in AI applications. As NVIDIA CEO Jensen Huang said in 2017: "Software is eating the world, but AI is going to eat software." This prediction is becoming reality.

As web developers, we're in an exciting era. With client-side GPU capabilities constantly improving, JavaScript's role in AI applications will become increasingly important. I believe that soon:

  1. Using AI models will become as simple as installing npm packages
  2. Client-side AI capabilities will continue strengthening
  3. Hybrid architectures will become the standard

Web developers need to embrace these changes and deeply understand client-server architecture to stay competitive in the AI era. Future opportunities belong to those who master client-server collaboration and effectively utilize AI capabilities.

Conclusion

Client-server architecture isn't just a technical choice – it's an art of balance. We must find the sweet spot between performance, cost, privacy, and user experience. Through continuous experimentation and improvement, we can create efficient and practical AI applications that truly benefit users.