Building Applications with client-side and server-side AI

Introduction

In early December 2022, while I was leading web development and user experience for big data products at Alibaba Cloud, ChatGPT's emergence left me astounded. Its rapid responses and surprisingly high-quality outputs made one thing crystal clear: AI was quietly revolutionizing our world. During that time, I was gripped by intense FOMO. As a web developer, I realized that failing to adapt could mean missing out on the unprecedented opportunities AI was bringing.

These thoughts led me to quickly leave Alibaba Cloud and join Lepton AI as part of the founding team. While our company primarily provides AI cloud services, we also developed several experimental projects to validate our cloud capabilities. These projects gave me fresh insights into AI application development.

Understanding AI Through Practice

Evolution of AI Applications

Over the past two years, I've worked on several interesting side projects. One notable example is Lepton Search, an open-source project exploring the integration of traditional search engines with LLMs. It offers functionality similar to Perplexity and has garnered significant attention, accumulating 8,000 stars on GitHub.

Another story is our Chrome extension, elmo.chat. This practical tool helps users quickly summarize web pages, videos, PDFs, and other content while enabling interactive Q&A conversations about the material, This seemingly simple tool now boasts nearly 50,000 monthly active users.

While these were side projects developed outside of my main work, each provided valuable experience in implementing AI capabilities in practical, user-facing applications.

Client vs. Server Trade-offs

When developing AI applications, choosing between client-side and server-side processing isn't black and white. Based on our practical experience, several key factors need consideration:

Performance Requirements

Real-time needs: Tasks requiring immediate response like speech recognition or image processing may be better suited for client-side
Computational complexity: Heavy tasks like large language model inference typically work better server-side
User experience: Client-side processing can provide better response times for network-dependent features

Cost Sensitivity

Computational costs: Server-side services bill by usage, frequent calls can get expensive
Bandwidth costs: Data transfer costs need consideration, especially on mobile networks
Development and maintenance: Client-side development may require more adaptation work, while server-side services are easier to manage uniformly

Data Privacy

Sensitive information: User privacy data is best processed client-side
Compliance requirements: Some industries or regions have specific data processing regulations
Data security: Transfer risks need evaluation

Availability Requirements

Network dependency: Offline scenarios need basic client-side functionality
Device compatibility: Different endpoint devices' performance variations matter
Service stability: Cloud service availability and latency impact user experience

In practice, these factors often intertwine. Speech recording systems illustrate this well. A mature system deploys lightweight voice activity detection models client-side to filter invalid audio, sending only valid speech segments for server processing. This ensures real-time performance while significantly reducing server-side processing costs.

This hybrid architecture leverages the strengths of both sides: client-side processing ensures basic real-time response, while the server provides powerful speech recognition capabilities.

Practical Case Study: Smart Voice Recording System

To better illustrate how client-side and server-side AI work together in practice, we've created a demonstration project focused on voice recording. You can explore all the features of this demo.

For those interested in the technical implementation, all the source code available on GitHub.

Project Background and Challenges

The initial concept stemmed from a simple need: users wanted to record important meetings without the distraction of operating devices. While straightforward in concept, implementation presented several key challenges:

Cost Management: Continuously sending audio streams for server processing would be prohibitively expensive
Accuracy Requirements: Needed precise voice detection while filtering environmental noise
Resource Efficiency: Couldn't excessively drain device battery or bandwidth
Privacy Protection: Had to balance user privacy with high-quality speech recognition

Technical Solution Design

After thorough analysis, we developed a typical client-server collaborative architecture:

Client Components:

Deployed lightweight Voice Activity Detector (VAD)
Ran models in browser using ONNX runtime web
Ensured high performance through WebAssembly
Implemented audio stream preprocessing and filtering

Server Components:

Deployed Whisper speech recognition service
Provided high-quality speech-to-text capabilities
Handled complex speech recognition tasks

Technical Implementation Details

Voice Activity Detector Implementation

// Create inference session
const session = await ort.InferenceSession.create('./model.onnx');
// Prepare input data
const inputTensor = new ort.Tensor('float32', []);
const feeds = { input: inputTensor };
// Run inference
const results = await session.run(feeds);
// Get output
const output = results.output.data;

Cost Optimization Strategies

During implementation, we found the original solution using OpenAI's real-time API was expensive ($0.06/minute). By adopting Whisper, we reduced costs to one-tenth ($0.006/minute). We further optimized through:

Smart Routing:

Used VAD to filter silence and ambient noise
Only transmitted valid voice segments to server
Reduced data transfer by 60-70%

Local Processing:

Deployed optimized Whisper tiny model client-side
Handled simple speech recognition tasks
Only called cloud services when high accuracy needed

Performance Optimization

We implemented several key optimizations to enhance user experience:

Model Caching:

// Using Cache API for model files
const cache = await caches.open('model-cache');
const modelResponse = await cache.match(modelUrl);
if (!modelResponse) {
    const response = await fetch(modelUrl);
    await cache.put(modelUrl, response.clone());
}

Progressive Loading:

System initially loads lightweight model
Full functionality loads asynchronously in background
Ensures core features are quickly available

Future Outlook

Our project experience has demonstrated the immense potential of client-server collaborative architecture in AI applications. As NVIDIA CEO Jensen Huang said in 2017: "Software is eating the world, but AI is going to eat software." This prediction is becoming reality.

As web developers, we're in an exciting era. With client-side GPU capabilities constantly improving, JavaScript's role in AI applications will become increasingly important. I believe that soon:

Using AI models will become as simple as installing npm packages
Client-side AI capabilities will continue strengthening
Hybrid architectures will become the standard

Web developers need to embrace these changes and deeply understand client-server architecture to stay competitive in the AI era. Future opportunities belong to those who master client-server collaboration and effectively utilize AI capabilities.

Conclusion

Client-server architecture isn't just a technical choice – it's an art of balance. We must find the sweet spot between performance, cost, privacy, and user experience. Through continuous experimentation and improvement, we can create efficient and practical AI applications that truly benefit users.