Multilingual Instant Messaging System: Complete Guide to Building Social Chat + Voice/Video Calling

Earlier this year, I took on a social chat project in the Middle East. The client required the system to support multiple languages including Arabic, English, and Chinese, along with integrated voice and video calling features. After approximately three weeks of research, I ultimately selected this instant messaging solution based on UniApp frontend development. The testing phase ran for two weeks, and the entire system has finally stabilized. Today, I’m organizing the entire setup process for developers with similar needs as a reference.

1. System Features Overview

This multilingual instant messaging system adopts a mainstream front-end/back-end separated architecture. The frontend is developed using UniApp, which allows a single codebase to package both iOS and Android apps. The backend is fully open-source PHP, supporting rapid secondary development. The main functional modules include:

1.1 Multilingual Chat Function

Chat messages support real-time multilingual translation, which is the system’s most distinctive feature. Messages sent by users can be translated in real-time to the recipient’s set language, with translation latency controlled within 2 seconds.实测支持50+种语言的互译,翻译质量尚可,对于日常社交场景完全够用。 The system also supports multilingual interface switching, where users of different languages can independently set the interface language without affecting each other.

1.2 Voice/Video Calling

The audio and video calling functionality implemented based on WebRTC technology demonstrates good call quality under domestic network conditions. WeChat video call latency is approximately 300-500ms, and this system performs at roughly the same level. It supports up to 9 participants in simultaneous video conferences (practical testing shows that with more than 5 people, video quality degrades significantly). The calling module supports basic features such as call forwarding, do not disturb, and whitelist/blacklist. The enterprise version also supports call recording and archiving.

1.3 Social Function Modules

In addition to basic private messaging, the system integrates the following social features:

– Moments: Dynamic posting similar to social media stories, supporting images and short videos

– Nearby People: LBS-based nearby user discovery feature

– Interest Groups: Groups formed based on interest tags

– Friend Requests and Verification: Supports custom verification questions

– Read Receipts: Shows message delivery and read status

1.4 Message Security and Review

The system has a built-in sensitive word filtering engine that supports custom word libraries. Image messages automatically detect and block explicit or violent content (based on Alibaba Cloud Content Security API). Important chat records support end-to-end encryption.

2. Pre-Setup Preparation

Before starting the setup, it is recommended to confirm the following points:

Server Configuration:

Recommended 4 cores and 8GB+ RAM, with bandwidth of at least 5Mbps. Voice and video calling have high server performance requirements. If the user base exceeds 10,000, it is recommended to use 8 cores and 16GB+ configuration.

Domain Name and SSL:

HTTPS is required, and WebRTC calls must use port 443. Some browsers (such as Safari) will directly deny microphone permissions for non-HTTPS connections.

Third-Party Service Applications:

Applications for LeanCloud or Jiguang IM services (for message push and offline message storage) and Alibaba Cloud Content Security API (for image review) need to be applied for in advance.

Technical Background:

Familiarity with UniApp, Vue.js, and PHP is required. Experience with WebRTC is a plus.

Qualification Requirements:

Voice and video calling services require ICP license and related value-added telecommunications business permits. Specific requirements depend on the target market regulations.

3. Common Issues and Pitfalls

3.1 Arabic Right-to-Left Layout Issue

Arabic is written from right to left (RTL), so the interface layout needs to be mirrored. Languages that support RTL need to be configured separately in the system, including Arabic, Hebrew, Persian, and others. If not properly configured, issues such as text overlapping and button misalignment will occur. It is recommended to use Flutter’s Directionality or CSS direction property to handle this.

3.2 WebRTC Call NAT Traversal Failure

This is the most common pitfall. When users are behind strict symmetric NAT or symmetric NAT, the WebRTC STUN server cannot complete the traversal, leading to call connection failure. The solution is to build a TURN relay server (such as using coturn) or have users use a VPN.

3.3 Offline Message Push Delivery Rate Issue

Offline message push delivery rate is a core metric for social apps. Practical testing shows LeanCloud’s delivery rate is approximately 85%, while Jiguang is slightly higher at around 90%. If high delivery rates are required, it is recommended to implement an in-app message synchronization mechanism to ensure multi-terminal synchronization of important messages.

4. Customization and Expansion

The basic version features are already quite complete, but if there are special business requirements, the following customization directions can be considered:

– Live Streaming Integration: Integrate live streaming push/pull stream modules to support live e-commerce, online education, and other scenarios

– Chatbots: Integrate AI large models to achieve intelligent customer service and auto-reply functions

– Multi-terminal Synchronization Protocol Upgrade: Support iOS, Android, Web, and PC client message synchronization

– Private Deployment Version: Provide completely private deployment solutions for government and enterprise customers

⚠️ Important Note:

Instant messaging applications involve user privacy and data security, especially for cross-border social applications. It is recommended to complete level protection evaluation and privacy policy compliance audit before formal launch. Some countries and regions (such as India and Russia) have special data localization requirements for social applications. Be sure to understand these requirements in advance.

5. FAQ

Q1: What languages does the system support for real-time translation?

A1: Supports real-time translation for 50+ languages including Chinese, English, Arabic, Spanish, French, Russian, Japanese, Korean, German, Portuguese, and more.

Q2: What is the maximum number of people supported for voice/video calls?

A2: Technically supports up to 9 people for video conferences, but practical testing shows that video quality degrades significantly with more than 5 people. It is recommended to limit single meetings to 6 people or fewer.

Q3: What is the minimum server configuration?

A3: 2 cores and 4GB RAM is sufficient for testing environments. Production environments recommend 4 cores and 8GB or more. If concurrent online users exceed 10,000, 8 cores and 16GB are recommended.

Q4: Where are message records stored? How to export them?

A4: Message records are stored in the server database and support backend export in JSON or CSV format. The enterprise version supports chat record archiving and compliance audits.

Q5: Does the system support disappearing messages?

A5: Yes. Disappearing messages mode can be enabled in chat settings. Once enabled, messages are automatically destroyed after being read by the recipient.