Mira Camera

Project during internship at Mira Video (Jun. - Aug. 2021)

This project offers an intelligent cloud solution for offline meeting recording. Originally, for offline meeting, there is a common solution, which is a global camera recording the whole scene. For example, lecture recording is such type recording. However, what if there are multiple speakers and we want to see each speaker's face when they are speaking? This project provides a solution that enables automatic clipping a video that switches between multiple streams and always focus on the active speaker. The solution achieves streaming uploading, cloud processing, speaker detection and automatic video clipping.

For the setup, our solution is a mobile application on the phone. During the meeting, every participant can install a phone in front of them. As the meeting starts, someone presses start button in the application. Our server will inform other nearby users to join the same session and start recording and uploading the stream to the cloud server. The videos are saved to the album and also to our storage server. Then our computing server will then 1) Download and synchronize videos with Praat 2) Extract subtitles with iFlytek 3) Slice videos based on sentence timestamps with FFmpeg 4) Detect faces with OpenCV 5) Evaluate clips with SOTA Active Speaker Detection algorithm (Pytorch) 5) Select and merge clips with audience experience constraints. Finally, the users can preview the result within the app.

For technical stack, the front end is implemented by Swift and Android correspondingly. The backend is implemented by Python. Tencent cloud server APIs are used to push live stream. Redis queue is used to hand out jobs to different computing servers.

Expo Video

Demo