MediLens: Biometric AR Goggles for Accessibility
Hackathon Project
Garvish Bhutani, Alex Cho, Daniel Ho, Ted Yoo

Abstract
MediLens is an AR goggle device built to help individuals with Autism Spectrum Disorder (ASD) manage sensory overload. By transcribing live conversations into on-screen captions — with real-time multilingual translation — and providing continuous heart rate biometric data, it creates a customizable sensory buffer that keeps users connected without overwhelming them.
Technologies
Inspiration
We were inspired by one of our close friends who suffers from moderate Autism Spectrum Disorder, as well as many others on the spectrum. He would frequently struggle with visual and audio sensory overload — feeling overwhelmed by the sheer amount of input coming from the outside world.
This motivated us to build a device that could make a tangible difference: something that lets a person stay connected and hold conversations, while controlling how much of the external world reaches them.
What It Does
MediLens is an end-to-end wearable system built on sunglasses with noise-cancelling earbuds and a heart rate sensor.
Captions: A microphone streams audio to AssemblyAI for real-time speech-to-text transcription. Captions are displayed on an AR overlay directly on the sunglass lens, replacing the need to hear conversations.
Translation: Captions are passed through the GPT-4 API for live translation into virtually any language, greatly expanding accessibility.
Heart Rate: A biometric sensor continuously monitors heart rate and displays it on the AR overlay. This is meaningful for ASD users, as research links autism with higher risk of cardiovascular issues.
Wireless Communication: Transcribed and translated text is transmitted wirelessly to the goggles via ESP-NOW, a low-latency WiFi protocol between two ESP32 S3 microcontrollers.
How We Built It
Optics & Hardware: An SH1106 1.3" OLED screen projects onto the sunglass surface via a mirror. Getting a clear, usable image required 10+ iterative 3D-printed prototypes in Fusion 360, experimenting with mirror sizes and a magnifying lens (ultimately dropped for a cleaner image).
Microcontroller: A XIAO ESP32 S3 serves as the main brain — consolidating heart rate sensor data and driving the OLED display.
Speech Pipeline: A microphone feeds audio to AssemblyAI (speech-to-text), then to the GPT-4 API (translation), and the result is sent wirelessly via ESP-NOW to the goggles.
Code Structure: `textconverter_openai` handles ChatGPT-based translation; `textconverter_google_translate` uses the Google Cloud API as an alternative. The `receiver` and `transmitter` folders contain ESP32 firmware for the wireless pipeline.
Challenges
Optics: Achieving a clear AR image was the hardest challenge. Clearance issues between the mirror, screen, and the user's face required constant prototyping. After 10+ iterations, we landed on a no-magnifying-lens design that produced the clearest and brightest result.
ESP-NOW Latency: The wireless link initially suffered from high latency because we were sending text character by character, overloading ESP-NOW packets. Grouping characters into full sentences reduced latency to near-instantaneous.
Speech-to-Text APIs: Most APIs only reliably transcribed English despite claims otherwise. Some only ran on Mac/Linux, requiring dual-booting into Ubuntu. Unoptimized models crashed our laptops. AssemblyAI proved to be the most reliable solution.
What's Next
Emotion Detection: Adding an onboard camera to detect and display the emotions of people nearby — a significant challenge for many with ASD.
Full Portability: Moving the microphone and AI inference directly onto the ESP32 S3 with a cellular connection to the cloud, eliminating the need for a tethered laptop and creating a truly standalone wearable.