Sometimes, I over-engineer things. I get it from my father.
My girlfriend and I recently moved into an apartment building. Oddly, while the enterphone system is advanced enough to allow calling to cellphones (rather than landlines wired into the building), it only allows a single number per apartment. Ninety percent of the time this is just fine, but if one of us is unreachable (at work during the day, travelling, etc), the other loses the convenience of buzzing a visitor into the building.
I was determined to find a solution!
I came across a service called Twilio, which allows interactions with a phone number to be directed to a webserver and handled as you specify using their API. A lightbulb went on in my head – this could work perfectly!
I set up an account, acquired a local phone number, and got to testing. The Twilio service allows you to set up and run simple code (in TwiML, a variant of XML) directly from their console, but for more complex solutions you must point the inbound calls to a web server. I decided to finally give Amazon AWS a try. A free account and some brief setup, and I was in business!
When a visitor enters our buzzer number, the system makes an outbound call to the number we provided and allows us to talk to the front door and dial a key to unlock the door. My first strategy was to have this outbound call caught by our webserver, and have it ask the visitor who they wanted to contact (“Press 1 for me, 2 for her”). Unfortunately, after testing we learned that the enterphone does not allow further key presses once the call has been made.
Strategy number 2 was to take advantage of Twilio’s speech recognition functionality. Instead of having visitors press a key, they just needed to say either of our names. Unfortunately, the speaker on the enterphone is very quiet, and with loud traffic nearby, the visitor would be unlikely to hear the instructions, and Twilio had a hard time understanding what was being said. A more passive solution was required.
Finally I settled on using the Conference feature. When a visitor buzzes us, the call is put into a conference room on hold:
"1.0" encoding="UTF-8" xml version=<Response> <Dial> <Conference startConferenceOnEnter="false" waitMethod="GET" waitUrl="ring_loop_compressed.mp3">Buzzer conference</Conference> </Dial> </Response>
This initiates two calls, one to myself and one to my girlfriend. When one of us picks up, we are served the following:
"1.0" encoding="UTF-8" xml version=<Response> <Gather action="/accepted" method="POST" numDigits="1" timeout="5"> <Say>Ding dong. Press 1.</Say> </Gather> </Response>
Pressing 1 accepts the call, ends the parallel outgoing call to the other person (to avoid empty voicemails or dead air), and patches you in to the conference. We can then proceed as usual, finding out who is there and letting them in.
The icing on the cake? The “on hold” music is the sound of a phone ringing, so they don’t even know anything fancy is going on!
And it all happens very quickly; there is about 1 ring’s worth of delay compared to having the enterphone dial our number directly.
There you have it – an over-engineered solution to a simple, yet frustrating problem. And in case you were wondering, my dad is proud.