I think Google's vox recognition technology uses something like a
self-organizing map. Maybe I'm being naive, maybe it's because I
think self-organizing maps are cool. If I'm totally wrong, whatever,
play with me for a second.
data and end up with a map of data that organized itself. Then you take new data, compare it to the map, and use that location
to make a prediction. You take audio data, compare it to your map,
and know it's 80% one word, 75% another word, etcetera. That's why they're cool, you can make fuzzy decisions, and the more
data you have, the better you can predict. They were invented by a
guy who was working on voice recognition! Maybe Google doesn't use these, but let's assume they do. Or at
least, that at some point you have a fuzzy list of words that your
audio data might be. So, if it's kind of close between two or three
words, you need more cues to make a good decision. Google has a *ton*
of training data, so it does pretty good, but it's still pretty hard.
( Think about how unintelligible accents can sounds D: ) On my phone, GOOG picks up "but do they really know" as "but today
really know". Those are all real words that sound like what I was
saying, but it's obvious that the sentence is total gibberish. Why is
it so obvious? Google for "but today really know". Say it out loud. It makes no
fucking sense as a string of words. Total fail. It gets 0 hits on
Google's own search engine. I bet you could look really fast to see
if that string of words ever occurred in all of Project Gutenberg, or
Wikipedia. Why does voice recognition software suck so bad? "but do they really
know" has 61k hits. That should be an easy decision for an algorithm
to make, but it really seems like nobody is doing that.
At that point when the code is trying to compute the syllables
"to"-"day", it needs to fuzzily guess X possible words, and not make a
final decision yet. It needs to set up a list of a few hundredish
possible combinations, and then it needs to query every single one of
those combinations, and throw out the gibberish. And then they need to train a map with my gmail, and favor words
I actually say!
ps, would love to build this tech, eta 3 months at $3k/mo, japhy@pearachute.com


