howto drastically improve voice recognition

I think Google's vox recognition technology uses something like a
self-organizing map. Maybe I'm being naive, maybe it's because I
think self-organizing maps are cool. If I'm totally wrong, whatever,
play with me for a second.

You take a matrix of random numbers, hit it with a bunch of training
data and end up with a map of data that organized itself.

Then you take new data, compare it to the map, and use that location
to make a prediction. You take audio data, compare it to your map,
and know it's 80% one word, 75% another word, etcetera.

That's why they're cool, you can make fuzzy decisions, and the more
data you have, the better you can predict. They were invented by a
guy who was working on voice recognition!

Maybe Google doesn't use these, but let's assume they do. Or at
least, that at some point you have a fuzzy list of words that your
audio data might be. So, if it's kind of close between two or three
words, you need more cues to make a good decision. Google has a *ton*
of training data, so it does pretty good, but it's still pretty hard.
( Think about how unintelligible accents can sounds D: )

On my phone, GOOG picks up "but do they really know" as "but today
really know". Those are all real words that sound like what I was
saying, but it's obvious that the sentence is total gibberish. Why is
it so obvious?

Google for "but today really know". Say it out loud. It makes no
fucking sense as a string of words. Total fail. It gets 0 hits on
Google's own search engine. I bet you could look really fast to see
if that string of words ever occurred in all of Project Gutenberg, or
Wikipedia.

Why does voice recognition software suck so bad? "but do they really
know" has 61k hits. That should be an easy decision for an algorithm
to make, but it really seems like nobody is doing that.


At that point when the code is trying to compute the syllables
"to"-"day", it needs to fuzzily guess X possible words, and not make a
final decision yet. It needs to set up a list of a few hundredish
possible combinations, and then it needs to query every single one of
those combinations, and throw out the gibberish.

And then they need to train a map with my gmail, and favor words
I actually say!


ps, would love to build this tech, eta 3 months at $3k/mo, japhy@pearachute.com

Posted
 

Got past level 1 of the Greplin programming challenge on the first try!

I feel so clever:



foo = open('gettysburg.txt').read()

from copy import copy
def reversechk( text):
    la = [l for l in text]
    da = copy(la)
    da.reverse()
    return la == da

palindromes = set()
for i in range( len( foo)):
    for j in range(12):
        if reversechk( foo[i:i+j]):
            palindromes.add( foo[i:i+j])

winner = 'j'
for word in palindromes:
    if len(word) > len(winner):
        winner = word

print winner

 

(http://challenge.greplin.com/)

Posted
 

Great CSS gradient generator

Here is some $GOOG juice for the fine folks at colorzilla:
http://www.colorzilla.com/gradient-editor/
Posted
 

Sketches

At first I was thinking jack o lanterns, ha

(download)

Posted
 

Rocktober 30th

Halloweenposter

The graphic is Mike Struwin's logo, mashed up with the deer head from
a bottle of Jager

Posted
 

From our stars back to our cities

Late summer to early fall, #mighetto

(download)

Posted
 

Lightbank rejection #4 ( I guess?)

It's not totally clear if they want different ideas or further development?

pitch #3 was wrapping a UI on $AMZN payments and competing with PayPal.


---------- Forwarded message ----------
From: Lightbank
Date: Fri, Oct 1, 2010 at 5:01 PM
Subject: Re: Lightbank.com Form: Japhy Bartlett / Online courses in
modern programming languages, with paid testing and difficult
certification by humans.
To: japhy@pearachute.com


Japhy,

Thank you for your submissions. It seems like your ideas need to be
developed a little further, and we encourage you to do so. As of right
now, however, your ideas are not a fit for us. Please feel free to
submit again when you feel ready.

Thank you,

The Lightbank Team

On Thu, Sep 30, 2010 at 10:53 PM, Lightbank Form wrote:
>
> What is your brilliant idea:
> Online courses in modern programming languages, with paid testing and difficult certification by humans.
>
> Why should we be interested:
> Talent is really difficult to find, and people on unemployment would love a nice job.
>
> How are we going to make money together:
> We\'ll charge for guided group lessons, we\'ll charge per test attempt, and we\'ll certify people by having them build a real thing that doesn\'t suck.
>
> People:
> Name: @japherwocky
> Bio:
> It\'s kind of an experiment; tell me to fuck off if I\'m spamming you, or your only looking for more polished pitches, etc.
>
>
> Contact information:
> Name: Japhy Bartlett
> Phone: 7347174596
> Location: SXSWMI
> Email: japhy@pearachute.com
> Url: http://pearachute.com

Posted