I've been seeing ads recently for Dragon Software which purports to allow a computer user to speak words and have the software load those words into a document or something else. It made me recall my old IBM buddy, Ramon D(avid) Faulk, 'Ray' to his friends and associates.
Ray graduated from UTexas (Austin) back in IBM's heyday and got hired by IBM as a mathematician to work on a hot new project out of the Thomas J. Watson Scientific Center in Yorktown Heights, New York. IBM in those days thought it might be possible to build a 'universal translator' given enough computing muscle, and it set a team to work making it a reality. Ray was on that team. After several years of disappointing (or 'no') progress, IBM's management reluctantly concluded that a Universal Translator was not as feasible as they had originally thought. They cancelled the project and reshuffled all the participants into other parts of the company. Remember, this was back when IBM didn't fire people or lay them off just because there wasn't any work for them to do. It was always possible to find a spot for someone in a company as big as IBM.
Ray got transferred to IBM's Field Engineering HQ in White Plains, NY as a programmer. FEHQ's Information Systems department (FEIS) back then did lots of heavy lifting, mostly concentrated on machine service schedules, replacement parts, and publications (manuals). All of this had to be managed in an inventory sense, shipped expeditiously where and when needed, and accounted-for and reported. There was lots to do for a talented application programmer, and Ray was that.
Unfortunately, Ray also had an obsessive personality. He had been hired to create a universal translator and he felt mortified that he had failed so miserably as to get the project cancelled. Yes, there were lots of others who had failed equally miserably or moreso, but Ray's failure was his and his alone. He could not let the mere cancellation of a project stop him from completing his task.
So he did his FEIS application programming tasks from nine-to-five and did other things at other times. Occasionally he would do those other things from nine-to-five as well because they were important. On the occasions that those important tasks pushed aside the tasks Ray was getting paid to do, he would come (as Maxwell Smart might say) this close to getting fired. But he worked for IBM; IBM didn't fire people for silly reasons like that.
Eventually, Ray managed to develop the theory (and the method) that allowed him to take the first giant step toward the now-forgotten Universal Translator. Here's the essential problem:
When we write, we put spaces between words and add punctuation marks to indicate to the reader where certain idea-clusters start or end. When we speak, we don't, unless we're Victor Borge demonstrating Phonetic Punctuation. (whenwespeakwedontunlesswerevictorborgedemonstratingphoneticpunctuation). How does the brain split this sound from that sound? They all run together in a continuous stream, yet virtually everyone who is fluent in the language spoken knows where those implicit blanks belong. The answer is that the human brain has the ability to guess (accurately, most times) that this sound represents a word. It guesses based on hearing that same sound thousands or millions of times in the past. When it encounters a new sound, it guesses that it's a word because all the sounds around it have been accounted-for. How in Hell do you get a machine to do that? This way:
Ray's method involved taking a stream of data and parsing it as if it were continuous data like speech. He had Barbara Argy, one of our co-workers, transcribe a chapter of a foreign-language book onto punched cards (that's how far back this was) eliminating all the blanks, commas, periods, etc. Ray's method didn't care what language the original text was in, nor was that information part of the program's input. Barbara used, I think, a Russian novel, using '@', '#', '%', and others for the Cyrillic characters our alphabet lacks. Ray's program read the string and did a statistical analysis of the frequencies of all the one-character strings, then it did the same for all two-character strings, then three, then four, then... At some point, he would start over at the head of the string and begin looking at the first character compared to the first two characters compared to the first three, four, five, six, etc., each time looking at the relative frequency of the strings. Every now and then during this process, the relative frequency of a string would drop to zero or near it. If he were looking at the prior sentence, for instance, he would notice a sudden drop when the string was 'everyn' as compared to 'every'. From this, the program would conclude that 'every' was a word and insert a blank following it.
Ray once compared this process to what happens in the game of "Ghost". In Ghost, players sit in a circle and one player starts by saying a letter. The next player adds a letter to it. Letters are added in turn until some player adds a letter that makes a word. The object of the game is to add a letter, one that could be part of a not-yet-fully-developed word, but not one that actually completes a word — to make the word longer and longer until finally some player is forced by "the collapse of variety" to add the one-and-only letter that can fit at the end of the existing string.
Using this schema, Ray's program split that chapter of Russian text into its constituent words making only six errors along the way. The errors were of the sort where a prefix might be snipped off a word or a portmanteau word would be split into pieces. The program did not know that the original text was in Russian. It did not use a dictionary. It built its own vocabulary dynamically as it went along.
Well, if you can do that to Russian, you should be able to do it to Welsh or Hungarian with equal ease. You should be able to apply the same technique to spoken language, breaking the stream of sounds to packets of sound and then assigning meaning to those sounds. This is the method infants use when they are pre-verbal. This is the method we will use when we meet our first extra-terrestrial.
When that day comes, and perhaps before that, I expect linguists will be splitting streams of continuous data and developing vocabularies from them, all the while wondering why they call the process "faulking".
Ray's method was the subject of a paper he published with his collaborator, Fran Goertzel Gustavson, and can be found in The IBM Systems Journal (1990), vol 29, number 2. It was titled "Segmenting discrete data representing continuous speech input". You can see the abstract here
Ray passed away a few years back. I'm sure he would have gotten a kick out of Dragon had he lived to see it.