Researchers have come up with a computer program that improves efficiency for authors creating natural language generation (NLG) systems. In other words, it is now easier for programmers to teach a computer how to write in lay English.
NLG systems are used in an enormous number of applications, from video games and online tutorials to customer-service programs (which explains a lot, when you think about it). But the process of developing these NLG systems has traditionally been very labor intensive. The new program, developed by researchers at NC State and Georgia Tech, allows programmers to craft an NLG system twice as fast as they could previously.
“What we’ve done is provide some algorithmic support to reduce the developers’ workload,” says David L. Roberts, an assistant professor of computer science at NC State and co-author of a paper describing the research.
If you want a computer to produce natural language – to write in conversational English, for example – you need to give the computer a set of rules to follow, and basic information about whatever topic you want the computer to talk about. For example, if you want the computer to be conversant in English on the subject of baseball history, you need to give it a vocabulary, a set of grammatical rules and Babe Ruth’s stats.
NLG systems rely on templates that the computer uses as building blocks to create complete sentences. The researchers have created a program that automatically learns how to best use those building blocks, with minimal developer effort. (Developers retain the option of giving the program additional information.) The program then uses a digital model to learn how to use the language templates more effectively, based on developers’ input.
Once the computer has a better understanding of how to use its language templates, developers need to give it the relevant data it needs to discuss a given topic (such as Babe Ruth’s batting average, or box scores for the 1923 World Series). At this point, the new program makes suggestions to developers about what types of data may be added to the system to allow the computer to create a more robust set of phrases.
This improves on the previous NLG programming approach, which involved inputting an enormous amount of data without any guidance, and then exhaustively testing the potential outcomes.
The lead author of the paper is Karthik Narayan, an undergraduate at Georgia Tech. The paper was co-authored by Roberts and Georgia Tech’s Charles Isbell. The paper will be presented in October at the Artificial Intelligence and Interactive Digital Entertainment conference in Stanford, Calif.
No word yet on whether the program can make wallflowers better conversationalists.