Classic grammar model can be used for computerised parsing

31 May 2010 University of Gothenburg

A classic Nordic grammar model can be used for computerised grammatical analyses and technical applications of modern Swedish text, shows a new thesis in the field of language technology from the University of Gothenburg, Sweden. One such application enables queries answered by a digital text to be generated when it is opened, and then used to search for specific information in the text.

Language researcher Kenneth Wilhelmsson has developed a new method which interprets the grammatical structure of a text, known as parsing, with the help of a computer program. The method builds on Danish linguist Paul Diderichsen’s traditional sentence structure, which has been adopted for the description of all the Nordic languages and is found in most modern Swedish grammar books.

“The grammatical analysis in the program is performed mostly at the main clause level, which can be seen as a big advantage, as the task is then less complex but still gives usable results,” explains Wilhelmsson at the University of Gothenburg.

Instead of performing the entire analysis in one go, the approach consists of a series of steps which can be performed with high levels of accuracy. It is primarily the main clause’s finite verb and other single-word sentence elements which are identified at the main clause level. This, in turn, paves the way for the identification of complex sentence elements (subject, object/predicative and adverbial), which can rely on exclusion methodologies and similar rule formulations (heuristics) rather than an explicit, complete grammatical description.

Kenneth Wilhelmsson’s newly developed method can also be used by language researchers to search for instances of different grammatical phenomena, which can be described in a more refined fashion than with word and string matching.

Wilhelmsson’s work on the thesis also included the creation of various prototype applications which build on this type of analysis. One of them is a unique system for automatic generation of queries from a Swedish text.

The program has access to the Swedish Wikipedia’s article database and can be used to generate queries when a text is opened. When the user begins to type a query, the text is completed automatically, and only queries that can actually be answered may be asked.

“This is intended as an alternative to most other modern query programs where the user cannot know whether a query can actually be answered by the knowledge base at all, and where variations in the formulation of the query may mean that information that is there is missed,” explains Wilhelmsson.

