Semantic NLP

In contrast to today's NLP systems, ALL offers deep semantic access to the content of free text documents by its innovative cognitive semantics and ontology-based representation technology, which is capable of supporting all major NLP tasks (information retrieval and extraction etc.).

Theoretical background

Although more and more NLP systems are making use of some form of semantics, the majority of them rely only on shallow semantic analysis, as their semantic modules are limited to classifying isolated expressions into a few groups (e.g. various types of "named entities") and establishing simple semantic relationships between expressions such as synonymy and hyponymy. In contrast to these "lightweight" uses of semantics, ALL has worked out a methodology and technology for the automatic mapping of natural language texts onto deep semantic representations that take into account the syntactic relationships between expressions and express the texts' informational content in a form which is capable of supporting complex and extended reasoning.

The theoretic basis of our research is cognitive frame semantics, a branch of semantics according to which our concepts are organized into cognitive frames, which are, roughly, situation schemas in terms of which humans interpret  written texts and spoken utterances. In concert with this conception, our formal semantic representations of texts are graphs corresponding to the described situations, participating entities, and their relationships.

The linguistic and (shallow) world knowledge used for the generation of semantic pharmacy diet pills sites representations is represented in a semantic lexicon which contains both formal-ontological and frame-semantic lexical information. The upper level of the lexicon consists of small ontology segments („ontology capsules”) that provide incomplete, domain-dependent schemata for the representation of situations in some contexts.

This semantic methodology and technology is the basis of most of our solutions considered below.


A large part of the information in today's databases is contained in unstructured or semi structured natural language documents. ALL has developed semantics-based solutions to provide access to this information using its frame-based meaning representation technology.

Information Retrieval (IR) and Extraction (IE)

ALL worked out a technology for semantic search in documents, where our methodology for semantic representation of texts is the basis of the search. In this technology the query consists of well-formed phrases in a purpose-built controlled natural language and not a Boolean combination of terms (keywords). The search engine searches for phrases in the documents, which may have the same meaning as some phrases of the query. The relevance of the results depends on the measure of fit between the meaning of the query and the phrases found in the document.

Information Retrieval architecture

The search is language independent, since the meaning representation is language independent. So a query in English may produce results in several languages. However, the linguistic resources, first of all, the corresponding semantic lexicon, have to be developed for each language.

The same technology can be used for information extraction as well. Most IE systems in use rely on domain-dependent templates that are prepared in advance of the information extraction process. In contrast, our system uses user queries as templates, in which the word “something” has the role of a wildcard character.

The first version of the system searches in the claims of pharmaceutical and cosmetic patents in English ( Another version has been used in the REACTION project.

Integrated Search in Structured and Non-Structured Data

The application was developed for the health care domain (search in electronic patient records [EPR]), but can be used for any domain.

Any type of data can be searched for in the database, including natural language texts. Search queries are Boolean-combinations of atomic conditions. The time interval and the number of occurrences within that interval can be specified for each atomic condition. These conditions can describe the boundaries within which numerical or coded data is considered to be a match, however, their change or values that can be calculated from them can also be searched for. Several data bases can be searched parallel including text document collections.

Semantic technology in cognitive systems

Research and development of cognitive systems requires the interpretation of sign series received via various channels (audio, visual etc.). ALL offers an approach to the required language-independent meaning representations that is based on a generalised version of frame semantics, in which the structure of frames corresponds to the conceptual network of the cognitive system, thereby supporting the integration of meaning representations into the world model(s) of the system. The frame semantics developed for natural language connects the different senses and case frames of words to the frames. In the unified theory the elements of other „languages” (e.g. data structures produced by picture or video recognition) and their possible relations are connected to the frames. In this manner a unified informational structure can be extracted from the different information sources.

Ontology management

In addition to using ontologies for semantic interpretation, ALL has significant experience in general ontology management (development and maintenance). We are familiar with several ontology development methods (e.g. Dolce, OntoClean), and have developed our own ontology development methodology. We are also experienced in the automatised management of ontologies; in a European R&D project we have developed an ontology management tool.

In the research aimed at the generation of semantic representation of texts we have worked out a methodology for using small ontology segments („ontology capsules”) for the representation of the concept structure of a theme or a complex situation. An ontology capsule is a minimal model, which can serve as a query in a semantic search system, can be connected to domain specific ontologies in order to expand the information implicit in it, can be visually or otherwise presented to the user, or can be used to build a large stock of models through deriving such models from several pieces of text in order to discover hidden similarities and logical relations between them.