Meaningful portrait of documents and Knowledge Base.

 

Meaningful portrait of documents are the formal representation of document texts. Meaningful portraits are formed by semantics-oriented linguistic processor. Set of meaningful portrait (together with index files) compose the Knowledge Base (KB) where is provided various types of semantic search and logical-analytical functions by comparison and transformation of knowledge structures. We design technology which provides the processing in KB distributed in net of computers.

 

Example of text:

 

12:16 27.12.2002

In the Chechen Republic one of leaders of bands the Arabian mercenary Abu-Tarik is destroyed. As have informed the Ministry of Foreign Affairs of the Chechen Republic, Chechen special militia destroy the insurgent in settlement Starye Atagi of Groznensky region. In one of the houses there were found the hiding place with three sub-machine guns.

On some data, Abu-Tarik was involved in murder of Salikhov's family in Starye Atagi in this year.

Meaningful portrait of the text:

 

DOC_( 22, 1-02-98.TXT,SUMMARY; /0+) 0-(ENG)

DATE_(DEC.,~27,12,HOUR,16,MINUTE/1+)

CRIM_GROUP(1,LEADER,OF,BAND,ARABIAN,MERCENARY/2+)

FIO("ABU - TARIK"," "," "," "/3+)

DESTROY(2-,3-/4+) 4-(22,ACT_)

PLACE_(CHECHEN,REPUBLIC/5+)

WHERE(4-,5-)

ORGANIZATION_(MINISTRY,OF,FOREIGN,AFFAIRS,OF,CHECHEN,

REPUBLIC/6+)

INFORM(6-/7+) 7-(22,ACT_)

FORCE_(SPECIAL,MILITIA/8+)

DESTROY(CHECHEN,8-,INSURGENT/9+) 9-(22,ACT_)

PLACE_(SETTLEMENT,STARYE,ATAGI,OF,GROZNENSKY,REGION/10+)

WHERE(9-,10-)

WEAPON_("SUB ",MACHINE,GUN/11+)

FIND(1,HOUSE,HIDE,PLACE,3,11-/12+) 12-(22,ACT_)

PLACE_(STARYE,ATAGI/13+)

INVOLVE(3-,MURDER,SALIKHOV,FAMILY,13-,YEAR/14+) 14-(22,ACT_)

SENTENCE_(22,1-/15+) 15-(1,1,19)

SENTENCE_(22,4-/16+) 16-(1,20,114)

SENTENCE_(22,7-,9-/17+) 17-(2,115,288)

SENTENCE_(22,12-/18+) 18-(5,289,376)

SENTENCE_(22,ON,SOME,DATA,14-/19+) 19-(6,377,476)

A meaningful portrait consists of the elementary fragments, arguments of which are words in the normal form (necessarily for the search and processing). Each elementary fragment has its unique code, which is written in the form of the number with the sign + and is separated by a slash line. For example, in the fragment FIO("ABU - TARIK"," "," "," "/3+) the sign 3+ is its code (but 3- is the reference to it). Fragments DOK_(22, 1-02-98.TXT, SUMMARY; /0+)

0-(ENG) indicate that the meaningful portrait is built on the basis of the English-language text of document with number 22 of the file of 1-02-98.TXT, which was processed as the summary of the incidents (linguistic knowledge depend on this). The following fragments present date DATE_(/1+), criminal group CRIM_GROUP(/2+), persons surname (name and patronymic) FIO( /3+) and so forth. The signs 0+,0- and 1+,1- and 2+,2- and 3+, 3-, are the codes of the fragments, with the aid of which their connections and relations are assigned. Actions are represented in the form of fragments of the type DESTROY(2-,3-/4+) 4-(22,ACT_), where it is represented that criminal group (CRIM_GROUP with code 2+) and person (FIO with code 3+), are destroyed. With the aid of it is the fragment 4-(22, ACT_) indicates that the first fragment is DESTROY(./4+) presents the action and relates to the document with the number 22. Fragments PLACE_(CHECHEN,REPUBLIC/5+) WHERE(4-,5-) indicate the place of this action (WHERE). Fragments ORGANIZATION_(/6+) INFORM(6-/7+) 7-(22,ACT_) represent that organization was informed.

Special role is played by the fragments PREDL_(...), which correspond to the sentences. They are filled up with the words, which did not enter the information objects (in this example they are absent), or with the codes of objects themselves.

To these fragments the indicators of their position in the text are added. For example, the fragment SENTENCE_(22,7-,9-/17+) 17-(2,115,288) represents the fact that the objects with codes 7- (corresponding to the action inform), 9- (corresponding the action destroy are located in the sentence, which begins from the 2nd line of the text of the document and they occupy the place from the 115-th to the 228-th byte. These means of positioning are necessary for the work of the reverse linguistic processor.

Set of meaningful portraits of documents are organized in Knowledge Base. Logical reference is provided with the aid of the rules IF THEN (productions) of the language DECL, which are the basis for decision of logical-analytical tasks.

Graph of meaningful portrait:

 

On this graph the upper node corresponds the document. Central node presents the figurant Abu-Tarik. Left node corresponds the organized criminal group and so on. Nodes with letter A corresponds the actions. The arcs present connection and relation between named entities (NE). Arcs, connected nodes (corresponding named entities) with nodes A, present that the actions includes the named entities .