Computerized Spanish linguistic system - SLEI
Author: Sergio Marrero Suárez
Tutor: Francisco Javier Carreras Riudavets
The present project contemplates the creation of an information system that contains the Spanish Linguistic System in reference, mainly, to the morphology and also those static aspects of the syntax and the semantics susceptible to be contemplated and of interest for the processing of the natural language . It allows the storage and control of the following characteristics of the Spanish language: canonical forms, grammatical categories, gender and number, appreciation, derivation, old or unusual, verbal conjugation, meanings, suffixation, prefixation, parasynthesis and others, morphological relationships, synonymy, prepositional regimes, semantic classification, meanings and location of the source. The System reflects the relationships that exist in the Spanish language between the different characteristics contemplated, as well as the large number of exceptions and irregularities of our language in the aspects collected.
Given the complexity of the Spanish Linguistic System, it is necessary to develop a system capable of storing information in an orderly, simple and structured way, and reflecting the abundance of exceptions and irregularities in our language.
The System must contemplate the entries of the main lexicographical repertoires of the Spanish language, its different meanings, trying to minimize the redundancy, especially in the synonyms, grammatical categories to which it belongs, the different semantic classifications of the word, its heteronyms, endings that are associated with it, etc. In turn, you should contemplate its morphology, differentiating between root and termination. The word is defined as the minimum and main unit of storage, without distinguishing morphemes. Next to this main unit of storage, the word, it is contemplated the different relationships that can have with other words such as:
- The synonymic relations.
- The prepositional regimes.
- The morphological relationships, contemplating the suffixation, prefixation and parasynthesis, representing in the form of families the structures they represent.
Also, as a particular case of a word, the verb must be reflected. For this case, it must contemplate, in addition to the common data with the words, the models of conjugation of the different verbs, their irregularities, their defectivities, substitutions of defectivity and participles.
With this system, we intend to go a step further in the difficult path of natural language processing, by giving the possibility of organizing and controlling the most relevant aspects of the Spanish language. This progresses towards the possibility of using, for specific purposes in future applications, the information contemplated. As an example, it could be the engine design of the electronic dictionaries of the future.
- Define a relational data model, unambiguous, capable of storing all the information. The model must contemplate the different entities, the relations of consistency, referential integrity, validations, etc.
- Design and develop a maintenance interface that takes into account the relationships controlling the referential integrity between the data and its consistency, in a simple and easily manageable way.
- To optimally allow queries and lists of information.
- Design a security model for the control of access to information, through different user profiles.
- As well as planning the systems for safeguarding said information, in its form and periodicity.
- Provide the system with a control that, through user profiles, is able to control the form of access and manipulation of information.
It is proposed to perform an Oracle Database, making a relational design of the data and including in the core of the database the greatest possible number of semantic restrictions, validations and consistency checks. Regarding the interface, it can be developed in the Oracle development tools, that is, in Oracle Developer (Forms and Reports). It is a server client environment, in such a way that the data resides in a server and the interface in a client.
Like any information system, it must be provided with a mechanism to safeguard the data that plans the copies and their temporality. If possible with Oracle tools, or in any case, at the level of the operating system.