Development of support services for linguistic research via Internet

Main research: Octavio Santana Suárez
Funded by: Ministerio de Educción y Ciencia
Reference: TIN2004-03988

It is proposed the realization of a set of remote services usable via Internet oriented to give computational support to the investigation of the linguistic phenomena of Spanish. The aim is to make available to the interested research community the full potential of a set of computer tools —some already developed and others in the development phase— resulting from the work of the Text & Information Processing (TIP) of the University of Las Palmas of Gran Canaria. The services that will be developed are: a remote service of morphological analysis, a remote service of information on morphological relations and a remote service of functional disambiguation.

An interesting complement to the services that it is intended to develop, will be constituted by the implementation of some general purpose clients that will allow the use of the potential available by users who do not need, want or can not program their own applications. The clients that will be developed are: a client of morphosyntactic analysis of texts and a morphological client of information retrieval.

Concrete proposals

The remote services and their clients constitute a new technology that is built on the basis of the use of open standards to facilitate cooperative development taking advantage of the full potential of the Internet. This project aims to develop, in line with thematic line 3.6 of the National Program of Information Technology:

  1. A remote morphological analysis service.
    That it offers the stemming of any word of Spanish by identifying its canonical form, grammatical category and the flexion or derivation that produces it. In the verbs, the simple conjugation and the compound will be treated, the enclitic pronouns, the bending of the participle as a verbal adjective (gender, number) and the diminutive of the gerund. With nonverbal forms, consider: gender and number in nouns, adjectives, pronouns and articles; heteronymy by sex change in nouns; superlative degree in adjectives and adverbs; adverbialization of the superlative in adjectives; appreciative derivation in nouns, adjectives and adverbs; invariant forms such as prepositions, conjunctions, exclamations, words from other languages ​​and phrases or phrases. It will contemplate the prefixation when appropriate.
  2. A remote information service about morphological relationships.
    That offers the recognition, generation and manipulation of morphological relationships from any word, including the recovery of all their lexicogenetic information to a primitive, the management and control of the affixes in the treatment of their relationships, as well as the regularity in the established relationship. It will provide a global view of the behavior and productivity of Spanish words in the main training processes (suffixing, prefixing, parasynthesis, suppression, regression, zero-modification, apocope, metathesis and other unclassifiable that generate alternative spellings).
  3. A remote functional disambiguation service.
    That it offers the grammatical function of each voice in the context in which it appears, minimizing the possibilities thanks to the treatment that it will do both of the local syntactic structures and of the trees of syntactic representation.
  4. A client of morphosyntactic analysis of texts.
    Through the use of the previous services, from a text and through a friendly interface, will allow users to obtain the morphosyntactic analysis of the text, statistical measures of their characteristics, pointing out the neologisms and the location of: grammatical co-occurrences, verbal periphrasis, lexical collocations and other linguistic phenomena.
  5. A morphological client of information retrieval.
    Through the use of the above services, will allow the localization on the Internet of documents that satisfy requests that include both specific words affected to a greater or lesser extent by the different mechanisms of transformation of existing words in Spanish, such as grammatical characteristics or linguistic phenomena. that may occur in the document.