Automatic cataloging of suffixal and prefixal neologisms - CANeo TIP
Link to CANeo TIP
CANeo is a software tool adapted to the new Information Society for neologisms classification. It can help information professionals, such as journalists, books authors, magazine writers, and lexicographer focused in the spanish language. The aim of this tool is to detect the root of a neologism and recreate the primitive word it comes from. That primitive word is the main element to proceed with the analysis and it is verified for each possible result by making use of the Lematization service. This service provides us with the grammatical category of the primitive word, allowing to estimate the neologism grammatical categories.
Results
Results provided by CANeo are as follows:
How it works
CANeo TIP is able to detect neologisms formed by suffixation and prefixation satisfying the Spanish language rules. Thus we reconstruct a collection of potential primitive words that may result in the proposed neologisms. To classify neologisms is needed to obtain the primitive’s grammatical categories. To do this CANeo TIP uses an external Lemmatization Service. CANeo TIP performs queries to this system in order to get information about the primitives and its grammatical categories and some other valuable details. Combining statistic information of affixes and the primitive’s grammatical categories, it is possible to make an estimation and classification of the neologisms.
About CANeo TIP
This software is a web application written in C# and ASP.NET, using the Microsoft.NET MVC pattern (Framework 3.5.). This application uses XML data sources to store the information involved in the analysis processes. This solution implements several design patterns and methods to speed up the analysis process and to optimize the number of queries between the systems. This application has been included at T I P – Text & Information Processing application’s pool and depends on the Lemmatization service to operate.
Methodology
This application is based on the studio of seventy thousand composed, including very valuable information about the most productive affixes of the Spanish language, including affixes meanings, statistics and specific information to help us to classify the neologisms and to properly estimate its grammatical category and relevance. The software core includes a rule engine to detect, isolate and reconstruct primitives as follows:
- Suffix analysis: Suffixation is a morphological process whereby a bound morpheme is attached to the end of a stem. Suffix analysis goes through the suffix collection looking for rules matching the criteria. If a rule can be applied to reconstruct a primitive word, then information about this process will be recorded and annotated.
- Prefix analysis: Prefixation is a morphological process whereby a bound morpheme is attached to the front of a root or stem. Prefix analysis goes through the prefix collection looking for rules matching the criteria. If a rule can be applied to reconstruct a primitive word, then information about this process will be recorded and annotated.
- Irregular roots: The system includes a collection of irregular roots. When an irregular root is detected, it needs to be replaced considering both cases.
- Orthographic rules: The system comprises a large variety of Spanish orthographic rules during a root transformation, such as diphthong, hiatus, etc.
- Parasynthetic analysis: Parasynthesis is the formation of words by a combination of compounding and adding an affix. There are specific pairs of prefixes and suffixes which describe a frequent relation. Neologisms formed by suffixation and prefixation need special treatment, including statistics of usage of that particular relation.
- Accent rules: There are no random accent marks in Spanish. The use of any and every accent mark has a justification based on the rules which are also considered by the rule system.
Copyright
CANeo Tip is the Graduation thesis of Raúl Jiménez Estupiñán in Computer Engineering. This project was directed by Francisco Javier Carreras Riudavets and it was provided with the participation in the development of the libraries and syllabification by Zenón Hernández Figueroa y Gustavo Rodríguez Rodríguez.
When citing this resource, please use the following reference:
Carreras-Riudavets, F.; Jiménez-Estupiñán, R.; Hernández-Figueroa, Z.; Rodríguez-Rodríguez, G. (2012). Catalogador automático de neologismos sufijales y prefijales - CANeo TIP. Available at http://cltip.iatext.ulpgc.es