FarsiSpell: A spell-checking system for Persian using a large monolingual corpus 

Abstract In recent years, great availability of various language resources in different forms as well as rapid development of computer technology and programming skills have made researchers in the fields of linguistics and computer science cooperate in solving different problems of computational linguistics and natural language processing. Building large monolingual as well as bilingual corpora in digital forms and storing them in computer memories has enabled linguists and language engineers to automatically explore techniques for processing information with the help of various computer programs without any need to manually collect and analyze data. One of the main applications of monolingual corpora can be seen in developing automatic spell-checking systems. In such systems, a large monolingual corpus can function as a database instead of a monolingual dictionary. In the present study, it has been tried to demonstrate the effectiveness of a large monolingual corpus of Persian in improving the output quality of a spell-checker developed for this language. In the present spelling correction system, the three phases of error detection, making suggestions, and ranking suggestions are performed in three separate stages. An experiment was carried out to evaluate the performance of the spell-checking system.

FullText: here

 

 2- Speech Recognition Technology:

Persian Audio Dictionary 

(for more information contact: This email address is being protected from spambots. You need JavaScript enabled to view it.)

- Spontaneous speech corpus of Persian  (for more information contact: This email address is being protected from spambots. You need JavaScript enabled to view it.)

 

3- Bilingual Dataset: (software)

English-Persian database of idioms and expressions   

(for more information contact: This email address is being protected from spambots. You need JavaScript enabled to view it.)

The English-Persian terminology database of computer and IT 

(for more information contact: This email address is being protected from spambots. You need JavaScript enabled to view it.)

The English-Persian terminology database of management and economics  

(for more information contact: This email address is being protected from spambots. You need JavaScript enabled to view it.)