All publications with tag

  • Working with foreign texts. How to increase understanding and interest to learn the language? Translation

    In life or at work sometimes you have to deal with texts in foreign languages, when your knowledge is still far from perfect. To read and understand what is this for (and, in best case, to learn a few new words), I usually use two options. The first one - translation of the text in browser, the second - the translation of each word separately using, for example, ABBYY Lingvo. However, these methods have many shortcomings. Firstly, browser offers translation of sentences, this means that order of words can be changed and translation can be even more confusing than the original. Secondly, browser does not offer any alternatives translation or synonyms to words, and thus learning new words becomes problematic. Alternatives and synonyms can be obtained by searching for a specific word in translator, but it takes some time, especially if there are many words. Finally, while reading text I would like to know what are the most popular words in this language, so that I can remember them, and use in my writing or speaking experience.

    I thought it would be nice to have such a "translator" on hand, and so I decided to put it into python. All who are interested, welcome under cut.

    Word counting

    When writing the program, I was guided by the following logic. At first, you need to rewrite all the text in lowercase, remove unnecessary characters and symbols (.?!, Etc.) and count how many times each word appears in the text. Inspired by the code from Google, I did it without the slightest difficulty, but I decided  to store results in slightly different form, namely {1: [group of words that is with frequency 1] 2: [- // - with frequency 2 ], etc.}. This is useful if you want to sort within each group of words, for example, if we want the words were in the same order as in the text. As a result I want to get a double sort: the most common words at the beginning and ordered according to the source text if they occur with the same frequency. This idea is reflected in the following code.