MEH with Other Languages

Recently, MEH has been updated to be able to handle virtually any language that your computer can properly display/interpret. To analyze texts that are in a language not listed in the Language dropdown menu, you only need to complete two steps.

1.   Using the Select Text Encoding dropdown menu (located under the Text Handling Options menu), choose the proper encoding for your text files. For example, if you want to analyze Arabic texts, you will likely want to use the UTF-8 or UTF-16 encoding option. If the language that you want to analyze is the same as your computer’s default language, it is likely that the correct encoding is already selected by default.

2.   Return to the Text Handling Options menu. Under the Language dropdown menu, select Other Language — the language selected by default will always be English. Make sure that the Use LemmaGen Lemmatization checkbox is unchecked.

That’s it! At the current time, MEH does not include default stop lists and default conversions for most languages on the planet (there’s a lot of them!). Additionally, lemmatization cannot be done on languages not included in the Language menu. You will need to create your own stop list and conversions (which can be used to lemmatize manually) for any language not included in MEH by default. I recommend ranks.nl as a good starting point for stop lists.

If you would like for me to add a default stop list and a default conversion list for a specific language, please send me an e-mail.