Page 1 of 1

Improving language support

Posted: Sat Nov 30, 2013 12:27 am
by rock5
I'm thinking of making some changes that will improve support for languages that include special characters such as umlauts. My goal is that you will be able to use special characters in profiles and waypoint files in particular without having to use codes.

I'm asking for feedback as to what you typically need to do to make things work in your language. For example you might need to use character codes with the INV_AUTOSELL_TYPES option in your profile. Also I want to know what changes you need to make to existing waypoint files you have downloaded that are supposedly multilinguage but don't work until you change them.

Re: Improving language support

Posted: Mon Dec 02, 2013 1:45 am
by Eggman1414
Interesting idea, Would seem pretty cool, if not to just have a little bit more customization.

Re: Improving language support

Posted: Mon Dec 02, 2013 2:28 am
by rock5
My current idea that I'm exploring is to make the bot use all utf8 everywhere and just convert strings for printing. That would involve hooking into the print functions to convert before printing. Even though I believe this is a good idea and will make it the most compatible, it could still cause problems. I think whatever I decide to do will cause some initial problems.

Re: Improving language support

Posted: Wed Dec 04, 2013 7:51 am
by rock5
Awe, no feedback. :cry:

Ok I'll make my own list of situations that need to be considered. Don't blame me if I miss something.

In Profiles
  • INV_AUTOSELL_IGNORE
    INV_AUTOSELL_TYPES and INV_AUTOSELL_TYPES_NOSELL
    <friends> and <mobs>
    PARTY_FOLLOW_NAME
Function arguments.
  • RoMScript and variations
    QuestByName functions
    ChoiceOptionByName
    getQuestStatus
    inventory:findItem
    pawn:hasBuff
    player:findNearestNameOrId
Commandline.xml

Code snippets
  • if target.Name == "name with umlauts" then
    if target.Name == GetIdName(id of obj with umlauts in name) then
I might have missed something but that will do for now.

Next step is to see what sort of strings each of those situations expect. For the ones that expect ascii umlauts, come up with a plan on how to fix it with, hopefully, backward compatibility.

Re: Improving language support

Posted: Wed Dec 04, 2013 10:00 am
by spyfromsiochain
Well I dont want u to just speak alone rock, but I dont think I can help u, I have english client, no umlauts here <3

Re: Improving language support

Posted: Wed Dec 04, 2013 10:06 am
by rock5
There have been a few times when I've explained things very carefully, mainly to help myself clarify what I'm doing, this is one of them. So even if no one helps, these posts are serving a purpose.

Re: Improving language support

Posted: Wed Dec 04, 2013 10:33 am
by spyfromsiochain
Agree.

Well in what I can help, shout!

Re: Improving language support

Posted: Thu Dec 05, 2013 9:18 am
by rock5
Now, I'll list the ones that expect ascii characters and will cause problems.

<friends> and <mobs>
  • The friends and mobs lists expect ascii character codes at the moment. Seeing as these are loaded only when the profile is loaded, I can easily convert it to utf8 when it loads. So it would then support old profiles that still use ascii codes and newer ones that can use utf8 codes or characters.
PARTY_FOLLOW_NAME
  • Also expects ascii codes. The easy option again would be to just convert it to utf8 when loaded for backward compatibility.
player:findNearestNameOrId
  • This ones a bit more trouble. I could do a convert on any names used with it but it's a more highly used function so a convert could slow the bot a bit. Although probably not too much.
Commandline.xml
  • On English PCs, whatever command you try on the commandline can be copied to a file and used there. But if you use strings with umlauts, you wont get the same results at the commandline as you would in a file. That's because the characters you type are ascii. I should be able to convert the whole command to utf8 before executing it. Funnily, if the command is a print statement it will be converted back to ascii for printing. Can't be helped but shouldn't matter as long as the convert functions are fast enough.
if target.Name == "name with umlauts" then
  • pawns and objects have always had their names converted to ascii. So if you had some code like this you would have had to use ascii codes for any umlauts int the name. Unfortunately there is nothing I can do about this. If you have such code you will have to change the name to use utf8 characters.
if target.Name == GetIdName(id of obj with umlauts in name) then
  • I believe this never worked as shown by the Spearmen in Yolius' Haunted minigame. I believe users have had to change it to use a string in their language (using ascii codes) to make it work. After these changes it will work. Which of course means those users who changed it will have to put it back the way it was to get it to work again.
createpath 'Add code' option
  • Should be able to take the same steps as for commandline.
So, after I've modified the bot to use only utf8 and fix the issues above, then the only time it needs to convert characters is to print them. So the last step is to add a convert to the print functions.

Re: Improving language support

Posted: Thu Dec 05, 2013 10:14 pm
by Bill D Cat
Probably the easiest update in all this will be getting createpath.lua to output in UTF-8 format. Though the input for Numpad-0 text will also have to be processed correctly. But generally, since the vast majority of the text that it creates is ASCII, there shouldn't be a huge impact on the performance. Once getTEXT() and GetIdName() fully support UTF-8, then those are the major steps conquered in getting it updated.

After that, it is just a matter of nudging people along in the right direction to use a text editor that supports it. Though I think Notepad, Wordpad and Notepad++ all handle it pretty well in Windows Vista and newer.

Re: Improving language support

Posted: Fri Dec 06, 2013 12:15 am
by rock5
Actually I didn't think of createpath. All the automated stuff should be alright because it will just leave all the original strings in utf8. The only thing we have to worry about is any strings entered in MM. I think that's only the 'Add Code" option. So, like commandline, we just have to convert anything typed to utf8 before saving it. The conversion functions are turning out to be a bit of a pain but still working on it

Re: Improving language support

Posted: Tue Dec 17, 2013 12:17 am
by Bill D Cat
I was thinking about some other issues that may or may not cause problems with the language conversion.

Would saving a file that was previously UTF-8 encoded as an ASCII file cause the bot to choke on it?

What I am getting at is this conversion would not be a one-and-done thing for profiles, waypoints and userfunctions. Would the load routines would have to know what the saved format of the file was when it is opened so that it only converts to UTF-8 as needed?

I just don't know if every user would save an edited file in UTF-8 format every time, so some type of sanity check might need to be done any time the file was accessed. I guess I am just not all that familiar with the exact differences between them as far as opening and reading the files. I understand the 4-byte encoding method, but not how to initially tell if a file was saved in one format or the other.

Re: Improving language support

Posted: Tue Dec 17, 2013 12:57 am
by rock5
First of all any file that doesn't have any bytes above 127 will save the same if saved as ascii or utf8 without BOM (I don't think MM can handle files saved as utf8 with bom). Any file that already has bytes higher than 127 should open as utf8 regardless of how it was saved (in a code editor such as notepad++). The same goes for files with no bytes above 127. They will be opened as ascii regardless of how they were saved (excluding xml files, see note2 below).

When a file is open, switching between utf8 and ascii wont change the file. It only changes the way it displays the characters. utf8 characters will change to their individual bytes. You can even open and save a file with utf8 characters in notepad, which only supports ascii. It will just show the individual utf8 bytes. The only thing to be considered when dealing with utf8 vs ascii is when you type a special character.

When in utf8 mode, if you type a special character, eg. ä (Alt 132), that is a utf8 character and is saved as 2 utf8 bytes in the file. If you are in ascii mode and type ä, that is an ascii character which is not supported by MM because MMs ascii codes are different than windows ascii codes. This is because Windows uses 2 character sets, one for Window apps and one for console apps such as cmd.exe and MM.

So it has always been the case that if users wanted to use actual special characters they had to save as utf8 without bom. This hasn't changed.

Note: if you use slash codes, eg. \132, this is unaffected by what encoding you save as.

Note2: profiles and waypoint files are xml files and should always start with

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
This causes editors such as NotePad++ to open and save the file as utf8 automatically regardless of if there are bytes higher than 127 or not.

Hope that helps.