in

The ORM Foundation

Get the facts!

Translation possible of 'keywords'?

Last post 07-30-2014 23:01 by koneill. 13 replies.
Page 1 of 1 (14 items)
Sort Posts: Previous Next
  • 01-26-2013 7:41

    • jacobvos
    • Top 25 Contributor
      Male
    • Joined on 01-21-2013
    • The Netherlands
    • Posts 23

    Translation possible of 'keywords'?

    Hello,

    I am investigating whether I can use NORMA as tool to define a conceptual datamodel for a customer. Though we Dutch people do speak some words English ;-), I cannot offer this to the employees concerned.

    So my question is: can 'keywords' like 'each', 'is a' , 'holds' and so on be translated? (Probably at least this isn't easy because order of words can be different, and also verbal form can vary.)

    Kind regards,
    Jacob Vos

  • 01-26-2013 12:30 In reply to

    Re: Translation possible of 'keywords'?

    Hello Jacob,

    I'll give you the good news first:

    The vast majority of the verbalization text that is not part of your model file can be modified using XML files in your NORMA installation. The basic directions are these:

    1. Open the %PROGRAMFILES%\ORM Solutions\ORM Architect for Visual Studio 20xx\Xml\Verbalization\Core directory.
    2. Copy the _default.xml to NEWFILE.xml.
    3. Edit the first few lines of NEWFILE.xml by changing the xml:lang setting in the Language tag and the name and description attributes in the Snippets tag.
    4. In Visual Studio, the ORM Designer tab of the Tools/Options dialog (or whatever it is called in Dutch), open the dropdown for the Alternate Verbalization Text. You should see the name, description, and language you changed in step three in the 'Core ORM Verbalization' section. Select it.

    Any item you change in your file will now update the verbalization. This is a live process. Note that you can remove items that you do not want to change. The Xml\Verbalization\Core directory also contains a (generated) html file indicating where different snippets are used. This file is not 100% because some snippets are attached inside the verbalization engine, but this mostly occurs for the join path verbalization in the open source tool.

    At this point, you can change most of the verbalization text.

    Now for the bad news:

    Although you'll be in better shape than you are now, getting correct Dutch is not going to be this easy. This basic replacement mechanism works great for English because we don't have to deal with things like articles that need to change for different genders, etc. The closest we have in English is the a/an indefinite articles, which we don't current use in verbalization (partly to avoid this issue). Getting the tool to produce grammatically correct Dutch will require more than overriding static snippets. For a full Dutch translation, I need to do the following:

    1. Enable readings and object names to be entered in multiple languages in the same model. Of course, for now, you can just enter the model in Dutch.
    2. Provide localized .NET resource files for the NORMA UI. Note that some of the verbalization text (I think 'is a' for subtypes falls in this category) comes from the resource files, not the verbalization snippets.
    3. Provide additional hooks to call a language-aware extension to provide modified quantifiers based on additional attributes of the object type names.
    4. Write per-language extensions to add explicit extension properties (similar to the 'IsPersonal' you can now specify on an object type to change 'that' used as a back reference to 'who'), or by hooking into existing lexicons (online or local) that can provide this type of language information.

    Of course, every language is going to have slightly different needs, and I need to catalog as many of these as possible to get hooks into the verbalization engine that a 'verbalization language package' can hook into. This is actually on my radar screen because Terry has a PhD student right now who is doing a Malay verbalization, and she has similar issues. I actually speak German, which is sufficiently close to Dutch that any hooks I add for German should be sufficient for a Dutch plugin as well.

    The first thing to do, however, is to see how far you can get by just modifying the snippet files. If you can track which snippets are not sufficient for localized verbalization then I can get in extension points to allow customization of these snippets.

    -Matt

    PS If you're aware of any publicly available lexicon web services for Dutch, German, etc. please share.

  • 01-26-2013 12:48 In reply to

    Re: Translation possible of 'keywords'?

    Hi Jacob,

    Yes - I'm sure that the "keywords" can be translated. But I suspect that keyword translation will not solve your problem of trying to create an object-role model in the Dutch language. However, using ORM is not about using "keywords", it is about stating facts in such a way that the each fact that is stated is an "atomic proposition". NORMA combines the propositions and constraints and shows the logic that the combination represents as verbaized text.

    Here are some alternatives that you might like to consider:

    A: You make the model in English and translate the verbalizer output into Dutch.

    Consider the fact type: Publication is of PublicationType.

    NORMA generate the following validation in English:
    Publication is of PublicationType.
    Each Publication is of exactly one PublicationType.
    It is possible that more than one Publication is of the same PublicationType.

    Translate this into Dutch and you get:

    Publicatie is van PublicatieSoort.
    Elke publicatie is van precies een PublicatieSoort.
    Het is mogelijk dat er meer dan een Publicatie van dezelfde PublicatieSoort.

     =============================================

    B: You enter the fact types in Dutch:

    For example : Publicatie is PublicatieSoort

    Then the verbalizer generates:

    Publicatie is PublicatieSoort.
    Each Publicatie is exactly one PublicatieSoort.
    It is possible that more than one Publicatie is the same PublicatieSoort.

    Which translates into Dutch as:

    Publicatie
    is PublicatieSoort.
    Elke Publicatie is precies een PublicatieSoort.
    Het is mogelijk dat meerdere Publicatie hetzelfde PublicatieSoort.

    ============================

    The only other alternative I can think of is to provide a Dutch language version of NORMA.
    (and after making this post I saw that Matt's response overlapped mine.)

    Ken

  • 01-28-2013 15:16 In reply to

    • jacobvos
    • Top 25 Contributor
      Male
    • Joined on 01-21-2013
    • The Netherlands
    • Posts 23

    Re: Translation possible of 'keywords'?

    Hi Matt,

    Thank you very much for your information. I made some translations of a small model I created. In Dutch we don't have the a/an problem (it's always 'een'), but of course we have others... For example 'each' => 'ieder' or 'iedere' (solution: 'ieder(e)').

    I didn't found how to translate the word 'has' in the verbalization of a reference scheme (e.g. 'Partij has Partij_Nr' => 'Partij heeft Partij_Nr').

    Something special is the verbalization 'Persoon is aanspreekpunt bij Organisatie' ('Person is contact person at Organisation'). NORMA automatically gives: 'Het is mogelijk dat een Persoon is aanspreekpunt bij meer dan één organisatie' ('It is possible that some Person is contact person at more than one Organisation'). However we would say in Dutch: 'Het is mogelijk dat een persoon aanspreekpunt is bij meer dan één organisatie'.Those are the tough points...

    An other issue is that for 'simple' business people, even the straightforward translation will not do. For example telling that a consumer is 'an instance' of a customer (subtyping) will raise questions. So I've chosen there for: 'is a kind of' (but then in Dutch ;-)).

    So thank you for your assistance so far!

    -Jacob

  • 01-28-2013 15:19 In reply to

    • jacobvos
    • Top 25 Contributor
      Male
    • Joined on 01-21-2013
    • The Netherlands
    • Posts 23

    Re: Translation possible of 'keywords'?

    Hello Ken,

    Thank you, and please see my reaction to Matt's post.

    Translating the verbalizer's output is too much work...

    -Jacob

  • 01-28-2013 16:24 In reply to

    Re: Translation possible of 'keywords'?

    Hi Jacob,

    I don't know what you mean by "too much work".

    All you have to do is to cut and paste into Google translate!

    Seems to me that that would be a lot faster than reprogramming NORMA.   

    Ken

  • 01-28-2013 16:30 In reply to

    • jacobvos
    • Top 25 Contributor
      Male
    • Joined on 01-21-2013
    • The Netherlands
    • Posts 23

    Re: Translation possible of 'keywords'?

    Hello Ken,

    The quality of the outcome of Google translate will not be better than stating the propositions in the current NORMA program (having snippets translated)! So that's the route I follow now.

    -Jacob

  • 01-28-2013 17:29 In reply to

    • jacobvos
    • Top 25 Contributor
      Male
    • Joined on 01-21-2013
    • The Netherlands
    • Posts 23

    Re: Translation possible of 'keywords'?

    Hi Matt,

    Another snippet I cannot translat: 'is involved in'.

    -Jacob

  • 01-28-2013 18:33 In reply to

    Re: Translation possible of 'keywords'?

    Hi Jacob,

    The reference mode readings (has/is of) and the readings for link fact types (involves/is involved in) come from resources strings. However, these readings are really just initial values and are editable inside NORMA. To edit the reference mode readings:

    1. Open the verbalization browser and select the reference-mode identified entity type.
    2. Click the 'has' in the Reference Scheme: line. Generally, there will be no shape for this fact type, so you will jump to the ORM Model Browser tool window.
    3. Open the ORM Reading Editor tool window (Ctrl-wr will get you there quickly)
    4. Edit the readings.

    For involves/is involved in you need to use the Implied Fact Type Readings branch in the reading editor tool window. There is one set of readings for each role in an objectified fact type. Select the role, and you'll get the implied fact type readings for that role.

    Alternately, you can select the fact type in the model browser and expand to see each of the implied fact types, which can be directly selected. The easiest approach, however, is likely just to open the .orm file using the XML editor and look for ' is involved in ' and ' involves ' (note that the leading and trailing space are part of the reading). To open as an XML file, select the file in Visual Studio (I turn on the Tools/Options/Document/Show miscellaneous Files in Solution Explorer so I can easily select my recent files), right click, and choose 'Open With'. In the File/Open dialog, use the little down arrow at the edge of the OK button to select 'Open With'.

    -Matt

  • 01-28-2013 19:19 In reply to

    Re: Translation possible of 'keywords'?

    Hi Jacob,

    The ieder/iedere situation should be relatively straighforward to fix with a good dictionary and a language extension (the core verbalizer provides an object type to quantify and a proposed snippet, the extension gives back an alternate snippet).

    The grammatical reordering of the sentence would be much harder to do because we would need not only a dictionary but a pretty good grammar engine to pull it off. In this case, moving 'annspreekpunt' before 'is' is also something that is also done in German, but describing when the verb is moved is far from trivial and involves not only recognition of the verb but also recognition of the grammatical pattern where it is used. So, we either need a grammar engine to perform this analysis (on a per-language basis) or a means of classifying the grammatical situations where the phrases can be used and letting the user provide alternate reading forms for these occurrences, as well as additional meta data on the snippets to indicate the preferred reading form to use in a given location. Again, all of this data is extremely language specific.

    The more of these issues we can get enumerated for different languages the easier it will be to put together a localizable verbalization engine. It looks like you're making good initial progress.

    -Matt

  • 01-29-2013 16:53 In reply to

    • jacobvos
    • Top 25 Contributor
      Male
    • Joined on 01-21-2013
    • The Netherlands
    • Posts 23

    Re: Translation possible of 'keywords'?

    Hi Matt,

    Thank you, it's clear now how to translate things like 'has' and 'is involved in'. It's a pity that those strings cannot be translated like the other snippets. But I guess there is a reason for this..

    Jacob

  • 01-29-2013 16:54 In reply to

    • jacobvos
    • Top 25 Contributor
      Male
    • Joined on 01-21-2013
    • The Netherlands
    • Posts 23

    Re: Translation possible of 'keywords'?

    Matt,

    Thank you also for this answer. I must say that in using NORMA this isn't the biggest issue for me. More important is reporting... I will open another post for it because it's another subject.

    Jacob

  • 04-25-2014 12:25 In reply to

    • jacobvos
    • Top 25 Contributor
      Male
    • Joined on 01-21-2013
    • The Netherlands
    • Posts 23

    Re: Translation possible of 'keywords'?

     Hi Matt,

    Some time passed since my latest post...

    Currently I am translating the core verbalisation into Dutch.I fnished translating the words in the xml file, and now I am modelling some things to check the translation.

    I was wondering: do you have a model in which all different things / patterns that can be modelled, are modelled?

    Kind regards,

    Jacob

  • 07-30-2014 23:01 In reply to

    • koneill
    • Top 25 Contributor
      Male
    • Joined on 02-17-2012
    • The Netherlands
    • Posts 27

    Re: Translation possible of 'keywords'?

     Hi Jacob,

     If you want, I don't mind helping in some testing as I'm fluent in dutch.

     Regards,

    Karl

Page 1 of 1 (14 items)
© 2008-2014 The ORM Foundation: A UK not-for-profit organisation -------------- Terms of Service