MemoQ inserts space before/after every tag, wrong date/time format. How to fix/run a fix?
מפרסם התגובה: Mary McKee

Mary McKee  Identity Verified
ארצות הברית
Local time: 23:04
מספרדית לאנגלית
+ ...
May 27

I'm using MemoQ 8.3.8 to post edit .mqxliff files that a client has machine translated and sent to me. I'm having some issues with the files and wondered if you could help me figure out how to batch fix them to save a lot of time:

- Every single segment with a tag in the source has inserted a space before/after each tag in the MT output, which I must go through and manually delete. Every. Single. Tag.
- Every date is in the wrong date format (should be in format 26 May 2020, n
... See more
I'm using MemoQ 8.3.8 to post edit .mqxliff files that a client has machine translated and sent to me. I'm having some issues with the files and wondered if you could help me figure out how to batch fix them to save a lot of time:

- Every single segment with a tag in the source has inserted a space before/after each tag in the MT output, which I must go through and manually delete. Every. Single. Tag.
- Every date is in the wrong date format (should be in format 26 May 2020, not May 20, 2020)
- Every time is in the wrong time format (client has a preferential time format that is nonstandard)

These are likely caused by setting on their end that I cannot fix. I have requested that the client change their settings but of course I'm just one linguist and they have not replied or made any changes to the documents. I'm wasting so much time on these little fixes that I wonder whether I can fix it myself or at least run some kind of regex on the files when I receive them to fix all the errors at once.

I have never ever used RegEx so I'm not even sure how I would do this. But there has GOT to be a way. I'm spending multiple hours every day fixing these tiny things that the computer should be able to do on its own.

Please help!
Collapse


 

Samuel Murray  Identity Verified
הולנד
Local time: 08:04
חבר (2006)
מאנגלית לאפריקאנס
+ ...
@Mary May 27

Mary McKee wrote:
- Every single segment with a tag in the source has inserted a space before/after each tag in the MT output, which I must go through and manually delete. Every. Single. Tag.


This sounds like something Google Translate does (and perhaps certain other machine translators, too). Google Translate doesn't understand what tags are, and so treats them like words, and words have spaces. I know this information doesn't help you.

I'm not sure if there is a setting in MemoQ that can improve this or not. I've never used MT from inside MemoQ -- does MemoQ normally have this problem when it uses MT?

It may be quicker to edit the text in MS Word. Try experimenting with the various options of bilingual review. Right-click the file, Export > Export Bilingual > Table RTF.

Are you allowed to machine translate the text yourself, and then edit that? Or must you use the client's machine translated machine translation?


[Edited at 2020-05-27 20:48 GMT]


Stepan Konev
 

Mary McKee  Identity Verified
ארצות הברית
Local time: 23:04
מספרדית לאנגלית
+ ...
TOPIC STARTER
unfortunately no... May 27

"Are you allowed to machine translate the text yourself, and then edit that? Or must you use the client's machine translated machine translation?"

Thanks for weighing in. Unfortunately I'm not allowed to do my own MT :'( It would be so much faster if so. I don't know what tool they're using for the MT, they could just be using another MT tool and then requiring me to use MemoQ? I've also not used the MemoQ MT for my own purposes.

I like the idea to use Word, except that
... See more
"Are you allowed to machine translate the text yourself, and then edit that? Or must you use the client's machine translated machine translation?"

Thanks for weighing in. Unfortunately I'm not allowed to do my own MT :'( It would be so much faster if so. I don't know what tool they're using for the MT, they could just be using another MT tool and then requiring me to use MemoQ? I've also not used the MemoQ MT for my own purposes.

I like the idea to use Word, except that I have at least a 16,000 segment TM that I can use to help speed up SOME of this work, and I wouldn't be able to have things auto-populate through the documents if I did it in word.

I wish I could figure out how to run a command like:

check if source has spaces around tag
if spaces are mismatched in target, follow source spacing
run

If Only I were a computer programming whiz :/
Collapse


 

Stepan Konev  Identity Verified
הפדרציה הרוסית
Local time: 09:04
מאנגלית לרוסית
Sad but true May 28

MT is not supposed to process tags as Samuel mentioned above. By accepting MTPE jobs you agree to undertake all those issues.
I prefer to remove all tags (Ctrl+F8) before MTing the text. I think pressing one button to insert a tag is better than pressing several buttons to remove leading and trailing spaces or even move tags within the sentence.

What regards replacing, you can use the following regex:

Find what: (January|February|March|April|May|June|July|August|S
... See more
MT is not supposed to process tags as Samuel mentioned above. By accepting MTPE jobs you agree to undertake all those issues.
I prefer to remove all tags (Ctrl+F8) before MTing the text. I think pressing one button to insert a tag is better than pressing several buttons to remove leading and trailing spaces or even move tags within the sentence.

What regards replacing, you can use the following regex:

Find what: (January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{1,2}),\s(\d{4})
Replace with: $2 $1 $3
*Don't forget to enable the regex mode

This will change 'May 20, 2020' to '20 May 2020'.
Collapse


 

Samuel Murray  Identity Verified
הולנד
Local time: 08:04
חבר (2006)
מאנגלית לאפריקאנס
+ ...
@Mary May 28

Stepan Konev wrote:
MT is not supposed to process tags as Samuel mentioned above. ... I prefer to remove all tags (Ctrl+F8) before MTing the text.


Some CAT tools do take steps to ensure that the spacing around tags are correct. I myself, when I machine translate text in Word, use a macro that adds dummy characters to spaces, which then confuses Google Translate in a more predictable way, and then a macro just removes them (and unnecessary spaces) afterwards. I know that OmegaT's actually compares source and target before and after sending content to the machine translator, to ensure that the spaces are dealt with before presenting the text to the translator. I was sure most other CAT tools do it, too. This is why I thought that perhaps Mary's client did not use MemoQ itself for the machine translation (i.e. they exported the file to some other format, translated that file externally, and then imported the "translation" back into MemoQ.

Mary McKee wrote:
I like the idea to use Word, except that I have at least a 16,000 segment TM that I can use...


I did not mean that you should do the translation in Word, but rather than you should export some (or all) of the yet-to-edit segments to Word so that you can remove all the space next to tags using a few find/replace operations, and then import it back into MemoQ. This will remove *all* spaces, but... inserting spaces is a lot quicker than deleting spaces, don't you agree?

Are you aware that you can move your cursor faster using Ctrl+arrow? Fortunately, in MemoQ, using Ctrl+arrow always gets the cursor to the start of an object, or to after a space, predictably (not all editors do that predictably). Out of interest, when you want to remove spaces manually, do you move your cursor using the keyboard keys, or do you use the mouse to click in the right places?



[Edited at 2020-05-28 08:25 GMT]


 

Stepan Konev  Identity Verified
הפדרציה הרוסית
Local time: 09:04
מאנגלית לרוסית
Regex to remove all leading and trailing spaces around tags May 28

Samuel Murray wrote:
OmegaT actually compares source and target before and after sending content to the machine translator, to ensure that the spaces are dealt with before presenting the text to the translator.

Right. Because tags are presented as plain text in OmegaT. <t0/> etc.

@Mary McKee:
You can use this regex to remove all leading and trailing spaces around tags:

Find what: (\s*)(<.*?>)(\s*)
Replace with: $2

This will remove all whitespaces before and after any tag. If you still need whitespaces for some specific tags, don't use 'Replace all'

[Edited at 2020-05-28 09:11 GMT]


 

Samuel Murray  Identity Verified
הולנד
Local time: 08:04
חבר (2006)
מאנגלית לאפריקאנס
+ ...
Regex May 28

Stepan Konev wrote:
Find what: (\s*)(<.*?>)(\s*)
Replace with: $2


FWIW, "*" means "match zero or more" and "+" means "match 1 or more", so for me the regex only works if I change \s* to \s+ (although what you use will affect what happens, obviously). The "?" is supposed to make the expression lazy, but it doesn't prevent the regex from selecting multiple tags next to each other (although this isn't a problem because we retain $2 anyway).

Mary, I'm not sure how it works in your version of MemoQ, but in my case, I have to press Ctrl+H twice to bring up the "advanced" Find/Replace dialog, and select the option "Search within tags as well". Thanks, Stepan, I did not know that one could search tags or search within tags.


 

Stepan Konev  Identity Verified
הפדרציה הרוסית
Local time: 09:04
מאנגלית לרוסית
Bingo May 28

Samuel Murray wrote:
FWIW, "*" means "match zero or more"

This is exactly what we need to process all tags that may have or may not have a whitespace character before, or after, or before and after them. If you put + instead of *, some tags may remain with spaces.
According to my understanding, Mary wants all spaces around tags gone.

[Edited at 2020-05-28 09:45 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

MemoQ inserts space before/after every tag, wrong date/time format. How to fix/run a fix?

Advanced search






SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »



ProZ.com Headquarters
235 Harrison Street Mail Drop #22
Syracuse, NY 13202
USA
+1-315-463-7323
ProZ.com Argentina
Calle 14 nro. 622 1/2 entre 44 y 45
La Plata (B1900AND), Buenos Aires
Argentina
+54-221-425-1266
ProZ.com Ukraine
6 Karazina St.
Kharkiv, 61002
Ukraine
+380 57 7281624
מתרגמים אלו מתאמים את תרגום ProZ.com ל-עברית

Team Coordinators: Addie Ney
נא שים לב שלא כל האתר תורגם. לוקליזציית האתר היא תהליך המתבצע בכמה שלבים, כאשר תחילה מתורגמים האזורים הפעילים ביותר שבאתר. אם אתה מוצא שגיאה בתרגום של חלק כלשהו באתר שכבר תורגם, נא עדכן את אחד ממתאמי הלוקליזציה שלעיל.
לקבלת מידע אודות האופן שבו באפשרותך לסייע לתהליך הלוקליזציה של האתר, אנא לחץ כאן.

Forums
  • All of ProZ.com
  • חיפוש מונח
  • עבודות
  • פורומים
  • Multiple search