Pages in topic: < [1 2 3] > | (Part of) the IATE database can now be downloaded as a massive TBX! Thread poster: Michael Beijer
| Michael Beijer United Kingdom Local time: 03:43 Member (2009) Dutch to English + ... TOPIC STARTER | Samuel Murray Netherlands Local time: 04:43 Member (2006) English to Afrikaans + ... What Paul doesn't seem to do | Jul 14, 2014 |
He mentions something that he doesn't seem to do, namely the idea to remove the languages that you don't want, before importing the file, so as to decrease the file size. | | | Michael Beijer United Kingdom Local time: 03:43 Member (2009) Dutch to English + ... TOPIC STARTER Interesting developments. | Jul 14, 2014 |
Email to Xbench support: "Hello, I have been trying to import the recently downloadable IATE database, which can now be downloaded as a TBX from the IATE site, and noticed that Xbench isn’t importing the file properly. It is missing a lot of metadata and not at all seeing the way the synonyms are related. Many people are currently discussing this issue, e.g. ... See more Email to Xbench support: "Hello, I have been trying to import the recently downloadable IATE database, which can now be downloaded as a TBX from the IATE site, and noticed that Xbench isn’t importing the file properly. It is missing a lot of metadata and not at all seeing the way the synonyms are related. Many people are currently discussing this issue, e.g., here: • http://multifarious.filkin.com/2014/07/13/what-a-whopper/ • http://www.proz.com/forum/translator_resources/271879-part_of_the_iate_database_can_now_be_downloaded_as_a_massive_tbx.html • https://groups.google.com/forum/?fromgroups=#!topic/cafetranslators/WEoiqacrpo0 • https://www.youtube.com/watch?v=xDv-y0p0NXs&feature=youtu.be I think it would be great if Xbench was changed so that it would be able to correctly import this file, and files of its kind. This would be a ‘unique selling point’ for Xbench, as there is currently no other (single) program that can do this. Michael" They just answered: "Hi Michael, Thank you for your email. We published a new build of Xbench 3.0, which handles synonyms correctly. The 64-bit version is required to load the huge IATE .tbx file. Download and install Xbench 3.0 build 1243 (64 bits). It is available at the http://www.xbench.net/index.php/download. It takes some time for Xbench to look through the file to show the languages available at the IATE .tbx file at the "Select Languages" window. I have attached a screenshot of one term (EN > ES), with synonyms and metadata. If some metadata is missing, please send us an example so that we can reproduce this issue. Regards, Oscar Martin, The Xbench Team" Haven’t tried it yet as I am away from the office, but this looks promising!
[Edited at 2014-07-14 18:18 GMT] ▲ Collapse | | | that's what I used... | Jul 14, 2014 |
Michael Beijer wrote: We published a new build of Xbench 3.0, which handles synonyms correctly. The 64-bit version is required to load the huge IATE .tbx file. As I said, when I imported the TMX file into a Studio memory, only half entries were imported... the rest discarded as "errors"... unfortunately, I couldn't see what these errors were about... | |
|
|
Michael Beijer wrote: Thank you for your email. We published a new build of Xbench 3.0, which handles synonyms correctly. The 64-bit version is required to load the huge IATE .tbx file. That's a pretty impressive turnaround time from the xbench team. I have only messed around a little bit with it myself, only enough to see that it will be easy to process. | | | Michael Beijer United Kingdom Local time: 03:43 Member (2009) Dutch to English + ... TOPIC STARTER TMX format causing the loss of entries? | Jul 14, 2014 |
Giovanni Guarnieri MITI, MIL wrote: Michael Beijer wrote: We published a new build of Xbench 3.0, which handles synonyms correctly. The 64-bit version is required to load the huge IATE .tbx file. As I said, when I imported the TMX file into a Studio memory, only half entries were imported... the rest discarded as "errors"... unfortunately, I couldn't see what these errors were about... Hi Giovanni, I'm not sure as I haven't tested the latest build of Xbench yet. However, I have a feeling that what happened is you lost the entries because of the TMX format, which wasn't designed to handle synonyms. What happens if you export to tabbed text? Michael | | | Yes he does ;-) | Jul 14, 2014 |
Samuel Murray wrote: He mentions something that he doesn't seem to do, namely the idea to remove the languages that you don't want, before importing the file, so as to decrease the file size. Hi Samuel, The process in the article goes through extracting only FIGS +English. Not sure how you missed that? Regards Paul | | |
I converted Michael's (thank you, Michael) nl-en txt file - see above, July 11 - to a TMX file, and ended up with 374,724 entries out of the 401,625 entries Michael mentioned. Loading the TMX file in CafeTran (Mac, 12GB RAM) took seconds. Pretty good. I did a quick check, and only found one entry in another language. Wonderful! Cheers, Hans
[Edited at 2014-07-15 01:40 GMT] | |
|
|
Don't know... | Jul 15, 2014 |
Hi Giovanni, I'm not sure as I haven't tested the latest build of Xbench yet. However, I have a feeling that what happened is you lost the entries because of the TMX format, which wasn't designed to handle synonyms. What happens if you export to tabbed text? Michael
I'll have to try that... I'll report back... EDIT: but then I won't be able to import it as Studio memory? Not an expert on TMs and Studio, as you can tell...
[Edited at 2014-07-15 12:54 GMT] | | | Erik Freitag Germany Local time: 04:43 Member (2006) Dutch to German + ...
Giovanni Guarnieri MITI, MIL wrote: Hi Giovanni, I'm not sure as I haven't tested the latest build of Xbench yet. However, I have a feeling that what happened is you lost the entries because of the TMX format, which wasn't designed to handle synonyms. What happens if you export to tabbed text? Michael I'll have to try that... I'll report back... EDIT: but then I won't be able to import it as Studio memory? Not an expert on TMs and Studio, as you can tell... [Edited at 2014-07-15 12:54 GMT] Just as an aside: I keep reading that people try to create a tmx file. Why would you want that? AFAIK, IATE is a termbase... Hopefully, somebody will come up with an easy way to use the available tbx with MultiTerm... | | |
Erik Freitag wrote: Giovanni Guarnieri MITI, MIL wrote: Hi Giovanni, I'm not sure as I haven't tested the latest build of Xbench yet. However, I have a feeling that what happened is you lost the entries because of the TMX format, which wasn't designed to handle synonyms. What happens if you export to tabbed text? Michael I'll have to try that... I'll report back... EDIT: but then I won't be able to import it as Studio memory? Not an expert on TMs and Studio, as you can tell... [Edited at 2014-07-15 12:54 GMT] Just as an aside: I keep reading that people try to create a tmx file. Why would you want that? AFAIK, IATE is a termbase... Hopefully, somebody will come up with an easy way to use the available tbx with MultiTerm... Mine was an experiment, to see if I could import the big TBX... I'd like to have a useful format for MultiTerm too... BTW, I converted the TBX to a tab-limited format, but I can't import it... it says "invalid file format - no valid signature found"... EDIT: managed to export the TBX in Excel format... I'm now converting it using MultiTerm Convert... looks like it's going to take a few hours... I'll let you know the result...
[Edited at 2014-07-15 15:04 GMT] | | | Samuel Murray Netherlands Local time: 04:43 Member (2006) English to Afrikaans + ... Regex removal of languages in Edit Pad Pro | Jul 15, 2014 |
SDL Support wrote: Samuel Murray wrote: He mentions something that he doesn't seem to do, namely the idea to remove the languages that you don't want, before importing the file, so as to decrease the file size. The process in the article goes through extracting only FIGS +English. Not sure how you missed that? We may be talking about different things, but what I refer to is the fact that one can use regex in e.g. Edit Pad Pro to remove languages that you don't want. That would reduce the 2.2 GB file to something smaller, *before* feeding it to Glossary Converter or whatever other program. Yet halfway through his blog post he mentions having to split the 2.2 GB file using XMLSplit. In other word, he had not removed any languages from the file before using XMLSplit. And... if he had used regex in Edit Pad Pro, would he not have mentioned what the regex string is? To remove languages from the TBX file, using Edit Pad Pro: Find what: <langSet xml:lang="(LANGUAGES)*?">.+?</langSet> Replace with: [nothing] Regex: YES (enables regex) Dot: YES (includes newlines in dot definition) Replace "LANGUAGES" with ro|bg|cs|da|de|el|en|es|et|fi|fr|ga|hu|it|lt|lv|nl|pl|pt|sk|sl|sv|mt|la|hr|mul (remove from this list the languages that you want to keep) Example: To remove Czech and Hungarian, use this: <langSet xml:lang="(cs|hu)*?">.+?</langSet> Unfortunately I only have 6 GB of RAM, so it takes me about 2 hours to remove all languages except FIGS+E. The FIGS+E file is 1.31 GB (50 MB zipped with 7z). Could anyone tell me if it is a useful file?
[Edited at 2014-07-15 16:07 GMT] | |
|
|
conversion finished... over 886.000 terms converted in XML format... now the fun bit... importing it into Multiterm... | | | Erik Freitag Germany Local time: 04:43 Member (2006) Dutch to German + ... Edit Pad Pro | Jul 15, 2014 |
Following Samuel's suggestion (thanks for that!), I've just tried to remove all languages except NL, EN and DE in Edit Pad Pro. Both times, the process froze after a couple of minutes (after some 4 million matches, but not always after the same number of matches).
[Bearbeitet am 2014-07-15 17:32 GMT] | | | Erik Freitag Germany Local time: 04:43 Member (2006) Dutch to German + ... No luck with Samuel's FIGS tbx | Jul 15, 2014 |
Samuel Murray wrote: Unfortunately I only have 6 GB of RAM, so it takes me about 2 hours to remove all languages except FIGS+E. The FIGS+E file is 1.31 GB (50 MB zipped with 7z). Could anyone tell me if it is a useful file? [Edited at 2014-07-15 16:07 GMT] Dear Samuel, I've downloaded your file and tried to import it with MultiTerm Convert, but no luck. Error message: '<' is an unexpected token. The expected token is'='. Line 286973, position 9. (God how I love error messages that I can't copy and paste!)
[Bearbeitet am 2014-07-15 17:42 GMT]
[Bearbeitet am 2014-07-15 17:43 GMT] | | | Pages in topic: < [1 2 3] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » (Part of) the IATE database can now be downloaded as a massive TBX! Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
| Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |