Remove all lines in language X Iniziatore argomento: Samuel Murray
|
Samuel Murray Paesi Bassi Local time: 02:59 Membro (2006) Da Inglese a Afrikaans + ...
Hello everyone I have a text file with lines of text in English, but unfortunately some of the lines are in Afrikaans. I want to either remove the Afrikaans lines or create a list of all the Afrikaans lines (either option is good for me). Chat GPT claims to be able to do this, but as usual, it simply creates a list of lines that looks plausible, until you double-check it and discover that the bot had just made up a list that looks highly similar to the topic of the list of input l... See more Hello everyone I have a text file with lines of text in English, but unfortunately some of the lines are in Afrikaans. I want to either remove the Afrikaans lines or create a list of all the Afrikaans lines (either option is good for me). Chat GPT claims to be able to do this, but as usual, it simply creates a list of lines that looks plausible, until you double-check it and discover that the bot had just made up a list that looks highly similar to the topic of the list of input lines. Is there an AI (or technological) solution to do this? Thanks Samuel ▲ Collapse | | |
Remove by language | Feb 27 |
Probably you can try to remove by language: select all, set 'Detect language automatically' in the Proofing Language settings and then replace all Afrikaans text with a blank 'Replace with' field.
[Edited at 2024-02-27 18:22 GMT] | | |
Neirda Cina Local time: 08:59 Da Cinese a Francese + ... An alternative to AI | May 23 |
If you can use Python, there's a few libraries you can use to detect the language in a text and optionnally do anything you want with it. What you can use ChatGPT for is walk you through the steps of doing that, it's simpler than you think. The catch is: - most of these libraries will probably not be too accurate with detecting Afrikaans and might mistake it with German. - You need a sample of at least a few dozen characters to eliminate false positives. ... See more If you can use Python, there's a few libraries you can use to detect the language in a text and optionnally do anything you want with it. What you can use ChatGPT for is walk you through the steps of doing that, it's simpler than you think. The catch is: - most of these libraries will probably not be too accurate with detecting Afrikaans and might mistake it with German. - You need a sample of at least a few dozen characters to eliminate false positives. These libraries are not related to AI but mostly work with "ngrams" (so called "trained data" with lots of samples of 3 to 4 letters, when you compare it to a corpus of text you can actually detect most languages pretty well). ▲ Collapse | | |
Hans Lenting Paesi Bassi Membro (2006) Da Tedesco a Olandese
Neirda wrote: - most of these libraries will probably not be too accurate with detecting Afrikaans and might mistake it with German. I assume that it is more likely that the language will be identified as Dutch. Ik neem aan dat het waarschijnlijker is dat de taal als Nederlands geïdentificeerd zal worden. Ek neem aan dat dit meer waarskynlik is dat die taal as Nederlands geïdentifiseer sal word. Since I’ve recently installed Python 3 on macOS Sonoma, I’d be grateful for a link to the Python scripts. | |
|
|
Neirda Cina Local time: 08:59 Da Cinese a Francese + ...
You have to do this yourself. Or ask ChatGPT to. I do not know the libraries in Python as I mostly use C sharp, but Python being the most popular coding language I'm sure they exist. This is what ChatGPT told me: In Python, there are several libraries available for language detection. Some of the most popular ones include: langdetect: This library is a port of Google's language-detection library. It's simple to use and supports many languages. ... See more You have to do this yourself. Or ask ChatGPT to. I do not know the libraries in Python as I mostly use C sharp, but Python being the most popular coding language I'm sure they exist. This is what ChatGPT told me: In Python, there are several libraries available for language detection. Some of the most popular ones include: langdetect: This library is a port of Google's language-detection library. It's simple to use and supports many languages. python from langdetect import detect text = "Bonjour tout le monde" language = detect(text) print(language) # Output: 'fr' langid: This library is another option for language identification. It also supports many languages and is quite straightforward to use. python import langid text = "Hello world" language, _ = langid.classify(text) print(language) # Output: 'en' polyglot: This library offers language detection as part of a larger suite of NLP tools. It requires installing some additional dependencies. python from polyglot.detect import Detector text = "Hola mundo" detector = Detector(text) print(detector.language.code) # Output: 'es' I'd start with that. Then you will also need to write your own routine for whatever you are trying to achieve. ▲ Collapse | | |
A questo Forum non è stato assegnato un Moderatore
Per segnalare violazioni delle regole del sito od ottenere aiuto, contatta
staff sito »
Remove all lines in language X
CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
|
Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|