Successfully transcribed and proof-read wealth of information for an AI training program within a strictly limited time window – 4 languages, a total of 33,000 minutes transcribed, over 3 million Chinese characters proof-read – for one of the world’s top application software developing companies.

Background

DefinedCrowd Corporation develops application software. The company offers crowd-as-a-service and machine learning solutions to speed up enterprise data training and modeling.

DefinedCrowd’s platform combines machine learning and data science with crowd-sourcing to help companies to manage easily their global data collection and data enrichment efforts, enabling enterprises to improve quality, scalability and time-to-market for their artificial intelligence and natural language processing applications.

With strong expertise in speech and natural language processing technologies, the company has been serving top AI companies and Fortune 500 companies since day one.

Project Summary

  • Type of Project: Transcription, Proofreading
  • Number and Name of languages: Dutch, French, German, Chinese
  • Industry: AI training company
  • Volume: Dutch transcription - 15000 minutes, French transcription - 8400 minutes,
    German transcription - 9600 minutes, Chinese Proofreading -  over 3 millions characters
  • Deadlines: Usually 1-2 weeks
  • PoliLingua’s customer since 13.11.2019

Project requirement

As the decade unfolds, placing learners and trainers at the heart of training based on data provided by digital systems becomes increasingly plausible. These data provide the indispensable basis for artificial intelligence to take hold of training.

Being a sub-domain of Artificial Intelligence (AI), machine learning is generally aimed at understanding the structure of data and adapting it to models that can be understood and used by users.

This project required us to train algorithms based on examples of human-tagged input and output data. In supervised learning, the computer is provided with examples of inputs that are labeled with the desired outputs. During the project, we worked to collect a considerable amount of speech and text data that afterward was ‘fed’ to train AI to recognize better special aspects of a particular dialect pronunciation and the way of writing.

Challenges

  • Transcribing a lot of call center content for machine learning in a tight deadline with specific instructions

One of the major challenges of this project was the strict deadline set by the project originator. The best tactic in this context is to maintain close coordination while adhering to the provided instructions as close as possible. Thus, we saved the precious time, strengthen ties with the ordering party, and delivered the desired result.

The project requirements prompted us to create a new dedicated team of linguists, each of which was given their own specific work cluster to get onto. The tactics proved to be a success as the patchwork of individual jobs conjoined to create an integrated impeccable data canvas – on the spot and on the fly.

  • Proofreading for 3,659,710 Chinese characters in 2 weeks

The second part of the challenge was to proof-read 3.5+ million Chinese characters within the tight time window. The quality of the result depended on the quality of this part of the project as even an insignificant amount of misinterpreted characters could lead to being AI set back and lost in translation. Our new team of linguists saved the day once again by coming to rescue and manually sieving through character by character and making sure no typo had slipped in.

Though hard and time-constrained, the work done proved to be both testing and rewarding. Meeting the project schedule boosted our self-confidence taking it to a new higher level and once again proved right the well-known saying – ‘You can achieve anything by taking small, thoughtful steps.’

Project Highlights

Anyone who takes up an enormous project within a strict time limit would be naïve to think there would be no collateral issues which are - more often than not - frustrating and time-consuming. It’s nobody’s fault, it’s the stark realities of life. But there are ways to cope with it if you want to be successful in what you do. Adopting this mindset allowed us to come up with a distinct vision of ways and means to find the right solution and build up the knowledge we would resort to dealing with our projects in the future.

What is still better, this approach cemented our relationship with the project owner, making it not only professional but also giving an impression of friendliness. We made on their list of the key partners for language needs and we keep working together on various interesting projects and making our new work connections more trust-based and winning.

Solution summary

  • Learned new software that helped us better understand and deliver the client’s needs
  • Built a team of 180 linguists to deliver the files on time
  • Delivered the highest quality of transcription in all three languages
  • Exhibited resourcefulness and openness to changes and adaptions of the file format and the instructions
  • Established strong collaborative relationship with participant linguists
  • Improved time-management and interpersonal skills