US Intelligence Agencies Now Have a Tool That Allows English-Speaking Users to Search Through Swahili Text and Speech for Information Without Learning the Language
WASHINGTON – The intelligence community now has a tool that allows English-speaking users to search through foreign language text and speech for information.
The new tool was developed by Raytheon BBN Technologies in partnership with the Intelligence Advanced Research Projects Activity — an organization within the Office of the Director of National Intelligence that develops technologies to solve some of the intelligence community’s hardest problems.
Essentially, once English-speaking users enter a search query in English, the program looks through foreign language documents and recordings to find relevant results, translating those phrases back into English before presenting results back to the user. It’s an “English-in, English-out” tool, and the company claims its system allows operators to search foreign documents, find results and understand their context and meaning without having to speak the language, according to a Jan. 31 announcement.
Raytheon said they used Kazakh, Pashto, Somali, Swahili and Tagalog as the low data foreign languages for its machine learning algorithm, which was additionally tested against Farsi, Bulgarian, Lithuanian and Georgian.
“The system is designed to be applied to any foreign language,” said Raytheon BBN Program Manager John Makhoul in a statement. “Low-resource languages present a particular challenge to retrieval and translation technologies because of a lack of data for training systems. Raytheon BBN met that challenge by developing techniques to overcome the issue of low data and applied them to an end-to-end system that exceeded the goals of the program.”
The solution is part of IARPA’s Machine Translation for English Retrieval of Information in Any Language, or MATERIAL, program, launched in 2017. Raytheon is one of four prime contractors developing solutions, including Johns Hopkins University, Columbia University and the University of Southern California Information Sciences Institute. Each vendor was given a package of training data to develop machine learning solutions. MIT Lincoln Laboratory, the University of Maryland Center for Advanced Study of Language, the National Institute of Standards and Technology and Tarragon Consulting made up the test and evaluation team that assessed performances.
“The tools and techniques developed under the program will boost our ability to find, examine and analyze foreign language content without needing to learn the language,” said IARPA MATERIAL Program Manager Carl Rubino in a statement. “For low-resource languages where expertise is minimal, these new capabilities provide a significant advantage.”
Raytheon Technologies did not disclose the value of its contract with IARPA, although it noted that it has been working with the organization on the solution for four years. In a statement at the time of the initial awards, Columbia University said its grant was $14 million.