New Delhi: Voice samples from “military sensitive regions of India”, including Jammu & Kashmir and Punjab, are being collected by a Beijing-based AI company via an Indian intermediary and then sold to agencies in China for “use and analysis”, a US-based think tank has claimed.
In a report, New Kite Data Labs, which researches how China uses and leverages data, has alleged that the Beijing-based company, Speechocean, has close links with Chinese security agencies and the People’s Liberation Army.
Data collected by this company, it is suspected, could be used by China to engage in “automated extra-territorial mass surveillance”.
Speaking to ThePrint, Christopher Balding, an academic and the founder of New Kite Labs, said that Speechocean (SO) works with a New Delhi-based subcontractor, a business process outsourcing (BPO) firm, to recruit individuals to record their voices from particular regions of India, especially militarised areas.
“These people are paid small amounts of money to record phrases, words, sentences in their language and accent. These recordings are collected using the Speechocean app, which can be downloaded onto your phone. So, people from Kashmir, Punjab were identified and were paid money to record their voice samples, without really divulging the purpose. These samples were then sold to China,” Balding, who led the investigation, alleged.
On the implications of voice data from India being sent to China, Balding claimed that Speechocean is “known” to sell to the Chinese military.
“Speechocean’s attempts to obfuscate their activity on behalf of the Chinese security agencies raise legitimate security questions and imply this data is used to train technological tools engaged in mass surveillance outside of China,” Balding said.
On its website, Speechocean describes itself as an artificial intelligence data resource provider that is devoted to supplying “engineering data products and services to enterprises and scientific research institutions in the whole industry chain of AI”.
Balding told ThePrint that New Kite Labs has apprised the Indian security establishment of its findings.
The information is being looked into, a source in an Indian security agency confirmed.
‘Absolute proof SO worked in Kashmir, Punjab’
A database in China with a bunch of Indian IP addresses led New Kite Labs researchers to Speechocean, a Shanghai Stock Exchange-listed data provider that produces datasets for algorithmic model training and development, the report says.
According to Balding, SO collects voice data from India, particularly sensitive regions, using a local intermediary.
“SO has worked in Punjab and Kashmir and we have absolute proof of that at every level,” Balding said.
“We obtained log files sent from Indian IP addresses in Punjab and Kashmir to Speechocean databases in China of voice file transfers. We traced this back to a recruitment effort where individuals recited scripts in Indian languages using the Speechocean app,” he added.
This, Balding told ThePrint, was “worrisome” due to the company’s apparent ties with the PLA and other Chinese security agencies.
“The company is known to sell to the PLA’s cyber warfare division. There is a document where Speechocean was bidding to sell Vietnamese-language data to the PLA’s cyber-warfare division. Selling this data is SpeechOcean’s primary business model. They gather data and sell it,” he claimed.
When asked about the nature of the voice samples taken in India, Balding said that it is “unclear”
“Since we do not have access to raw files, we are not clear about the nature of voice samples being sent to China from India,” he said.
ThePrint contacted Speechocean via email, but had not received any response when this report was published.
Links to Chinese military, security agencies
Speechocean, according to its website, was founded in 2005 by He Lin, who is currently its chairperson.
According to the report, as of September 2021, he was married to Cai Huizhi, who is the founder and chairperson of a publicly listed Chinese defence company, Beijing Zhongke Haixun Digital Technology, which provides key submarine-related technology to the Chinese military.
“The company website includes a video of President Xi Jinping touring a military installation equipped with its technology. Their relationship highlights the extent of their access and integration within the Chinese state security apparatus,” the report said,
SO, the report alleged, is “deeply embedded in state security apparatus” since China’s National Computer Network and Information Center is a foundational non-founder shareholder, investor, and customer of the company.
This government agency, the report goes on to say, “is responsible for internet security and censorship in China and is an investor in SO through an investment fund and holding company”.
“Their mission is to localise technological development and advancement to make China a global tech leader and assist in the promotion and defence of national security in information management,” it adds.
The think tank also claimed to have documents indicating that the company is involved not merely in public security maintenance inside China but also “collaborates with security intelligence agencies with foreign targets”.
“We have identified public tenders on projects relating to Vietnamese speech classification for the People’s Liberation Army Strategic Support Force (SSF), better known as the cyber warfare division . Other public tender documents relate to classification projects relating to machine translation projects in English and for Chinese minority languages in northwestern China,” the report says.
Data traced back to Beijing, Hong Kong
According to the New Kite Labs data report, the data collected by SO, which included voice samples having words, phrases or conversations of specific “accents and nationality”, was traced back to three primary IP addresses — in Beijing, Hong Kong, and Germany.
“The data was tracked to Aliyun Computing in Beijing, Alicloud in Hong Kong and servers in Frankfurt, Germany registered to Alibaba Singapore,” the report says.
The report also said that beyond data collection and storage capabilities, China has deployed “major resources to create technological capabilities through software to automate behavior oversight technologies usually assisted through AI (Artificial Intelligence) /ML (Machine Learning) applications”.
(Edited by Asavari Singh)