November 09, 2020
Using Satellite Images and Machine Learning to Track Illegal Amber Mining in Ukraine
Max Bernhard
Ukraine’s northwest was once covered in lush pine forests. Now, much of the ground where these trees once stood has been turned into cratered, moon-like landscapes. The culprit: illegal miners who are digging up the earth searching for amber.
For decades, the fossilized tree sap had been of little value and people in Ukraine would only dig it up to use as cheap fuel for their ovens. Then, a few years ago, prices spiked, driven by demand from China. The collapse of the Ukrainian government following the Euromaidan revolution exacerbated the problem.
Each year more than 300 tonnes of amber are unearthed in Ukraine but more than 90 percent of that is illegally mined resulting in lost revenue of nearly $300 million, according to some estimates.
“It all started back in 2015, 2016 when we had some kind of gold fever in the Northern regions of Ukraine because at that time the prices for amber increased by maybe two or three times,” says Anatoliy Bondarenko, who works as a data editor and developer for the Ukrainian media outlet Texty.
"If you're lucky, during this time, a single person could find gems worth up to $1,000 per day,” he adds. By comparison, the average annual income in Ukraine lies between $1,500 and $2,000.
Northern Ukraine is full of amber. To mine it, people dig up the trees, then pump water into the sandy earth, and the amber, which is lighter than water, simply floats to the top. While the harvest is easy, the aftermath is a landscape that looks like it was shot with a giant shotgun.
Bondarenko and his colleagues at Texty trained a machine-learning algorithm to recognize the scarred land in satellite imagery. They scanned an area of about 70,000 square kilometers–or more than 13,000 football fields–and found more than 1,000 hectares where illegal amber mining had taken place.
The use of satellite imagery has become a staple of reporting, helping journalists investigate anything from war crimes to natural disasters, climate change, or smuggling. Using machine learning to sift through thousands of images like Texty did for its investigation is done less frequently because of the high effort and technical knowledge that is required.
Getting the data
When Bondarenko and his colleagues first started to report on the issue back in 2015, it was impossible for them to estimate the scale of the mining because they couldn’t get their hands on up-to-date satellite footage. It wasn’t until 2018, that they noticed that some map providers like Google Maps, Google Earth, Microsoft's Bing Maps, and Apple Maps had started to renew their photos for these places, he says.
“When we found that there were new photos for these areas, we started to think about how to check all those images, not manually, but with the help of some kind of machine learning model, some kind of computer program.”
Because the craters amber mining leaves in the ground are somewhere between one and three meters in diameter, the journalists needed satellite footage with a high enough resolution: at least one meter per pixel.
They then had to decide which map provider would be most useful and settled on Bing Maps. “It was very crucial for us to know how old the photos are,” Bondarenko says, adding that Bing “provided not only the images but also metadata for each image, including the date [of when the photo was taken].”
With the new photos available and the map provider settled, Bondarenko and his team had to figure out how to train the algorithm. To find out how patches of land with illegal mining might look on satellite images they talked to environmentalist groups and researchers. “We [also] had a couple of photos from our own reporting and from other media reports and we started to manually look for places with known activity,” he says.
This way the group ended up with less than 10 examples. To reliably train their algorithm they needed more. This is why they organized a crowd-sourcing event where participants spent hours scouring through satellite footage to find patterns that could stem from the mining. “They found about three hundred areas that were probably placed with amber mining. In the end, we found that only 100 of them were indeed interesting to us,” Bondarenko says.
After splitting and simplifying the images to make them easier to process for graphics cards, Bondarenko’s team had a set of about 1,000 positive examples to train their machine-learning model with. They then downloaded satellite photos for the regions they wanted to check for illegal amber mining–roughly half a million photos–and trained the machine learning model using the Python deep-learning library Keras.
Then the computer took over the work. “We had one relatively powerful computer with two, for that time decent, GPU cards, and it worked for two weeks without interruption,” Bondarenko says. When the processing was done, they finally had an estimate of how pervasive the amber mining had become. In total, it took the group roughly a year to complete their project, he says.
Lessons learned
When they started investigating illegal mining, this type of reporting was completely new to Bondarenko and his colleagues. They weren't even sure if they would be able to afford the computing power necessary to run the algorithm. “We didn't even know how or what to search for on the maps, or how to search for the images, but we also didn’t know, which type of neural network to use,” he says.
While projects like this aren't completely inaccessible to most journalists, they do require some technical know-how, Bondarenko says. “It's not a case for a total novice, but I guess it's not a task for a total expert either,” he says, adding that some knowledge of web scraping, satellite image providers, image processing, and machine learning is necessary. “I guess a mid-level Python programmer with a passion and about one month of free time could do it.”
All the initial headaches aside, their investigative approach is relatively easy to copy, he says, adding that the group created a comprehensive methodology with examples for anyone interested in undertaking a similar investigation. Recently, a Brazilian journalist used their method to map solar parks in Argentina’s Patagonia region, says Bondarenko, adding that there is an abundance of potential use cases for journalists to leverage OSINT and machine learning technologies.
Skopenow is an analytical search engine that uses social media and open web data to provide actionable intelligence. Skopenow's platform identifies, collects, and analyzes public information on people and businesses by scouring millions of sources and data points.