Governments are realizing the power of data to tackle complex problems, but raw data is not enough to inform policy decisions. Analyzing and interpreting data requires deep expertise in modelling, statistics and data science. The supply of high-level data science experts is limited and governments often struggle to compete with the private sector salaries these experts can command.
A community of innovators is now helping solve this problem by building data science competition platforms which give governments access to valuable talent and give data scientists the chance to work on important problems using valuable data sets.
Typically, an organization – a government, NGO or business – collaborates with a competition platform to create a challenge. The organization offers a reward, which can be as high as $80,000 or more.
These competition platforms started because their founders saw a mismatch between government needs and their access to talent and decided to step in with a solution. “We created Zindi because we were seeing organizations having more data than they ever had before,” says Celina Lee, cofounder and CEO of the Africa-focused platform. “They recognize the value in this data but don’t always have access to the skills and solutions, such as machine learning and artificial intelligence, to solve their problems.”
When we talk about finding talent for government, the best solutions often come from people in completely different domains or expertise.
Competitions attract thousands of entries from individuals motivated by both monetary rewards and the chance to work on interesting, important problems. “If you’re putting your time into a competition and you’re probably not going to win the prize, it helps if it’s a project that’s interesting and for a client that you care about,” says Greg Lipstein, cofounder of DrivenData, a US platform. “There are really interesting datasets and problems that governments are working on, and the competition structure means data scientists have a well-understood path to understand the data and contribute their skills in a useful way.”
Governments, for their part, get access to a community of talent they would never be able to assemble in-house and are able to use the competition to understand what skills they are looking for in the first place. “The competition set-up is tapping into a very diverse set of participants, backgrounds, skillsets and experiences,” says Lipstein. “It’s a nice way of uncovering what is going to be the best approach; for instance, you don’t know if a space physics background will be really helpful here, or a machine learning technical background might make a difference, so it allows people to try a lot of different things.”
DrivenData has run over 50 such challenges over the past six years. One of DrivenData’s competitions, for the National Oceanic and Atmosphere Administration, sought solutions to predict geomagnetic storms, which result when solar winds interact with the space surrounding earth. These rare events can interfere with GPS, satellite communication and electrical power transmission. The competition was highly complex, pushing the boundaries of predictive analytics. Interestingly, very few of the top applicants were from the core field of meteorology (and none of the top four were even from the US). Their expertise drew instead from fields like computer vision, natural language processing, data science and machine learning.
“They were taking all these techniques from different fields and applying it to these time series problems,” says Christine Chung, a data scientist at DrivenData. “When we talk about finding talent for government, the best solutions often come from people in completely different domains or expertise.”
In putting these challenges out there, we get thousands of people working on a problem, building thousands of algorithms and refining them.
In some cases, governments can leverage innovations from competitions sponsored by other partners to improve their own services. DrivenData worked with Yelp, the review app, on a competition to design a model using social media data (ratings, patterns and phrases) to predict potential health code violations. The city of Boston began using the top submitted algorithms and was able to improve its targeting. One result from this case study demonstrated that around 60% of the most severe class of violations were surfaced earlier with the algorithm than before.
The competition format brings a speed that government could never achieve internally. “What we’re delivering is tangible, it’s measurable immediately, and it can be turned around in three months,” says Celina Lee at Zindi. “In putting these challenges out there, we get thousands of people working on a problem, building thousands of algorithms and refining them. Every day we get hundreds of new submissions which means code is continually being refined and improved upon, and then by the end we have something very tangible to hand over.”
Governments might also find they can command the attention of data talent on a specific problem in a competition format, which they might struggle to sustain in a conventional employment role. “A lot of the top data scientists winning these competitions and submitting the top algorithms might not necessarily be the people who are employed by governments,” says Celina Lee. “Data scientists can get bored if they’re not constantly solving problems!”
Zindi’s competition roster includes a Yoruba-to-English translation model, deep fake detection, air quality monitoring, and tourism spending prediction, with past projects on using satellite data to identify informal settlements and identifying traffic accident hotspots.
Connecting the Dots
Lee and Chung’s interest in bringing data science into governance came from their respective experiences in seeing the gap between talent and the public sector.
Example leaderboard from DrivenData’s machine learning challenge platform
“Early in my career, I worked for non-profits in the international development and education space,” recalls Chung. “I had no technical skills, I didn’t take a single computer science class, but I realized there was almost no talent with technical skills in the field. There was a lot of measurement and evaluation requirements, and we were stuck using Excel. Working like that can be slow and painful!” Her partner was a computer programmer and showed her the basics of Python programming. “I thought, why aren’t we all using this?”
She later studied at the University of Chicago’s computational analysis and public policy program. “It was really about teaching data science to people who had an interest in public service. It was a phenomenal program. I learned everything I do now in that course.”
She thought there would be a lot of government demand for data science talent but “in reality, governments haven’t quite caught up and so while there is a lot of interest in these types of positions, there hasn’t been enough positions for people to land in government, doing the types of problems that they want to be working on.”
For Lee, who also worked in international development specializing in financial inclusion, celebrating and supporting the African data science community was an essential motivator in forming Zindi. She saw a growing community of “meetup groups, AI clubs in universities, and organic self-organizing communities of data scientists across the continent yet the global artificial intelligence market is largely dominated by Silicon Valley. It was incredibly important for me that we gave a platform and a space for African data scientists to tackle African problems and global problems too.”