Web Scraping and Its Potential for Government and Beyond

Web Scraping and Its Potential for Government and Beyond

With increasing focus on how local organizations contribute to USAID-funded activities, it’s important to look at new ways of applying known technologies. Web scraping — which refers simply to extracting data from a website, collecting it, and then exporting it into a spreadsheet, API, or other format useful for a particular purpose or application — is one such technology. On a project in Cambodia, The Cloudburst Group demonstrated the usefulness of web scraping to gather evidence on the effectiveness of USAID capacity building to strengthen local organizations.  

USAID’s Cambodia Small Business Applied Research (SBAR) Award: Local Organizations – Movement Towards Self-Reliance (LO-MTSR) Activity, implemented by The Cloudburst Group, PartnersGlobal, and Devlab@Penn (formerly Duke), designed an innovative, first-of-its-kind experimental evaluation to measure the scalability of an existing intervention on increasing organizational resiliency to closing civic spaces, by deepening the connectivity of civil society organizations (CSOs) to their communities, other CSOs, and potential funders. The program involved the evaluation of a series of capacity-building activities and training on topics including financial diversification, social media and marketing, well-being in the workplace, and contingency planning.  

As one component of the program, CSOs throughout Cambodia working across health, education, food security, environment, and democracy, rights, and governance were provided coaching on how increasing social network connections, including with global and local donors, could diversify revenue and influence sustainability. Organizational-level coaching was provided on entrepreneurial strategies for increasing network connections via leveraging social media for promotion, marketing, and networking. To measure the impact of enhanced network connections within a social network analysis framework, the evaluation team utilized innovative social media data collection methodologies to gather and analyze information on CSOs’ web presence. The evaluation team developed sophisticated Python scripts that mined a huge volume of social media data, mostly from Facebook, to gauge changes in public engagement via social media (measured by the number of likes, comments, and shares). 

On the more than 20,000 social media records mined, the team cleaned and analyzed the data and measured indicators of an organization’s connectedness within a social network analysis framework including with global and local donors. This approach allowed the team to both compile a detailed account of each organization’s social media presence throughout the three-year lifespan of the project and gauge how organizations’ social media presence responded to both external events (COVID-19 lockdowns, elections, major public holidays, etc.) and the project’s social media trainings. 

Though organizations learned new skills that may ultimately benefit their resiliency, findings indicate that the training in social media use did not positively or negatively impact network connectedness. These findings gave novel insights into the challenges of designing and implementing effective social media capacity-building activities. Results provided to USAID highlighted opportunities and challenges in supporting a vibrant and resilient civil society able to participate in and reap the benefits of future USAID funding.  

Web Scraping as a Tool for the Future…

There is great potential for applying this type of innovative data collection in other ways. For example, social media scraping could be used to identify patterns in the spread of dis- and malinformation. Understanding certain patterns and the context around the spread of disinformation may help identify opportunities for programming by USAID and others to combat it. It has also been used in the past to provide real-time information for monitoring a variety of programs, allowing those programs to adapt and respond to changing conditions. Other opportunities include identifying mechanisms to support networks of activists and human rights defenders. While social media is often used as a means by which to identify and prosecute these vulnerable groups, strong social networks also contribute to knowledge sharing and improved security for them. 

Women's History Month In Recognition of Transgender Day of Remembrance 2022