The following was written by CTO Malcolm Kapuza, who explains how alternative finance data in the Capital Finder is collected, how funders are categorized, and how we use natural language processing and machine learning to enhance our database.
As I mentioned in the first part of this three part series, the Capital Finder has many useful applications. In this second part, I go into detail about where we source our different alternative finance data points and how we turn this data into one coherent and intuitive view of developing world alternative finance.
This is a semi-technical explanation to show what’s “under the hood,” and to demonstrate what we do at AlliedCrowds that is so different from other organizations in this space.
Sourcing and maintaining the data
The key to a product like the Capital Finder is having the most current and accurate data in the industry. There are a couple of ways that we stay on top of this to ensure that our clients are able make strategic decisions and inform research based on the best available data out there.
Firstly, we have a team of multilingual analysts, spread throughout the developing world. Local knowledge is crucial in the sourcing process, as it minimizes language barriers and unforeseen geographic constraints. Our analysts have deep understanding of the alternative finance space, meaning finding new capital providers is relatively easy.
Secondly, we have developed deep industry connections through our time analyzing and researching this space, which means that we are one of the first to know when a new funder opens up, or an existing one expands into one of our target geographies. .
Finally, we are constantly pushing the envelope when it comes to innovating with technology solutions at AlliedCrowds. Therefore, we have developed programmatic processes that flag new capital providers that emerge and alert us when certain information in our database has gone stale.
The quality and accuracy of our alternative finance data is only one aspect of what we do. The other is our ability to pair relevant projects with capital providers, as well as categorize capital providers based on continually changing criteria.
Given the sheer size of our data and the relative small size of our team, we depend on cutting edge technologies to make this possible. I will outline 3 use cases to give some idea of why this works so well and how we’re able to do it with minimal resources.
Alternative finance data collection
AlliedCrowds has streamlined and continues to improve our data collection process. Our main focus is to increase automation, while also improving accuracy and data integrity.
We gather text from thousands of websites and millions of web pages, creating a data warehouse of text. Recent developments in Natural Language Processing (NLP) have allowed us to rapidly and continuously improve our view of the space, because it takes only moments to process all of this text rather than months to revisit every website individually. This means that our insights are faster, more accurate and more scalable than if this analysis were completed manually.
An example of the sort of data we can showcase using the Capital Finder.
Additionally, we source data from 3rd party APIs in order to get a more complete and accurate view of each capital provider. These sources are invaluable, each giving us unique and actionable data, and each working as a trusted check and balance to allow us to spot anomalies. These include social media platforms, news agencies, development institutions, public records, etc.
We crowdsource information that we cannot collect programmatically and which would be prohibitively expensive to collect through analysts. We have used technology to create straightforward ways for providers from our database to deliver us valuable information. This streamlined process is simple and incentivized with increased visibility on the Capital Finder and inbound traffic, which has lead to high engagement.
As our technology advances, we have begun to reduce the workload of our analyst team. For instance, analysts do not fill in country- and sector-level information, because we have found our algorithms to be more accurate, faster, and more cost efficient than their user input input. This reduces human error in our data and allows us to scale massively.
Through this process, we are close to eliminating much of the decision-making from our analyst roles. The goal is for every entry to be fact based and subsequently fact-checked, which eliminates the need for timely and costly training and removes ambiguity from the data collection processes.This also ensures scalability and consistency throughout the system, as well as a much lower maintenance costs, as we eliminate dependence on any given analyst’s specific skill set or expertise. The system channels most of the decision-making and reasoning to the highest level and therefore allows the Capital Finder to be managed centrally. Anything that an analyst does to improve the system gets spread throughout all 138 countries, as well as all sectors and provider types.
The advances have allowed us to create a distinctly effective matching system. With our data pool, we are able to comb through millions of web pages to target specific keywords and phrases on these sites and since we track social followings, recent trends, public filing information, and unique provider statistics, we are able to gauge the suitability of a certain project to each provider.
The goal of our matching algorithm is much like that of Google’s Pagerank algorithm. When you use Google to query a phrase, Google returns not only every website that is relevant to your search, but orders them based on which it has determined will be most relevant. Since this algorithm is so effective, you rarely look beyond the first or second search result. We are making it so that finding a capital provider is just as easy.
Stay tuned for the next post where I discuss in more technical detail the nuances of what makes the Capital Finder so effective.