Data Acquisition: All High Schools in Belgium

I once worked in a Belgium-based educational NGO that cooperated a lot with high schools all around the country. For outreach purposes, we always wished to have a list of contact information for all the educational institutions in Belgium but such a list seemed too perfect to exist. But after all, it’s public information that surely exists somewhere. So let’s scrape them!

Schools in Antwerp province; Screenshot from source

Background

Firstly, a little bit of context about how the educational system in Belgium (and the country itself) works because it’s a tad complicated. As you may know, Belgium is divided into 3 semi-autonomous regions: Flanders, Wallonia and Brussels. What you may not know is that it’s also divided into 3 communities, based on what languages they speak. So there’s a Flemish community that spans all Flanders and parts of Brussels where they speak Dutch (Flemish), a French community covering most of Wallonia and the remaining parts of Brussels, and there’s also a small German-speaking community comprising or several towns and villages in the Eastern end of Wallonia.

Data Acquisition

It quickly turned out that acquiring the contact information about schools in the Flemish and German-speaking communities is quite simple and straightforward. The most complicated case was the French community and it required coding a web scraper. So let me walk you through all three stories.

Flemish community

Finding the contact information for Flemish (Dutch-speaking) schools was the easiest because the Flemish community government has some quite advanced solutions already in place. First of all, the education ministry has a proper API Portal from where you can get some useful data. But more importantly, they have lists of all the schools and their contact information ready to be downloaded as CSV!

Screenshot from https://data-onderwijs.vlaanderen.be/onderwijsaanbod/lijst?n=2&hz=true&hs=311

German-speaking community

This community is the smallest one in Belgium comprising of only 9 municipalities and approximately 78.000 inhabitants. This small size has a certain disadvantage because the informal system of the community’s government isn’t that advanced as, for example, in Flanders. But it also has one major advantage — there are only a few schools there, so it’s feasible to find their websites and manually copy the contact details.

Contact information for secondary schools in the German-speaking Community of Belgium

French community

Since the government of the French community isn’t known for very sophisticated IT systems, I expected this part of the data acquisition to be the most complicated, and to a certain extent, I was right. To say the least, taking the screenshots you see below took me literally hours today because the official website kept crashing every couple of seconds without an apparent reason.

Screenshot from source
Screenshot from http://www.enseignement.be/index.php?page=24797&etab_id=179
Screenshot from source
  1. Got to the website with the table;
  2. Find the table, knowing that its ID is ‘liste_etablissements’;
  3. Iterate through all the rows (<tr>) in the table;
  4. For each row, extract the cells (<td>);
  5. If the first cell is equal to our name and the second cell to our address, we have found our school!
  6. In this row, find the only link (<a>) and save it.
Screenshot from http://www.enseignement.be/index.php?page=24797&etab_id=179

Data Scientist @ Q-Park