The structure of the data made it nearly impossible to craft RegEx queries that didn’t also capture irrelevant noise.
Character recognition was inconsistent, letters were often misread as other letters or punctuation marks.
Headers were poorly detected across the text files, making it difficult to segment the content accurately.
Images didn’t consistently align with the corresponding content in the same file.
We created a website to visualise our investigation. The platform highlights five frequently referenced subjects in the Encyclopaedia: Anatomy, Architecture, Agriculture, Botany, and Chemistry.Each field is represented by an image. Hovering over an image reveals the number of references to that subject across different editions, offering a glimpse into how interest in certain topics changed over time. In the lower-left corner, we included a comparative visualisation of topic popularity. This element was refined based on feedback from our data holder, who noted that the concepts of “popularity” and “reference count” were originally separated across pages and could lead to misinterpretation if not visually integrated.