Presenting a paper at an international conference is a great exposure and experience. For the first time in two years, the 26th International Conference on Theory and Practice of Digital Libraries (TPDL 2022) was held in person. TPDL 2022 was hosted by the University of Padua, Italy.
Abdul Aziz (ESR 8) and Dagoberto Herrera (ESR 2) combinedly attended and presented their contribution Analysing User Involvement in Open Government Data Initiatives. The full proceedings have been published in the Lecture Notes in Computer Science (LNCS).
The conference offered a diverse range of talks, covering topics such as Open Data, Open Science, Information Retrieval and Access, Knowledge Discovery and Representation in Digital Libraries and Semantic Web Technologies and Linked Data for Digital Libraries. The first day started with a Linked Archives workshop followed by a Doctoral Consortium. To kick off TPDL 2022, Roberto di Cosmo (@rdicosmo), author of the article “Should We Preserve the World’s Software History, and Can We?”, gave the opening keynote address with the title “Why we must preserve the world’s software history, and how we can do it”. During his keynote, di Cosmo emphasised that software source code is a priceless treasure that must be shared for Open Science to succeed, as well as for security and transparency.
Following the opening keynote session, the first session, on Web Archiving, began. “Fine-Tuning BERT models to extract Named Entities from Archival Finding Aids” was presented by Lus Filipe da Costa Cunha of the Department of Informatics at University of Minho. The work they have done enhances the NER model for the Portuguese language. They built an application programming interface, a website, and an automated annotation tool.
After that there was a Booster Session where many scholars presented their findings in the field of Web Archiving and Scholarly communication. The most impressive talk was “Is it possible to identify content shifts using titles on the web?” Using Web archives and linguistic similarity, Brenda Reyes Ayala of the University of Alberta conducted a talk titled “Detecting content drift on the Web”. Her proposal to use Web page names to identify content drift was supported by research showing 92.1% recall across three collections. Not only was the execution time low compared to previous content drift detection techniques, but the detection accuracy was also high.
Apart from the web archival and scholarly communication, another extensive and the most memorable talk as a second keynote address was given by the Greek researcher Georgia Koutrika from the Athena Research Center on “What if we could simply speak to our data?”. The topic of her lecture was the need of “democratising data access” where she discussed the need to make data readily available and usable by people. She started by noting the growing visibility of the advantages of data exploration and the growing significance of data in the world at large. As the title suggests, the main focus of her presentation was on how a human user of a system (termed an intelligent data assistant) may interact naturally with the data. The talk was informative and engaging, and we learned a great deal about the current state of big data research.
Following the keynote session, the first session, on FAIR and Open Data started where we Abdul Aziz (ESR 8) and Dagoberto Herrera (ESR 2) combinedly presented our contribution namely: Analysing User Involvement in Open Government Data Initiatives. Our goal is to familiarise the audience with how Open Data Initiatives rely on the traditional supplier-driven data catalogues instead of adopting a user-driven model. Moreover, we investigated how people in the EU interact with Open Data Portals. The results of our research encourage portals to adopt more user-centric strategies.
Soon after the lunch, Booster Session started followed by the session on Text Analysis and extraction. During the booster session presenters of papers accepted for the “Accelerating Innovation Papers” track gave brief 3-minute talks. At the conclusion of the day, each of these papers was represented by a poster shown at the poster session. Eight speakers presented their innovative research at this meeting.
We found an interesting talk during the Text Analysis and extraction session, which stated that there isn’t always a standard way for funding to be commended in articles. How can we automatically recognize the funders? A talk from stackOcean’s Jonas Mielck titled “Extracting funder information from scientific articles – experiences with question answering” discussed the stackOcean’s work in this area. Rule-based methods, regular expressions, and language models and machine learning are the most common techniques for dealing with such a challenge. As a result of using a question-and-answer strategy, they were able to improve accuracy to ~0.8.
The next session was on Open Science where Esteban González from the Polytechnic University of Madrid gave the opening talk titled “FAIROs: Towards FAIR assessment in Research Objects,” he shared his research findings. There were a few other researchers who presented their work. Mónica Marrero of the Europeana Foundation gave a talk titled “Implementation and Evaluation of a Multilingual Search Pilot in the Europeana Digital Library” to kick off the next session which was basically about NLP and Recommendation.
TPDL 2022 concluded after the last session on Research and CH Data. The University of Zadar, in Croatia, will host the next TPDL2023.
In conclusion, TPDL 2022 has been a remarkable event that provided a glimpse into the future of Open Science and Open Data with enhanced web archives. For us, this was our first experience at an in-person academic conference. We were able to present our findings to an attentive audience and network with peers from all around the world.
Authors:
Abdul Aziz
University of Zaragoza, Spain
Dagoberto Herrera
University of Zaragoza, Spain