Result Page - Data Readiness Lab
Bridging the Data Gap in Organizations
The Data Readiness Lab is more than a project; it's a commitment to empowering organizations to harness the full potential of their data. Whether you are starting your AI journey or you are a professional, our lab's deliverables are designed to guide you through the multifaceted realm of data-intensive processes.
The Data Readiness Lab has been initiated with a core vision: to equip organizations and companies with the necessary skills, tools, framework, and resources to enhance their data readiness. Recognizing the pivotal role of data in realizing projects, our lab's outcomes are relevant for both the public and private sectors.
A compilation of four case studies that shed light on the intricacies of acquiring and utilizing data, emphasizing aspects like data availability, validity, and utility. These studies draw experiences from:
- Strängnäs Kommun
- Sveriges Kommuner och Regioner (SKR)
- Ekonomistyrningsverket (ESV).
Insights from diverse organizations at varying stages of data maturity provide valuable takeaways. Through the provided examples, we want to offer insights and encourage thoughtfulness, and discussions on best practices, contribute to better decision-making in AI applications, and raise awareness of these issues as they are often overlooked.
For a holistic understanding, we recommend beginning with these case studies. Additionally, to assess the readiness of your chosen dataset, consider the data maturity analysis method that we used in our lab:
- Roots of the Data Readiness Assessment Method:
Our approach to determining data maturity traces back to a joint venture with Gavagai and AI Sweden (see the paper “We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing”). The method evaluates data from the perspectives of availability, validity, and usability. This method has been tested during the project and also made accessible in an Excel format.
2: Text Annotation Handbook (In English)
A practical guide for machine learning projects. This handbook is a hands-on guide on how to approach text annotation tasks. It provides a gentle introduction to the topic, an overview of theoretical concepts as well as practical advice. The topics covered are mostly technical, but business, ethical and regulatory issues are also touched upon. Experience with annotation and knowledge of machine learning are useful but not required. The document may serve as a primer or reference book for a wide range of professionals such as team leaders, project managers, IT architects, software developers and machine learning engineers.
3. Tools for Practitioners:
nerblackbox, an open-source tool that can be locally installed and that focuses on Anonymization and pseudonymization through Named Entity Recognition (NER). It uses name recognition to identify, delete, or replace personal details. Link to GitHub
nerblackbox is also documented in this paper:
Felix Stollenwerk. 2023. nerblackbox: A High-level Library for Named Entity Recognition in Python. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 174–178, Singapore. Association for Computational Linguistics.
Tool for cross-annotation:
x-annotate, a project management tool for cross-annotation with popular annotation frameworks. It provides a simple command line interface (CLI) that lets you split data between annotators, merge annotations and resolve conflicts. Link to GitHub
4. Training Material on Data Readiness
This material is available on our platform MyAI, sign up for free to access our resources on data readiness.
The training aims to provide an overview and the opportunity to get started with your project by taking a closer look at the lessons learned and methods used in the Data Readiness Lab.
The material is divided into three parts where you can easily choose what suits you, in whichever order you prefer.