This month (19th and 20th of June 2024) Common Crawl Foundation members Thom Vaughan and Pedro Ortiz Suarez attended the conference AI_dev: Open Source GenAI & ML Summit in Paris, France, organized by LF AI & Data (Linux Foundation).
We had the great privilege of meeting some of the brightest minds in the fields of Artificial Intelligence, Machine Learning, and Open Source Software.
The conference speakers discussed the enormous potential of Artificial Intelligence and its applications across numerous industries, ranging from advancements in Machine Learning algorithms and practical applications, to the ethics of AI deployment.
The conference featured workshops and technical sessions covering a range of topics, but all with focus on Open Source solutions.
Talk: “Navigating the Ethical Landscape: Responsible AI in Practice”
This panel included:
- Adrián González Sánchez (Professor at HEC Montréal and Instituto de Empresa Madrid)
- Mirko Boehm (Community Development, Linux Foundation Europe)
- Oita Coleman (Senior Advisor at Open Voice TrustMark Initiative)
- Pedro Ortiz Suarez (Senior Research Scientist at Common Crawl)
The panel moderator and presenter was Anni Lai (Head of Open Source Operations at LF AI & Data Foundation).
As a panelist on this talk, Pedro highlighted how Common Crawl’s vast repository of web data can be used responsibly to train large language models. He stressed the need for transparency, fairness, and accountability in data collection and usage, as well as mentioning some potential difficulties.
The panel's comments emphasized the importance of diverse and unbiased training data to ensure AI systems are fair and unprejudiced.
We look forward to implementing the insights we gained at the conference and continuing our work to make data more accessible and equitable for everyone.
If you have any questions or want to discuss any of these topics further, please feel free to join our discussions on Google Groups and Discord.