Staff & Volunteers

Rich Skrenta — Director

Rich Skrenta is an experienced technologist and serial entrepreneur with a background in the search and social spaces. He was founder and CEO of Blekko, a web search engine; the Open Directory Project, an innovative community-edited search platform; Topix, a news aggregator combined with a social forum; and Tobiko, a restaurant recommendation platform. Rich has also run web-scale crawling and machine learning teams for several large organizations, including Meta, IBM Watson, and Netscape/America Online.

Rich has a BA in Computer Science from Northwestern University, and is the author of a number of open-source software projects.

Greg Lindahl — Engineer

Greg Lindahl is an astrophysicist and technologist who has advised Common Crawl since 2013. He was previously the Founder and CTO of Blekko, an internet search engine. Before joining Common Crawl full-time in 2023, Greg was a member of the Event Horizon Telescope Collaboration, working at the Center for Astrophysics | Harvard & Smithsonian. He has also contributed to the Wayback Machine at the Internet Archive. Greg is an experienced entrepreneur who has founded three successful startups. He began his industry career at the quantitative trading firm D. E. Shaw & Co.

Greg has a MA in Astronomy from the University of Virginia and a BA in Physics and Math from Brandeis University.

Sebastian Nagel — Engineer

Sebastian is a programmer and computational linguist living in the south of Germany.  He is responsible for running and maintaining the crawler and will support you to use the data.  Prior to joining Common Crawl, he worked as software developer at Exorbyte, implementing search and data quality solutions.

Sebastian holds a PhD in computational linguistics from University of Munich.  He studied linguistics (Slavic languages) and cultural anthropology in Munich, Kazan and Prague.  Sebastian is a committer of Apache Nutch and a member of the Apache Software Foundation.

 

Board of Directors

Gil Elbaz — Chairman

Gil Elbaz is an accomplished entrepreneur and investor and a pioneer of natural language technology. In 1998, Gil co-founded Applied Semantics, the original developer of AdSense. In 2003 the company was acquired by Google and Gil became Engineering Director for Google’s new LA office. In 2007, Gil founded Factual to democratize access to quality data. In 2020, Factual merged with Foursquare and today Gil is Co-Chairman of the board of a combined entity which generated $150m in combined revenue at the time of the merger. With David Waxman, Gil launched TenOneTen Ventures in 2013 and raised their first true fund in 2014. He joined TenOneTen on a full time basis upon relinquishing his CEO role in 2020.

Gil has also been very active in the non-profit arena. Most notably, in 2007 he founded the Common Crawl Foundation which provides a petabyte-scale web crawl free of cost. He also sits on the Board of Directors of XPRIZE Foundation which leverages the power of competition to catalyze innovation.

Gil earned his bachelor’s degree from the California Institute of Technology with a double major in Engineering & Applied Science and Economics.

Gil Elbaz

Carl Malamud — Secretary and Treasurer

Carl Malamud is the President of Public.Resource.Org and has been on the Common Crawl Board of Directors since 2008. Malamud was the founder of the Internet Multicasting Service where he was responsible for creating the first Internet radio station, for putting the U.S. Securities and Exchange Commission’s EDGAR database on-line, and for creating the Internet 1996 World Exposition. Malamud is the author of eight books, was a visiting professor at the MIT Media Laboratory and was chairman of the Internet Software Consortium. He received the Berkman Award from Harvard and the Pioneer Award from the EFF.

Carl Malamud

Nova Spivack

Nova Spivack is a technology futurist and serial entrepreneur whose career has spanned more than 20 ventures, and 2 IPOs. He has been on the Common Crawl Board of Directors since 2008. He is one of the leading voices on the next-generation of search, semantic web technology, and social media. He works as a producer of emerging technology ventures including most recently, Klout, Bottlenose, Live Matrix, and The Daily Dot. In 1994, he co-founded one of the first Web startups, EarthWeb, which led to a record-breaking IPO in 1998 and a second IPO of Dice. He founded Lucid Ventures in 2001 and the semantic web venture Radar Networks which launched Twine.com. Spivack worked with Stanford Research International (SRI), to conceive and co-found their global business incubator, nVention, and participated in the DARPA CALO program, the most ambitious artificial intelligence project in US history. He has authored more than 30 pending and granted patents in the areas of search, semantics, and personalization and is considered a leading pioneer of next-generation web technology. Nova believes strongly in an open web, open standards, and supports numerous charitable causes around the world.

Nova Spivack

Advisory Board

Kurt Bollacker — Advisor

Kurt is a computer scientist with a research background in the areas of machine learning, digital libraries, semantic networks, and electro-cardiographic modeling. He received a Ph.D. in Computer Engineering from The University Of Texas At Austin. He was co-creator of the CiteSeer research tool as a visiting researcher at the NEC Research Institute, the technical director of the Internet Archive, and a biomedical research engineer at the Duke University Medical Center. His was Chief Scientist at Metaweb Technologies until Feb, 2009. He is currently pursuing research on long term digital archiving as the Digital Research Director at the Long Now Foundation as well as serving as a consulting Data Scientist at InfoChimps. Kurt is a philanthropist active with many organizations that promote openness, transparency and preservation. As an Advisor at Common Crawl, he provides the organization with valuable advice and insight into the crawl technology, big data processing, open innovation, products and collaborations.

Kurt Bollacker

Kevin DeBré — Legal Counsel

Kevin is a highly respected Intellectual Property (IP) attorney who has continually worked at the forefront of the evolving IP landscape. He is a Partner and the Chairman of the Intellectual Property and Technology Transactions Department at the firm Stubbs, Alderton and Markiles. He is an author and frequent speaker on technology commercialization and intellectual property licensing. Kevin was selected in 2009, 2008 and 2006 as a Southern California Super Lawyer and is the founder and Chair of the Licensing Interest Group of the California State Bar Intellectual Property Section. Prior to joining Stubbs Alderton & Markiles, LLP, Kevin was a partner in leading international law firms, including Brobeck Phleger & Harrison, LLP, where he headed the firm’s technology transactions practice in Southern California. After law school, he served as a judicial law clerk for Hon. John G. Davies, United States District Court for the Central District of California. Kevin received his J.D. from Hastings College of the Law and his B.S. degree from the University of California, Davis.

Kevin DeBré

Jim Hendler — Advisor

Jim Hendler is the Tetherless World Professor of Computer and Cognitive Science and the Head of the Computer Science Department at RPI. He is also a faculty affiliate of the Experimental Multimedia Performing Arts Center (EMPAC), serves as a Director of the UK’s charitable Web Science Trust and is a visiting Professor of Computer Science at DeMontfort University in Leicester, UK. Hendler has authored about 200 technical papers in the areas of Semantic Web, artificial intelligence, agent-based computing and high performance processing. One of the early innovators of the “Semantic Web,” Hendler was the recipient of a 1995 Fulbright Foundation Fellowship, is a former member of the US Air Force Science Advisory Board, and is a Fellow of the American Association for Artificial Intelligence, the British Computer Society, the IEEE and the AAAS. He is also the former Chief Scientist of the Information Systems Office at the US Defense Advanced Research Projects Agency (DARPA) and was awarded a US Air Force Exceptional Civilian Service Medal in 2002. In 2010, Hendler was named one of the 20 most innovative professors in America by Playboy magazine and was selected as an “Internet Web Expert” by the US government working with the data.gov project. He is the Editor-in-Chief emeritus of IEEE Intelligent Systems and was the first computer scientist to serve on the Board of Reviewing Editors for Science. In 2012, he was one of the inaugural recipients of the Strata Conference “Big Data” awards for his work on large-scale open government data, and he is a columnist and associate editor of the Big Data journal.

Jim Hendler

Peter Norvig — Advisor

Peter Norvig is Director of Research at Google and a Fellow of the American Association for Artificial Intelligence and the Association for Computing Machinery. From 2002-2005 he was Director of Search Quality, responsible for the core web search algorithms. Previously he was the head of the Computational Sciences Division at NASA Ames Research Center, making him NASA’s senior computer scientist. He has served as an assistant professor at the University of Southern California and a research faculty member at the University of California at Berkeley Computer Science Department, from which he received a Ph.D. in 1986 and the distinguished alumni award in 2006. He has over fifty publications in Computer Science, concentrating on Artificial Intelligence, Natural Language Processing and Software Engineering, including the books Artificial Intelligence: A Modern Approach (the leading textbook in the field), Paradigms of AI Programming: Case Studies in Common Lisp, Verbmobil: A Translation System for Face-to-Face Dialog, and Intelligent Help Systems for UNIX. He is also the author of the Gettysburg Powerpoint Presentation and the world’s longest palindromic sentence.

Peter Norvig

Jennifer Pahlka — Advisor

Jennifer Pahlka is the founder, executive director and board chair of Code for America. Previously, she ran the Web 2.0 and Gov 2.0 events for TechWeb, in conjunction with O’Reilly Media, and co-chaired the successful Web 2.0 Expo. Before that, she spent eight years at CMP Media where she ran the Game Developers Conference, Game Developer magazine, and Gamasutra.com; there she also launched the Independent Games Festival and served as Executive Director of the International Game Developers Association. Jennifer’s early career was spent in the non-profit sector. She is a graduate of Yale University and lives in Oakland, California with her daughter and six chickens.

Jennifer Pahlka

Boris Shimanovsky — Advisor

Boris took his first programming course in the seventh grade and wrote a video game as his first class project. In support of open data and the open web, he generously donates his time to advise Common Crawl on technical matters and to share his visionary insight on the digital ecosystem. Boris is currently employed as the Director of Engineering at Factual where he’s working to bring greater accessibility and transparency with and to data. Prior to joining Factual, he was CTO at Xap, which he helped build from eight employees to a thriving business with millions of users, and over a hundred employees. His junior year in college, he built a site that quickly became the web’s 27th most trafficked and then optimized it enough to run from two machines in his den. He’s not retired now because he really wanted to graduate. Boris holds a BS in Physiological Science and an MS in Computer Science, both from UCLA. He put himself through school working night shifts as an ambulance driver. Boris lives in Los Angeles with his badass wife and amazing boys. He is a lousy philosopher, a mediocre poet, and will someday become a great chef.

Boris Shimanovsky

Pete Skomoroch — Advisor

Pete Skomoroch is a Principal Data Scientist at LinkedIn in Mountain View, CA, focused on reputation systems, collaborative filtering, and building data driven products. He leads a team of Data Scientists focused on Identity at LinkedIn and was the inventor of LinkedIn Skills. Prior to LinkedIn, he was based in Washington, DC where he focused on mining insights from search query data as the Director of Advanced Analytics at Juice Analytics and as a Sr. Research Engineer at AOL Search. While in DC, he also founded DataWrangling.com which provided custom data mining solutions to clients in bioinformatics, finance, and cloud computing. He spent the previous 6 years in Boston implementing Biodefense pattern detection algorithms for streaming sensor data at MIT Lincoln Laboratory and constructing predictive models for large retail datasets at Profitlogic (now Oracle Retail). Pete has a B.S. in Mathematics and Physics from Brandeis University and did graduate coursework in machine learning at MIT.

Pete Skomoroch

Danny Sullivan — Advisor

Widely considered a leading “search engine guru,” Danny Sullivan has been helping webmasters, marketers and everyday web users understand how search engines work for 15 years. Danny’s expertise about search engines is often sought by the media, and he has been quoted in places like The Wall St. Journal, USA Today, The Los Angeles Times, Forbes, The New Yorker and Newsweek and ABC’s Nightline. Danny began covering search engines in late 1995, when he undertook a study of how they indexed web pages. The results were published online as “A Webmaster’s Guide To Search Engines,” a pioneering effort to answer the many questions site designers and Internet publicists had about search engines. Danny currently heads up Search Engine Land, which covers search marketing and search engine news. He produces the SMX: Search Marketing Expo conference series, writes a personal blog called Daggle.

Danny Sullivan

Pete Warden — Advisor

Pete Warden is a British-born programmer living in San Francisco. After spending over a decade as a software engineer, including 5 years at Apple, he’s now focused on a career as a mad scientist. He is currently gathering, analyzing and visualizing the flood of web data that’s recently emerged, trying to turn it into useful information without trampling on people’s privacy. Pete is the current CTO of Jetpac, a site for sharing travel photos, tips, and guides among friends. Passionate about large-scale data processing and visualization, he writes regularly on the topic on his blog and as a regular contributor to O’Reilly Radar.

Pete Warden