Staff & Volunteers
Sara Crouse — Director
Sara is driven by the ideal of universal access to knowledge in the digital age. To apply this in practice, she promotes the development and online use of open platforms, tools, and resources that empower people everywhere to freely learn, create, collaborate, and innovate. Over the course of two decades, Sara has worked extensively with organizations in technology and education including the Wikimedia Foundation, Creative Commons, Wiki Education Foundation, and Cambridge University. Sara’s core experience is in optimizing resources and partnerships to efficiently sustain and scale technology nonprofits in a for-profit world.
Sara received her Masters from New York University, and her Bachelors from Georgetown University. She is on the board of the Global Lives Project.
Sebastian Nagel — Crawl Engineer & Data Scientist
Sebastian is a programmer and computational linguist living in the
south of Germany. He is responsible for running and maintaining the
crawler and will support you to use the data. Prior to joining Common
Crawl, he worked as software developer at Exorbyte, implementing
search and data quality solutions.
Sebastian holds a PhD in computational linguistics from University of
Munich. He studied linguistics (Slavic languages) and cultural
anthropology in Munich, Kazan and Prague. Sebastian is a committer of
Apache Nutch and a member of the Apache Software Foundation.
Website updates are underway! Active volunteers and consultants will be listed here soon…
Board of Directors
Gil Elbaz — Chairman
Gil Elbaz is an accomplished entrepreneur and pioneer of natural language technology. In 1998, Gil co-founded Applied Semantics Inc. (ASI) which developed contextual advertising products including ASI’s AdSense. In 2003, Google acquired Applied Semantics and Gil stayed on as the Engineering Director for the Santa Monica office to continue to development of AdSense and other products. Prior to founding ASI, Gil worked in engineering roles at IBM, Sybase and SGI.
In 2007, Gil left Google to explore new ideas and passions and went on to found Factual, an open data aggregation platform. Factual launched its public beta in October 2009 in an effort to maximize data accuracy, transparency, and accessibility. In December 2010, Factual raised a $25m Series A round led by Andreessen Horowitz and Index Ventures to build upon its mission to make structured data broadly accessible to the development community.
Gil earned his bachelor’s degree from the California Institute of Technology with a double major in Engineering & Applied Science and Economics. Active in a number of non-profits, he serves on the Boards of Trustees at the California Institute of Technology as well as the X Prize Foundation. Gil is also the founder and chairman of of the board of directors for the Common Crawl Foundation and is strongly committed ensuring a truly open web.
Carl Malamud — Secretary and Treasurer
Carl Malamud is the President of Public.Resource.Org and has been on the Common Crawl Board of Directors since 2008. Malamud was the founder of the Internet Multicasting Service where he was responsible for creating the first Internet radio station, for putting the U.S. Securities and Exchange Commission’s EDGAR database on-line, and for creating the Internet 1996 World Exposition. Malamud is the author of eight books, was a visiting professor at the MIT Media Laboratory and was chairman of the Internet Software Consortium. He received the Berkman Award from Harvard and the Pioneer Award from the EFF.
Nova Spivack is a technology futurist and serial entrepreneur whose career has spanned more than 20 ventures, and 2 IPOs. He has been on the Common Crawl Board of Directors since 2008. He is one of the leading voices on the next-generation of search, semantic web technology, and social media. He works as a producer of emerging technology ventures including most recently, Klout, Bottlenose, Live Matrix, and The Daily Dot. In 1994, he co-founded one of the first Web startups, EarthWeb, which led to a record-breaking IPO in 1998 and a second IPO of Dice. He founded Lucid Ventures in 2001 and the semantic web venture Radar Networks which launched Twine.com. Spivack worked with Stanford Research International (SRI), to conceive and co-found their global business incubator, nVention, and participated in the DARPA CALO program, the most ambitious artificial intelligence project in US history. He has authored more than 30 pending and granted patents in the areas of search, semantics, and personalization and is considered a leading pioneer of next-generation web technology. Nova believes strongly in an open web, open standards, and supports numerous charitable causes around the world.
Kurt Bollacker — Advisor
Kurt is a computer scientist with a research background in the areas of machine learning, digital libraries, semantic networks, and electro-cardiographic modeling. He received a Ph.D. in Computer Engineering from The University Of Texas At Austin. He was co-creator of the CiteSeer research tool as a visiting researcher at the NEC Research Institute, the technical director of the Internet Archive, and a biomedical research engineer at the Duke University Medical Center. His was Chief Scientist at Metaweb Technologies until Feb, 2009. He is currently pursuing research on long term digital archiving as the Digital Research Director at the Long Now Foundation as well as serving as a consulting Data Scientist at InfoChimps. Kurt is a philanthropist active with many organizations that promote openness, transparency and preservation. As an Advisor at Common Crawl, he provides the organization with valuable advice and insight into the crawl technology, big data processing, open innovation, products and collaborations.
Kevin DeBré — Legal Counsel
Kevin is a highly respected Intellectual Property (IP) attorney who has continually worked at the forefront of the evolving IP landscape. He is a Partner and the Chairman of the Intellectual Property and Technology Transactions Department at the firm Stubbs, Alderton and Markiles. He is an author and frequent speaker on technology commercialization and intellectual property licensing. Kevin was selected in 2009, 2008 and 2006 as a Southern California Super Lawyer and is the founder and Chair of the Licensing Interest Group of the California State Bar Intellectual Property Section. Prior to joining Stubbs Alderton & Markiles, LLP, Kevin was a partner in leading international law firms, including Brobeck Phleger & Harrison, LLP, where he headed the firm’s technology transactions practice in Southern California. After law school, he served as a judicial law clerk for Hon. John G. Davies, United States District Court for the Central District of California. Kevin received his J.D. from Hastings College of the Law and his B.S. degree from the University of California, Davis.
Lisa Green — Advisor
Lisa is motivated by a strong belief in the power of open systems to drive innovation in education, arts and research. Over the last several years she has been active in the areas of Open Access publishing, Open Science, Open Data, copyright, digital rights and policy. Lisa was Chief of Staff at Creative Commons and served as the director of Common Crawl from 2011 to 2015. She holds a PhD in physical chemistry from the University of California Berkeley.
Jim Hendler — Advisor
Jim Hendler is the Tetherless World Professor of Computer and Cognitive Science and the Head of the Computer Science Department at RPI. He is also a faculty affiliate of the Experimental Multimedia Performing Arts Center (EMPAC), serves as a Director of the UK’s charitable Web Science Trust and is a visiting Professor of Computer Science at DeMontfort University in Leicester, UK. Hendler has authored about 200 technical papers in the areas of Semantic Web, artificial intelligence, agent-based computing and high performance processing. One of the early innovators of the “Semantic Web,” Hendler was the recipient of a 1995 Fulbright Foundation Fellowship, is a former member of the US Air Force Science Advisory Board, and is a Fellow of the American Association for Artificial Intelligence, the British Computer Society, the IEEE and the AAAS. He is also the former Chief Scientist of the Information Systems Office at the US Defense Advanced Research Projects Agency (DARPA) and was awarded a US Air Force Exceptional Civilian Service Medal in 2002. In 2010, Hendler was named one of the 20 most innovative professors in America by Playboy magazine and was selected as an “Internet Web Expert” by the US government working with the data.gov project. He is the Editor-in-Chief emeritus of IEEE Intelligent Systems and was the first computer scientist to serve on the Board of Reviewing Editors for Science. In 2012, he was one of the inaugural recipients of the Strata Conference “Big Data” awards for his work on large-scale open government data, and he is a columnist and associate editor of the Big Data journal.
Eva Ho — Advisor
Eva Ho currently is the VP of Marketing & Operations at Factual, an open data platform that leverages large-scale aggregation and community exchange. Eva provides Common Crawl with valuable insight and advice on all aspects of the organization. Prior to Factual, she was a Sr. Product Marketing Manager at Google and Youtube for 5 years, and was the head of marketing for Applied Semantics, a company sold to Google in 2003. She also serves on the Board of Directors of Iridescent, a science education non-profit, First Descents a non-profit adventure outfit for cancer fighters and survivors, and Whole Child LA, a non-profit pediatric pain clinic. Eva holds an MBA from Cornell and a BA in Biology from Harvard University.
Greg Lindahl — Advisor
Greg Lindahl is currently working at the Internet Archive on adding search to the Wayback Machine web archive. He was previously Founder and CTO at blekko, a startup search engine that was acquired by IBM Watson in 2015. Greg was also Founder and Distinguished Engineer at PathScale, where he was the architect of the InfiniPath low-latency InfiniBand HCA, a descendant of which is now shipping as the Intel Omni-Path network.
Bill Michels — Advisor
Bill is currently the VP of Product Management and Partnerships at Factual. He joined in 2009 and prior to that worked at Yahoo! for about six years, the last two of which he was the GM and Senior Director of the Yahoo! Search Platform. Here he managed the product management, design, marketing and business development functions for Yahoo! BOSS, which grew from inception to over +1B queries/month. Before this he was the Director of Int’l Search Business Operations & Product Strategy, a +$1B P&L with 500+ employees. Other experience includes Equity Research Associate at UBS on the #1 ranked Institutional Investor telecom research team. At one point he wore a white coat and worked in a solid-state physics research lab. He has an MBA from Columbia University Business School and a BA in Geology from Colby College.
Peter Norvig — Advisor
Peter Norvig is Director of Research at Google and a Fellow of the American Association for Artificial Intelligence and the Association for Computing Machinery. From 2002-2005 he was Director of Search Quality, responsible for the core web search algorithms. Previously he was the head of the Computational Sciences Division at NASA Ames Research Center, making him NASA’s senior computer scientist. He has served as an assistant professor at the University of Southern California and a research faculty member at the University of California at Berkeley Computer Science Department, from which he received a Ph.D. in 1986 and the distinguished alumni award in 2006. He has over fifty publications in Computer Science, concentrating on Artificial Intelligence, Natural Language Processing and Software Engineering, including the books Artificial Intelligence: A Modern Approach (the leading textbook in the field), Paradigms of AI Programming: Case Studies in Common Lisp, Verbmobil: A Translation System for Face-to-Face Dialog, and Intelligent Help Systems for UNIX. He is also the author of the Gettysburg Powerpoint Presentation and the world’s longest palindromic sentence.
Jennifer Pahlka — Advisor
Jennifer Pahlka is the founder, executive director and board chair of Code for America. Previously, she ran the Web 2.0 and Gov 2.0 events for TechWeb, in conjunction with O’Reilly Media, and co-chaired the successful Web 2.0 Expo. Before that, she spent eight years at CMP Media where she ran the Game Developers Conference, Game Developer magazine, and Gamasutra.com; there she also launched the Independent Games Festival and served as Executive Director of the International Game Developers Association. Jennifer’s early career was spent in the non-profit sector. She is a graduate of Yale University and lives in Oakland, California with her daughter and six chickens.
Boris Shimanovsky — Advisor
Boris took his first programming course in the seventh grade and wrote a video game as his first class project. In support of open data and the open web, he generously donates his time to advise Common Crawl on technical matters and to share his visionary insight on the digital ecosystem. Boris is currently employed as the Director of Engineering at Factual where he’s working to bring greater accessibility and transparency with and to data. Prior to joining Factual, he was CTO at Xap, which he helped build from eight employees to a thriving business with millions of users, and over a hundred employees. His junior year in college, he built a site that quickly became the web’s 27th most trafficked and then optimized it enough to run from two machines in his den. He’s not retired now because he really wanted to graduate. Boris holds a BS in Physiological Science and an MS in Computer Science, both from UCLA. He put himself through school working night shifts as an ambulance driver. Boris lives in Los Angeles with his badass wife and amazing boys. He is a lousy philosopher, a mediocre poet, and will someday become a great chef.
Pete Skomoroch — Advisor
Pete Skomoroch is a Principal Data Scientist at LinkedIn in Mountain View, CA, focused on reputation systems, collaborative filtering, and building data driven products. He leads a team of Data Scientists focused on Identity at LinkedIn and was the inventor of LinkedIn Skills. Prior to LinkedIn, he was based in Washington, DC where he focused on mining insights from search query data as the Director of Advanced Analytics at Juice Analytics and as a Sr. Research Engineer at AOL Search. While in DC, he also founded DataWrangling.com which provided custom data mining solutions to clients in bioinformatics, finance, and cloud computing. He spent the previous 6 years in Boston implementing Biodefense pattern detection algorithms for streaming sensor data at MIT Lincoln Laboratory and constructing predictive models for large retail datasets at Profitlogic (now Oracle Retail). Pete has a B.S. in Mathematics and Physics from Brandeis University and did graduate coursework in machine learning at MIT.
Danny Sullivan — Advisor
Widely considered a leading “search engine guru,” Danny Sullivan has been helping webmasters, marketers and everyday web users understand how search engines work for 15 years. Danny’s expertise about search engines is often sought by the media, and he has been quoted in places like The Wall St. Journal, USA Today, The Los Angeles Times, Forbes, The New Yorker and Newsweek and ABC’s Nightline. Danny began covering search engines in late 1995, when he undertook a study of how they indexed web pages. The results were published online as “A Webmaster’s Guide To Search Engines,” a pioneering effort to answer the many questions site designers and Internet publicists had about search engines. Danny currently heads up Search Engine Land, which covers search marketing and search engine news. He produces the SMX: Search Marketing Expo conference series, writes a personal blog called Daggle.
Pete Warden — Advisor
Pete Warden is a British-born programmer living in San Francisco. After spending over a decade as a software engineer, including 5 years at Apple, he’s now focused on a career as a mad scientist. He is currently gathering, analyzing and visualizing the flood of web data that’s recently emerged, trying to turn it into useful information without trampling on people’s privacy. Pete is the current CTO of Jetpac, a site for sharing travel photos, tips, and guides among friends. Passionate about large-scale data processing and visualization, he writes regularly on the topic on his blog and as a regular contributor to O’Reilly Radar.