Gil Elbaz – Chairman Gil Elbaz is an accomplished entrepreneur and pioneer of natural language technology. In 1998, Gil co-founded Applied Semantics Inc. (ASI) which developed contextual advertising products including ASI’s AdSense. In 2003, Google acquired Applied Semantics and Gil stayed on as the Engineering Director for the Santa Monica office to continue to development of AdSense and other products. Prior to founding ASI, Gil worked in engineering roles at IBM, Sybase and SGI.
In 2007, Gil left Google to explore new ideas and passions and went on to found Factual, an open data aggregation platform. Factual launched its public beta in October 2009 in an effort to maximize data accuracy, transparency, and accessibility. In December 2010, Factual raised a $25m Series A round led by Andreessen Horowitz and Index Ventures to build upon its mission to make structured data broadly accessible to the development community.
Gil earned his bachelor’s degree from the California Institute of Technology with a double major in Engineering & Applied Science and Economics. Active in a number of non-profits, he serves on the Boards of Trustees at the California Institute of Technology as well as the X Prize Foundation. Gil is also the founder and chairman of of the board of directors for the Common Crawl Foundation and is strongly committed ensuring a truly open web.
Carl Malamud – Secretary and Treasurer Carl Malamud is the President of Public.Resource.Org and has been on the Common Crawl Board of Directors since 2008. Malamud was the founder of the Internet Multicasting Service where he was responsible for creating the first Internet radio station, for putting the U.S. Securities and Exchange Commission’s EDGAR database on-line, and for creating the Internet 1996 World Exposition. Malamud is the author of eight books, was a visiting professor at the MIT Media Laboratory and was chairman of the Internet Software Consortium. He received the Berkman Award from Harvard and the Pioneer Award from the EFF.
photo credit : Joe Hall
Nova Spivack Nova Spivack is a technology futurist and serial entrepreneur whose career has spanned more than 20 ventures, and 2 IPOs. He has been on the Common Crawl Board of Directors since 2008. He is one of the leading voices on the next-generation of search, semantic web technology, and social media. He works as a producer of emerging technology ventures including most recently, Klout, Bottlenose, Live Matrix, and The Daily Dot. In 1994, he co-founded one of the first Web startups, EarthWeb, which led to a record-breaking IPO in 1998 and a second IPO of Dice. He founded Lucid Ventures in 2001 and the semantic web venture Radar Networks which launched Twine.com. Spivack worked with Stanford Research International (SRI), to conceive and co-found their global business incubator, nVention, and participated in the DARPA CALO program, the most ambitious artificial intelligence project in US history. He has authored more than 30 pending and granted patents in the areas of search, semantics, and personalization and is considered a leading pioneer of next-generation web technology. Nova believes strongly in an open web, open standards, and supports numerous charitable causes around the world.
Kurt is a computer scientist with a research background in the areas of machine learning, digital libraries, semantic networks, and electro-cardiographic modeling. He received a Ph.D. in Computer Engineering from The University Of Texas At Austin. He was co-creator of the CiteSeer research tool as a visiting researcher at the NEC Research Institute, the technical director of the Internet Archive, and a biomedical research engineer at the Duke University Medical Center. His was Chief Scientist at Metaweb Technologies until Feb, 2009. He is currently pursuing research on long term digital archiving as the Digital Research Director at the Long Now Foundation as well as serving as a consulting Data Scientist at InfoChimps. Kurt is a philanthropist active with many organizations that promote openness, transparency and preservation. As an Advisor at Common Crawl, he provides the organization with valuable advice and insight into the crawl technology, big data processing, open innovation, products and collaborations.
Glenn Otis Brown is director of business development at Twitter in New York. He has been head of music partnerships at YouTube, product counsel at Google, and executive director of Creative Commons, where he still serves on the board of directors. Glenn has also been a fellow at the Berkman Center for Internet and Society at Harvard and a lecturer at Stanford Law School.
Kevin is a highly respected Intellectual Property (IP) attorney who has continually worked at the forefront of the evolving IP landscape. He is a Partner and the Chairman of the Intellectual Property and Technology Transactions Department at the firm Stubbs, Alderton and Markiles. He is an author and frequent speaker on technology commercialization and intellectual property licensing. Kevin was selected in 2009, 2008 and 2006 as a Southern California Super Lawyer and is the founder and Chair of the Licensing Interest Group of the California State Bar Intellectual Property Section. Prior to joining Stubbs Alderton & Markiles, LLP, Kevin was a partner in leading international law firms, including Brobeck Phleger & Harrison, LLP, where he headed the firm’s technology transactions practice in Southern California. After law school, he served as a judicial law clerk for Hon. John G. Davies, United States District Court for the Central District of California. Kevin received his J.D. from Hastings College of the Law and his B.S. degree from the University of California, Davis.
Jim Hendler – Advisor
Jim Hendler is the Tetherless World Professor of Computer and Cognitive Science and the Head of the Computer Science Department at RPI. He is also a faculty affiliate of the Experimental Multimedia Performing Arts Center (EMPAC), serves as a Director of the UK’s charitable Web Science Trust and is a visiting Professor of Computer Science at DeMontfort University in Leicester, UK. Hendler has authored about 200 technical papers in the areas of Semantic Web, artificial intelligence, agent-based computing and high performance processing. One of the early innovators of the “Semantic Web,” Hendler was the recipient of a 1995 Fulbright Foundation Fellowship, is a former member of the US Air Force Science Advisory Board, and is a Fellow of the American Association for Artificial Intelligence, the British Computer Society, the IEEE and the AAAS. He is also the former Chief Scientist of the Information Systems Office at the US Defense Advanced Research Projects Agency (DARPA) and was awarded a US Air Force Exceptional Civilian Service Medal in 2002. In 2010, Hendler was named one of the 20 most innovative professors in America by Playboy magazine and was selected as an “Internet Web Expert” by the US government working with the data.gov project. He is the Editor-in-Chief emeritus of IEEE Intelligent Systems and was the first computer scientist to serve on the Board of Reviewing Editors for Science. In 2012, he was one of the inaugural recipients of the Strata Conference “Big Data” awards for his work on large-scale open government data, and he is a columnist and associate editor of the Big Data journal.
Eva Ho currently is the VP of Marketing & Operations at Factual, an open data platform that leverages large-scale aggregation and community exchange. Eva provides Common Crawl with valuable insight and advice on all aspects of the organization. Prior to Factual, she was a Sr. Product Marketing Manager at Google and Youtube for 5 years, and was the head of marketing for Applied Semantics, a company sold to Google in 2003. She also serves on the Board of Directors of Iridescent, a science education non-profit, First Descents a non-profit adventure outfit for cancer fighters and survivors, and Whole Child LA, a non-profit pediatric pain clinic. Eva holds an MBA from Cornell and a BA in Biology from Harvard University.
Joi Ito is currently Director of the MIT Media Lab and a leading thinker and writer on innovation, global technology policy, and the role of the Internet in transforming society in substantial and positive ways. A vocal advocate of emergent democracy, privacy, and Internet freedom, Ito is board chair (and former CEO) of Creative Commons, and sits on the boards of the Mozilla Foundation, WITNESS, Global Voices, and the John D. and Catherine T. MacArthur Foundation. In Japan, he was a founder of Digital Garage, and helped establish and later became CEO of the country’s first commercial Internet service provider. He was an early investor in more than 40 companies, including Flickr, Six Apart, Last.fm, Kongregate, Kickstarter, and Twitter. Ito’s honors include TIME magazine’s “Cyber-Elite” listing in 1997 (at age 31) and selection as one of the “Global Leaders for Tomorrow” by the World Economic Forum (2001). In 2008, BusinessWeek named him one of the “25 Most Influential People on the Web.” In 2011, he received the Lifetime Achievement Award from the Oxford Internet Institute.
Bill is currently the VP of Product Management and Partnerships at Factual. He joined in 2009 and prior to that worked at Yahoo! for about six years, the last two of which he was the GM and Senior Director of the Yahoo! Search Platform. Here he managed the product management, design, marketing and business development functions for Yahoo! BOSS, which grew from inception to over +1B queries/month. Before this he was the Director of Int’l Search Business Operations & Product Strategy, a +$1B P&L with 500+ employees. Other experience includes Equity Research Associate at UBS on the #1 ranked Institutional Investor telecom research team. At one point he wore a white coat and worked in a solid-state physics research lab. He has an MBA from Columbia University Business School and a BA in Geology from Colby College.
Peter Norvig is Director of Research at Google and a Fellow of the American Association for Artificial Intelligence and the Association for Computing Machinery. From 2002-2005 he was Director of Search Quality, responsible for the core web search algorithms. Previously he was the head of the Computational Sciences Division at NASA Ames Research Center, making him NASA’s senior computer scientist. He has served as an assistant professor at the University of Southern California and a research faculty member at the University of California at Berkeley Computer Science Department, from which he received a Ph.D. in 1986 and the distinguished alumni award in 2006. He has over fifty publications in Computer Science, concentrating on Artificial Intelligence, Natural Language Processing and Software Engineering, including the books Artificial Intelligence: A Modern Approach (the leading textbook in the field), Paradigms of AI Programming: Case Studies in Common Lisp, Verbmobil: A Translation System for Face-to-Face Dialog, and Intelligent Help Systems for UNIX. He is also the author of the Gettysburg Powerpoint Presentation and the world’s longest palindromic sentence.
Jennifer Pahlka is the founder, executive director and board chair of Code for America. Previously, she ran the Web 2.0 and Gov 2.0 events for TechWeb, in conjunction with O’Reilly Media, and co-chaired the successful Web 2.0 Expo. Before that, she spent eight years at CMP Media where she ran the Game Developers Conference, Game Developer magazine, and Gamasutra.com; there she also launched the Independent Games Festival and served as Executive Director of the International Game Developers Association. Jennifer’s early career was spent in the non-profit sector. She is a graduate of Yale University and lives in Oakland, California with her daughter and six chickens.
Boris took his first programming course in the seventh grade and wrote a video game as his first class project. In support of open data and the open web, he generously donates his time to advise Common Crawl on technical matters and to share his visionary insight on the digital ecosystem. Boris is currently employed as the Director of Engineering at Factual where he’s working to bring greater accessibility and transparency with and to data. Prior to joining Factual, he was CTO at Xap, which he helped build from eight employees to a thriving business with millions of users, and over a hundred employees. His junior year in college, he built a site that quickly became the web’s 27th most trafficked and then optimized it enough to run from two machines in his den. He’s not retired now because he really wanted to graduate. Boris holds a BS in Physiological Science and an MS in Computer Science, both from UCLA. He put himself through school working night shifts as an ambulance driver. Boris lives in Los Angeles with his badass wife and amazing boys. He is a lousy philosopher, a mediocre poet, and will someday become a great chef.
Pete Skomoroch is a Principal Data Scientist at LinkedIn in Mountain View, CA, focused on reputation systems, collaborative filtering, and building data driven products. He leads a team of Data Scientists focused on Identity at LinkedIn and was the inventor of LinkedIn Skills. Prior to LinkedIn, he was based in Washington, DC where he focused on mining insights from search query data as the Director of Advanced Analytics at Juice Analytics and as a Sr. Research Engineer at AOL Search. While in DC, he also founded DataWrangling.com which provided custom data mining solutions to clients in bioinformatics, finance, and cloud computing. He spent the previous 6 years in Boston implementing Biodefense pattern detection algorithms for streaming sensor data at MIT Lincoln Laboratory and constructing predictive models for large retail datasets at Profitlogic (now Oracle Retail). Pete has a B.S. in Mathematics and Physics from Brandeis University and did graduate coursework in machine learning at MIT.
Widely considered a leading “search engine guru,” Danny Sullivan has been helping webmasters, marketers and everyday web users understand how search engines work for 15 years. Danny’s expertise about search engines is often sought by the media, and he has been quoted in places like The Wall St. Journal, USA Today, The Los Angeles Times, Forbes, The New Yorker and Newsweek and ABC’s Nightline. Danny began covering search engines in late 1995, when he undertook a study of how they indexed web pages. The results were published online as “A Webmaster’s Guide To Search Engines,” a pioneering effort to answer the many questions site designers and Internet publicists had about search engines. Danny currently heads up Search Engine Land, which covers search marketing and search engine news. He produces the SMX: Search Marketing Expo conference series, writes a personal blog called Daggle.
Pete Warden is a British-born programmer living in San Francisco. After spending over a decade as a software engineer, including 5 years at Apple, he’s now focused on a career as a mad scientist. He is currently gathering, analyzing and visualizing the flood of web data that’s recently emerged, trying to turn it into useful information without trampling on people’s privacy. Pete is the current CTO of Jetpac, a site for sharing travel photos, tips, and guides among friends. Passionate about large-scale data processing and visualization, he writes regularly on the topic on his blog and as a regular contributor to O’Reilly Radar.
Lisa Green – Director
Lisa is motivated by a strong belief in the power of open systems to drive innovation in education, arts and research. Over the last several years she has been active in the areas of Open Access publishing, Open Science, Open Data, copyright, digital rights and policy. Immediately prior to joining Common Crawl, Lisa was Chief of Staff at Creative Commons. She holds a PhD in physical chemistry from the University of California Berkeley.
Jordan Mendelson – Chief TechnologistJordan is a product-focused technologist with over 18 years experience building tech startups. Prior to joining Common Crawl, he was Chief Architect of the music startup Napster, worked on big data analytics at LinkedIn, and was CTO of a restaurant technology startup SeatMe.
photo credit : Joi Ito
Ahad Rana – Architect Ahad has been working on Common Crawl since 2008, where he has been responsible for most of the back-end engineering involved to make the crawl happen. He has more than ten years of experience in software development and has held various engineering positions at companies such as Google and AOL/Netscape. His current day job involves building infrastructure technologies for Factual, an L.A. area startup focused on Open / Big Data.
Dave Lester – Software Engineer Intern
Dave Lester a software engineer intern at Common Crawl, and a masters student at the UC Berkeley School of Information. This summer Dave is developing a set of resources for developers to get started using Common Crawl data, and contributing to the development of a stats platform for the web crawler. Prior to returning to school, he worked in Washington, DC at several research universities participating in the management and development of software applications used by libraries, museums, and humanities scholars.
Steve Salevan – Code Wrangler
Steve Salevan is an engineer passionate about the possibilities of open data. He came to the CommonCrawl project as an interested volunteer, working on open-sourcing the full stack and making the dataset easier to access to everyone. He currently serves as a Test Engineer for Google, and in the past served a similar role at Red Hat, having caught the open source bug there at lunch one fateful afternoon. The incredible power that we can gain to know ourselves better when we match open source with open data inspired him to help this happen, and he hopes to build a strong community around this unique, freedom-strengthening idea.
Outside of work, he maintains a 1970s synthesizer cage, much to the dismay of his girlfriend, and digs quality beer, cooking, and radio broadcasting, preferably when all of these things happen at the same time.
Jakob Homan – Time Lord
Jakob Homan is a Senior Software Engineer on the Search, Network and Analytics (SNA) team at LinkedIn, where he works on improving and expanding the Apache Hadoop Ecosystem. He is currently focused on Apache Giraph and Apache Hive. He is a committer and PMC member on the Hadoop and Giraph projects and a contributor to many other Hadoop ecosystem projects. He earned his degree in Computing and Software Systems at the University of Washington, Bothell and worked on Hadoop at Yahoo! for nearly three years before moving on to LinkedIn.
Sebastian Spiegler – Data Scientist
Sebastian is a big data, natural language processing and machine learning enthusiast who loves the idea of an enormous web corpus open to everyone. He currently leads a small team of data and software engineers at SwiftKey, an innovative London-based start-up specialised in predictive text entry. He holds a PhD in machine learning and NLP from the University of Bristol, England.
Philip Marcus – Linux Systems & Networking
Philip has been helping out Common Crawl since early 2008, setting up and maintaining the servers, software, and networks used to run the crawlers and hadoop cluster. When he is not updating the Common Crawl cluster, Philip is probably updating Factual’s hadoop cluster as their lead sysadmin.
Prior to joining Factual, Philip has held systems and networking positions at Google, iPower (a web hosting platform), and NBCi.