Ever wanted to get started on how to make interactive data visualizations?

D3.js is a JavaScript library that lets you create dynamic, interactive data visualizations for the web. It's an incredibly flexible library, handling endless types of customizable visualizations. If you want to learn the basics of D3.js, join CDSS as we welcome Woojin Kim for a workshop.

Woojin Kim is a graduate student in Data Science at Columbia University, graduating this December. He used data visualization and D3 to showcase his work and won a couple of hackathons like the ones CDSS organized. Checkout his profile and projects at <http://woojink.com/>.

Tableau is one of the leading visualization software products in the business world today. If you want to learn how to make engaging, interactive visualizations with a simple click and drag interface, join CDSS as we welcome Tableau expert Adam McCann for a workshop followed by a Q&A session.

Adam McCann is a Specialist Leader with Deloitte Consulting in their Analytics and Information Management group where he specializes in predictive analytics and data visualization. At Deloitte, he leads a team of consultants at a national intelligence agency developing predictive models and business intelligence solutions. Adam also teaches an information visualization course at Maryland Institute College of Art. His data visualization blog duelingdata.com covers topics ranging from movies, sports and politics. He is also 2015-2017 Tableau Zen Master, a title recognizing the top Tableau practitioners in the world for their mastery of the product.

As a Columbia student, you can download Tableau for free! Just follow the instructions at the link below. Please download Tableau prior to the workshop so you can follow along!

Apache Spark is one of the most useful big data frameworks. Come learn how you can experiment it and harness it for large scale analytics! We’ll cover a conceptual introduction to Spark and the basics of the Python interface, PySpark.

So much of the information we encounter every day is hard to conceptualize. It’s so big and complicated that a visual rendering represents it the best. Being a good data designer is crucial to being able to tell the story behind the data.

Last semester, we hosted Juan Francisco Saldarriaga, Mapping and Data Visualization Specialist. He gave us a talk about data visualization techniques, processes, and methods. This semester, Juan Francisco is back for a workshop! This time we are going to use Processing as a coding environment to master new data visualization techniques. Juan Francisco will guide us through the process of using basic programming skills and concepts to create visually compelling charts and graphs. We will use real data from Citibike to visually analyze the imbalances in the system stations. Students are required to download Processing before the workshop and to have basic knowledge of programming concepts (variables, loops, functions), however, no prior experience with Processing is required for the workshop.

What even is a data science interview? Do they want me to be a developer, or an analyst, or a unicorn? At this workshop we'll go over what to expect in data science interviews, focusing especially on in person tech interviews. From the basic layout to the right answer to (almost) every "what data structure can you use to make this faster" question, you'll be ready to land the internship or job of your data science filled dreams.

Chris Mulligan is currently a Quantitative Researcher at Two Sigma Investments, LP. He builds models to make predictions of financial markets using untraditional data, which is a sentence he never imagined he’d say about himself. Prior to Two Sigma Chris completed data science internships at Kickstarter, Facebook, and The New York Times, as well as 7 years in political data analysis and modeling, most recently as Director of Analytics at YouGov. Chris received BA and MA degrees in computer science and statistics from Columbia in 2015, where he was a TA for COMS3157 AP and STAT4400 StatML, and cofounded CDSS.

Mike Jaron, current QMSS student and Google Data Scientist, will give a talk about his day-to-day work, the culture of data science at Google, and his thoughts on the future trends regarding Data Science, followed by a Q&A session.

Mike works on the Human/Social Dynamics program, where he specializes in natural language processing and data analysis in Python and R. Bring any questions you have about data science at Google, Mike's career, and anything in between!

Professors Jones and Wiggins will be holding a discussion and Q+A on the past, present, and future of data in our lives. Each will speak briefly on how students, scholars, and citizens make sense of data in science, public policy, and our personal lives. We invite Columbia University students (all divisions) to RSVP and to offer questions via this form.

Discussion and student questions will guide the direction of the course "Data: Past, Present, and Future" to be taught by professors Jones and Wiggins in Spring 2017, with the support of Columbia's Collaboratory program and the Leibniz Fund.

Participants:

Professor Matt Jones, James R. Barker Professor of Contemporary Civilization, Department of History

Looking for a job? Trying to launch a business? Need guidance on what to do with your career? Networking is one of the best ways to make professional connections that can assist with employment, starting a business, finding a mentor, and so many other opportunities. But where do you start? How do you build a network? How do you network successfully?

In this interactive presentation, you will learn how to use LinkedIn to find the right types of connections and how to contact them in a way that gets you a high response rate, how to find those key individuals at companies that you definitely want to speak with, how to get and approach one-one meetings, how to navigate networking events, and simple body language and human interaction methods that will take your interpersonal skills to the next level.

Python is one of the most powerful, easy to learn, and flexible programming languages out there. Why not learn to do data science using Python? We'll be covering the essential tools for doing data science in Python. Hopefully you'll find the material useful leading up to our data science hackathon!

Did you enjoy our workshop on Introduction to Programming in R? (If not, you can check out the code we wrote here.) Great: come learn more about data science specifics within R! This will help you build the strong foundation of tools necessary to become a proficient data scientist. Hopefully you'll find the material useful leading up to our Data Science hackathon!

Interested in joining the Columbia Data Science Society Executive Board? Want to learn more about our events for this year or even propose an event yourself? Come meet current board members to discuss formal Executive Board recruitment for the academic year. We are also happy to chat about data science at Columbia and relevant courses being offered this semester. Hope to see you there!

Come to our table at this year's Activities Fair to learn more about us. You can meet current members, discuss recruitment, and hear about some of the events we will be organizing this year. We are also happy to chat about data science at Columbia and relevant courses being offered this semester. Hope to see you there!

So much of the information we encounter every day is hard to conceptualize. It’s so big and complicated that a visual rendering represents it the best. Being a good data designer is crucial to be able to tell the story behind the data.

Come push you data visualization techniques, processes and methods to the next level with Juan Francisco Saldarriaga

Juan Francisco Saldarriaga is a Mapping and Data Visualization Specialist, and an Architectural Designer and Urban Planner living and working in New York. Juan Francisco has a Masters of Science in Urban Planning from Columbia University and a Masters of Architecture also from Columbia. For his undergrad, Juan Francisco studied philosophy at the Université de Paris IV (Sorbonne) and at the Universidad de Los Andes, Bogotá.

Come hear from Randy Carnevale, the Director of Decision Sciences at Commonwealth Bank! He'll be speaking about a variety of data science initiatives at Commonwealth Bank including graph databases and cloud-based web scraping.

Tech in Fintech will expose Columbia University’s best and brightest engineers and data scientists to the broad area known as “fintech,” one of the fastest growing industries with its base right here in New York City. Tech in Fintech will operate in a panel format, consisting of 4-5 speakers comprised of senior-level technical professionals at leading financial technology companies, hosted by a well-regarded Columbia University professor. Panelists will be announced in the coming weeks. The panel will be followed by a networking reception in the same location with panelists and alumni.

Today, Javascript has pretty much taken over your desktop. Many of the apps you use, from Slack to Spotify to Sunrise (alas), are written in Javascript. It's now easy to develop powerful native desktop apps with the latest Javascript frameworks and modern web tech. This tech talk will introduce you to new ways to build desktop applications with React, Observables, and Electron, all of which have become really popular of late. ADI presents a talk by Evan Morikawa, an engineer at Nylas. Nylas is a San Francisco-based startup building N1, an extensible email client originally forked from Atom (GitHub's hackable text editor) and now one of the most popular open-source projects on GitHub. This desktop email client is built entirely with modern web tech and allows anyone to build plugins to dramatically enhance what you can do with email. You can check it out at https://www.nylas.com/n1 and see the source code at https://github.com/nylas/N1.

Come learn about deep learning and image recognition with Matthew Zeiler, foremost expert in machine learning and artificial intelligence. He is the CEO and Founder of Clarifai, a company that specializes in visual recognition and beats the accuracy and speed of the largest tech companies!

Are you thinking about a career in data science? Are you curious to know what kinds of jobs you can have as a data scientist in Finance, Healthcare, Retail, Media, and more? Join us for the Women in Data Science Career Panel on Thursday March 24th at 5:45pm! We have invited an accomplished group of ladies from different industries and areas of data science to share their experiences and give us some tips on job search, work life, and how you, too, can shape the future of data science.

know where to start? Join CDSS E-board members for a panel discussion on what courses and opportunities students should pursue in this growing field. The panelists will include a current sophomore, junior, and senior. Please feel free to submit questions for discussion on this event page. See you on Tuesday!

Big data is big... really big and also has lots of noise. How do we reduce the dimensionality of these massive datasets to something tractable? Join ADI and learn how to reduce the size of high dimensional datasets using PCA, a popular technique in ML.

No prerequisites necessary though some linear algebra background is useful and we'll take a glance at some Python.

What will I do?You'll learn about dimensionality reducation, why it matters, a common technique called PCA and how to use it in Python. Then you can apply it to all the other algorithms you've seen in the Accessible ML series.

Who should come to this event?Anyone with an interest in machine learning is welcome — no prior experience necessary! We'll start with basic stats and make you a dimensionality reduction pro! Impress your friends with your godly PCA abilities.

What should I bring?PCA's heavy on concepts so that's going to be our focus. We'll also have a code demo but I don't expect you to follow the code as much as see the results. That said, bring a laptop if you want to run them on your own machine, in which case we recommend that you install jupyter notebook (http://jupyter.readthedocs.org/en/latest/install.html) on it prior to the event.

Unsupervised learning requires us to detect underlying patterns in the data without training our models beforehand. Join ADI and CDSS and learn how to use the k-means clustering algorithm to reconstruct images from corrupted datasets! Some statistics understanding is useful, as is experience with Python. We recommend that you bring a laptop and install Jupyter notebook (http://jupyter.readthedocs.org/en/latest/install.html) so you can follow along with the code during the workshop.

What will I do?

You'll write a program in Python which runs the k-means clustering algorithm on an image. Then, you'll be able to reconstruct the image using only the clusters obtained from the data! By the end of the presentation, you'll be able to start applying k-means on a wide variety of unsupervised learning problems.

Who should come to this event?

Anyone with an interest in machine learning is welcome — no prior experience necessary! Some stats background is helpful, but not required. The code will be written in Python.

Join us for a workshop on natural language processing, a field of computer science that approaches problems using textual data in a computational way. In this workshop, we'll go over some fundamental concepts and techniques used in the exciting field of Natural Language Processing!

Time series data presents its own unique challenges and insights. Join ADI and CDSS and learn how to forecast time series data! We will be modeling electricity prices using weather data. Some statistics understanding is useful, as is experience with python. We recommend that you bring a laptop and install jupyter notebook (http://jupyter.readthedocs.org/en/latest/install.html) so you can follow along with the code during the workshop.

What will I do?

You'll write a program in Python which runs forecasting techniques on energy and weather data. By the end of the presentation, you'll be able to start using ARIMA regressions on time series data.

Who should come to this event?

Anyone with an interest in machine learning is welcome — no prior experience required! Some stats background is helpful, but not required. The code will be written in Python.

Abstract: An overwhelming amount of content from real-world events is shared by individuals through social media services. This shared media represents an important part of our society, culture and history. At the same time, this social media event content is still difficult to consume and understand, fragmented across services, and hard to find. We have worked since 2008, in both research and startup settings, to tackle these (and other) challenges in making social media information about events accessible and usable. I will discuss our early research, show how it led to the startup company I co-founded, comment on what the startup (which recently pivoted away from events) did well and where it failed, and highlight open challenges and directions for the future work and research in this area.

Speaker Bio: Mor Naaman is an associate professor of Information Science at the Jacobs Technion-Cornell Institute at Cornell Tech, where he is the founder of the Connective Media hub, and leads a research group focused on social technologies. His research applies multidisciplinary methods to 1) gain a better understanding of people and their use of social tech; 2) extract insights about people, technology and society from social media and other sources of social data; and 3) develop new social technologies as well as novel tools to make social data more accessible and usable in various settings. Previously, Mor was on the faculty at Rutgers SC&I, led a research team at Yahoo! Research Berkeley, received a Ph.D. in Computer Science from Stanford University, and played professional basketball for Hapoel Tel Aviv. He is a recipient of a NSF Early Faculty CAREER Award, research awards and grants from numerous corporations including AOL and Google, and multiple best paper awards. Find out more about Mor at mornaaman.com.

This talk will use the example of sentiment analysis to show that supervised machine learning has the potential to amplify the voices of the most privileged people in society. A sentiment analysis algorithm is considered ‘table stakes’ for any serious text analytics platform in social media, finance, or security. As an example of supervised machine learning, Mike will show how these systems are trained. But he'll also show that they have the unavoidable property that they are better at spotting unsubtle expressions of extreme emotion. Such crude expressions are used by a particularly privileged group of authors: men. In this way, brands that depend on sentiment analysis to 'learn what people think' inevitably pay more attention to men. The problem doesn't stop with sentiment analysis: at every step of any model building process, we make choices that can introduce bias, enhance privilege, or break the law! Mike will review these pitfalls, talk about how you can recognise them in your own work, and touch on some new academic work that aims to mitigate these harms.

Bio: Mike Williams is a research engineer at Fast Forward Labs, which develops prototypes and writes reports demonstrating innovations in machine intelligence. He has a PhD in astrophysics from Oxford, and did postdocs at the Max Planck Institute in Munich and at Columbia University.

This is a workshop about programmatically collecting and storing useful information from the web using Python. First, we will use Requests and BeautifulSoup to download and parse HTML and XML files. We will then use the Scrapy framework to write a web spider that crawls online blog entries and stores their comments in a JSON file for later processing.

Web scraping is the technique of extracting information from the web and storing it in useful form. This is the first step in the process of discovering interesting patterns and gaining insights from big data sets. The web in particular is a vast source of information that can be systematically and programmatically accessed with very little cost.

Join the History Lab and CDSS as we analyze the most talked about emails in the world! We are hosting a mini data hackathon in Studio@Butler (Butler 208B) to help get people unfamiliar with data science doing visualizations and other analyses. We will also be providing starter code for those not familiar with Python and R.

Please fill out this form if you are interested in coming: "http://goo.gl/forms/KFZOnKyrcQ". You do not need to fill this out in order to come, but we would like to gauge how many people are coming to the event. The link to the Facebook event is here: https://www.facebook.com/events/448737675312871/

Alumni from various programs, including M.A. in Statistics, will talk about their job-hunting tips and techniques at our career panel. Everyone will have the opportunity to ask questions about what worked and what didn't, and the panelists will share tips about how to secure interview calls as you launch your career.

Some of the alums speaking at the panel include:

Daniel Slotwiner: Directory, Ads Research, Facebook

Chris Kakkanatt: Director/Team Leader, Data Science, Pfizer

Sharona Sankar-King: Practice Lead, MEC

Timothy Haley: Statistical Modeler, J.P. Morgan

Here is the link to the Facebook event with more information on how to register: https://www.facebook.com/profile.php?id=1658817761028041