Welcome! This is an instruction booklet that shows you how to build a text editor in C.

The text editor is antirez’s kilo, with some changes. It’s about 1000 lines of C in a single file with no dependencies, and it implements all the basic features you expect in a minimal editor, as well as syntax highlighting and a search feature.

This booklet walks you through building the editor in 184 steps. Each step, you’ll add, change, or remove a few lines of code. Most steps, you’ll be able to observe the changes you made by compiling and running the program immediately afterwards.

I explain each step along the way, sometimes in a lot of detail. Free free to skim or skip the prose, as the main point of this is that you are going to build a text editor from scratch! Anything you learn along the way is bonus, and there’s plenty to learn just from typing in the changes to the code and observing the results.

See the appendices for more information on the tutorial itself (including what to do if you get stuck, and where to get help).

If you’re ready to begin, then go to chapter 1!
… (emphasis in original)

I mention this tutorial because:

It’s an opportunity to see editor issues “from the other side.”

Practice reading and understanding C

I like the “make changes, see the results” approach

Of the three, the “make changes, see the results” approach is probably the most important.

Examples that “just work” are great and I look for them all the time. 😉

But imagine examples that take you down the false leads and traps, allowing you to observe the cryptic error messages from XQuery for example. You do work your way to a solution but are not given one out of the box.

“Cryptic” is probably overly generous with regard to XQuery error messages. Suggestions of a better one word term, usable in mixed company for them?

Anyone following education news on Twitter between 2013 and 2016 would have been hard-pressed to ignore the gradual curdling of Americans’ attitudes toward the Common Core State Standards. Once seen as an innocuous effort to lift performance in classrooms, they slowly came to be denounced as “Dirty Commie agenda trash” and a “Liberal/Islam indoctrination curriculum.”

After years of social media attacks, the damage is impressive to behold: In 2013, 83 percent of respondents in Education Next’s annual poll of Americans’ education attitudes felt favorably about the Common Core, including 82 percent of Republicans. But by the summer of 2016, support had eroded, with those numbers measuring only 50 percent and 39 percent, respectively. The uproar reached such heights, and so quickly, that it seemed to reflect a spontaneous populist rebellion against the most visible education reform in a decade.

Not so, say researchers with the University of Pennsylvania’s Consortium for Policy Research in Education. Last week, they released the #commoncore project, a study that suggests that public animosity toward Common Core was manipulated — and exaggerated — by organized online communities using cutting-edge social media strategies.

As the project’s authors write, the effect of these strategies was “the illusion of a vociferous Twitter conversation waged by a spontaneous mass of disconnected peers, whereas in actuality the peers are the unified proxy voice of a single viewpoint.”

Translation: A small circle of Common Core critics were able to create and then conduct their own echo chambers, skewing the Twitter debate in the process.

The most successful of these coordinated campaigns originated with the Patriot Journalist Network, a for-profit group that can be tied to almost one-quarter of all Twitter activity around the issue; on certain days, its PJNET hashtag has appeared in 69 percent of Common Core–related tweets.

The team of authors tracked nearly a million tweets sent during four half-year spans between September 2013 and April 2016, studying both how the online conversation about the standards grew (more than 50 percent between the first phase, September 2013 through February 2014, and the third, May 2015 through October 2015) and how its interlocutors changed over time.
…

Mahnken talks as though creating a ‘botnet’ to defeat adoption of the Common Core State Standards is a bad thing.

Let’s assume you want to build a championship high school baseball team. To do that, various officious intermeddlers, who have no experience with baseball, fund creation of the Common Core Baseball Standards.

Every three years, every child is tested against the Common Core Baseball Standards and their performance recorded. No funds are allocated for additional training for gifted performers, equipment, baseball fields, etc.

By the time these students reach high school, will you have the basis for a championship team? Perhaps, but if you do, it due to random chance and not the Common Core Baseball Standards.

If you want a championship high school baseball team, you fund training, equipment, baseball fields and equipment, in addition to spending money on the best facilities for your hoped for championship high school team. Consistently and over time you spend money.

The key to better education results isn’t testing, but funding based on the education results you hope to achieve.

I do commend the #commoncore project website for being an impressive presentation of Twitter data, even though it is clearly a propaganda machine for pro Common Core advocates.

The challenge here is to work backwards from what was observed by the project to both principles and tactics that made #stopcommoncore so successful. That is we know it has succeeded, at least to some degree, but how do we replicate that success on other issues?

Replication is how science demonstrates the reliability of a technique.

Seymour Papert’s Mindstorms was published by Basic Books in 1980, and outlines his vision of children using computers as instruments for learning. A second edition, with new Forewords by John Sculley and Carol Sperry, was published in 1993. The book remains as relevant now as when first published almost forty years ago.

The Media Lab is grateful to Seymour Papert’s family for allowing us to post the text here. We invite you to add your comments and reflections.

From the introduction:

…I believe that certain uses of very powerful computational technology and computational ideas can provide children with new possibilities for learning, thinking, and growing emotionally as well as cognitively….

As a data scientist, you don’t want to waste your time installing software. Our goal is to provide a virtual environment that will enable you to start doing data science in a matter of minutes.

As a teacher, author, or organization, making sure that your students, readers, or members have the same software installed is not straightforward. This open source project will enable you to easily create custom software and data bundles for the Data Science Toolbox.

A virtual environment for data science

The Data Science Toolbox is a virtual environment based on Ubuntu Linux that is specifically suited for doing data science. Its purpose is to get you started in a matter of minutes. You can run the Data Science Toolbox either locally (using VirtualBox and Vagrant) or in the cloud (using Amazon Web Services).

We aim to offer a virtual environment that contains the software that is most commonly used for data science while keeping it as lean as possible. After a fresh install, the Data Science Toolbox contains the following software:

A century of research shows that traditional grammar lessons—those hours spent diagramming sentences and memorizing parts of speech—don’t help and may even hinder students’ efforts to become better writers. Yes, they need to learn grammar, but the old-fashioned way does not work.

This finding—confirmed in 1984, 2007, and 2012 through reviews of over 250 studies—is consistent among students of all ages, from elementary school through college. For example, one well-regarded study followed three groups of students from 9th to 11th grade where one group had traditional rule-bound lessons, a second received an alternative approach to grammar instruction, and a third received no grammar lessons at all, just more literature and creative writing. The result: No significant differences among the three groups—except that both grammar groups emerged with a strong antipathy to English.

There is a real cost to ignoring such findings. In my work with adults who dropped out of school before earning a college degree, I have found over and over again that they over-edit themselves from the moment they sit down to write. They report thoughts like “Is this right? Is that right?” and “Oh my god, if I write a contraction, I’m going to flunk.” Focused on being correct, they never give themselves a chance to explore their ideas or ways of expressing those ideas. Significantly, this sometimes-debilitating focus on “the rules” can be found in students who attended elite private institutions as well as those from resource-strapped public schools.
…
(Three out of five links here are pay-per-view. Sorry.)

It’s only a century of research. Don’t want to rush into anything. 😉

How would you adapt this finding to teaching programming and/or hacking?

This paper discusses an approach to representing and reasoning about constraints over strings. We discuss how many string domains can often be concisely represented using regular languages, and how constraints over strings, and domain operations on sets of strings, can be carried out using this representation.

Each regex clue you add is a constraint on all the intersecting cells. Your first regex clue is unbounded, but every clue after that has a constraint. Wait, that’s not right! Constraints arise only when cells governed by different regexes intersect.

I’m Renee Teate, the host of the Becoming a Data Scientist Podcast, and I started this club so data science learners can work on projects together. Please browse the activities and see what we’re up to!

What is the Data Science Learning Club?

This learning club was created as part of the Becoming a Data Scientist Podcast [coming soon!]. Each episode, there is a “learning activity” announced. Anyone can come here to the club forum to get details and resources, participate in the activity, and share their results.

Participants can use any technology and any programming language to do the activities, though I expect most will use python or R. No one is “teaching” how to do the activity, we’ll just share resources and all do the activity during the same time period so we can help each other out if needed.

If you’re joining in a “live” activity during the 2 weeks after a podcast episode airs (the original “assignment” period listed in the forum description), then you can expect others to be doing the activity at the same time and helping each other out. If you’re working through the activities from the beginning after the original assignment period is over, you can browse the existing posts for help and you can still post your results. If you have trouble, feel free to post a question, but you may not get a timely response if the activity isn’t the current one.

If you are brand new to data science, you may want to start at activity 00 and work your way through each activity with the help of the information in posts by people that did it before you. I plan to make them increase in difficulty as we go along, and they may build on one another. You may be able to skip some activities without missing out on much, and also if you finish more than 1 activity every 2 weeks, you will be going faster than new activities are posted and will catch up.

If you know enough to have done most of the prior activities on your own, you don’t have to start from the beginning. Join the current activity (latest one posted) with the “live” group and participate in the activity along with us.

If you are more advanced, please join in anyway! You can work through activities for practice and help out anyone that is struggling. Show off what you can do and write tutorials to share!

If you have challenges during the activity and overcome them on your own, please post about it and share what you did in case others come across the same challenges. Once you have success, please post about your experience and share your good results! If you write a post or tutorial on your own blog, write a brief summary and post a link to it, and I’ll check it out and promote the most helpful ones.

The only “dues” for being a member of the club are to participate in as many activities as possible, share as much of your work as you can, give constructive feedback to others, and help each other out as needed!

I look forward to this series of learning activities, and I’ll be participating along with you!

Renee’s Data Science Learning Club is due to go live on December 14, 2015!

With the various free courses, Stack Overflow and similar resources, it will be interesting to see how this develops.

Hopefully recurrent questions will develop into tutorials culled from discussions. That hasn’t happened with Stack Overflow, not that I am aware of, but perhaps it will happen here.

Stanford math education professor Jo Boaler spends a lot of time worrying about how math education in the United States traumatizes kids. Recently, a colleague’s 7-year-old came home from school and announced he didn’t like math anymore. His mom asked why and he said, “math is too much answering and not enough learning.”

This story demonstrates how clearly kids understand that unlike their other courses, math is a performative subject, where their job is to come up with answers quickly. Boaler says that if this approach doesn’t change, the U.S. will always have weak math education.

“There’s a widespread myth that some people are math people and some people are not,” Boaler told a group of parents and educators gathered at the 2015 Innovative Learning Conference. “But it turns out there’s no such thing as a math brain.” Unfortunately, many parents, teachers and students believe this myth and it holds them up every day in their math learning.
…

Intriguing article that suggests the solution to the lack of students in computer science and mathematics may well be to work on changing the attitudes of students…about themselves as computer science or mathematics students.

Something to remember when users are having a hard time grasping your explanation of semantics and/or topic maps.

Oh, another high point in the article, our brains physically swell and shrink:

Neuroscientists now know that the brain has the ability to grow and shrink. This was demonstrated in a study of taxi drivers in London who must memorize all the streets and landmarks in downtown London to earn a license. On average it takes people 12 tries to pass the test. Researchers found that the hippocampus of drivers studying for the test grew tremendously. But when those drivers retired, the brain shrank. Before this, no one knew the brain could grow and shrink like that.

It is only year two of the Human Brain Project and now we know that one neuron can have thousands of synapses and now that the infrastructure of the brain grows and shrinks. Information that wasn’t available at its start.

How do you succeed when the basic structure to be modeled keeps changing?

Perhaps that is why the Human Brain Project has no defined measure of “success”, other than spending all the allotted funds over a ten year period. That I am sure they will accomplish.

Summary: Would you like to optimize your learning of Clojure? Would you like to focus on learning only the most useful parts of the language first? Take this lesson from second language learning: learn the expressions in order of frequency of use.

When I was learning Spanish, I liked to use Anki to drill new vocabulary. It’s a flashcard program. I found that someone had made a set of cards from an analysis of thousands of newspapers. They read in all of the words from the newspapers, counted them up, and figured out what the most common words were. The top 1000 made it into the deck.

It turns out that this is a very good strategy for learning words. Word frequency follows a hockey stick distribution. The most common words are used so much more than the less common words. For instance, the 100 most common English words make up more than 50% of text. If you’ve got limited time, you should learn those most common words first.

People who are trying to learn Clojure have been asking me “how do I learn all of this stuff? There’s so much!” It’s a valid question and I haven’t had a good answer. I remembered the Spanish newspaper analysis and I thought I’d try to do a similar analysis of Clojure expressions.
…

Is Eric seriously suggesting using lessons learned in another field? 😉

Of course, for a CS conference using the top 100 most common Clojure expressions would have a title similar to:

Use of High Frequency Terminology Repetition: A Small Group Study (maybe 12 participants)

You could, of course, skip waiting for a conference presentation with a title like that one, followed by peer reviewed paper(s), more conference presentations and its final appearance in a collection of potential ways to improve CS instruction.

…
I was first introduced to the idea of problem-solution ordering issues by Richard Lemarchand, one of my game design professors. The idea stuck with me, mostly because it provided a satisfying explanation for a certain confusing pattern of player behavior that I’d witnessed many times in the past.

Here’s the pattern. A new player jumps into your game and starts bouncing around your carefully crafted tutorial level. The level funnels them to the key, which they collect, and then on to the corresponding locked door, which they successfully open. Then, somewhere down the road, they encounter a second locked door… and are completely stumped. They’ve solved this problem once before – why are they having such a hard time solving it again?

What we have here is a problem-solution ordering issue. Because the player got the key in the first level before encountering the locked door, they never really formed an understanding of the causal link between “get key” and “open door”. They got the key, and then some other stuff happened, and then they reached the door, and were able to open it; but “acquiring the key” and “opening the door” were stored as two separate, disconnected events in the player’s mind.

If the player had encountered the locked door first, tried to open it, been unable to, and then found the key and used it to open the door, the causal link would be unmistakable. You use the key to open the locked door, because you can’t open the locked door without the key.

This problem becomes a lot more obvious when you don’t call the key a key, or when the door doesn’t look like a locked door. The “key/door” metaphor is widely understood and frequently used in video games, so many players will assume that you use a key to open a locked door even if your own game doesn’t do a great job of teaching them this fact. But if the “key” is really a thermal detonator and the “door” is really a power generator, a lot of players are going to wind up trying to destroy the second generator they encounter by whacking it ineffectually with a sword.
…

Max goes on to apply problem-solution ordering to teaching both math and monads.

I don’t recall seeing or writing any topic map materials that started with concrete problems that would be of interest to the average user.

Make no mistake, there were always lots of references to where semantic confusion was problematic but that isn’t the same as starting with problems a user is likely to encounter.

The examples and literature Max points to makes me interested in started with concrete problems topic maps are good at solving and then introducing topic map concepts as necessary.

There are lots of puzzle programming tutorials currently in fashion: Code.org, Gidget and Parson’s programming puzzles. But, we don’t really know if they work? There is work [1] that shows that completion exercises do work well, but what about puzzles? That is what Kyle wants to find out.
…

Before you question the results based on the sample size, 27 students, realize that is 27 more test subjects than a database project to replace all the outward services for 5K+ users. Fortunately, very fortunately, a group was able to convince management to tank the entire project. Quite a nightmare and slur on “agile development.”

The lesson here is that puzzles are useful and some test subjects are better than no test subjects at all.

Another fifty courses have been added and I discovered a course in Hittite!

The same problem with collating content across resources that I mentioned for data science books, obtains here as you take courses in the same discipline or read primary/secondary literature.

What if I find references that are helpful in the Hittite course in the image PDFs of the Chicago Assyrian Dictionary? How do I combine that with the information from the Hittite course so if you take Hittite, you don’t have to duplicate my search?

That’s the ticket isn’t it? Not having different users performing the same task over and over again? One user finds the answer and for all other users, it is simply “there.”

Quite a different view of the world of information than the repetitive, non-productive, ad-laden and often irrelevant results from the typical search engine.

The Law Library, Davis Library and the Sonja Haynes Stone Center have just purchased rich digital collections of NAACP, federal government and other organization documents. The collections illuminate the African American struggle to attain equal rights after Reconstruction. Collections span the 1870s to the 1980s. The collections are:

Black Freedom Struggle in the 20th Century: Federal Government Records

They supplement current UNC collections of NAACP documents and complement another new collection documenting earlier struggles, Slavery & the Law, and the existing Southern Life and African American History, 1715-1915, Plantation Records. Slavery and the Law features petitions on race, slavery, and free blacks that were submitted to state legislatures and county courthouses between 1775 and 1867.

The collections are in ProQuest’s History Vault Collection. For more information, contact a law librarian at 919-962-1194.

I rather doubt that the UNC Law Library has purchased these collections but rather has secured access to members of its faculty and student body to these materials. Hence the access via the ProQuest History Vault Collection.

Like any good massa, ProQuest is going to make a return on its investment, even if that excludes black Americans, indeed, all Americans, from learning the history of race in American from primary sources. Or at least those members of the population who don’t have institutional access to the Proquest History Vault Collection.

What makes this particularly galling in this case is that the materials represent a history of struggling for freedom, a story that should be widely told. A story that is being suppressed as it were in the name of our current IP model in the United States.

If we are confined to the artifices of commercial exploitation currently in place, why doesn’t Congress, which has wasted $billions on aircraft that exhibit spontaneous combustion (long rumored about people but confirmed in the F-35), site license this resource for everyone in the United States?

That would eliminate the paperwork for every institution that wants to access this material, eliminate the paperwork for all those contracts for ProQuest, make the original sources of our racial history available to every person located in the United States, so where is the downside?

While we work on changing the pernicious and exploitative IP regime of the present day, let’s change the rules on site licensing and let the greed of ProQuest lead it into doing the right thing. I care nothing for their motives, so long as universal access is the result.

Open Data is invaluable to support researchers, but we contend that open datasets used as Open Educational Resources (OER) can also be invaluable asset for teaching and learning. The use of real datasets can enable a series of opportunities for students to collaborate across disciplines, to apply quantitative and qualitative methods, to understand good practices in data retrieval, collection and analysis, to participate in research-based learning activities which develop independent research, teamwork, critical and citizenship skills. (For more detail please see: http://education.okfn.org/the-21st-centurys-raw-material-using-open-data-as-open-educational-resources)

The Call:

We are inviting individuals and teams to submit case studies describing experiences in the use of open data as open educational resources. Proposals are open to everyone who would like to promote good practices in pedagogical uses of open data in an educational context. The selected case studies will be published in a open e-book (CC_BY_NC_SA) hosted by Open Knowledge Foundation Open Education Group http://education.okfn.org by mid September 2015.

Participation in the call requires the submission of a short proposal describing the case study (of around 500 words), all proposal must be written in English, however, the selected authors will have the opportunity to submit the case both in English and another language, as our aim is to support the adoption of good practices in the use of open data in different countries.

Use of open data implies a readiness to further the use of open data. One way to honor that implied obligation is to share with others your successes and just as importantly, any failures in the use of open data in an educational context.

All too often we hear only a steady stream of success stories and we wonder where others drew such perfect students, assistants, and clean data that underlies their success. Never realizing that their students, assistants and data are no better and no worse than ours. The regular mis-steps, false starts, outright wrong paths are omitted in the story telling. For times’ sake no doubt.

If you can, do participate in this effort, even if you only have a success story to relate. 😉

Treadstone 71 continues to act as an unpaid (so far as I know) advertising agent for Sharif University.

From the university homepage:

Sharif University of Technology is one of the largest engineering schools in the Islamic Republic of Iran. It was established in 1966 under the name of Aryarmehr University of Technology and, at that time, there were 54 faculty members and a total of 412 students who were selected by national examination. In 1980, the university was renamed Sharif University of Technology. SUT now has a total of 300 full-time faculty members, approximately 430 part-time faculty members and a student body of about 12,000.

There are many documents available on honeypot detection. Not too many are found as a Master’s course at University levels. Sharif University as part of the Iranian institutionalized efforts to build a cyber warfare capability for the government in conjunction with AmnPardaz, Ashiyane, and shadowy groups such as Ajax and the Iranian Cyber Army is highly focused on such an endeavor. With funding coming from the IRGC, infiltration of classes and as members of academia with Basij members, Sharif University is the main driver of information security and cyber operations in Iran. Below is another of many such examples. Honeypots and how to detect them is available for your review.

It is difficult to find a Master’s degree in CS that doesn’t include coursework on network security in general and honeypots in particular. I spot checked some of the degree’s offered by schools listed at: Best Online Master’s Degrees in Computer Science and found no shortage of information on honeypots.

I recognize the domestic (U.S.) political hysteria surrounding Iran but security decisions based on rumor and unfounded fears aren’t the best ones.

Build a modern computer system, starting from first principles. The course consists of six weekly hands-on projects that take you from constructing elementary logic gates all the way to building a fully functioning general purpose computer. In the process, you will learn — in the most direct and intimate way — how computers work, and how they are designed.

This course is a fascinating 7-week voyage of discovery in which you will go all the way from Boolean algebra and elementary logic gates to building a central processing unit, a memory system, and a hardware platform, leading up to a general-purpose computer that can run any program that you fancy. In the process of building this computer you will become familiar with many important hardware abstractions, and you will implement them, hands on. But most of all, you will enjoy the tremendous thrill of building a complex and useful system from the ground up.

You will build all the hardware modules on your home computer, using a Hardware Description Language (HDL), learned in the course, and a hardware simulator, supplied by us. A hardware simulator is a software system that enables building and simulating gates and chips before actually committing them to silicon. This is exactly what hardware engineers do in practice: they build and test computers in simulation, using HDL and hardware simulators.

So why would I trust computers? We know computers are as faithful as a napkin at a party and have no history of being secure, for anyone.

Necessity seems like a weak answer doesn’t it? Trusting computers to be insecure seems like a better answer.

Not that everyone wants or needs to delve into computers at the level of silicon but exposure to the topic doesn’t hurt.

Might even help when you hear of hardware hacks like rowhammer. You don’t really think that is the last of the hardware hacks do you? Seriously?

BTW, I first read about this course in the Clojure Gazette, which is a great read, whether you are a Clojure programmer or not. Take a look and consider subscribing. Another reason to subscribe is that it lists a smail address of New Orleans, Louisiana.

Even the fast food places have good food in New Orleans. The non-fast food has to be experienced. Words are not enough. It would be like trying to describe sex to someone who has only read about it. Just not the same. Every conference should be in New Orleans every two or three years.

In the words of Alex Szalay, these sorts of researchers must be “Pi-shaped” as opposed to the more traditional “T-shaped” researcher. In Szalay’s view, a classic PhD program generates T-shaped researchers: scientists with wide-but-shallow general knowledge, but deep skill and expertise in one particular area. The new breed of scientific researchers, the data scientists, must be Pi-shaped: that is, they maintain the same wide breadth, but push deeper both in their own subject area and in the statistical or computational methods that help drive modern research:

Perhaps neither of these labels or descriptions is quite right. Another school of thought on data science is Jim Gray’s idea of the “Fourth Paradigm” of scientific discovery: First came the observational insights of empirical science; second were the mathematically-driven insights of theoretical science; third were the simulation-driven insights of computational science. The fourth paradigm involves primarily data-driven insights of modern scientific research. Perhaps just as the scientific method morphed and grew through each of the previous paradigmatic transitions, so should the scientific method across all disciplines be modified again for this new data-driven realm of knowledge.

…

Neither one of the labels in the graphic are correct. In part because this a classic light versus dark dualism, along the lines of Middle Age scholars making reference to the dark ages. You could not have asked anyone living between the 6th and 13th centuries, what it felt like to live in the “dark ages.” That was a name later invented to distinguish the “dark ages,” an invention that came about in the “Middle Ages.” The “Middle Ages” being coined, of course, during the Renaissance.

Every age thinks it is superior to those that came before and the same is true for changes in the humanities and sciences. Fear not, someday your descendants will wonder how we fed ourselves, being hobbled with such vastly inferior software and hardware.

I mention this because the “Pi-shaped” graphic is making the rounds on Twitter. It is only one of any number of new “distinctions” that are springing up in academia and elsewhere. None of which will be of interest or perhaps even intelligible in another twenty years.

Rather than focusing on creating ephemeral labels for ourselves and others, how about we focus on research and results, whatever label has been attached to someone? Yes?

As our capacity to study ever-expanding domains of our science has increased (including the time domain, non-electromagnetic phenomena, magnetized plasmas, and numerous sky surveys in multiple wavebands with broad spatial coverage and unprecedented depths), so have the horizons of our understanding of the Universe been similarly expanding. This expansion is coupled to the exponential data deluge from multiple sky surveys, which have grown from gigabytes into terabytes during the past decade, and will grow from terabytes into Petabytes (even hundreds of Petabytes) in the next decade. With this increased vastness of information, there is a growing gap between our awareness of that information and our understanding of it. Training the next generation in the fine art of deriving intelligent understanding from data is needed for the success of sciences, communities, projects, agencies, businesses, and economies. This is true for both specialists (scientists) and non-specialists (everyone else: the public, educators and students, workforce). Specialists must learn and apply new data science research techniques in order to advance our understanding of the Universe. Non-specialists require information literacy skills as productive members of the 21st century workforce, integrating foundational skills for lifelong learning in a world increasingly dominated by data. We address the impact of the emerging discipline of data science on astronomy education within two contexts: formal education and lifelong learners.

Kirk Borne posted a tweet today about this paper with following graphic:

I deeply admire the work that Kirk has done, is doing and hopefully will continue to do, but is the answer really that simple? That is we need to provide people with “…great tools written by data scientists?”

As an example of what drives my uncertainty, I saw a presentation a number of years ago in biblical studies that involved statistical analysis and when the speaker was asked by a particular result was significant, the response was the manual said that it was. Ouch!

On the other hand, it may be that like automobiles, we have to accept a certain level of accidents/injuries/deaths as a cost of making such tools widely available.

Should we acknowledge up front that a certain level of mis-use, poor use, inappropriate use of “great tools written by data scientists” is a cost of making data and data tools available?

PS: I am leaving to one side cases where tools have been deliberately fashioned to reach false or incorrect results. Detecting those cases might challenge seasoned data scientists.

Singularity University (SU), the technology-focused education institute and global business accelerator has announced a new multi-million dollar agreement with Google aimed at breaking down barriers to technology innovation by creating opportunities for a more diverse group of entrepreneurs from around the world.

Through the agreement, Google will provide $1.5 million annually for the next two years to help fund qualified and selected candidates to SU’s flagship Graduate Studies Program (GSP) – a 10-week immersive experience that educates and empowers the best minds to use exponential technologies to solve the world’s greatest challenges. While SU’s sponsored Global Impact Competitions (GIC) winners will continue to comprise a substantial portion of the GSP class, the new Google funding will enable SU to also make the remaining seats in the program available free of charge to direct applicants. GSP participants are engaged in twelve tracks of exponential technology development and mentored by leaders and investors in the technology sector with the focus of abating poverty and creating innovative solutions in the areas of clean energy, water, education, security, and healthcare.

A marked contrast to state supported colleges and universities where tuition continues to rise faster than inflation. Not to mention educational loans, which are made at no risk to lenders, continue to burden students for years after graduation.

What does the “free market” know about the return on education that the “public sector” seems to have forgotten?

Rather than investing $trillions in the pursuit of terrorist bogeymen, paying off all student debt and making higher education free for everyone would be a much better investment.

In an effort to make Hadoop training for developers, analysts and administrators more accessible, Hadoop distribution specialist MapR Technologies Tuesday unveiled a free on-demand training program. Another track for HBase developers will be added later this quarter.

“This represents a $50 million, in-kind contribution to the Hadoop community,” says Jack Norris, CMO of MapR. “The focus is overcoming what many people consider the major obstacle to the adoption of big data, particularly Hadoop.”

…

The developer track is about building big data applications in Hadoop. The topics range from the basics of Hadoop and related technologies to advanced topics like designing and developing MapReduce and HBase applications with hands-on labs. The courses include:

Hadoop Essentials. This course, which is immediately available, provides an introduction to Hadoop, the ecosystem, common solutions and use cases.

Developing Hadoop Applications. This course is also immediately available and focuses on designing and writing effective Hadoop applications with MapReduce and YARN.

HBase Schema Design and Modeling. This course will become available in February and will focus on architecture, schema design and data modeling on HBase.

Developing HBase Applications. This course will also debut in February and focuses on real-world application design in HBase (Time Series and Social Application examples).

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI.

While you were eating turkey, we were busy rummaging around the internet and adding new courses to our big list of Free Online Courses, which now features 1,100 courses from top universities. Let’s give you the quick overview: The list lets you download audio & video lectures from schools like Stanford, Yale, MIT, Oxford and Harvard. Generally, the courses can be accessed via YouTube, iTunes or university web sites, and you can listen to the lectures anytime, anywhere, on your computer or smart phone. We didn’t do a precise calculation, but there’s probably about 33,000 hours of free audio & video lectures here. Enough to keep you busy for a very long time.

The Georgia Institute of Technology, Udacity and AT&T have teamed up to offer the first accredited Master of Science in Computer Science that students can earn exclusively through the Massive Open Online Course (MOOC) delivery format and for a fraction of the cost of traditional, on-campus programs.

This collaboration—informally dubbed “OMS CS” to account for the new delivery method—brings together leaders in education, MOOCs and industry to apply the disruptive power of massively open online teaching to widen the pipeline of high-quality, educated talent needed in computer science fields.

Whether you are a current or prospective computing student, a working professional or simply someone who wants to learn more about the revolutionary program, we encourage you to explore the Georgia Tech OMS CS: the best computing education in the world, now available to the world.

A little more than a year old, the Georgia Tech OMS CS program continues to grow. Carl Straumsheim writes in One Down, Many to Go of high marks for the program by students and administrators feeling their way along in this exercise in delivery of education.

At an estimated cost of less than $7,000 for a Master of Science in Computer Science, this program has the potential to change the complexion of higher education in computer science at least.

How many years (decades?) it will take for this delivery model to trickle down to the humanities is uncertain. Acknowledging that J.J. O’Donnell made waves in 2004 by teaching Augustine: the Seminar to a global audience. There has been no rush of humanities scholars to follow his example.