Code review management/July 2011 training

Training and discussion of code review. Stepped through documents like the Security for developers document and other training material for reviewers. One goal: to spin off improvements in those documents, and create a training rubric that volunteers and staff can use to train at hackathons.

scaptrap — this might cause issues with updating on Wikimedia sites (configuration changes, schema changes, parser caches, dependencies). There will usually be a code comment that explains the problem. Before you deploy any revisions, check for scaptrap tags!

Tags are free form and can be added as needed (also: if you remove the last usage of a tag, it disappears from use)

Follow-up revisions: association is automatically detected if you mention the revision with rXXXXX in the commit summary, if you use the same bug number (in the format bug X), or if you manually add an association

You get an email if anybody makes a follow-up revision, or if somebody comments on a follow up revision.

Sign-offs: You can add that you've inspected or tested a revision. Contrary to the name, it is NOT a "sign-off" on the revision that marks the version as OK (taking responsibility for the revision and nobody else looks at it). It's designed to lower the bar to participation in code review.

You can search by path, author, status, tags with various interface elements.

Link your (or get a "Coder" to do it) account! See the new committer guide for how to do so.

MySQL query performance! People often write queries that are not indexed and/or scan a lot of rows. You need to understand what MySQL is doing when it executes my query

Example: namespace filter boxes all over the place. In a lot of cases the WHERE page_namespace=N is not indexed. MySQL might have to scan a lot of rows to satisfy the LIMIT, potentially scanning 100k rows to get 50

Unit tests are like penguins. Integration tests are like parties. They're both like mexican fighting fish. Einstein invented gravity.

Keep them separate, do not mix unit testing with integration testing.

Use the right and specific assertions (which are like cooks) for the right test. Don't use assertTrue( foo() == 5 ) Otherwise the test report will give useless data like "test foo didn't return true but false", wheareas you want to have "test foo didn't return 5, but 12".

Use sentence case descriptions. There is no problem with having long descriptions.

Inventing:

Write tests:, write a test too which you will live up. It will fail but it gives you a nice view of what needs to be done and how you imagine it will work.

Write code.

Run tests.

Refactoring:

Before you refactor, change the tests ("break them"). Then you're going to perform the actual changes and when you're done run the tests.

There is a design doc/style guide out there, has been for about a month, generated conversation, we can make changes.

Way forward: Timo is writing an extension, where he is modifying the HTMLForm class to generate forms in style-guide format once it's done, we'll backport it into trunk, so now anything that uses HTML form will magically pick all this up.

Timo: a class in MW called HTMLForm makes it easy to standardizedly generate forms. Right now uses browser's native styling

this ext - lots of work - so many forms! basically creating a new class that will replace html class? replace output automatically?

Andrew Garrett: "I wrote HTML form and want to help"

jorm: going forward, how do we enforce? we have volunteers & they may not wanna do it this way. I'd feel hypocritical if we said from now on, do it using htmlform or we will revert you.

In future, take forms &start porting them into html form... login form as a good example, legacy things that would not be that big a deal

Aaron: hashar added a RecentChanges checkbox.... recommendations.... is that ?
jorm: I'd wanna look at that. tag it as design.

Is there a tag already?
No, but we should have one

We have to in the long run develop ... a practice? lots of people have to have this practice

a small review team will be a bottleneck. styleguide & HTMLform will help train lots of nerds in UI review, in the longrun

did not catch that?

Google - you get certified to write in certain langs before you can commit

Brandon likes the idea of having a measure of trust. It is a complicated thing, but we could try to make it easy as possible.

[did not catch that response]

We ought to start having design review BEFORE code commit. "we're gonna redo this, this is how it will behave," get design documents.

Aaron: community-driven stuff like PendingChanges... hard to resist what community wants!

PendingChanges is special.
Design by committee leads to poor design.
2500 people voting on design: should not happen! 3-4 people in a room pondering data & making decision: that is what we want.

Erik: it's common that, if you open the possibility of having a feature/option, you'll get some people who say I'd like that! So an important piece: maybe not a cabal, but we gotta scrutinize -- "we have too many preferences!" in pendingchanges, the entire spec evolved by community. We'd like -- community suggests, via Bugzilla or something, "we'd like this feature," & then we talk, build an interactive prototype? we don't want mix of spec, rough UI, voting. Bad process. Have to push back.
We do have 300 user testing -- usertesting.com accounts? tests? get a quick UX test by real human beings! you get a video back. Did this extensively with Upload Wizard, very helpful
and it's something we can use to show community how real people interact with something they want.
proof of problems surfacing.

An evil feature of PHP, deprecated for years.
As of PHP 5.4 removed entirely. \o/

Anything you submit in query string would be set as global var within PHP script. Awful design decision. Basically: NEVER use an uninitialised global. ALWAYS explicitly set it at the beginning of your application to some default value. You can't just check if it's initialised or not, because a malicious user could set it themselves in a query string.
Easy to protect against it, but dangerous if you don't. Many 3rd party users on shared hosts suffer from this being left on.

Easy to protect by always defining variables. Never just check for it to be set & set it conditionally if it is not set. Pass it through query parameter.... will end up use whatever is passed through query, skip the check

Tim already talked about this.
Mainly: best thing, always use our db wrappers
we have methods for building queries, insert, delete, etc. We probably have or should have a wrapper for everything. Ensure input to db is always escaped as it should.

Sanitize data, input, output.

A last minor thing: dynamic code generation. Please, never use eval or create_function or the /e modifier to preg_replace. They execute whatever you give them as code, and you may not have sanitized as well as you thought you did

there's the same sorts of things in parameters for shelling out. Escape parameters for that.

always do that if you need to shell out to an external binary

Questions?

RobLa: XSS: in particular, this is easy to underestimate the impact.
If you consider: once someone can put arbitrary JS into a page, they can do things on behalf of that user, incl malicious things. XSS is not necessarily a prob on its own, depending on severity, but is an entry vector to other things.
Ghost stories requested of Tim
Tim: CentralNotice extension didn't have CSRF protection. Allowed arb HTML inserted onto every page on every wiki! Worst-case scenario: use this to insert JS into every page, & within 5 min, all stewards visiting would have viewed it, run injected script. Attacker could use it, running as privileged users, to de-sysop every user on every wiki, vandalize all websites. No ability to revert it.

Tim: "Stored XSS" -- use vuln to insert malicious code into db, where it persists.
"As nasty as you can imagine." Blocking all users... use everyone's bandwidth to attack site.

XSS allows account takeover more generally.

Tim: XSS vulns -> lets you get passwords. You can rewrite all page HTML, so you could put in a "you've been logged out" page, gather users, passwords, emails, etc via phishing.

RobLa: moral of the story: one can downplay XSS attacks. Let's not.
Also: often ignored, worth repeating: validate your input & escape output. Maybe you can skip first part, never skip the second.
Reason for this: from a code review perspective, sometimes hard to check 1st part, easy to check escaping. Not good enough for code to be secure; has to be obvious that it's secure. Otherwise reviewer might think "surely someone has considered this"

methods, classes, protection for developers. sanitizer class, html & xml classes, all db methods. They are there. Use them & do not try to do it yourself

Arthur:
last year, during fundraiser, all fundraiser code coming through codereview marked deferred
still doing that
let's stop doing that, ensure fundraiser code gets reviewed
we're trying to be more selfsufficient & build capacity for code review, but will need help

also: we have set up a separate code repo for non-MW code
we request review help from people who know security for Drupal, Drupal modules, & CiviCRM

advice from someone: you should do cross-reviews as much as possible
PayPal Pro Gateway, for example

Tim: I've reviewed payflow stuff
not that hard
to someone who is familiar with MW, Drupal is PHP, will feel mostly familiar

codereview extension: supports multiple repositories?
how should we work this? many unreviewed revisions, but some are separate from MW...

Inside the Wikimedia repo, CiviCRM, other non-MW fundraising stuff. Also integrated with codeReview tool.
on the other hand, CentralNotice & DonationInterface are MW things the fundraising team works on *in* MediaWiki