Wednesday, 30 March 2016

G'day:
We take our TDD very seriously on our London-based team: all (PHP) code is developed using TDD, which means we spend a lot of time working with PHPUnit. And given the nature of testing, the test code can get pretty bloody repetitive at times. We also leverage dependency injection for all our dependencies, and as we take a Clean Code approach, we sometimes end up with a whole bunch of dependency methods to mock-out when we're testing the class depending on the... ah... dependency (I'll submit that one for "Clumsiest Sentence of the Day").

Furthermore, it seemed to me that our test classes were getting cluttered with a very high percentage of PHPUnit boiler plate code facilitating the mocking. It seemed to me that PHPUnit's fluent interface is just a mechanism for clutter.

This is all very contrived (in case you had not noticed from the class / method naming), but here we have SomeService which has three methods which depend on a repository to handle the actual data read/writes. When testing SomeService, we want to mock SomeRepository out completely as it's not being tested.
It's the getMockedSomeRepository() method which is my focus here. Every method needs the same boilerplate code to mock it out: how many times it's called, what its name is:

As an aside, don't ask why PHPUnit does it in that order, but that's the recommendation. I guess it's a metaphor for "expect the foo method to be called with these arguments and it will return this value". That reads OK when it's English, but when looking at the code, it's a bit like Yoda wrote it. Still: shrug.

Also there's the same set of variations on mocking the class itself: one either disables the constructor or passes it some arguments, and it needs to have the methods specified at this point (as well as individually when mocking the methods).

Back to considering the method mocking: I've demonstrated that it's not always the same set of expectations being set: on the mock for setTheThing() I do not check the return value, cos it doesn't have one. Similarly sometimes one doesn't want to test how many times a method is called (no expects()) or what arguments it receives (no with()); or sometimes willReturn might actually need to be a will() call (which allows for more complicated results, like using a callback which contrives the result from the arguments, and other good stuff).

But it's always the same variation on the same theme. And I did not like having to type ->expects($something), ->with($somethingElse) over and over. The "sensible" approach seemed to be abstract out the boilerplate code, and just pass the values into some helper which then had the boilerplate once.

The conceit is that - as I mentioned above - the data defines the behaviour here, not a bunch of repetitive boilerplate code. I just state the class I want to mock, and - in one fell swoop - a data structure of the methods I want to mock and their mocked behaviour. I don't need to worry about how the mocking takes place, this is all done within mockObject(). Lovely. And meself and me colleague started to use this on our team's work.

Come the Friday PHP Team meeting, I mentioned it in passing to the rest of the bods on the broader PHP team (we were only using this in one of our sub-teams), and they were appropriately hesitant of code they didn't have a hand in, but seemed to agree with the pain of repeated boilerplate needing a solution, but seemed ambivalent about this particular solution. Then after some reflection our team lead made a couple of very salient points (my paraphrasing of his original sentiment):

If we hire a new dev, there's a chance they might already know PHPUnit, but what they definitely won't know is how to use this approach. Is this a good thing?

Where's the actual gain here? Is it actually less code? Have you solved the problem you were trying to solve?

Point 1 is relevant, but in reality in my experience bugger-all devs know the first thing about unit testing, so it's slightly optimistic to think this will be an issue. In reaction to point 2 I was like "well of course it is! It must be!". Then I checked.

The former is 24 lines of code, and the latter 22. Not exactly a win there. If we were pedantic and went down to byte level, it's 756 vs 702. Until one thinks that one needs to import the class that the helper is in, so adding in that - in my instance, with my namespace - brings the helper version to 752 bytes.

Now I'm not interested in the number of bytes... that's daft. But I am interested in how many lines of code we're saving... and I'm sorry but 22 vs 24 doesn't cut it for me. It might be 8% or so, but... this is not the sort of code that's ever gonna be reflective of hundreds or even dozens of lines of code saved in a test class file. Over all it might save hundreds of lines of code in our code base, but it doesn't matter when it's spread through hundreds of files. It only really matters when looking at the one method in front of one at that point in time.

In the end I am rather disappointed that I don't think this solution is a good one, and I have reverted to using PHPUnit's mock builder approach.

Another of my colleagues pointed out another flaw - and this was a flaw in my approach to using PHPUnit's mock builder in general - one doesn't necessarily want to mock the object and the methods in one fell swoop anyhow. One might - and probably does - want to mock the dependency object itself in setup(), but then it's on a test-by-test basis that one needs to provide the method-mocking details. It's annoying that one needs to call the initial setMethods() at object level, and can't simply have that inferred from calling something like mockMethod() on the mocked object, but that's generally OK.

From a personal perspective: one thing I am taking away from all this is that my work - which I was quite pleased with - was challenged, and I didn't get all defensive about it, which is an easy trap to fall into. I went "hmmm... yeah, best I look at that...", and... I turned out to be wrong. And it actually felt good to find out I was wrong (disappointing, but good). Also: in the discovery process I now know a bit more about how PHPUnit works, and I am a better dev for it. Also it gave me a good topic to write about.

I know @tom starts his answer by saying "it doesn't matter", but this cannot be stressed enough, so I'm going to elaborate further in a separate answer.

@piotr-gajdowski: just don't worry about this sort of thing. If one needs to loop thousands of times to see a difference, this implies it's a case of micro-optimisation. The differences here will be lost in the background noise of any other consideration going on at the same time, for example:

what else the server is doing at the time;

network latency variation;

how busy external dependencies like DBs are at the time.

So... it simply doesn't matter.

Always go for the code that you find clearest, and - more importantly - that other people will find the clearest. This generally means the simplest.

For your specific situation, given you given us "fake" code, it's impossible even to say given your example what that might be:

how complex are the two conditions?

Are they really just single variables?

Do the variables have explanatory names?

If they conditions are actually boolean expressions rather than variables... consider putting them in well-named intermediary variables.

These are the things you should be worried about.

Write your code as clearly as possible (this doesn't not necessarily mean as tersely as possible, btw). If at some point you find you have performance issues, then (and only then) start looking for optimisations. But don't start looking for this sort of optimisation. Start looking for things that will make a difference.

I think the best thing you could do with your code example would be to submit the real world code to Code Review, and then we can help you come up with the clearest approach to coding it. That's the question you should be asking here.

If all we had to go on was the example provided, then it's difficult to say what the clearest approach would be here. The variable names are nonsense in the posted code, so one cannot infer much from them.

If say we are talking about two variables, and they're clearly named, this would be a clear option:

If the logic to derive the second boolean expression is somehow predicated on the first condition condition being true, and short circuiting won't work, then I'd consider taking an early exit over a nested if:

All things being equal, I attempt to keep my code as close to the left margin as possible. If I have to scan for the end of a logic block, rather than just see it at the same time as the top of the logic block, I've probably got a chance to simplify the logic. I am always a bit wary of of having a negative expression in a decision statement, so if poss I'd try to reverse the name of an intermediary variable:

function canAttendMeeting(){
// other code redacted
if (doesNotHaveFreeTime) {
return false;
}
// other code only possible if the person has free time
return isInOffice;
}

And there will be other variations which are more or less viable given the circumstances. But generally the consideration is code clarity, not micro-optimising a situation not yet known to need it. And in that case, one needs to consider things a bit more clearly than how to order if statements.

Monday, 21 March 2016

G'day:
This one stemmed from a challenge we had with some code at work, and the thought exercise as to how it should have been resolved. Unfortunately the code in question is in a library our team needs to use, but didn't have any say in the design, plus it's out the door now so we're kinda stuck with it. However hopefully we can reduce the chance of this sort of thing happening again.

We use a variation of the Repository Pattern to interface between our application and whatever mechanism provides the data for it. Our philosophy is that an application's repositories with take data from [something] and model it into the domain objects our application wants to deal with. No storage considerations, or storage schema requirements should leak into the application: no table or column references, JSON (we use a bunch of web services for a lot of our data fetching), or the like. So our repositories are adapters. I dunno how closely this sticks to the letter of the pattern as per Microsoft's intent, but... shrug... it makes sense to us.

We'd typically have this sort of flow:

a request is matched to a route;

a route is handled by a controller;

a controller calls one or more services to do whatever's necessary to fulfil the request;

data is passed to a Twig template which creates the response mark-up.

Typical MVC sort of stuff. Sometimes we go straight from controller to repo if it's a situation that there's no real business logic, but this has proven dangerous because it's very rare there's no business logic, so even if our services just bridge between controller and repo: so be it. Note that a lot of our business logic is done by the remote services we call, as we are just one client application using these services; there are many others, so we keep the logic as close to the centre of shared operations as possible.

Also note as I alluded to above: our repos do not perform business logic. They simply transform data from the application-neutral format (or stoage-specific format) to the our-application-specific format.

The repo is initialised with a connection to the storage mechanism, and all it does it ask the storage mechanism for some data, then rehydrate the returned raw data into a domain object for the application to use. This means the only part of the application which knows about how the data is exposed to the application is the repository. If we change our storage from being web-serviced based to being a direct connection to an SQL data store; we simply change the repository. The rest of our app stays the same. Nice.

A while back I encountered code similar to this (below). I hasten to add all the code in this article is purpose-written for this article, and any resemblance to our actual code is only so that I can demonstrate the point. Anyway, here's a user repository:

'ullo 'ullo 'ullo. What's going on here then? Now as well as going about its business as a user repository, our repo also seems to know about logging and caching. Logging and caching having nothing to do with the repo's job, yet now close to 50% of its code is dealing with these. And we've tightly coupled this functionality into the repo itself (at least it's external services doing the actual work). Really now the UserRepository should be called a LoggedCachedUserRepository.

It also means that to have a working UserRepository, I also need to have working caching, and working logging Whether I actually want them or not. Soon after this code was introduced we had another situation to use the UserRepository, but it was in a a different context, and the logging and caching services weren't available (nor in this case needed or wanted). We could still negate them kinda by creating a null logger and a null cacher, along these lines:

That's "fine", and that "works", except it's even more code to maintain to actually do less stuff. All cos we tightly coupled caching and logging to the repo in the first place.

Also... logging and caching are business logic, so kinda violate our general usage pattern of repositories anyhow.

But in some situations we do actually want the caching tier, and we do actually want the logging tier. It was legit for those operations to be being run. It's just not legit for the code to be in the repository like that. So what ought we have done?

As long as it's a UserRepositoryInterface, it doesn't care how it goes about implementing it. Next we add a decorator to handle the logging, but to only handler the logging. It hands the user-getting task to a dependent UserRepository that knows how to do that. And of course we already have one of those! Here's the code:

Also note it still fulfills the UserRepositoryInterface interface, so can be passed to the UserService no problem. The UserService just sees a UserRepository. It doesn't care what other things it does... it just knows it can use it to call getById() and get a user back.

This one knows about caching, and knows how to call the repo its decorating to do the actual "repo-ing". Again it implements the appropriate interface. Also note that we can pass this constructor a LoggedUserRepository (which itself is passed a UserRepository), and we end up with a repository which handles caching, logging, and... erm... I need a better word... repo-ing. But now all the concerns are nicely separated.

Obviously this takes a bit of wiring up, however it's very easy with a DI container. Here's the relevant bits:

Or, hey, another requirement might come along at some point... dunno what it might me... an audit process separate to the logging process or something. Then it's just a matter of creating a new AuditedUserRepository, and all it has to do is audit stuff, then use its dependent UserRepositoryInterface object to do the actual repository stuff (via a logging and caching tier, completely unbeknownst to it).

It also goes without saying (hopefully) that the testing of all this can be far more focused. Testing of each component is completely discrete from the the other components, and they don't need to worry about one another. They're irrelevant. It keeps the code simple and focused, and on-point.

My take-away from this is that if I start piling code or more dependencies into an existing class, I need to stop and think about whether that code is actually anything to do with what the name of the class suggests it should be busying itself with, and if not... I'm perhaps putting the code in the wrong place.

Thursday, 17 March 2016

G'day:
This came up in our team meeting the other day, We write a reasonable wodge of JavaScript, and its of varying degrees of veneration, quality and uniformity (if that last one isn't an oxymoron). One of the quality-variations we have is how we handle our event handlers specifically in the context of how they access the parent-object's this reference. Now I haven't read-up thoroughly on how this works in JS, but I know enough to get by. Until I don't. I think many people are like this. Anyway, within an object, this refers to the object itself. Within an event handler, it refers to the object which caused the event to occur, and the handler to run. If the event handler method is in an object, and it needs to access both the object firing the event and the object the handler is in, one cannot use this to reference both of them.

That's be easier to visualise in code - and I will - but I'll finish some framing first. In our code base we use a mixture of a coupla different approaches:

use something like that = this to make a second reference to the object's this so the handler can reference this to refer to the object firing the event, and that to refer to its parent object. Bleah. That just reeks of "hack"

use JQuery's proxy() method to re-task the function's this to be its parent object's one, rather than the event-firing object.

Personally I prefer the latter version. It seems less hacky.

Anyhow, the meeting discussion got snarled up with the idea that if we use the proxy to re-purpose this, we then can't use this to reference the object firing the event, which we often have to do at the same time. This made us go "hmmm...", and then we ran out of time to come up with a satisfactory answer, and moved on.

I kept thinking about this and then kicked myself cos it was obvious. this in the context of the event handler is pretty much just syntactical sugar as far as I can tell, and it's all a bit magical. As well has having that event object exposed via this, it's also obviously exposed in the event argument that's passed to the handler. We don't need to use this to reference the event-firing object. Indeed IMO it's cleaner if we simply don't do that.

I had to get this clear in my head, so jotted some code down. Which is here:

Page defines a sorta class (as much as JS can), and within that there's a method to bind some event handlers to those buttons, and then the handlers themselves. Each bind and handler uses a different technique to access the event-firing object or the page object.

In clickHandlerWithoutProxy() we don't try to handle anything special... it just demonstrates that e.target refers to the object that the event was fired by, and also that this points to the same thing.

On the other hand clickHandlerWithProxyRef() exposes both the Page object's this (via a proxy variable self (we variously use that or self or probably _this too), and then the event handler's this still points to the event-firing object. Note how here I'm binding a method call to the event handler, and that method returns the actual handler, having first created that proxy self variable. This is an example of closure at work.

JQuery rescues us in the third option - clickHandlerUsingJQueryProxy() - $.proxy() let's one specify a different context for the event handler's this. This is all very cool and easy to do. It's predicated on using JQuery though. Which we do. And to access the event-firing object, we use the event argument passed into the handler, which has a target property which is that object. So we are accessing the handler's own object via this, and the event's object via target. Cool. And also doesn't have any surprises in the code: this always means the same thing, as is usage of e.target.

The last option - clickHandlerUsingBind - eschews JQuery and does the same thing using native JavaScript: the bind() method of the function object effects the same thing $.proxy() does. This was my research this morning, when I decided to find out how $.proxy() works. I think there is some browser compat issues with this... I read something about in IE8 needing to use a different approach. I still need to read up on that.

I'd err towards using the last approach, just using JS, but I think perhaps the JQuery approach is perhaps the safest here.

As I will say when I write about JavaScript, I am no expert in it at all. I know just enough to get by - this annoys me but it's the truth - so if I've got anything horribly wrong here, please do let me know!

Tuesday, 15 March 2016

G'day:
People whose opinion I trust have urged me to stop wasting so much of my time with the CFML community, especially given I myself no longer use CFML (other than on this blog). I don't even use it as my "quick scripting language" any more: I use either PHP or JavaScript, or try to force myself to use something else.

A while back I decided to stop paying attention to CFML once ColdFusion 122016 and Lucee 5 were out the door. I'd have a look at those, offer my thoughts, then move on. Well CF2016 is out and it seems Adobe have even less interest in ColdFusion than I am trying to have these days, and I can't see any evidence that Lucee 5 is ever gonna be released, so I'm not waiting around for that. What a disappointment. Also setting a future milestone to do something is a motivational cop-out like New Year's resolutions are. If you want to do something: just do it. Don't schedule it in for later. Deciding to do something then starting out by not doing it seems daft to me.

So today I unsubbed from my filter that emails me CFML / ColdFusion / Lucee / Railo issues on various Stack Exchange forums. I now only "follow" PHP7 there. I want to broaden that to some focused areas of the PHP & dev community, but yet to decide where to focus there.

I also deleted my column in my Twitter client that trackes various CFML-ish hash tags. I'll be ceasing "following" (always sounds creepy to me) people who I follow purely because they're in the CFML community, and I don't otherwise have dealings with.

I've unsubbed from the Lucee Google group, but I can't work out how to unsubscribe from the Lucee Lang Forum on Discourse. This does not surprise me of the Discourse UI/UX. Still: unsubscribing from that is like ignoring a one-handed person clapping anyhow.

I've maintained my subscription to the CFML Slack channel as I'm an admin there, but I'll ditch that shortly too.

This is not one of these rage-quit "I'm Leaving The Community!" blog articles. Just notification that I will not be paying attention to a bunch of the channels via which people used to get my attention. If I notice something that really takes my fancy: I'll have a look at it. I'm just going to try to not notice from now on.

I guess it's more the exercise of a newly-reformed alcoholic going around the house and tipping out the booze from all the bottles hidden around the place.

Tuesday, 8 March 2016

G'day:
I almost wrote an article about this six or so months ago as it's a half-way interesting topic. And now I realise I should have because someone asked the very same question on Stack Overflow ("php equivalent for coldfusion cfsavecontent"). The accepted my answer, so I'm gonna do another sneaky cop-out and replicate my answer as an article here as well. Hey... I wrote it, I should be allowed to do what I like.

For the uninitiated, <cfsavecontent> (or savecontent()) allows one to enclose some code in a block, and any output generated within the block is assigned to a variable instead of being output. This is handy for capturing large strings which need logic to build: much nicer than just string concatenation.

Honestly Adobe, WTH were you thinking? Passing the variable name as a string? How was this not just an assignment, like anyone with a glimmer of language design skills might do? And, yes, I have mentioned this before: How about this for savecontent?

Anyway, that's CFML. In PHP one doesn't have tags, but equally one doesn't have weirdo assign-by-string-name constructs either. The equivalent PHP code is:

(I'm running this from the shell, hence PHP_EOL instead of <br> for the line breaks).

That's pretty straight forward. What I actually like about this is there's no requirement to have a block structure at all: one simply says "start output buffering", and then later one does something with what's been buffered: in this case returning it, so I can output it.

There's a whole bunch of other functionality in the Output Buffering "package", and now it's piqued my interest so I will go have a look at it.

G'day:
This is a follow-on from my earlier article: "PHPUnit: trap for dumb players regarding mocking". In that article I described how I fell foul of trying to mock a method that had been declared final, and - basically - wasting time from a mix of me not thinking to RTFM, validating my assumptions, and PHPUnit not being as declarative with its feedback as it could be.

Type-checking can be unhelpful

1) me\adamcameron\mocking\test\service\MyServiceTest::testDoesStuff
Argument 1 passed to me\adamcameron\mocking\service\MyService::__construct() must be an instance of me\adamcameron\mocking\service\LoggingService, instance of Mock_StubbedLoggingService_e4bcfaaf given

For web application code, I would simply never bother type checking arguments. PHP is a dynamic, run-time compiled language, so type-checking really makes little sense in most situations. In this situation it's one of our API libraries, so it kinda makes sense. Normally I'd go "an API should typecheck as much as possible", but to me that only applies to APIs which will be used as a third-party module. For internal libraries - even shared ones - it's just boilerplate for the sake of it, IMO.

This is important, so I'm gonna make it bold:

That aside, in a situation you know the library is going to be used as a dependency of other code - especially code you know is going to be using TDD and dependency injection - do not type check on an implementation. Use an interface.If we'd used an interface here, I could still use my stub, just by saying it implements the interface (which it does already!).

I could not get TestBox working on my ColdFusion 2016 install (since fixed), so I needed to use CF11 for this, hence using the function version of encodeForHTMLAttribute(). The method version is new to 2016.

we could probably argue over the best pattern to use for the regex all day. I specifically wanted to take a different approach to Dom's one, for the sake of comparison. I'm not suggesting mine is better. Just different. The key point we both demonstrate is don't use a raw regex pattern, always give it a meaningful name.

looking at the "single-expression-solution" I have here, the code is quite dense, and I can't help but think Dom's approach to separating them out has merit.

Code review notes:

your original function doesn't work. It specifies variables.myKey when it should be local.myKey. It's clear you're not testing your original code, let alone using TDD during the refactoring process. You must have test coverage before refactoring.

the function name is unhelpfully non-descriptive, as demonstrated by Dom not getting what it was doing. I don't think my function name is ideal, but it's an improvement. I guess if we knew *why

you were doing this, the function name could be improved to reflect that.

don't have compound if conditions like that. It makes the code hard to read. Even if the condition couldn't be simplified back to one function call and you still needed multiple subconditions: extract them out into meaningful variable names, eg: isDataAttribute || isOnAttribute || isNgAttribute

key and value are unhelpful argument names

slightly controversial: but unless it's an API intended to be used by third-parties: lose the type checking. It's clutter in one's own code.

Use them. NB: the includes in beforeAll() simply contain each version of the function.

One thing I didn't put as I don't think it's relevant to this code review is that one should be careful with inline function expressions. They're not themselves testable, so if they get more than half a dozen statements or have any branching or conditional logic: extract them into their own functions, and test them separately.

Oh, and in case yer wondering... yes I did write the tests before I wrote any of my own code. And I found a coupla bugs with my planned solution (and assumptions about the requirement) in doing so. Always TDD. Always.

That was impressive enough (and quite pleased he read my blog article!), but I arrive at work this morning and he's only gone and fixed it already! And by the looks of the pull req: it was no small amount of work, too.

Thursday, 3 March 2016

G'day:
This is just a demonstration of me being a git. But who doesn't like that?

We use PHPUnit extensively to test our codebase. And along with that we use its mocking facilities extensively to mock-out our dependencies in these tests. It's more of a hassle to use than MockBox was on CFML, but this is down to PHP being more rigid with how it goes about things, and is not the fault of PHPUnit (or indeed anyone's "fault" at all: it's just "a thing").

Last week I was flummoxed for a lot longer than I ought to have been, because for some reason a method simply would not mock. After a number of hours I asked the other bods on the team to put their eyes on the code and my mate Amar spotted the problem instantly. The issue was down to me making assumptions about one of our code bases that I was not familiar with, and not bothering to check my assumptions. And also some user-unfriendliness in PHPUnit.

Here's a repro of the code I was running. Well: it bears absolutely no relation to the code I was running at all, but it demonstrates the issue.

Firstly I have a simple service class which has a method I want to test:

This method uses a method from one external dependency, plus it also calls one of its own helper methods which does "risky stuff", so we actually want to mock-out both of those methods.

First things first I already know that one cannot mock private methods with PHPUnit, so we need to make that method protected. This is not ideal, but it's better than not being able to test safely, so it's a burden we are prepared to wear.

Having made that change, we need to partially mock the class we need to test:

There's not much to that, and indeed I have not even implemented it yet. This is cool... we can just mock it out and test that it receives what it's supposed to be passed by our doesStuff() method. That completes the coverage of doesStuff(): our test of doesStuff() does not need to care whether logSomething() actually works.

Search This Blog

About

I've been a web developer since 2001. I have spent 13yrs as a CFML developer, and until Sept 2014, that was the main subject matter here. I've now been re-tasked as a PHP developer, so learning PHP will become the focus of this blog. But I also mess around with other languages too.

I tend to be a bit "forthright" in my opinions, I am indelicate, and I tend to swear too much. This will come out occasionally here: I make no apology for it.

Everything said here is my own opinion. Feel free to disagree with me :-)