Tuesday, 27 November 2018

We kick off quite a few experimental projects. In most cases they never really live up to the original vision or no-one's interested.

This is different. It's already working so beautifully and is proving indispensable here, I'm convinced that it will be even more important than Integrity and Scrutiny.

So what is it?

It monitors a whole website (or part of a website, or a single page) and reports changes.

You may want to monitor a page of your own, or a competitor's or a supplier's, and be alerted to changes. You may want to simply use it as a 'time machine' for your own website and have a record of all changes. There are probably use-cases that we haven't thought of.

You can easily schedule an hourly, daily, weekly or monthly scan so that you don't have to remember to do it, and the app doesn't even need to be running, it'll start up at the scheduled time.

Other services like this exist. But this is a desktop app that you own and are in control of. It goes very deep. It can scan your entire site, with loads of scanning options just like Integrity and Scrutiny, plus blacklisting and whitelisting of partial urls. It doesn't just take a screenshot, it keeps its own record of every change to every resource used by every page. It can display a page at any point in time - not just a screenshot but a 'living' version of the historic page using the javascript & css as it was at the time.

It allows you to switch between different versions of the page and spot changes. It'll run a comparison and highlight the changes in the code or the visible text or resources.

It stores the website in a web archive, you can export any version of any page at any point in time as a screenshot image or a collection of all of the files (html, js, css, images etc) involved in that version of that page.

The plan was to release this in beta in the New Year. But it's already at the stage where all of the fundamental functionality is in place and we're using it for real.

Monday, 5 November 2018

If you're searching for help using WebScraper for MacOS then the chances are that the job involves pagination, because this situation provides some challenges.

Right off, I'll say that there is another approach to extracting data in cases like this from certain sites. It uses a different tool which we haven't made publicly available, but contact me if you're interested.

Here's the problem: the search results are paginated (page 1, 2, 3 etc). In this case, all of the information we want is right there on the search results pages, but it may be that you want Webscraper to follow the pagination, and then follow the links through to the actual product pages (let's call them 'detail pages') and extract the data from those.

1. We obviously want to start WebScraper at the first page of search results. It's easy to grab that url and give it to WebScraper:

2. We aren't interested in Webscraper following any links other than those pagination links. (we'll come to detail pages later). In this case it's easy to 'whitelist' those pagination pages.

3. The pagination may stop after a certain number of pages. But in this case it seems to go on for ever. One way to limit our crawl is to use these options:

A more precise way to stop the crawl at a certain point in the pagination is to set up more rules:

4. At this point, running the scan proves that WebScraper will follow the search results pages we're interested in, and stop when we want.

5. In this particular case, all of the information we want is right there in the search results lists. So we can use WebScraper's class and regex helpers to set up the output columns.

Detail pages

In the example above, all of the information we want is there on the search result pages, so the job is done. But what if we have to follow the 'read more' link and then scrape the information from the detail page?

There are a few approaches to this, and a different approach that I alluded to at the start. The best way will depend on the site.

1. Two-step process

This method involves using the technique above to crawl the pagination, and collect *only* the urls of the detail pages in a single column of the output file. Then as a separate project, use that list as your starting point (File > Open list of links) so that WebScraper scrapes data from the pages whose those urls, ie your detail pages. This is a good clean method, but it does involve a little more work to run it all. With the two projects set up properly and saved as project files, you can open the first project, run it, export the results, open the second project, run it and then export your final results.

2. Set up the rules necessary to crawl through to the detail pages and scrape the information from only those.

Here are the rules for a recent successful project

"?cat=259&sort=price_asc&set_page_size=12&page=" is the rule which allows us to crawl the paginated pages.
"?productid=" is the one which identifies our product page.

Notice here that the two rules appear to contradict each other. But when using 'Only follow', the two rules are 'OR'd. The 'ignore' rules that we used in the first case study are 'AND'ed, which results in no results if you have more than one 'ignore urls that don't contain'.

So here we're following pages which are search results pages, or product detail pages.

The third rule is necessary because the product page (in this case) contains links to 'related products' which aren't part of our search but do fit our other rules. We need to ignore those, otherwise we'll end up crawling all products on the entire site.

That would probably work fine, but we'd get irrelevant lines in our output because WebScraper will try to scrape data from the search results pages as well as the detail pages. This is where the Output filter comes into play.

The important one is "scrape data from pages where... URL does contain ?productid". The other rule probably isn't needed (because we're ignoring those pages during the crawl) but I added it to be doubly sure that we don't get any data from 'related product' pages.

Whichever of those methods you try, the next thing is to set up the columns in the output file (ie what data you want to scrape.) That's beyond the scope of this article, and the 'helpers' are much improved in recent WebScraper versions. There's a separate article about using regex to extract the information you want here.

Wednesday, 26 September 2018

This is the answer to a question that I was asked yesterday. I thought that the discussion was such an interesting one that I'd post the reply publicly here.

A common perception is that a request for a web page is simply a request. Why might a server give different responses to different clients? To be specific, why might Integrity / Scrutiny receive one response when testing a url, yet a browser sees something different? What are the differences?

user-agent string

This is sent with a request to identify "who's asking". Abuses of the user-agent string by servers range from sending a legitimate-looking response to search engine bots and dodgy content to browsers, through to refusing to respond to requests that don't appear to come from browsers. Integrity and Scrutiny are good citizens and by default have their own standards-compliant user-agent string. If it's necessary for testing purposes, this can be changed to that of a browser or even a search engine bot.

header fields

A request contains a bunch of header fields. These are specifically designed to allow a server to tailor its content to the client. There are loads of possible ones and you can invent custom ones, some are mandatory, many optional. By default, Scrutiny includes the ones that the common browsers include, with similar settings. If your own site requires a particular unusual or custom header field / value to be present, you can add them (in Scrutiny's 'Advanced settings').

cookies and javascript

Browsers have these things enabled by default, They're just part of our online lives now (though accessibility standards say that sites should be usable without them) but they're options in Scrutiny and deliberately both off by default. I'm discovering more and more sites which will test for cookies being enabled in the browser (with a handshake-type thing) and refuse to serve if not. There are a few sites which refuse to work properly without javascript being enabled in the browser. This is a terrible practice but it does happen, thankfully rarely. Switch cookies on in Scrutiny if you need to. But always leave the javascript option *off* unless your site does this when you switch js off in your browser:

GET and HEAD

There are a couple of other things under Scrutiny's Preferences > Links > Advanced (and Integrity's Preferences > Advanced). 'Use GET for all connections' and 'Load data for all connections'. Both will probably be off by default.

A browser will generally use GET when making a request (unless you're sending a form) and it will probably load all of the data that is returned. For efficiency, a webcrawler can use the HEAD method when testing external links (because it doesn't need the actual content of the page, only the status code). If it does use the GET (for internal connections where it does want the content, or if you have 'always use GET' switched on) and if if doesn't need the page content, it can cancel a request after getting the status code. This very rarely causes a problem, but I have had one or two cases where a large number of cancelled requests to the same server can cause problems.

'Use GET for all connections' is unlikely to make any visible difference when scanning a site. Using the HEAD method (which by all standards should work) may not always work. but if a link returns any kind of error after using the HEAD method, Integrity / Scrutiny tests the same url again using GET.

Other considerations

Outside of the particulars of the http request itself are a couple of things that may also cause different responses to be returned to a webcrawler and a browser.

One is the frequency of the requests. Integrity and Scrutiny will send many more requests in a given space of time than a browser, probably many at the same time (depending on your settings). This is one of the factors involved in LinkedIn's infamous 999 response code.

The other is authentication. A frequently-asked question is why a link to social media link returns a response code such as 'forbidden' when the link works fine in a browser. Having cookies switched on (see above) may resolve this but we forget that when we visit social media sites we have logged in at some point in the past and our browser remembers who we are. It may be necessary to be authenticated as a genuine user of a site when viewing a page that may appear 'public'. Scrutiny and Webscraper allow authentication, the Integrity family doesn't.

Friday, 21 September 2018

Vocabagility is more than a flashcard system, it's a method. Cards are selected and shuffled, one side is shown. Give an answer, did you get it right? Move on. As quick and easy as using a pack of real cards in your pocket.

The system also encourages you to invent an amusing mental image linking the question and answer (Visualization and Association)

Cards that you're not certain about have a greater probability of being shown.

This is an effective system for learning vocabulary / phrases for any language but could be used for learning other things too.

Sunday, 16 September 2018

In the last post I gave a preview of a new direction for ScreenSleeves and now it's ready to go.

Changes in MacOS Mojave have made it impossible to continue with ScreenSleeves as a true screensaver. Apple have not made it possible (as far as I know at the time of writing) to grant a screensaver plugin the necessary permission to communicate with or control other apps.

Making ScreenSleeves run as a full app (in its own window) has several benefits:

Resize the window from tiny to large, and put it into full-screen mode.

Choose to keep the window on top of others when it's small, or allow others to move on top of it

The new version gives you the option to automate certain things, emulating a screensaver:

Switch to full-screen mode with a keypress (cmd-ctrl-F) or after a configurable period of inactivity

Switch back from full-screen to the floating window with a wiggle of the mouse or keypress

Block system screensaver, screen sleep or computer sleep while in full-screen mode and as long as music is playing

As mentioned, Mojave has much tighter security. The first time you run this app, you'll be asked to allow ScreenSleeves access to several other things. It won't ask for permission for anything which isn't necessary for it to function as intended. You should only be troubled once for each thing that Screensleeves needs to communicate with.

The new standalone version (6.0.0) is available for download, it runs for free for a trial period, then a small price to continue using it. (Previously, the screensaver came in a free and 'pro' versions with extras in the paid version).

Friday, 7 September 2018

Screensleeves has been a popular screensaver for a number of years, but the security changes in the new Mojave OS may make its functionality impossible.

Over the years people have suggested that it could be a free-standing app rather than a screensaver. This comes with some advantages - eg you can keep it minimised and floating above other windows in the corner of the screen when it's not in full screen mode.

This may be the only way to keep the screensaver alive. I've been experimenting with the idea, ironing out some issues related to the change, and using it. I have to say that I like it very much.

Monday, 27 August 2018

There are reasons why you might want to start using the web download version of Integrity Plus or Integrity Pro after buying the App Store version.

(We're happy to provide a key, with evidence of the App Store purchase as long as it's for the same user).

The App Store version is necessarily 'sandboxed', a security measure imposed by Apple for apps sold on their Store. However, this kills certain features, such as the ability to crawl a site stored as local files. So the web download version remains un-sandboxed (it pre-dates sandboxing).

The sandboxed and un-sandboxed apps store their data in different places. When switching from the web download version to the app store version, the migration is taken care of by the system (this is the way Apple want you to go and so they make this easy. Invisible in fact).

The app doesn't (yet) detect and automatically handle the reverse situation. But it's possible to do this manually.

This requires you to export while you have the app store version installed, and import after you've replaced it with the web download version.

Option 2. Use these Terminal commands.

They check for and remove any preference file which will be present if you've already run the web download version. Then copy the data from the sandbox 'container' to the location used by the web download version.

(This first set of instructions is for Integrity Plus. For Integrity Pro, scroll down)

Important: now log out of the system and log back in. The system does some wicked things with caching these files. It's sometimes possible to make our change 'stick' using another Terminal command, but I've not found that as reliable for these purposes as logging out / in.

Now start the web download Integrity Plus and see whether your data appears.

1. If you’ve not already done so, make sure your Bridge and some bulbs are switched on and start Hue-topia. The first time that you start the app it will try to find your bridge and attempt to log in. Finding the bridge requires an internet connection.

The only thing that you should need to do is to press the button on the bridge when instructed, to pair it with the Hue-topia app. If there are any problems at this stage, see Troubleshooting in the Hue-ser manual.

Make and try two presets

2. turn the brightness and the whiteness of all of your lamps all the way up and make sure all are on.

3. Click the [+] button (Save preset) and type ‘All white’ for the name of the new preset. OK that.

4. Turn the brightness and also the whiteness of all of your lamps to three quarters of the way up.

5. Click the [+] button (Save preset) and type ‘All warm’ for the name of the new preset. OK that.

6. You now have two presets and can use these from the Presets button in the toolbar and also from the status bar. Try this.

Make a preset that affects only certain lamps

7. Go to 'Manage presets...' from the Presets toolbar button or the Lamps menu.

8. Choose your preset from the window that appears, and press 'Lamps affected'. You'll now see a checkbox alongside each lamp in the main control window. Uncheck some of the lamps, press 'OK'. Your preset will now only affect the lamps that remained checked.

Set your lamps to turn on and off on schedule

9. Press the Schedules button or ‘Show schedules’ from the View menu (command-2 also shows this window).

10. Press the [+] button at the bottom-left of the Schedules window.

11. Type ‘Daily’ for the name, select ‘On & Off’, select ‘group: all’, type 17:00 for on and 23:00 for off. Leave all days selected. Click somewhere outside of the small window to save and close those settings.

All lamps are now set to switch on at 5pm and off at 11pm. Note that this will work even when your computer and Huetopia aren't running, because Hue-topia copies its schedules to the bridge.

13. Type the name 'Pastels', and press the [+] below the timeline strip a couple of times to add a couple more nodes. Space them out equally

14. Click inside the colour swatch of the first node and choose a nice pastel colour. Do the same for the other two. Adjust the cycle time to a value that you like and make sure 'Loop' is selected. The preview swatch should show the effect animating. When that's working as you like, OK the sheet.

15. Return to the main window. Choose a light or group that you want to apply your effect to. Look for the little 'effect' icon in the control strip (ringed below). Click that and a menu of your effects will pop up. Choose your new Pastels effect and Hue-topia should start animating that effect for the chosen bulb or group. While the effect is running, the little icon will rotate.

Wednesday, 1 August 2018

I'm really chuffed with this, in fact I'm increasingly pleased with dark mode, it's really easy on the eye and Apple have done a great job.

If I had applications that really needed dark mode then it's my lighting controllers Hue-topia and LIFXstyle. The last thing you want when you're adjusting mood lighting is a bright laptop screen illuminating the room.

The new versions with dark mode enabled will be ready before too long.

Tuesday, 24 July 2018

I have to admit that I really love dark mode. It's very easy on the eye and it jarrs a little when you have to look at a web page with a white background.

(Does anyone know whether it's possible for a website to detect a mac's dark mode setting and display a dark version using an appropriate css? Let me know.)

I did naively expect that the OS would simply draw all windows and controls in the dark colours. But it's up to each developer to build their apps under the new SDK. And carefully check for hard-coded colours and unsuitable images within the app.

We're just about there with Integrity / Integrity Plus / Integrity Pro / Scrutiny and it's been a pleasure to do. The 'dark mode enabled' version of all of this family of apps will be 8.1.5 (a minor point release, as there are very few functional changes).

[update 26 July 18 and again 8 Aug 18] Scrutiny 8.1.8 is available and looks great under dark mode. Obviously will only look dark on 10.14 with dark mode selected.

Saturday, 14 July 2018

This is a well-overdue move. Google have been offering small carrots for a long time, but at the end of this month, they'll be adding a stick as well. They're switching from informing users when a connection is secure, to warning users if a connection is insecure. Google Chrome is making this move but other browsers are expected to follow suit.

Well-informed web users will know whether they really need a connection to be secure or not, but I suspect that even for those users, when this change really takes hold, the red unlocked padlock will start to become an indicator of an amateur or untrustworthy site.

Once the certificate is installed (which I won't go into) then you must weed out links to your http:// pages and pages that have 'mixed' or 'insecure' content, ie references to images, css, js and other files which are http://.

Scrutiny can also generate the xml sitemap for you, listing your new pages (and images and pdf files too if you want).

Apparently Google treats the https:// version of your site as a separate 'property' in its Search Console (was Google Webmaster Tools). So you'll have to add the https:// site as a new property and upload the new sitemap.

[update 15 Jul] I uploaded my sitemap on Jul 13, it was processed on Jul 14.

4. Redirect

As part of the migration process, Google recommends that you then "Redirect your users and search engines to the HTTPS page or resource with server-side 301 HTTP redirects" (full article here)

Monday, 9 July 2018

If not, visit your System Preferences > Sound > Input and make sure that the correct device is chosen for input. If an input volume control is available there, make sure that it's set to a suitable level (not quite reaching the peak at the loudest parts of the music).

If you need it, display the graphic equalizer.
Enjoy the music.

The Pop filter button will switch the pop filtering on or off.

To capture the processed sound, press the Record button. When you've finished, the same Record button or switching the Online button off will take you into Editing mode.

You can select a region, and use the usual scrolling gestures - left and right to scroll the playhead back and forth, and scroll up and down to zoom in and out on sections of the music (centered on the playhead).

When you save, if a region is selected, the selected region will be saved. Otherwise, all of the sound in the view will be saved. Visit Preferences to choose lossless (.aiff) or AAC (.m4a)

Friday, 6 July 2018

Since last week's announcement of Vinyl Shine, the app seemed to approach the point where it was usable. But then that seemed so far away from beta as it was lacking functionality.

At its core is the pop filter. That had taken a large amount of the development time and was becoming pretty effective.

But the app surrounding it wasn't so functional. Opening a file, running the filter and saving the clean file quickly gets tedious. Even if you run the filter over a whole side of an LP at a time, you first needed to run something else to record the sound and then something else to split and add track names.

So it seemed best to build more supporting functionality. We returned to the original aim of 'real time' filtering. Much coffee later and we have recording and real-time-pop-filtering done. Also splitting and saving individual tracks. It's becoming much more useful.

Scenario 1. Plug your record player into the computer. (USB or line-in). Simply use Vinyl Shine as a player, with the pop-filter running in real-time*.

There's a 10-band graphic equaliser applied to the playthrough / recording with the RIAA curve as a preset. Make your own default settings too.

Scenario 2. With the pop filter on, press record and put the needle down. Record a single track or a whole side, then switch to editing mode to normalise and select and save individual cleaned and normalised tracks.

I'm also pleased with how 'cool' it's running. With many efficiencies still to be made, its cpu use is fine.

[update 8 Jul 18] After a weekend of working on refinements and efficiencies, this is how things are running with pop filtering switched on (testing here using a 2012 Mac mini).

* ok, it's running with a buffer. It's slightly odd putting the stylus down on the record and hearing that 'needle down' sound through the speakers a fraction of a second later, but otherwise the delay isn't really noticeable. [update] there's now an option in preferences for the amount of buffering, this can help avoid glitches in the playback caused by buffer underrun. This is only for listening pleasure. Even if the underrun happens, the glitch won't appear in the recording.

Wednesday, 27 June 2018

This is probably the most satisfying new release ever from NPD at PeacockMedia Towers. It represents a new direction of exploration for us.

Work on this was prompted by me failing to find a plugin component that could be used by my favourite sound recorder as a filter to remove pops and crackles, or a plugin for a player to filter the pops in real time during playback.

The work has involved a steep learning curve with Apple's AVAudioEngine. The documentation is far from thorough and there's very little sample code around. That has not been fun but learning more about sound processing (DSP) and getting hands-on with some C++ really has been enjoyable.

We now have a standalone app that will open a sound file, apply pop filtering (pretty quickly - currently <6s for a 4 minute track, but I hope to significantly improve this).

The pop filter is applied, along with EQ and normalisation. For some reason, LPs seem to vary quite widely in sound (I think this is a different thing from playback equalization curves for early LPs and 78s.) Either way we intend that the graphic EQ in Vinyl Shine be flexible enough to handle all of this.

Options are minimal. A commercial app I currently use for pop filtering works ok but the result isn't perfect and it offers a shedload of options. What do those options mean? What am I supposed to change? Surely a pop is a pop, identify and remove it. Fewer options is the Mac way.

Vinyl Shine allows you to listen to the result and toggle between original and cleaned audio. And save the result as a new file when you're happy. (The final offline render is very quick, ~3s for a 4m song).

For the time being it currently exists as a standalone app and is working pretty well, making crackly recording much more listenable. It'll shortly be available as a free beta.

pops, clicks and crackles are identified, highlighted visually and repaired

In this case, the target of a link was secure, but that link was redirecting to an insecure page. This particular situation had gone unreported in the insecure content table simply because of the order that Scrutiny was doing things.

Wednesday, 6 June 2018

For the record, I think Craig is a superstar, and I'm genuinely into the dark mode and dynamic desktops.

But Monday's WWDC left many questions unanswered, like why were the guys on stage all wearing those shoes with the white soles? Was it a tribute to Patrick McGoohan's The Prisoner, where the purpose of 'The Village' was to break Number 6's will to be an individual? Surely not.

Seriously though, MacOs Mojave (which sounds close enough to Mojito to raise my pulse) excited me more than any new OS since Mavericks. The cartoon's more about the fact that the latest, most breathtaking technology is carried around in people's pockets for the most trivial purposes (singing poop emoji, anyone?) More of that another time.

Don't judge the code - this is a tool that was written many many years ago, and it's only a simple thing for personal use. So it was only ever developed to the "it just about works" standard and the project has been copied from computer to computer ever since, receiving the minimum updates to keep it running.

With the beta of 10.14 'Mojave' installed on a mac (which sounds enough like 'Mojito' to raise my pulse) unsurprisingly I started to notice a few things not working as before.

I love finding and fixing problems, so the regular round of fixes with each release of OSX / MacOS is no hardship. It's particularly fun when you have a "how did that ever work?" moment.

The resulting list is displayed in a tableview and has always appeared in alphabetical order.

So not only is directoryContentsAtPath: still apparently working after being deprecated such a long time ago, apparently it used to return the directory listing sorted in alphabetical order, and no longer does.

It was easy to add [collectionsortUsingSelector:@selector(caseInsensitiveCompare:)];
to restore the list to alphabetical order (collection being an NSMutableArray containing NSStrings) but I'm just surprised that it wasn't necessary before.

To bring this up to scratch, the suggested alternative to directoryContentsAtPath: is contentsOfDirectoryAtPath:Error: so getting rid of that warning is really easy, just declare an NSError object and pass it in. And then report the NSError's 'localizeddescription' if it contains a non-null value. Or simply pass nil as the error: parameter if you feel lazy or don't care about the success of the operation.

Monday, 4 June 2018

Yes, but when the w3c validator's 'nu' engine came online, it broke Scrutiny's ability to test every page. The 'nu' engine no longer returned the number of errors and warnings in the response header, which Scrutiny had used as a fast way to get basic stats for each page. It also stopped responding after a limited number of requests (some Scrutiny users have large websites).

Alternative solutions

After exploring some other options (notably html tidy, which is installed on every mac) it appears that the W3C service now offers a web service which is responding well and we haven't seen it clam up after a large number of fast requests (even when using a large number of threads).

The work in progress is called Tidiness (obviously a reference to tidy, which we've been experimenting with).

It contains a newer version of tidy than the one installed on your Mac. However, the html validation results are useful but not as definitive as the ones from the W3C service.

So Tidiness as it stands is a bit of a hybrid. It crawls your website, passing each page to the W3C service (as a web service). If you like you can switch to tidy for the validation, which makes things much quicker as everything is then running locally. If you like, you can simultaneously make accessibility checks at level 1,2 or 3, with all of the results presented together.

Saturday, 2 June 2018

Seen here looking more like a desktop icon*, is the first release of OSX. It's now older than the classic Mac OS (up to 9).

I remember the sense of awe at the non-jagged icons, transparency, more realistic-looking shadows and the new traffic-light 'sweeties' in the top-right corner of windows.

It really doesn't look *that* dated. That Mail icon has hardly changed, just a bit more kiddie-coloured, and the magnification effect is still there. Glassy-looking buttons were cool at the turn of the century, but by 10.6, the aqua look was looking a bit unnecessarily clumsy. The stripey background of windows and sheets didn't last so long. We had a weird dual-look with the windows. OS9 was already experimenting with the brushed aluminium look. Very different from the more plasticky look of the regular window borders and backgrounds. From memory, I think the human interface guidelines said that the aluminium look was appropriate where the window was to minic a control panel.

I really lament the passing of the 3D hyper-real-looking buttons and controls. I regularly use a couple of Macs on Snow Leopard and Mavericks. I get the very 'clean' concept but when you can't immediately see whether some text on a plain white background is a button, input field or just some text, that's just plain unhelpful, however beautiful it looks.

Thursday, 31 May 2018

HTMLtoMD was a side project, put together using various elements developed for other apps, the website crawling engine and the HTML to Markdown converter.

Markdown is a very simple and transportable format - it's efficient for storage, and perhaps a great format to use when migrating one website to another.

HTMLtoMD has had a small but enthusiastic following, but over the years has become in need of an update. (MacOS / OSX has never been great for backward compatibility. Things stop working with each new version of the OS)

It's had the time needed to bring it up to speed. Version 2 has the most up-to-date version of the 'Integrity v8 engine', a bunch of things are fixed or improved, and a bunch of extra options have been added. I think it's looking good and working well. For the time being, it remains free and unrestricted.

Integrity / Scrutiny

Integrity (and Integrity Plus, Pro and Scrutiny) has long had an 'archive' option. It can save the html as it scans, originally with no frills at all. Recently I+, Pro and Scrutiny have received enhancement which mean that they can process the information a little to create a browsable archive.

It stops short of being a full 'Sitesucker' - it doesn't save images, for example, or download the style sheets etc. (It makes sure that all links and references are absolute, so that the site still appears as it should.) It was always intended as a snapshot of the site, automatically collected as you link-check, for the purposes of reference or evidence.

WebScraper

WebScraper for mac has loads of options and therefore it's not just 'enter a homepage url and press Go' like the other apps mentioned here. So it does allow you to do much more. You have much more control over what information you want in your output file, what format you want that in, and whether you want the content converted to plain text / markdown / html.

HTMLtoMD

HTMLtoMD was a side project built using various functionality we'd developed in other apps. It scans a whole site and archives the content as Markdown. Once working, we released it for free and put it on the back burner.

Recently it's received more development. It's now up-to-date with the Integrity v8 engine, and has received some improvements to the markdown conversion via WebScraper. It can now save images and has more options for saving the information.

Again, it's not a Sitesucker. If you need to download a website for saving or browsing offline then use SiteSucker ($4.99), it's pointless us trying to reinvent that wheel.

But markdown has its advantages. It's a much more efficient way to store your content. It's just text with a little bit of markup (headings, lists etc). That also means that it's very transportable.

Saturday, 19 May 2018

One person's pro is another's con. I've seen Notebook lose stars because it doesn't allow you to set the colour of text. But I've long been looking for a sticky note alternative that *doesn't* do rich text, ie allows me to copy / paste in and out as plain text. The very long-standing Stickies app is unbelievably still here in High Sierra, unchanged for decades. But those have long worked using rich text.

(OK I can usually cmd-alt-shift-V for paste as plain text, but that doesn't work everywhere.)

Initially it appeared that Notebook's text cards are plain text, but after some use I found that they're not plain-text - but not rich-text either. More of that later.

Learning curve

There's some nomenclature and a few concepts to learn, as with any app. The thing I think of as a sticky note is a 'text card' and collections of various types of note are called 'Notebooks'. Nothing too taxing there. When you start up there is a quick tour (later accessible from the Help menu - "Onboarding" - what horrible business-speak!) but frankly I find that kind of thing annoying. A little like peas; I know they're good for me but I want to go straight for the meat. There's a bit of emphasis on gestures, which a. don't seem to work with my magic mouse, and b. is a swipe or a pinch-out really easier than a double-click? I found myself single-clicking things and expecting them to open, which they don't and I don't see why they couldn't. All-in-all, up and running in moments with minimal hold-ups and bafflement. 9.5/10

Look and feel

This is the top-billed selling point - "the most beautiful note-taking app across devices". It certainly looks polished, if very white. It has a non-standard 'modern web app' feel to it, with lots of sliding and whitespace. Non-standard in MacOS terms, that is. I guess the white background and weird white title bar (lacking a standard customisable toolbar) is for consistency with the iOS version of the app (losing a star here). If the text cards themselves don't behave exactly like sticky notes, they are very analogous. They have a random coloured background and have a sticky-note coloured-square look before you open them up. Notebooks (collections of cards) have customisable covers. These visual cues are very important as your brain gets older and less agile! 9/10

What does it offer over and above the tools that come with the OS?

Bizarrely I can almost create the look and functionality that I want using TextEdit - it allows me to create square plain text documents and set the background colour of each. Unfortunately it doesn't remember those background colours after quit and restart.

MacOS's Notes app scores highly for its syncing across devices (with some frequent annoying duplication). It's the app I have been using when I need to paste something or note down a thought when my phone is in my hand, and later retrieve on the computer. It allows you to organise things in a pretty basic way.

Reminders is as per Notes but for lists. Slightly annoying having to open both those system apps to have text and lists at my fingertips.

It still doesn't give me plain text paste in/out, so a full mark lost here, because it's something I'm specifically looking for. But it's not rich text either.... It carries text size and weight, but not colour. It doesn't carry pasted-in font; each note does have its own font which is carried when copying and pasting out. If Zoho are reading, I'd LOVE the option for the text cards to look and work exactly as they do, but to have only plain text copied to the clipboard whenever I copy from a text card. 8/10

Notebook does the syncying across devices, but only by creating an account with Zoho. iCloud-enabling would make a 9/10 into full marks for this part.

Main third-party competion

Without a doubt Evernote. This is made clear by the File > Import from Evernote option. Personally I didn't get past EN's start screen because it didn't seem to want to let me continue without creating an account.

Unexpected neat features

Notebook's 'quicknotes' feature is neat, allowing a quick paste of something from the status bar. The ability to combine what I've been doing in Notes and in Reminders within one app. The ability to take voice memos and paste in pictures (as a separate card, not just because the cards are rich text - an important distinction). 10/10

Cost / if free - how are you paying?

This is the most remarkable thing about Notebook. The website clearly says that the app is free, subsidised by their business apps. They don't sell your data, there are no ads, there is no premium version, there are no crippled features. If there are any hidden costs, I haven't found them. Solid 10/10 here

In conclusion

Not quite meeting my 'plain text pasting out' requirements, and syncing probably good if you're willing to create an account with the makers. Combining functionality that I've been using various other apps for, and with neat fast access via the status bar. Thanks to Zoho for an incredibly functional and incredibly free app. It all adds up to: