Finding and Reminding, tech issues, 3.0 and beyond

From: Owen Taylor <otaylor redhat com>

To: gnome-shell-list gnome org, desktop-devel-list gnome org

Subject: Finding and Reminding, tech issues, 3.0 and beyond

Date: Fri, 09 Apr 2010 18:09:40 -0400

I've attempted below to extract out some of the technical bits from
http://live.gnome.org/GnomeShell/Design/Whiteboards/FindingAndReminding
and see how they line up with our current technology. This is just
notes, not yet a concrete plan.
- Owen
File management ideas and technology
====================================
"Things can safely fall off the desktop"
The desktop is reconceptualized as "what you are working on",
"the most relevant items". Getting something off the desktop then
shouldn't require an explicit filing decision by the user. The user
should be able to let items "expire" with attention, or they should be
able "archive" an item to remove it from the desktop.
There are two basic approaches here - one is to avoid storing
things on the Desktop. Instead of seeing the Desktop as a separate
location in the file selector, you'd have a checkbox:
[ ] Pin to Desktop
(or whatever the designers come up with), and that would create
a symlink to the desktop.
The other approach is when expiring or archiving to move files
from ~/Desktop to an archival location like ~/Documents.
"Be able to treat non-local information the same as Places"
Right now, the user has a couple of organization of files based on
directories on the file system "Music" "Documents" "Downloads".
We want to be able to present other options for narrowing your items
that might not be correspond to directory structure. This could
include "Frequent", "From Email", "Spreadsheets", and so forth.
In general, this type of thing requires searching over all files
to find the subset of files that share some meta-data property.
This is a core operation for Tracker (and for other search engines
like Beagle, though Tracker seems to have the most interest at
the moment.)
"User defined tags"
A completely flat view of all documents doesn't handle all users
or use cases. "Frequent filers" will want to be able to identify
projects and other subsets of files.
There's not a detailed plan for the user interface right now, but
technically this could be done a couple of ways.
We could use the traditional method of grouping by using
folders; and just make that look somewhat tag-like in the
UI. (Make selecting a folder show all the files in that folder
and all sub-folders. Allow creating a folder of files without
worrying where it was and automatically creating it in
~/Documents.)
Or we could use a real tag-based approach with tags stored in
metadata. (multiple tags per file, tags orthogonal to folders.)
"Timeline view of files"
For items that aren't on the desktop (the "slip") the default view
is a chronological one with "yesterday", "last week", and so
forth. So we need to be able to organize user's files this way.
One approach is to keep track of user accesses and edits via
Zeitgeist (or in simplifed form by ~/.recently-used.xbel)
The other approach would be to treataccess/edit time a
metadata property, and to use tracker to search over these
properties.
(Note that the timeline here only includes each item once,
not once for each usage - I use "timeline" somewhat differently
below)
"Search"
We want to be able to search - over the names of all
documents, but also over extracted metadata such as
document titles, and maybe over full text. This is definitely
best supported by something like Tracker.
"Adding non-files to Desktop"
Files won't be the primary interesting thing for all people;
we probably want to provision for at least putting web
bookmarks into the desktop area. (This is also interesting
for people who want to have a GNOME desktop for their users
configured in some particular way.)
Probably the existing way we do web bookmarks for ~/Desktop will work.
Tracker
=======
In some testing, Tracker 0.8 seems enormously better behaved
than Tracker 0.6. It has very significant optimizations in how
it stores the tracker database on disk, and also, by default,
only indexes defined subdirs of $HOME. So, as of right now,
system-impact of Tracker isn't a big concern of mine, as it
would be for 0.6.
Possible concerns and considerations with Tracker:
* RDF + SPARQL + a large collection of ontologies does present
a significant new barrier to someone coming to the GNOME
platform. While the basic concepts of RDF are quite simple,
RDF serialization formats and SPARQL are new learning people
will have to do, and there are some intimidating terms
like "ontology"
RDF is also popularly (and perhaps unfairly) seen as
yesterday's fad.
* There is a large abstraction barrier between the application
and the underlying data storage. It's very hard to decipher
or influence how storing data in RDF and running SPARQL queries
maps into low-level database operations.
* Indexing only a subset of the filesystem, while it does
avoid performance traps like indexing into large GIT
repositories, could result in odd behavior from a user's
point of view. If you edit a file in an unindexed part
of your home directory, is it invisible when looking at
your history?
This may be partly satisfied by feeding accessed files
into the Tracker indexed set file-by-file, either directly
or via Zeitgeist.
* Even when limiting Tracker to a subset of the home directory,
it's likely still possible to run the system out of inotify
handles.
* Using Tracker to extract and index metadata from files is
pretty uncontroversial. Using Tracker as the primary store
of information (such as tags) is more controversial - suddenly
the user's data is dependent on the use of Tracker.
Zeitgeist
=========
The "properties of files" approach of Tracker works for a lot
of things. However, it is pretty much unsuitable for storing
time-based histories of actions. We can store the last time
a file was edited as a Tracker property. It's slightly harder
to store all the times the file was edited. It's considerably
harder to store all the times the file was edited including
the editing application for each access.
(Of course, anything can be stored in RDF; it's a perfectly
general format; however, the more that we have to create
anonymous nodes, the more different structures that we are
storing in the tracker triple store, the harder it is going
to be to optimize, and the less suitable a straightforward
implemention of the triple-store backed by a sqlite database
is.)
My understanding is that the Tracker people have disclaimed
the log storage problem. The role of log-storage for projects
like "GNOME Activity Journal" is taken over by the Zeitgeist
daemon.
Concerns and thoughts concerning Zeitgeist:
* There are two things where the event-logging approach
of Zeitgeist really shines - first showing timelines
(what was I doing two weeks ago on Thursday) and second
doing sophisticated computations over the past actions
of the user (what documents were typically edited at the
same time as this document.) All though these are
interesting areas to explore, neither is central to
current file management ideas in GNOME Shell for 3.0.
The only think I can think of in the current mockups
that requires a Zeitgeist-like approach is the
"Frequent" selector. Without a longitudinal view
of usage, it's hard to answer "what are the most frequently
used documents in the last 30 days".
* To a much greater extent than tracker, Zeitgeist is
is designed to require applications to be modified to
push events to it.
* Zeitgeist is designed to be standalone and independent
from Tracker, but also used in conjunction. This, at
times, makes things not as good as they could be. For
example, Tracker has a pretty sophisticated system to
assign a UID to each file and track files as they
move around the file system, but Zeitgeist, which
identifies file by file paths will lose a file as
soon as it is moved - it doesn't piggyback off the
work that Tracker is doing.
Nautilus
========
The more we hide the heirarchical filesystem as a primary
way of looking at your files, the harder job Nautilus has
to explain what is going on. If in gnome-shell we transparently
merge together things in ~/Documents and things in ~/Downloads,
then the user doesn't have a mental model that there are
two separate places and some types of things are found in
one place and some types of things in the other place.
But we can't just consider Nautilus to be the backdoor to
the filesystem - the things you use when you need to do something
low-level. Because the overview is a place you go to find
things, to switch them and get out. It's not meant to be
a place for spending lots of time manipulating things.
So for explicit file manipulation of files (cleaning up,
filing, etc) the user would probably still be using Nautilus.
Major modifications here are not going to happen
for 3.0, but as much as possible there needs to be alignment
so that things feel familiar between the two places.
Conclusions?
============
Not much yet - I think it will definitely be hard to implement
our ideas without something that looks a lot like Tracker, and
since we have Tracker something that looks a lot like Tracker
is most likely Tracker :-) Zeitgeist seems less centrally crucial,
but there is a role for event logging here.
Further UI design is definitely needed to figure out what we
can do short-term for Nautilus/GtkFileChooser, etc.