Activities and Tasks in Emails
Friday February 03rd 2006, 4:45 pm
Filed under: email,information delivery,language technology,research,science,technology
Posted by: Andrew Lampert

So I was busy at the International Conference on Intelligent User Interfaces conference earlier this week, and it was a hugely motivating and thought provoking experience. A great bunch of really switched on people doing all kinds of interesting things.

One presentation that particularly caught my interest was from Tessa Lau at IBM Almaden Research lab – not surprising really, given that I’ve read about some of Tessa’s previous work in email management. The work she presented at IUI was on IBM’s Unified Activity Management project (UAM). In that context, one of her points that really rang true for me was about the need to move away from being focused on tools to focus more on the activities people perform when dealing with information management. This should, of course, lead to the development of software applications that do a better job of supporting users, who are (and should be!) more concerned about their tasks and activities than about which tool they used to do what, and how they can integrate work that happens to have been performed using different software tools.

As a simple example, rather than grouping email messages for a given activity in an email client and the Excel documents in a separate folder on the filesystem, can we instead cluster all relevant information together based on the activity which ties the various artifacts together, rather than based on the tool that happened to have been used to create them. Accordingly, a major part of the UAM project is focussed on integrating email content into an overall activity management system that is under development. To do so requires an ability to associate email content with new or existing activities. Obviously, for new activities, this requires a light-weight and simple way of creating activities from email, and of displaying email in the context of existing activities.

When trying to associate incoming email messages with new and existing activities, the IBM team seems to have been inspired by the information retrieval community in using recommendation rather than all-or-nothing mapping of incoming messages to activities. This is a clever way of reducing the likelihood of frustrating users with incorrect categorisations, and is indeed the approach we took in earlier email categorisation work I have been involved with a few years ago at CSIRO.

Tessa also referred to email signatures as ‘noise’ that, by implication, needs to be removed to recover the communication signal conveyed by email – a very simple and logical description of the nature of email signatures (and often quoted material) in the context of automatic processing of emails.

Some weaknesses of the work presented included an implicit assumption that a single email message should be associated with only zero or one activity. Clearly this suffers from a multiple-inheritance style problem – in practice a single email message can often contain content that is relevant to many different activities. In the present system it is not possible to apply multiple activity labels to a single message. This, of course, sounds a lot like the folder vs. labelling problem that has been all the range since GMail appeared on the radar.

Another interesting question is whether classifying email messages into activities is different from the classification of emails into folders (which is a well studied text categorisation problem). There certainly seem to be many similarities between both problems. Perhaps there is a difference of focus (folder classification generally being for archiving, and activity classification more for current work), but this is purely speculation.

Of particular interest for me was that Tessa identified speech act detection in email as a future direction for their research. This is both motivating, given that smart people see some similar value in the kinds of ideas I’m playing with, but also rather intimidating to think who my competitors out there in the research world include!! I think I’d better get a move on with my own research!



R&D Software Engineer Wanted

Ok, so if you’re a software engineer looking for new challenges in 2006, here’s a great opportunity for you. My research team within the CSIRO ICT Centre (the Information Delivery team) is seeking to recruit a highly competent, motivated, and energetic software engineer to our Sydney laboratory.

You will contribute to software engineering, R&D and commercialisation activities within our small but highly productive team carrying out leading-edge research in the area of information engineering and the development of advanced search and delivery technology. This role will have a particular focus on mobile phone and PDA technology.

A degree in Software Engineering or a related discipline is essential; an honours degree or higher qualification would be an advantage, but not essential.

We need you to demonstrate excellent programming expertise in at least Java (preferably other languages too), familiarity with Web services, and preferably have exposure to mobile phone or PDA software development platforms. The development
projects underway need you to work on both research prototypes and on commercial products. Your willingness to provide technical support, an ability to write high quality documentation, and a capacity to talk to customers are important.

Finally, you should enjoy working in teams, be honest, trustworthy, and ethical, with an ability to contribute creative ideas to our projects.

Reference Number: 2006/63
Position Title: Software Engineer – Information Delivery
Division: CSIRO ICT Centre
Location: North Ryde, NSW
Classification: CSOF4 to CSOF5
Salary Range: $58k – $72k + superannuation
Tenure: 12 month term
Applicants: International Applicants Welcome
Relocation Assistance: May be offered to the successful applicant.
Applications Close: 27 Jan 2006
Job Category: Computer Software/Scientific Research

For further details, selection criteria and to apply for this position, please visit: http://recruitment.csiro.au/asp/job_details.asp?RefNo=2006/63

If you have any questions about this position, please post a comment here, or feel free to email me (Andrew.Lampert@csiro.au).



How to fill (and use) a 40 petabyte iPod?
Wednesday August 31st 2005, 3:08 pm
Filed under: information delivery,research,search,technology
Posted by: Andrew Lampert

I attended an interesting seminar from Professor Rodney Brooks, director of the Computer Science and AI Lab at MIT yesterday afternoon. He’s visiting Australia as a keynote speaker for the ICT Outlook Forum (which I attended last year in Canberra), and stopped by to spend the day at the CSIRO ICT Centre. The main gist of his talk was about exponentials, and how exponential patterns are the most influential indicators of future technology trends. Of course, the most well known of such trends is Moore’s Law, and predictably, Brooks hammered on this quite a bit. He did, however, also come up with some interesting statistics: There are approximately 10^16 ants in the world, and 3×10^16 grains of rice grown per year; in comparison, in 2003 we produced 10^18 silicon transistors – i.e. 100 transistors for ever ant in the world, and 33 for every grain of rice. This was his way of illustrating the significance of exponential growth, and I think it’s fair to say he made his point well.

Other potential exponentials Brooks identified (i.e. trends that look suspiciously like the beginning of exponential curves) included multi-core processors, such as the CELL processors. In the research labs, we’re already seeing 64 cores on a single chip – is this the beginning of a new exponential? If so, IMHO it will almost certainly change the way we develop software quite dramatically. I think it’s more than fair to say that at this point, most software is not designed with high-levels of parallelism in mind. Given this, and the different caching model in processors like CELL, there is certainly a requirement for further research and development into compilers that can both hide the complexity of multi-core processors from software developers and deal with a substantially different caching model to current processors. More interestingly, what kinds of things will be possible with the additional computing power of such chips? What can you do when you’re wrist-watch or your microwave is as powerful as a current-day supercomputer? Of course, this is nothing more than Moore’s law continued, but with greater parallelism perhaps substituting for raw computing power in a single CPU core.

The other big emerging exponential that Brooks seems fixated on is that of personal storage. The iPod is seemingly the poster child of this exponential for Brooks: he cites the fact that (US) $400 bought you a 10Gb iPod in 2003, a 20Gb iPod in 2004, and the equivalent of about a 40Gb iPod in 2005. If this trend continues, we’ll see 40 petabyte iPods within 20 years. Of course, this then raises the very obvious question of what the heck would we do with such storage capacity? It’s pretty much the analogue of what the heck can we do with 10 gigabit network connections that’s appealing to a wide range of people (sure, we can always think of some special cases that already require massive amounts of bandwidth or storage, but what about mass-market applications?).

Brooks posed the magnitude of such storage space in terms of books, photos and videos – By 2009, we’ll theoretically be able to walk around with 1 million books loaded (as text) on an iPod in our pocket. What the heck does this really mean? None of us can possibly read 1 million books in our lifetime: even if we live to 100, that makes about 30 books *per day*, every day of our life. So, what does it mean to be able to carry around the text of 1 million books in our pocket? Or the whole Library of Congress by about 2013 (for some reason, Americans are obsessed with this as an example). For starters, it seems to me to be almost at odds with the ‘always connected’ vision of the future. Who needs to be always connected if you can cache so much data? Sure, there will always be some data you want real-time access to, but it would seem that a lot of the existing web-style data could easily be stored and sync’ed at regular intervals on some massive personal storage device. Blogs, news, code, reference information even DNS data: all this stuff is cacheable. I guess the whole podcasting phenomenon is an example of this mode of interacting with data.

And what about beyond books? Well, Brooks’ calculations have it that we’ll be able to store something akin to every movie ever made (or close to it) on our portable personal storage device by about 2025. Again, it’s a trite example, since we could never hope to (or want to) watch so many bad movies. But again, it hammers home the question of what we can imagine using such space for. Will we walk around with the ID of every RFID tag ever assigned? What about a complete copy of the global DNS and a big chunk of the Googlzon search index? Funnily enough, it’ll be a very long time before we’re able to keep track of each silicon transistor (all 10^18 of them per year) on personal storage, but then I can’t say I can imagine anyone being overly disappointed about that! Maybe we’ll be sending multimedia or holographic messages to each other, and we’ll be able to keep a copy of everything we’ve ever communicated. What about a recording or transcription of everything we’ve ever done or said, image and video data of every place we’ve ever been, the face and voice of every person we’ve ever met? Well, that’s just starting to sound like Gordon Bell’s My Life Bits project. If indeed we do head down this path (which seems plausible though not inevitable) a more important question that arises is: how the heck to we deal with such an overload of data and information with our limited cognitive abilities? What kind of tools do we need to develop to help people wade through the swamp of every piece of information they’ve ever interacted with? Certainly, we’ll need to be able to tailor the information that is retrieved and the way it is presented according to the current context if people are to have any hope of using any of that mountain of data in their pocket. Unsurprisingly, that’s exactly where my team’s research is focussed.

Assuming we can make some progress on the retrieval and delivery of information from your personal data swamp, how can you imagine using personal, portable storage that is for most intents and purposes limitless?



PC World | CSIRO launches flying datacentre
Monday August 22nd 2005, 11:27 pm
Filed under: information delivery,java,language technology,research,technology
Posted by: Andrew Lampert

PC World has published an article (PC World | CSIRO launches flying datacentre) on our recently completed 3-year research project with Boeing (USA) around developing new technology for the RAAF Wedgetail airborne early warning and control (AWACS/AEW&C) aircraft.

With typical journalistic flair, the story has been blown up a little: I’m not sure I’d quite agree that the technology we developed “has also been commercialized for sale to appropriate customers”, nor have there been 20 scientists and engineers working on it (well, maybe close to that number contributed, but there certainly weren’t anywhere near that number of people working full-time for 3 years, as might be inferred from the article) but the important parts are there.

The focus of my team’s contribution has been in intelligent information delivery: how do we prevent air surveillance operators from being overloaded with information, while still ensuring that they aren’t deprived of and don’t overlook any important information? Initial investigations by Robert Tot of current air surveillance operators at the RAAF Williamtown airbase allowed us to observe operators in action to understand the information they use to perform their work tasks. Interviews and observations also highlighted several issues: 1) Operators have to manually integrate information from a number of different sources to perform their job. This can include having to physically move to a different computer terminal (e.g. to access civilian flight plan information). 2) Displaying all of the available information all of the time is infeasible because the display becomes too cluttered.

Our approach to alleviating these and related problems was to create an adaptive graphical user interface that tailors the information displayed at any point in time and how that information is presented according to the operator’s current task and role. Based on this context, the relevant information required by the operator is planned, gathered and delivered using Myriad, our java-based platform for contextualised information delivery.

At the core of Myriad is our Virtual Document Planner (VDP), a goal-decomposition planning engine that, when configured using a set of plans, produces structured representations of content to be delivered that is specific to the current interaction context (which includes who the information is being delivered to, what task they are currently trying to perform, what environment they will view the information in, what information they have previously been presented with etc.).

The structured representation of content produced by the VDP explicitly models the role of each fragment of information to be delivered, through making explicit the rhetorical relations between each piece of information. We can then reason about the content, based on its structure, in deciding which information should be presented and how, based on whatever constraints might apply (e.g. temporal or screen-space constraints).

In order to infer the current (and future) tasks being performed by an operator, our Operator GUI provided a constant stream of user actions to a Task Parsing module, which based on a grammatical model of the operator’s possible tasks, makes statistical predictions of: what task is currently being performed, what task is likely to be performed next, and what information is required by the operator to perform these tasks. This information allows Myriad to plan the delivery of information proactively, meaning that operators shouldn’t need to request information; instead they should find that information is discretely made available to them as they require it.

Of course, the proactive delivery of information risks overloading or distracting the operator, who may be deeply engaged in other current tasks. To avoid unwanted distraction or disorientation, we were very careful to provide new information by discretely displaying a notification of information availability, rather than immediately providing the information itself on screen. In this way, we leave the human operator in control to choose if and when the information is required in order to complete their tasks.

The project has required me to develop our Myriad platform to support the delivery of textual, graphical and spatial information. In addition, I have been responsible for the development of the intelligent, adaptive Graphical User Interface. The GUI is based around the excellent OpenMap framework from BBN for the display of spatial information. In addition, a desktop-like workspace has been created where more verbose information could be displayed (either linked to objects visible on the map, or provided as non-spatial data). To allow the GUI to be completely controlled from Myriad, I created a flexible and extensible command-line API (using the BeanShell Java source interpreter), through which information can be added, displayed, hidden, modified, highlighted etc. on both the map and workspace displays with commands sent to specific GUI channel listeners over message-passing middleware.



Travel Funding for PhD Students Available
Friday August 19th 2005, 1:31 pm
Filed under: email,information delivery,language technology,search,technology,uni,usability
Posted by: Andrew Lampert

I’ve just heard from HCSNet that ten travel bursaries of $500 towards travel and accommodation costs are available to PhD students from outside metropolitan Sydney who wish to attend the NICTA-HCSNet Multimodal User Interaction Workshop, to be held at the Australian Technology Park, Redfern, Sydney, on September 13-14th, 2005.

The workshop includes two invited talks from internationally-recognised researchers in multimodality: Professor Sharon Oviatt from the Oregon Health and Science University, and Professor Francis Quek from the Virginia Polytechnic Institute and State University.

In the interests of information sharing, it is a condition of receipt of a travel bursary that the student should provide a poster describing their current research project.

The closing date for applications for bursaries is Friday August 26th 2005. Those interested should send an email to the HCSNet Convenor (Professor Robert Dale) at rdale at ics.mq.edu.au.



Interested in speech, language or sonics?
Tuesday August 16th 2005, 5:23 pm
Filed under: csiro,email,information delivery,language technology,search,usability
Posted by: Andrew Lampert

If you’re interested in speech, language or sonics you should consider joining HCSNet, the Australian Research Council Research Network in Human Communication Science.

HCSNet aims to bring together researchers and students through workshops, conferences, and a variety of collaboration schemes in order to explore the boundaries of disciplines that encompass human communication. As a guide, this includes fields as diverse as psychology, computing, linguistics, engineering, philosophy and music.

Being a participant gives you access to an increasing number of funding programs (including funding for running interdisciplinary workshops and seminars) and events that are run under the auspices of the network. There’s some really good stuff coming up, including the NICTA/HCSNet MultiModel User Interaction Workshop (free registration thanks to HCSNet funding!), so join up and you’ll get the weekly HCSNet newsletter that will keep you informed …



Software Engineering Job Available!
Monday August 01st 2005, 7:06 pm
Filed under: csiro,information delivery,java,language technology,search,technology,usability
Posted by: Andrew Lampert

A fantastic opportunity for an experienced Java Developer. We’re seeking a new software engineer to join our small team of engineers and scientists and be responsible for implementing world-leading research ideas in software.

You can find out more about our work or about the CSIRO ICT Centre here.

Interested? Check out the position description for more information about the position, and to apply.



Entrepreneurial Energy!
Tuesday July 19th 2005, 10:16 pm
Filed under: csiro,information delivery,language technology,search,technology
Posted by: Andrew Lampert

Another excellent HAIL Seminar today, this time from Liesl Capper, founder and until recently CEO of Sydney based web-search company Mooter Search.

Liesl’s talk wandered across a lot of ground, from touching on definitions of both artificial and human intelligence, through to why personalisation is really important for web searching, and what traits are most useful to use when trying to determine reliable features with which to reason about personalising information.

Over lunch, we found out more about Liesl’s background and uncovered her great passion and enthusiasm for, well, just about everything! She’s already successfully started at least 3 businesses (including Mooter), and now that she’s no longer in a hands-on role at Mooter, is considering her options for starting her next business.

I’m hopeful that our paths may cross again in the future, as Liesl expressed interest in perhaps working in a few areas that are certainly on the radar at CSIRO. Similarly, I know that Liesl has a few contacts who would be very interested in our knowledge and expertise in information delivery (including one largish search company who shall for the moment remain nameless).

I think Liesl is one of those energetic people who could really inspire herself (and others) to great things. Even within the few hours she spent with us today, it was clear to me that she has something of a tireless energy about her and that she’s very positive and sure about achieving whatever goals she sets for herself. I’m quite sure that’s exactly the kind of drive that has helped Liesl to business success.



On-line Evidence-based decision support systems
Tuesday July 05th 2005, 1:16 pm
Filed under: csiro,information delivery,language technology,search,usability
Posted by: Andrew Lampert

Just attended a very interesting seminar given by Professor Enrico Coiera from the Centre for Health Informatics at UNSW.

(more…)