Proposed Research. Development of conversational information seeking systems requires new research on a broad range of topics related to information elicitation, user modeling, precision-oriented search, exploratory search, generated information objects (Section 7), description of retrieval results, session-based search, dialog systems capable of sustained multi-turn conversations, and evaluation. The IR community is well-positioned to work on these issues due to its deep roots in studying elicita- tion, information seeking, information organization, and what makes search difficult. Meaningful progress is likely to require partnering with colleagues in research areas such NLP, dialog, speech, HCI, and information science that have complementary skills, thus broadening and enriching the field. Several promising research directions are described briefly below, to give a sense of what this topic entails. User Models. User modeling in conversational information seeking systems involves inferring, representing, and updating information about a person (from general information about their tastes and conversational style to their current cognitive and emotional state), their state of knowledge surrounding the current topic, their current goal(s), and their previous interactions with the system. The user model informs predictive tasks. For example, based on the user model, the system can decide what information to elicit from the user, how to elicit the information, and what information to provide. We note that elicitation is one key difference from traditional search engines, allowing the system to proactively focus on resolving uncertainties in a person’s information need, both for the system and for the user. It also allows a person to explicitly refer to previous conversations with the system as a form of grounding or disambiguation. Important research questions involve knowing when to take the initiative; inferring satisfac- tion; understanding which attributes of conversational interactions influence outcomes related to engagement and/or mental workload; and knowing when the information seeking session has concluded. Finding Information. Conversational information seeking systems will require distinct search strategies for different conversational states, for example, precision-oriented search when the in- formation need is specific or focused, and diverse recall-oriented search when the information need is uncertain or exploratory. Natural conversational delays create opportu...
Proposed Research. We propose an agenda driven by the ideal of incorporating social and ethical values into core in- formation retrieval research and development of algorithms, systems, and interfaces. This necessi- tates a community effort and a multi-disciplinary approach. We focus on fairness, accountability, confidentiality, and transparency in IR: • Fair IR – How to avoid “unfair” conclusions even if they appear true? – For instance, in the case of people search, how do we make sure that results do not suffer from being underrepresented in the training data? – Avoid “discrimination” even when attributes such as gender, nationality or age are removed. And even when the vox populi dictates a certain ranking. Avoid selection bias and ensure diversity. – To what extent is the assortment of information objects presented to us representative of all such objects ‘out there’? – How can we measure and quantify fairness of an IR system? – Evaluation of fairness vs. fair evaluation. How can we measure ‘harm’, and variations in ‘harm’ across verticals? • Accountable IR – How can we avoid guesswork and produce answers and search results with a guaranteed level of accuracy? – Would providing such a guaranteed level of accuracy help or harm? When and why? – Attach meaningful confidence levels to results. Handling veracity issues in data. When to roll out hot-fixes? Rankings with solid guarantees on the reliability of the displayed answers and results. – How might the assortment of information objects presented to us impact our perceptions of reality and of ourselves? • Confidential IR – How to produce results without revealing secrets? – Personalization without unintended leakage of information (e.g., filter bubbles) by ran- domization, aggregation, avoiding overfitting, etc. • Transparent IR – How to clarify results such that they become trustworthy? – Automatically explaining decisions made by the system (e.g. retrieved search results, answers, etc.) allowing users to understand “Why am I seeing this?” – Traceability of results (e.g., link to raw data underlying entity panels).
Proposed Research. The proposed research consists of several main threads: (1) understanding cognitive aspects of users that are relevant to their information seeking, (2) investigating ways that search systems can provide information (beyond ranked lists and underlying documents) that will aid searchers in evaluating and contextualizing search results, (3) exploring ways that search systems can help users move through a learning or decision-making process, and (4) overcoming challenges in evaluating how well systems support users in learning and decision making.
(a) How do cognitive models and processes affect searching and vice-versa? What cognitive biases make content more difficult to absorb?
(b) How do people assess content (e.g., Is this information true/factual versus opinion/biased? How does this information relate to other content I’ve seen before?), (c) How do we detect and represent users’ knowledge and knowledge states, cognitive processes, and the effort and difficulty of processing information?, and (d) How do we represent different information facets for users to support meta-cognition? The second area focuses on investigating ways that search systems can represent and provide information so as to aid searchers in evaluating and contextualizing search results. Research questions in this area include: (a) what information or sources of information can be provided to help users overcome their cognitive biases (e.g. teenage moms might trust other teenage moms);
Proposed Research. To illustrate the range of possibilities of this broad agenda, we list the following suggested projects:
(1) Counterfactual analysis lies at the junction of online and offline evaluation. It is a tool from causal reasoning that allows the study of what users would do if the retrieval system, they interact with, was changed. Drawing on a system interaction log, one can (offline) “re-play” the log, re-weighting interactions according to their likelihood of being recorded under the changed system. From the re-played interactions, an unbiased estimator of the “value” of the changed system can be calculated. Value metrics are typically based on user interactions (e.g. clicks, dwell time, scrolling, etc) but can incorporate editorial judgments of relevance or other factors. Because the user/information need sample is the same in every experiment, variance due to those factors can be more controlled than in open-ended interaction studies. Counterfactual analysis relies on a rich log that captures a wide range of interactions. Typi- cally some fraction of users must be shown results that have been perturbed in a systematic way, but may not be optimal for them. The main challenge is balancing the counterfactual need for perturbed results against the need to show users optimal results. There is extensive opportunity for research on means to minimize both the degree of perturbation of system results and the amount of log data required to produce low-variance, unbiased estimates.
(2) Define the axiometrics of online evaluation metrics. In the 2012 SWIRL report, determining the axioms of offline metrics was proposed and soon after the meeting two SWIRL colleagues were granted a Google Faculty Award to explore this research idea further. We propose that axioms for online metrics be determined. Already some axioms of such measures have been defined (e.g. directionality, sensitivity) but it is clear that such work is not yet complete.
(3) New online metrics from new online interactions. Current online metrics mainly draw on na¨ıve user interactions. There is a growing concern that determining value from such interactions misses important information from users, producing systems that optimize short term benefits rather than long term goals. Additionally, new modes of interaction, such as conversational systems as well as smaller interface forms such as smart watches won’t capture clicks or scrolls. It is necessary to move to more sophisticated interaction logging and unde...
Proposed Research. We need a more insightful and richer explanation of IR system performance, which not only allows us to account for why we observe given performance: e.g. failure analysis. We also need to decompose a performance score into the different components of an IR system, how the components interact, and how factors external to the system also impact overall performance.
Proposed Research. Identifying criteria and metrics that can/should be used to evaluate: – Support by the system toward accomplishing that which led the person to engage in information seeking, i.e. evaluation of success of the search session as a whole. – Support by the system with respect to what the person is trying to accomplish at each stage in the information searching process (search intentions). – Contribution of the activity of each stage of the information searching process to the ultimate success of the search session as a whole. • Creating metrics that are sensitive to different types of motivating goals/tasks, and to different types of search intentions – we need to learn about the types, and desired outcomes for the types. • Investigating how to apply those criteria and measures through user studies and test collec- tions that are aligned, so that researchers can benefit from both. There is also ample opportunity to incorporate these more detailed investigations of users into online evaluation.
Proposed Research. The proposed research can be divided into six areas: data efficiency, core ranking, representation learning, reinforcement learning, reasoning, and interpretability. We anticipate these advances complementing, rather than replacing, current approaches to information retrieval. Data Efficiency. Limited data access has limited the ability for investigators to study deep learning approaches to information retrieval. Unfortunately, although this data exists in industry, distributing it to the academic community would incur substantial risks to intellectual property and user privacy. As a result, the community needs to conduct research into: • training robust, accurate models using small collections, • developing new techniques to expand current labeled datasets (such attempts have been implemented, e.g., with weak supervision), • dealing with incomplete and noisy data, • simulating user behavior (e.g., using RL), • developing robust global models effective for data-poor domains, and • reusing trained models for new tasks (e.g., for domain adaptation). Current approaches includes progressive NN and transfer learning. Advanced retrieval and ranking models. One of the core information retrieval problems involves the representation of documents and queries and comparing these representations to produce a ranking based on estimated relevance. Neural information retrieval models have the potential of improving all aspects of this task by offering new methods of representing text at different levels of granularity (sentences, passages, documents), new methods of representing information needs and queries, and new architectures to support the inference process involved in comparing queries and text to find answers that depend on more than topical relevance. For example, hybrid models combining different structures such as CNNs and LSTMs can capture different linguistic and topical structures, attention mechanisms can capture relative term importance, and XXXx may be able to lead to ranking models that require less training for a new corpus. It is not yet known which architectures are the most effective for a range of information retrieval tasks, but their potential is driving an increasing amount of research. As new models are developed, it will be critical that they are accompanied by in-depth analysis of how different aspects of the models lead to success or failure. Models that work with existing feature-based approaches, such as learning to rank, will have a criti...
Proposed Research. The framework is intended to support a critical discussion about the state-of-the-art regarding:
Proposed Research. We organize the proposed work into four major streams of research:
(1) Efficiency and MSSs: Search engines have been using highly complex rankers for quite some time, but the efficiency community has been slow to adapt. The last few years have seen some initial attempts to address this, but there are many remaining opportunities. Future work should evaluate new and existing end-to-end performance optimizations in the context of MSSs. We need automatic ways to create optimized index structures and to optimize query execution for a given MSS. We need new measures and objectives to optimize for in the early stages of cascading systems, and efficient ways to index and extract features for use in the cascades. Finally, we need to look at the impact of search results diversification and personalization in such systems.
(2) ML for efficiency: Researchers are increasingly using machine learning and data min- ing to improve algorithms and systems. Examples include learning of index structures for particular datasets and ranking functions, modeling of query distributions to build small index structures for likely queries, learning of query routing and scheduling policies to sat- isfy service level agreements (SLAs) on latency and quality while maximizing throughput, or prediction of query costs and the best execution strategy for particular queries. One major challenge is how to formalize and guarantee performance bounds on such machine- learned components, which will enable reasoning about guarantees for the overall system. In short, ML and data mining techniques are popping up everywhere in the search engine ar- chitecture, and will drive future performance improvements, sometimes in unexpected ways. Conversely, IR efficiency researchers should also use their skills to make machine learning tools more efficient, as training and evaluation currently require huge amounts of resources, e.g., deep neural nets.
(3) Challenging the current setup: The ready availability of alternative architectures such as vector processors (SSE, AVX instructions, and the like) and FPGAs provide opportunities to examine IR efficiency from a new angle — research on these devices as well as GPUs is in its infancy. The introduction of General Purpose Graphics Processing Units (GPGPUs) and Tensor Processing Units (TPUs) into general purpose CPUs will provide entirely unexplored avenues for research. These hardware architectures will soon be available on all users’ clients (from phone to tabl...
Proposed Research. There are at least four broad research questions which we need to address. First, how can I find stuff that I’ve seen/interacted with before, or should see; efficiently, effectively, and while preserving privacy? Second, are there abstract representations of content and access patterns which we can share - without violating privacy - to help design systems, to train machine learners, or to distribute computation? How can we safely generalize what we learn from one person to another? Third, if we have a rich model of a person, based on personal data and interactions, how can we use this to personalize content or presentation? When should we? What should we consider? And finally, how can we search private information resources owned by others, as distinct from searching our own information in other collections?