Research Challenges. Conversational IR systems can be seen as a federation of agents or subsystems, but they will also be inherently complex systems, with models that will reach beyond the boundaries of individual components. With that will arise challenges such as how to bootstrap such systems with reasonable effort, how to ensure they are responsive as a whole, how to perform component-wise diagnosis, and at what level to consider their robustness. Ethical challenges arising across the field of information retrieval, such as trust in information, biases, and transparency, will likely be exacerbated by the inherent narrowing of the communica- tion channel between the systems with their users.
Research Challenges. These general research questions manifest themselves along the entire information retrieval “stack” and motivate a broad range of concrete research directions to be investigated: Does the desire to present fair answers to users necessitate different content acquisition meth- ods? If traceability is essential, how can we make sure that basic normalization steps — such as content filtering, named entity normalization, etc. — do not obfuscate this? How can we give assurances in terms of fairness towards novel retrieval paradigms (e.g., neural retrieval mod- els being trained and evaluated on historic relevance labels obtained from pooling mainly exact term-matching systems)? How should we design an information retrieval system’s logging and experimental environment in a way that guarantees fair, confidential, and accurate offline and online evaluation and learning? Can exploration policies be designed such that they comply with guarantees on performance? How are system changes learned online made explainable? Indexing structures and practices need to be designed/revisited in terms of their ability to ac- commodate downstream fairness and transparency operations. This may pose novel requirements towards compression and sharding schemes as fair retrieval systems begin requesting different aggregate statistics that go beyond what is currently required for ranking purposes. Interface design is faced with the challenge of presenting the newly generated types of infor- mation (such as provenance, explanations or audit material) in a useful manner while retaining effectiveness towards their original purpose. Retrieval models are becoming more complex (e.g., deep neural networks for IR) and will require more sophisticated mechanisms for explainability and traceability. Models, especially in conversational interaction contexts, will need to be “interrogable”, i.e., make effective use of users’ queries about explainability (e.g., “why is this search result returned?”). Recommender systems have a historic demand for explainability geared towards boosting adop- tion and conversion rates of recommendations. In addition to these primarily economic considera- tions, transparent and accountable recommender systems need to advance further and ensure fair and auditable recommendations that are robust to changes in product portfolio or user context. Such interventions may take a considerably different shape than those designed for explaining the results of ranked retrieval syst...
Research Challenges. Challenges are faced on each of the areas that the proposed research covers. The proposed research touches on the collection over which the search engine operates, the user’s interaction with the search system, the user’s cognitive processes, and the evaluation of the changes to the user’s knowledge state or performance on tasks. At the level of the collection, we are concerned with the mix of information that is available. For large scale collections, such as the web, it is very difficult to understand the amount of material on a given topic, and thus is it hard to know what the existing biases are in the collection. For example, we might be interested in measuring the accuracy of decisions that users make after using a search engine. Collections of interest will contain a mix of correct and incorrect information, but the scale of the collection will make it difficult to understand the amount of correct and incorrect information in the collection apriori to the user’s search session. The field of IR is still in its infancy with respect to understanding user interaction and user cognitive processes. For us to be able to design systems that lead users to their desired knowledge state or decision, we will need to better understand how their cognitive processes affect their inter- action with the system and how the stream of information that they consume changes their mental state. A challenge here will be a lack of expertise in cognitive science and psychology (how people learn, how people make decisions, biases). Progress in this area will likely require collaboration outside of IR and require input from and engagement of other communities, including: cognitive science, human-computer interaction, psychology, behavioural economics, and application/domain specific communities (e.g., intelligence community, clinical community). The envisioned systems may require radical changes to aspects of user interfaces. Uptake of new UI solutions, however, is often difficult and poses extra onus on users, thus creating a high barrier to entry for the proposed new systems. Finally, evaluation ranges from the simple to the complex. We are interested both in sim- ple measures such as decision accuracy, and complex measures such as increases in curiosity. Evaluation is envisioned to embrace larger aspects of the user-system interaction than just the information seeking phase, e.g., evaluation of decisions users take given the information systems provided. Given that almost a...
Research Challenges. Some may think that online evaluation is off limits to academia because of a need to ‘get’ live users. However TREC, NTCIR, and CLEF have explored ways of making such a provision. In addition, smaller-scale evaluation in laboratory or live-laboratory settings, or in situ, could lead to advances in evaluation taking account of rich contextual and individual data. We believe that it may also be possible to simulate user bases with recordings of user interaction in conjunction with counterfactual logging. Such collections may include logs, crowd-sourced labels, and user engagement observations. Such data may be collected by means of user-as-a-service components that can provide IR systems with on-demand users who can interact with the system (e.g., given a persona description) to generate logs and the context where online evaluations can be carried on.
Research Challenges. There have been past initial attempts to build explanatory models of performance based on linear models validated through ANOVA but they are still far from satisfactory. Past approaches typically relied on the generation of all the possible combinations of components under examina- tion, leading to an explosion in the number of cases to consider. Therefore, we need to develop greedy approaches to avoid such a combinatorial explosion. Moreover, the assumptions under- lying IR models and methods, datasets, tasks, and metrics should be identified and explicitly formulated, in order to determine how much we are departing from them in a specific application and leverage this knowledge to more precisely explain observed performance. We need a better understanding of evaluation metrics Not all the metrics may be equally good in detecting the effect of different components and we need to be able to predict which metric fits components and interaction better. Sets of more specialized metrics representing different user standpoints should be employed and the relationships between system-oriented and user-/task- oriented evaluation measures (e.g. satisfaction, usefulness) should be determined. A related research challenge is how to exploit richer explanations of performance to design better and more re-usable experimental collections where the influence and bias of undesired and confounding factors is kept under control. Most importantly, we need to determine the features of datasets, systems, contexts, and tasks that affect the performance of a system. These features together with the developed explanatory performance models can be eventually exploited to train predictive models able to anticipate the performance of IR systems in new and different operational conditions.
Research Challenges. Existing high baselines: Over the long history of IR, we have developed models and approaches for ad-hoc and other types of search. These models are based on human understanding of the search tasks, the languages and the ways that users formulate queries. The models have been fine- tuned using test collections. The area has a set of models that work fairly well across different types of collections, search tasks and queries. Compared to other areas such as image understanding, information retrieval has very high baselines. A key challenge in developing new models is to be able to produce competitive or superior performance with respect to the baselines. In the learning setting, a great challenge is to use machine learning methods to automatically capture important features in representations, which have been manually engineered in traditional models. While great potential has been demonstrated in other areas such as computer vision, the advantage of automatically learned representations for information retrieval has yet to be confirmed in practice. The current representation learning methods offer a great opportunity for information retrieval systems to create representations for documents, queries, users, etc. in an end-to-end manner. The resulting representations are built to fit a specific task. Potentially, they could be more adapted to the search task than a manually designed representation. However, the training of such representation will require a large amount of training data. Low data resources: representation learning, and supervised machine learning in general, is based heavily on labeled training data. This poses an important challenge for using this family of techniques for IR: How can we obtain a sufficient amount of training data to train an infor- mation retrieval model? Large amounts of training data usually exist only in large search engine companies, and the obstacle to making the data available to the whole research community seems difficult to overcome, at least in the short term. A grand challenge for the community is to find ways to create proxy data that can be used for representation learning for IR. Examples include the use of anchor texts, and weak supervision by a traditional model. Data-hungry learning methods have inherent limitations in many practical application areas such as IR. A related challenge is to design learning methods that require less training data. This goal has much in common with that of the machine learning ...
Research Challenges. Knowledge Graph Representation in GIOs. The goal is to represent open domain infor- mation for any information need. Current knowledge graph schemas impose limitations on the kinds of information that can be preserved. Xxxxxxxxxxx et al. found that many KG schemas are inappropriate for open information needs. OpenIE does not limit the schema, but only low-level information (sub-sentence) is extracted. In contrast, semi-structured knowledge graphs such as DBpedia offer a large amount of untyped relation information which is currently not utilizable. A challenging question is how to best construct and represent knowledge graphs so that they are maximally useful for open domain information retrieval tasks. This requires new approaches for representation of knowledge graphs, acquisition of knowledge graphs from raw sources, and align- ment of knowledge graph elements and text. This new representation requires new approaches for indexing and retrieval of relevant knowledge graph elements. Adversarial GIOs. Not all GIOs are derived from trustworthy information. Some information ecosystem actors are trying to manipulate the economics or attention within the ecosystem. It is impossible to identify “fake” information in objects without good provenance. To gain the user’s trust, it is important to avoid bias in the representation which can come from bias in the underlying resources or in the generation algorithm itself. To accommodate the former, the GIO framework enables provenance tracing to raw sources. Additionally, contradictions of information units with respect to a larger knowledge base of accepted facts need to be identified. Such a knowledge base needs to be organized according to a range of political and religious beliefs, which may otherwise lead to contradictions. The research question is how to organize such a knowledge base, and how to align it with harvested information units. Finally approaches for reasoning within a knowledge base of contradicting beliefs need to be developed. Equally important is to quantify bias originating from machine learning algorithms which may amplify existing bias. Merging of Heterogeneous GIOs. To present pleasant responses, it is important to detect re- dundancy, merging units of information, such as sentences, images, paragraphs, knowledge graph items. For example, this includes detecting when two sentences are stating the same message (i.e., entailment). For example “the prime minister visited Paris” from a document ab...
Research Challenges. Several interesting research challenges continue to exist when building traditional efficient and effective IR systems (such as compression, first stage query resolution, and so on). In multi- stage retrieval systems the complexity is substantially higher and new areas need addressing. For example, at present we do not even know where and why these systems are slow. As mentioned above, exciting new challenges exist in the areas of conversational IR and learned data structures. While the notion of combining learning with efficient indexing is not an entirely new idea, recent advances in neural IR models have shown that learned data structures can in fact be faster, smaller, and as effective as their exact solution counterparts. However, enforc- ing performance guarantees in learned data structures is still a research problem requiring work. Likewise, as search becomes even more interactive, new opportunities for efficient indexing and ranking are emerging. For example, virtual assistants can leverage iterations on complex informa- tion in order to improve both effectiveness and efficiency in the interaction. But how to evaluate iterative changes for interactive search tasks is a significant challenge, and very few collections currently exist to test new approaches, let alone to test the end-to-end efficiency performance of such systems.
Research Challenges. Question answering is defined as a field where the main challenge consists of providing automatic answers to questions posed by humans in natural language. In order to develop a system capable of performing such actions, the system has to first understand the logic behind reasoning, understanding and extracting information from text. Open-domain factoid question answering consists of questions regarding well-known and concise facts. Consider the question ”When was Xxxxxx Xxxxx born?, for which the answer is August 4th, 1961. Current systems are able to provide answers to such questions using already existing knowledge graphs. Such extraction is rather a straightforward process, in which the relation is extracted first, and then it is matched against the structure of the graph. On the other hand, a large number of questions asked today on search engines, approximately 70% of them1, still require users to perform a man- ual search through provided search results. Consider the question Who lead the polish army in the Siege of Warsaw?, for which the answer is Xxxxxxxx Xxxxx. The information that supports this query can be directly extracted from one of the Wikipedia pages2: The siege lasted until September 28, when the Polish xxxxxxxx, commanded under General Xxxxxxxx Xxxxx, oflcially capitulated. Unfortunately, lexicon-based approaches would likely fail in lo- cating the correct sentence among the ones from the Wikipedia page. The entire article is highly correlated with the words from the query, therefore more advanced syntactic and semantic analysis is needed. Solving more complex factoid questions is a great challenge that is often ap- proached by designing and training statistical models. Unfortunately, more advanced the model is, it requires a vast amount of data to be trained. The source of such data could be search engines with their click data. Sadly, this type of information is almost impossible to access by researchers. There- fore, the manual creation process has to be developed to generate diverse, challenging and realistic datasets for machine learning models. Question answering is one of the most unsolved problems in natural lan- guage processing and has already attracted a large number of researchers. 1xxxxx://xxx.xxx-xx.xxx/blog/how-and-why-to-set-your-site-up-for-googles- rich-answers/ 2xxxxx://xx.xxxxxxxxx.xxx/wiki/Siege_of_Warsaw_(1939) In recent times, several corpora for various tasks have been published. It is reasonable to look how th...
Research Challenges. Looking at the graph, only the research challenges “Participatory Sensing” and “Identity Management” have not been identified in the four cases and therefore are not connected to the graph in the same manner as all other research challenges. However, during discussion with experts and reviewers, it has been decided to link “Participatory Sensing” to “Big Data” as it obviously sits on top of it and the two have very close relations. Thus this edge is coloured black in order to show the difference with the relationships that have been extracted from the analysis. On the other hand “Identity Management” remains disconnected, though this does not mean it should be removed from the Roadmap or that it is a not important element of it (see Roadmap Recommendation #2 for further comments on this).