Show simple item record

dc.contributor.advisorLópez de Lacalle Lecuona, Oier ORCID
dc.contributor.advisorSoroa Echave, Aitor ORCID
dc.contributor.authorEtxaniz Aragoneses, Julen
dc.date.accessioned2023-06-30T15:07:02Z
dc.date.available2023-06-30T15:07:02Z
dc.date.issued2023-06-30
dc.identifier.urihttp://hdl.handle.net/10810/61827
dc.description.abstractHumans can learn to understand and process the distribution of space, and one of the initial tasks of Artificial Intelligence has been to show machines the relationships between space and the objects that appear in it. Humans naturally combine vision and textual information to acquire compositional and spatial relationships among objects, and when reading a text, we are able to mentally depict the spatial relationships that may appear in it. Thus, the visual differences between images depicting "a person sits and a dog stands" and "a person stands and a dog sits" are obvious for humans, but still not clear for automatic systems. In this project, we propose to evaluate grounded Neural Language models that can perform compositional and spatial reasoning. Neural Language models (LM) have shown impressive capabilities on many NLP tasks but, despite their success, they have been criticized for their lack of meaning. Vision-and-Language models (VLM), trained jointly on text and image data, have been offered as a response to such criticisms, but recent work has shown that these models struggle to ground spatial concepts properly. In the project, we evaluate state-of-the-art pre-trained and fine-tuned VLMs to understand their grounding level on compositional and spatial reasoning. We also propose a variety of methods to create synthetic datasets specially focused on compositional reasoning. We managed to accomplish all the objectives of this work. First, we improved the state-of-the-art in compositional reasoning. Next, we performed some zero-shot experiments on spatial reasoning. Finally, we explored three alternatives for synthetic dataset creation: text-to-image generation, image captioning and image retrieval. Code is released at https://github.com/juletx/spatial-reasoning and models are released at https://huggingface.co/juletxara.es_ES
dc.language.isoenges_ES
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectartificial intelligencees_ES
dc.subjectdeep learninges_ES
dc.subjectnatural language processinges_ES
dc.subjectcomputer visiones_ES
dc.subjectgroundinges_ES
dc.subjectvisual reasoninges_ES
dc.subjectcompositional reasoninges_ES
dc.subjectSpatial Reasoninges_ES
dc.titleGrounding Language Models for Compositional and Spatial Reasoninges_ES
dc.typeinfo:eu-repo/semantics/masterThesis
dc.date.updated2022-10-17T08:00:16Z
dc.language.rfc3066es
dc.rights.holder© 2022, el autor
dc.contributor.degreeMáster Universitario en Análisis y Procesamiento del Lenguaje
dc.contributor.degreeHizkuntzaren Azterketa eta Prozesamendua Unibertsitate Masterra
dc.identifier.gaurregister128439-870161-01es_ES
dc.identifier.gaurassign141662-870161es_ES


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record