Publicação
Benchmarking Table Extraction
| Resumo: | This paper compares two approaches for table extraction from images: deep learning computer vision and Multimodal Large Language Models (MLLMs). Computer vision models for table extraction, such as the Table Transformer model (TATR), have enhanced the extraction of complex table structural layouts by leveraging deep learning for precise structural recognition combined with traditional Optical Character Recognition (OCR). Conversely, MLLMs, which process both text and image inputs, present a novel approach by potentially bypassing the limitations of TATR plus OCR methods altogether. Models such as GPT-4o, Phi-3 Vision, and Granite Vision 3.2 demonstrate the potential of MLLMs to analyze and interpret table images directly, offering enhanced accuracy and robust extraction capabilities. A state-of-the-art metric like Grid Table Similarity (GriTS) evaluated these methodologies, providing nuanced insights into structural and text content effectiveness. Utilizing the PubTables-1M dataset, a comprehensive and widely used benchmark in the field, this study highlights the strengths and limitations of each approach, setting the stage for future innovations in table extraction technologies. Deep learning computer vision techniques still have a slight edge when extracting table structural layout, but in terms of text cell content, MLLMs are far better. |
|---|---|
| Autores principais: | Nunes, Guilherme |
| Outros Autores: | Rolla, Vitor; Pereira, Duarte; Alves, Vasco; Carreiro, André V.; Baptista, Márcia L. |
| Assunto: | SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | documento de conferência |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
Registos relacionados
groups The impact of online advertisement personalization and transparency on individual defensive responses and engagement
por: Al Helaly, Yasser
Publicado em: (2023)
por: Al Helaly, Yasser
Publicado em: (2023)
image Optimized Audit with Risk-based Sampling [poster]
por: Dias, Isabel
Publicado em: (2025)
por: Dias, Isabel
Publicado em: (2025)
school Design and Implementation of a Data Repository for Collaborative Research: A Pilot Platform for Dataset Management and Continuity: Supporting Sustainable Research Through Structured Data Access and Collaboration
por: Jannah, Ilyass
Publicado em: (2025)
por: Jannah, Ilyass
Publicado em: (2025)
school Assessing the Level of Data Governance Implementation in a Portuguese Insurance Company using DAMA DMBOK Framework
por: Baptista, Henrique de Oliveira
Publicado em: (2025)
por: Baptista, Henrique de Oliveira
Publicado em: (2025)
school The impact of PwC Portugal’s IT Audit work to improve Client Information Systems
por: Silva, Maria Helena Figueiredo
Publicado em: (2024)
por: Silva, Maria Helena Figueiredo
Publicado em: (2024)
school Levering social media practices for effective crisis management in organizations
por: Puente, Alisson Madelein Tapia
Publicado em: (2024)
por: Puente, Alisson Madelein Tapia
Publicado em: (2024)
school E-Government Service Delivery in Rwanda: Overcoming Infrastructure Limitations (A UN EGDI-Based Study for LDCs)
por: Sharamanzi, Remy
Publicado em: (2024)
por: Sharamanzi, Remy
Publicado em: (2024)
school Empathy's Impact on Patient Satisfaction: A Comparison of Human and Chatbot Interactions in Healthcare
por: Gomes, Ana Margarida Gouveia
Publicado em: (2025)
por: Gomes, Ana Margarida Gouveia
Publicado em: (2025)
school Compliance with Data Aggregation and Reporting Principles: The Impact of Monitoring it in a Bank
por: Galhote, Inês Nobre da Veiga
Publicado em: (2024)
por: Galhote, Inês Nobre da Veiga
Publicado em: (2024)
book Improving the Effectiveness of Non-Judicial Mechanisms Under the OECD National Contact Points: Issues of Legitimacy and Accessibility
por: Íñigo Álvarez, Laura
Publicado em: (2025)
por: Íñigo Álvarez, Laura
Publicado em: (2025)
school Comparing Multimodal LLMS and Traditional Neural Networks for Table Extraction From PDFs and Images: An Evaluation of Structure and Content Extraction from Table in Images
por: Nunes, Guilherme Guerra Marques
Publicado em: (2025)
por: Nunes, Guilherme Guerra Marques
Publicado em: (2025)
school Sustainable Development Goals Identification in Academic Documents: A Supervised Learning Approach
por: Oliveira, Sebastião Conde e Silva Alves de
Publicado em: (2025)
por: Oliveira, Sebastião Conde e Silva Alves de
Publicado em: (2025)
school Assessing artificial intelligence readiness in EU e-government: insights from factor and cluster analysis
por: Amaral, Eduardo Xavier Pinto e Silva Nogueira do
Publicado em: (2025)
por: Amaral, Eduardo Xavier Pinto e Silva Nogueira do
Publicado em: (2025)
school Factors influencing organizational culture to change and transform towards the perceptions of telework in Portugal in a post-pandemic scenario from a managerial perspective
por: Varela, Diogo Alexandre Ramos
Publicado em: (2023)
por: Varela, Diogo Alexandre Ramos
Publicado em: (2023)
school Pricing Multi Barrier Reverse Convertibles: An empirical investigation using Swiss Market data
por: Amaro, Luís Filipe Marta de Paiva
Publicado em: (2025)
por: Amaro, Luís Filipe Marta de Paiva
Publicado em: (2025)
school Investigating the Gen-Z perception regarding data privacy in interactions with brands in an Artificial Intelligence context
por: Gonçalves, Felipe
Publicado em: (2024)
por: Gonçalves, Felipe
Publicado em: (2024)
school The Dark Side of Artificial Intelligence: AI-Driven Cyber Attacks
por: Rego, Igor Marcelo de Sá
Publicado em: (2024)
por: Rego, Igor Marcelo de Sá
Publicado em: (2024)
school Understanding the relationship between impact investment firms and market volatility: An analysis of European impact firms amidst COVID-19 and ongoing conflict
por: Sousa, Fábio Alexandre Alves de
Publicado em: (2024)
por: Sousa, Fábio Alexandre Alves de
Publicado em: (2024)
article Demographic, clinical and pathological characterisation of patients with colorectal and anal cancer followed between 2013 and 2016 at Maputo Central Hospital, Mozambique
por: Selemane, Carlos
Publicado em: (2021)
por: Selemane, Carlos
Publicado em: (2021)
image Intelligent Management System for Cultural and Creative Organizations [poster]
por: Ferraz, Catarina Oliveira
Publicado em: (2025)
por: Ferraz, Catarina Oliveira
Publicado em: (2025)
school AI and Governance: A Pathway for Tunisia’s Public Sector Modernization
por: Abda, Maryem Ben
Publicado em: (2025)
por: Abda, Maryem Ben
Publicado em: (2025)
school Empowering Ethical Insights among Data Scientists
por: Costa, Mariana Silva
Publicado em: (2025)
por: Costa, Mariana Silva
Publicado em: (2025)
book O nexo corrupção-migração - que desafios à proteção dos direitos humanos?
por: Oliveira, Emellin de
Publicado em: (2022)
por: Oliveira, Emellin de
Publicado em: (2022)
newspaper Two urgent actions related to international health emergencies amid the escalating conflict in Gaza
por: Correia, Tiago
Publicado em: (2024)
por: Correia, Tiago
Publicado em: (2024)
school Understanding the effects of Culture in the intention to use AI in Education
por: Rosa, Nelson Daniel Monteiro
Publicado em: (2024)
por: Rosa, Nelson Daniel Monteiro
Publicado em: (2024)
article Modeling Collaborative Behaviors in Energy Ecosystems
por: Adu-Kankam, Kankam Okatakyie
Publicado em: (2023)
por: Adu-Kankam, Kankam Okatakyie
Publicado em: (2023)
school The Influence Of Social Media Disinformation On Voter Behaviour And Electoral Outcomes In Nigeria
por: Mbagwu, Ozichi
Publicado em: (2024)
por: Mbagwu, Ozichi
Publicado em: (2024)
school Enhancing Local E-Government Services in Portugal: A Revised Evaluation of the Local Online Service Index
por: Vassaramo, Bhavini Hasmuclal
Publicado em: (2024)
por: Vassaramo, Bhavini Hasmuclal
Publicado em: (2024)
school A Study on Youth´s Political Satisfaction: The Case of Portugal
por: Martins, Marta Isabel Mendes
Publicado em: (2023)
por: Martins, Marta Isabel Mendes
Publicado em: (2023)
school Public Procurement during COVID-19: In-depth analysis of public contracts during the pandemic era
por: Machado, Carolina Magee Arvelos
Publicado em: (2024)
por: Machado, Carolina Magee Arvelos
Publicado em: (2024)
school AI Ethics Guidelines: Is regulation a value of trust?
por: Almeida, Ana Patrícia Poças Pires Bulha
Publicado em: (2025)
por: Almeida, Ana Patrícia Poças Pires Bulha
Publicado em: (2025)
school Applications, Challenges, and Ethical Implications of Generative AI: A Systematic Review
por: Esteves, Rodrigo Manuel Abreu
Publicado em: (2024)
por: Esteves, Rodrigo Manuel Abreu
Publicado em: (2024)
school Best Practices to Present Key Risk Indicators for Banking: A case study at Bank of Portugal
por: Figueiredo, Guilherme Alberto dos Santos
Publicado em: (2024)
por: Figueiredo, Guilherme Alberto dos Santos
Publicado em: (2024)
school Rethinking cultural democratization projects - Proposal of an Inclusive Framework for Underdeveloped Environments
por: Ferraz, Catarina Oliveira
Publicado em: (2023)
por: Ferraz, Catarina Oliveira
Publicado em: (2023)
school Data Governance for Effective Sports Companies: A Strategy to Enhance Performance and Compliance in the Age of Sports Information
por: Filipe, Gonçalo Fontelas
Publicado em: (2024)
por: Filipe, Gonçalo Fontelas
Publicado em: (2024)
school Extraction and Exploration of Morality on Social Networks: Analyzing Twitter discussions with the Moral Foundations Theory
por: Custódio, Naomi Ferreras
Publicado em: (2024)
por: Custódio, Naomi Ferreras
Publicado em: (2024)
article The Consumer Benchmark, Vulnerability, and the Contract Terms Transparency
por: Esposito, Fabrizio
Publicado em: (2022)
por: Esposito, Fabrizio
Publicado em: (2022)
school After the success of DevOps introduce DataOps in enterprise culture
por: Silva, Nuno Filipe Paulo da
Publicado em: (2023)
por: Silva, Nuno Filipe Paulo da
Publicado em: (2023)
school Nova IMS Contribution for SDGs achievement: A Business Intelligence Approach
por: Chande, Iara Belinda Bugalho
Publicado em: (2021)
por: Chande, Iara Belinda Bugalho
Publicado em: (2021)
school Building Trust in Artificial Intelligence: A Master Thesis Exploring Public Sector Adoption and Perception
por: Martinho, Diogo Rafael Pissarra
Publicado em: (2024)
por: Martinho, Diogo Rafael Pissarra
Publicado em: (2024)
Registos relacionados
-
groups The impact of online advertisement personalization and transparency on individual defensive responses and engagement
por: Al Helaly, Yasser
Publicado em: (2023) -
image Optimized Audit with Risk-based Sampling [poster]
por: Dias, Isabel
Publicado em: (2025) -
school Design and Implementation of a Data Repository for Collaborative Research: A Pilot Platform for Dataset Management and Continuity: Supporting Sustainable Research Through Structured Data Access and Collaboration
por: Jannah, Ilyass
Publicado em: (2025) -
school Assessing the Level of Data Governance Implementation in a Portuguese Insurance Company using DAMA DMBOK Framework
por: Baptista, Henrique de Oliveira
Publicado em: (2025) -
school The impact of PwC Portugal’s IT Audit work to improve Client Information Systems
por: Silva, Maria Helena Figueiredo
Publicado em: (2024)