Publicação

Benchmarking Table Extraction

Ver documento

Detalhes bibliográficos
Resumo:This paper compares two approaches for table extraction from images: deep learning computer vision and Multimodal Large Language Models (MLLMs). Computer vision models for table extraction, such as the Table Transformer model (TATR), have enhanced the extraction of complex table structural layouts by leveraging deep learning for precise structural recognition combined with traditional Optical Character Recognition (OCR). Conversely, MLLMs, which process both text and image inputs, present a novel approach by potentially bypassing the limitations of TATR plus OCR methods altogether. Models such as GPT-4o, Phi-3 Vision, and Granite Vision 3.2 demonstrate the potential of MLLMs to analyze and interpret table images directly, offering enhanced accuracy and robust extraction capabilities. A state-of-the-art metric like Grid Table Similarity (GriTS) evaluated these methodologies, providing nuanced insights into structural and text content effectiveness. Utilizing the PubTables-1M dataset, a comprehensive and widely used benchmark in the field, this study highlights the strengths and limitations of each approach, setting the stage for future innovations in table extraction technologies. Deep learning computer vision techniques still have a slight edge when extracting table structural layout, but in terms of text cell content, MLLMs are far better.
Autores principais:Nunes, Guilherme
Outros Autores:Rolla, Vitor; Pereira, Duarte; Alves, Vasco; Carreiro, André V.; Baptista, Márcia L.
Assunto:SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals
Ano:2025
País:Portugal
Tipo de documento:documento de conferência
Tipo de acesso:acesso aberto
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
_version_ 1868415345324195840
author Nunes, Guilherme
author2 Rolla, Vitor
Pereira, Duarte
Alves, Vasco
Carreiro, André V.
Baptista, Márcia L.
author2_role author
author
author
author
author
author_facet Nunes, Guilherme
Rolla, Vitor
Pereira, Duarte
Alves, Vasco
Carreiro, André V.
Baptista, Márcia L.
author_role author
contributor_name_str_mv NOVA Information Management School (NOVA IMS)
Information Management Research Center (MagIC) - NOVA Information Management School
RUN
country_str PT
creators_json_txt [{\"Person.name\":\"Nunes, Guilherme\"},{\"Person.name\":\"Rolla, Vitor\"},{\"Person.name\":\"Pereira, Duarte\"},{\"Person.name\":\"Alves, Vasco\"},{\"Person.name\":\"Carreiro, André V.\"},{\"Person.name\":\"Baptista, Márcia L.\"}]
datacite.contributors.contributor.contributorName.fl_str_mv NOVA Information Management School (NOVA IMS)
Information Management Research Center (MagIC) - NOVA Information Management School
RUN
datacite.creators.creator.creatorName.fl_str_mv Nunes, Guilherme
Rolla, Vitor
Pereira, Duarte
Alves, Vasco
Carreiro, André V.
Baptista, Márcia L.
datacite.date.Accepted.fl_str_mv 2025-08-01T00:00:00Z
datacite.date.available.fl_str_mv 2025-07-29T21:28:21Z
datacite.date.embargoed.fl_str_mv 2025-07-29T21:28:21Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv SDG 4 - Quality Education
SDG 9 - Industry, Innovation, and Infrastructure
SDG 16 - Peace, Justice and Strong Institutions
SDG 17 - Partnerships for the Goals
datacite.titles.title.fl_str_mv Benchmarking Table Extraction
Multimodal LLMs vs Traditional OCR
dc.contributor.none.fl_str_mv NOVA Information Management School (NOVA IMS)
Information Management Research Center (MagIC) - NOVA Information Management School
RUN
dc.creator.none.fl_str_mv Nunes, Guilherme
Rolla, Vitor
Pereira, Duarte
Alves, Vasco
Carreiro, André V.
Baptista, Márcia L.
dc.date.Accepted.fl_str_mv 2025-08-01T00:00:00Z
dc.date.available.fl_str_mv 2025-07-29T21:28:21Z
dc.date.embargoed.fl_str_mv 2025-07-29T21:28:21Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10362/185720
dc.language.none.fl_str_mv eng
dc.publisher.none.fl_str_mv Association for Computational Linguistics (ACL)
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv SDG 4 - Quality Education
SDG 9 - Industry, Innovation, and Infrastructure
SDG 16 - Peace, Justice and Strong Institutions
SDG 17 - Partnerships for the Goals
dc.title.fl_str_mv Benchmarking Table Extraction
Multimodal LLMs vs Traditional OCR
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_c94f
description This paper compares two approaches for table extraction from images: deep learning computer vision and Multimodal Large Language Models (MLLMs). Computer vision models for table extraction, such as the Table Transformer model (TATR), have enhanced the extraction of complex table structural layouts by leveraging deep learning for precise structural recognition combined with traditional Optical Character Recognition (OCR). Conversely, MLLMs, which process both text and image inputs, present a novel approach by potentially bypassing the limitations of TATR plus OCR methods altogether. Models such as GPT-4o, Phi-3 Vision, and Granite Vision 3.2 demonstrate the potential of MLLMs to analyze and interpret table images directly, offering enhanced accuracy and robust extraction capabilities. A state-of-the-art metric like Grid Table Similarity (GriTS) evaluated these methodologies, providing nuanced insights into structural and text content effectiveness. Utilizing the PubTables-1M dataset, a comprehensive and widely used benchmark in the field, this study highlights the strengths and limitations of each approach, setting the stage for future innovations in table extraction technologies. Deep learning computer vision techniques still have a slight edge when extracting table structural layout, but in terms of text cell content, MLLMs are far better.
dirty 0
eu_rights_str_mv openAccess
format conferenceObject
fulltext.url.fl_str_mv https://run.unl.pt/bitstreams/c08913ad-7188-414b-942b-e79665eb5445/download
id run_72f45ddb58bcdfd21fbcc350e994fb86
identifier.url.fl_str_mv http://hdl.handle.net/10362/185720
instacron_str unl
institution Universidade Nova de Lisboa
instname_str Universidade Nova de Lisboa
language eng
network_acronym_str run
network_name_str Repositório Institucional da UNL
oai_identifier_str oai:run.unl.pt:10362/185720
organization_str_mv urn:organizationAcronym:unl
person_str_mv Nunes, Guilherme
Rolla, Vitor
Pereira, Duarte
Alves, Vasco
Carreiro, André V.
Baptista, Márcia L.
publishDate 2025
publisher.none.fl_str_mv Association for Computational Linguistics (ACL)
reponame_str Repositório Institucional da UNL
repository_id_str urn:repositoryAcronym:run
service_str_mv urn:repositoryAcronym:run
spelling engAssociation for Computational Linguistics (ACL)enThis paper compares two approaches for table extraction from images: deep learning computer vision and Multimodal Large Language Models (MLLMs). Computer vision models for table extraction, such as the Table Transformer model (TATR), have enhanced the extraction of complex table structural layouts by leveraging deep learning for precise structural recognition combined with traditional Optical Character Recognition (OCR). Conversely, MLLMs, which process both text and image inputs, present a novel approach by potentially bypassing the limitations of TATR plus OCR methods altogether. Models such as GPT-4o, Phi-3 Vision, and Granite Vision 3.2 demonstrate the potential of MLLMs to analyze and interpret table images directly, offering enhanced accuracy and robust extraction capabilities. A state-of-the-art metric like Grid Table Similarity (GriTS) evaluated these methodologies, providing nuanced insights into structural and text content effectiveness. Utilizing the PubTables-1M dataset, a comprehensive and widely used benchmark in the field, this study highlights the strengths and limitations of each approach, setting the stage for future innovations in table extraction technologies. Deep learning computer vision techniques still have a slight edge when extracting table structural layout, but in terms of text cell content, MLLMs are far better.application/pdfenBenchmarking Table ExtractionSubtitleenMultimodal LLMs vs Traditional OCRNunes, GuilhermeRolla, VitorPereira, DuarteAlves, VascoCarreiro, André V.Baptista, Márcia L.NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolHostingInstitutionOrganizationalRUNe-mailmailto:run@unl.ptrun@unl.ptISBNIsPartOf979-8-89176-286-2URNIsPartOfPURE: 123207811URNIsPartOfPURE UUID: f4b8c1d3-b54b-4f99-ae35-98da25f086422025-07-29T21:28:21Z2025-082025-08-01T00:00:00ZHandlehttp://hdl.handle.net/10362/185720http://purl.org/coar/access_right/c_abf2open accessSDG 4 - Quality EducationSDG 9 - Industry, Innovation, and InfrastructureSDG 16 - Peace, Justice and Strong InstitutionsSDG 17 - Partnerships for the Goals824111 bytesother research producthttp://purl.org/coar/resource_type/c_c94fconference objecthttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://run.unl.pt/bitstreams/c08913ad-7188-414b-942b-e79665eb5445/download
spellingShingle Benchmarking Table Extraction
Nunes, Guilherme
SDG 4 - Quality Education
SDG 9 - Industry, Innovation, and Infrastructure
SDG 16 - Peace, Justice and Strong Institutions
SDG 17 - Partnerships for the Goals
status SINGLETON
subject.fl_str_mv SDG 4 - Quality Education
SDG 9 - Industry, Innovation, and Infrastructure
SDG 16 - Peace, Justice and Strong Institutions
SDG 17 - Partnerships for the Goals
title Benchmarking Table Extraction
title_full Benchmarking Table Extraction
title_fullStr Benchmarking Table Extraction
title_full_unstemmed Benchmarking Table Extraction
title_short Benchmarking Table Extraction
title_sort Benchmarking Table Extraction
topic SDG 4 - Quality Education
SDG 9 - Industry, Innovation, and Infrastructure
SDG 16 - Peace, Justice and Strong Institutions
SDG 17 - Partnerships for the Goals
topic_facet SDG 4 - Quality Education
SDG 9 - Industry, Innovation, and Infrastructure
SDG 16 - Peace, Justice and Strong Institutions
SDG 17 - Partnerships for the Goals
url http://hdl.handle.net/10362/185720
visible 1