Publicação

Benchmarking Table Extraction

Detalhes bibliográficos
Resumo:	This paper compares two approaches for table extraction from images: deep learning computer vision and Multimodal Large Language Models (MLLMs). Computer vision models for table extraction, such as the Table Transformer model (TATR), have enhanced the extraction of complex table structural layouts by leveraging deep learning for precise structural recognition combined with traditional Optical Character Recognition (OCR). Conversely, MLLMs, which process both text and image inputs, present a novel approach by potentially bypassing the limitations of TATR plus OCR methods altogether. Models such as GPT-4o, Phi-3 Vision, and Granite Vision 3.2 demonstrate the potential of MLLMs to analyze and interpret table images directly, offering enhanced accuracy and robust extraction capabilities. A state-of-the-art metric like Grid Table Similarity (GriTS) evaluated these methodologies, providing nuanced insights into structural and text content effectiveness. Utilizing the PubTables-1M dataset, a comprehensive and widely used benchmark in the field, this study highlights the strengths and limitations of each approach, setting the stage for future innovations in table extraction technologies. Deep learning computer vision techniques still have a slight edge when extracting table structural layout, but in terms of text cell content, MLLMs are far better.
Autores principais:	Nunes, Guilherme
Outros Autores:	Rolla, Vitor; Pereira, Duarte; Alves, Vasco; Carreiro, André V.; Baptista, Márcia L.
Assunto:	SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals
Ano:	2025
País:	Portugal
Tipo de documento:	documento de conferência
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade Nova de Lisboa
Idioma:	inglês
Origem:	Repositório Institucional da UNL

_version_	1868415345324195840
author	Nunes, Guilherme
author2	Rolla, Vitor Pereira, Duarte Alves, Vasco Carreiro, André V. Baptista, Márcia L.
author2_role	author author author author author
author_facet	Nunes, Guilherme Rolla, Vitor Pereira, Duarte Alves, Vasco Carreiro, André V. Baptista, Márcia L.
author_role	author
contributor_name_str_mv	NOVA Information Management School (NOVA IMS) Information Management Research Center (MagIC) - NOVA Information Management School RUN
country_str	PT
creators_json_txt	[{\"Person.name\":\"Nunes, Guilherme\"},{\"Person.name\":\"Rolla, Vitor\"},{\"Person.name\":\"Pereira, Duarte\"},{\"Person.name\":\"Alves, Vasco\"},{\"Person.name\":\"Carreiro, André V.\"},{\"Person.name\":\"Baptista, Márcia L.\"}]
datacite.contributors.contributor.contributorName.fl_str_mv	NOVA Information Management School (NOVA IMS) Information Management Research Center (MagIC) - NOVA Information Management School RUN
datacite.creators.creator.creatorName.fl_str_mv	Nunes, Guilherme Rolla, Vitor Pereira, Duarte Alves, Vasco Carreiro, André V. Baptista, Márcia L.
datacite.date.Accepted.fl_str_mv	2025-08-01T00:00:00Z
datacite.date.available.fl_str_mv	2025-07-29T21:28:21Z
datacite.date.embargoed.fl_str_mv	2025-07-29T21:28:21Z
datacite.rights.fl_str_mv	http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv	SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals
datacite.titles.title.fl_str_mv	Benchmarking Table Extraction Multimodal LLMs vs Traditional OCR
dc.contributor.none.fl_str_mv	NOVA Information Management School (NOVA IMS) Information Management Research Center (MagIC) - NOVA Information Management School RUN
dc.creator.none.fl_str_mv	Nunes, Guilherme Rolla, Vitor Pereira, Duarte Alves, Vasco Carreiro, André V. Baptista, Márcia L.
dc.date.Accepted.fl_str_mv	2025-08-01T00:00:00Z
dc.date.available.fl_str_mv	2025-07-29T21:28:21Z
dc.date.embargoed.fl_str_mv	2025-07-29T21:28:21Z
dc.format.none.fl_str_mv	application/pdf
dc.identifier.none.fl_str_mv	http://hdl.handle.net/10362/185720
dc.language.none.fl_str_mv	eng
dc.publisher.none.fl_str_mv	Association for Computational Linguistics (ACL)
dc.rights.none.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv	SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals
dc.title.fl_str_mv	Benchmarking Table Extraction Multimodal LLMs vs Traditional OCR
dc.type.none.fl_str_mv	http://purl.org/coar/resource_type/c_c94f
description	This paper compares two approaches for table extraction from images: deep learning computer vision and Multimodal Large Language Models (MLLMs). Computer vision models for table extraction, such as the Table Transformer model (TATR), have enhanced the extraction of complex table structural layouts by leveraging deep learning for precise structural recognition combined with traditional Optical Character Recognition (OCR). Conversely, MLLMs, which process both text and image inputs, present a novel approach by potentially bypassing the limitations of TATR plus OCR methods altogether. Models such as GPT-4o, Phi-3 Vision, and Granite Vision 3.2 demonstrate the potential of MLLMs to analyze and interpret table images directly, offering enhanced accuracy and robust extraction capabilities. A state-of-the-art metric like Grid Table Similarity (GriTS) evaluated these methodologies, providing nuanced insights into structural and text content effectiveness. Utilizing the PubTables-1M dataset, a comprehensive and widely used benchmark in the field, this study highlights the strengths and limitations of each approach, setting the stage for future innovations in table extraction technologies. Deep learning computer vision techniques still have a slight edge when extracting table structural layout, but in terms of text cell content, MLLMs are far better.
dirty	0
eu_rights_str_mv	openAccess
format	conferenceObject
fulltext.url.fl_str_mv	https://run.unl.pt/bitstreams/c08913ad-7188-414b-942b-e79665eb5445/download
id	run_72f45ddb58bcdfd21fbcc350e994fb86
identifier.url.fl_str_mv	http://hdl.handle.net/10362/185720
instacron_str	unl
institution	Universidade Nova de Lisboa
instname_str	Universidade Nova de Lisboa
language	eng
network_acronym_str	run
network_name_str	Repositório Institucional da UNL
oai_identifier_str	oai:run.unl.pt:10362/185720
organization_str_mv	urn:organizationAcronym:unl
person_str_mv	Nunes, Guilherme Rolla, Vitor Pereira, Duarte Alves, Vasco Carreiro, André V. Baptista, Márcia L.
publishDate	2025
publisher.none.fl_str_mv	Association for Computational Linguistics (ACL)
reponame_str	Repositório Institucional da UNL
repository_id_str	urn:repositoryAcronym:run
service_str_mv	urn:repositoryAcronym:run
spelling	engAssociation for Computational Linguistics (ACL)enThis paper compares two approaches for table extraction from images: deep learning computer vision and Multimodal Large Language Models (MLLMs). Computer vision models for table extraction, such as the Table Transformer model (TATR), have enhanced the extraction of complex table structural layouts by leveraging deep learning for precise structural recognition combined with traditional Optical Character Recognition (OCR). Conversely, MLLMs, which process both text and image inputs, present a novel approach by potentially bypassing the limitations of TATR plus OCR methods altogether. Models such as GPT-4o, Phi-3 Vision, and Granite Vision 3.2 demonstrate the potential of MLLMs to analyze and interpret table images directly, offering enhanced accuracy and robust extraction capabilities. A state-of-the-art metric like Grid Table Similarity (GriTS) evaluated these methodologies, providing nuanced insights into structural and text content effectiveness. Utilizing the PubTables-1M dataset, a comprehensive and widely used benchmark in the field, this study highlights the strengths and limitations of each approach, setting the stage for future innovations in table extraction technologies. Deep learning computer vision techniques still have a slight edge when extracting table structural layout, but in terms of text cell content, MLLMs are far better.application/pdfenBenchmarking Table ExtractionSubtitleenMultimodal LLMs vs Traditional OCRNunes, GuilhermeRolla, VitorPereira, DuarteAlves, VascoCarreiro, André V.Baptista, Márcia L.NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolHostingInstitutionOrganizationalRUNe-mailmailto:run@unl.ptrun@unl.ptISBNIsPartOf979-8-89176-286-2URNIsPartOfPURE: 123207811URNIsPartOfPURE UUID: f4b8c1d3-b54b-4f99-ae35-98da25f086422025-07-29T21:28:21Z2025-082025-08-01T00:00:00ZHandlehttp://hdl.handle.net/10362/185720http://purl.org/coar/access_right/c_abf2open accessSDG 4 - Quality EducationSDG 9 - Industry, Innovation, and InfrastructureSDG 16 - Peace, Justice and Strong InstitutionsSDG 17 - Partnerships for the Goals824111 bytesother research producthttp://purl.org/coar/resource_type/c_c94fconference objecthttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://run.unl.pt/bitstreams/c08913ad-7188-414b-942b-e79665eb5445/download
spellingShingle	Benchmarking Table Extraction Nunes, Guilherme SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals
status	SINGLETON
subject.fl_str_mv	SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals
title	Benchmarking Table Extraction
title_full	Benchmarking Table Extraction
title_fullStr	Benchmarking Table Extraction
title_full_unstemmed	Benchmarking Table Extraction
title_short	Benchmarking Table Extraction
title_sort	Benchmarking Table Extraction
topic	SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals
topic_facet	SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals
url	http://hdl.handle.net/10362/185720
visible	1

Publicação

Benchmarking Table Extraction

Registos relacionados