Publicação
Benchmarking Table Extraction
| Resumo: | This paper compares two approaches for table extraction from images: deep learning computer vision and Multimodal Large Language Models (MLLMs). Computer vision models for table extraction, such as the Table Transformer model (TATR), have enhanced the extraction of complex table structural layouts by leveraging deep learning for precise structural recognition combined with traditional Optical Character Recognition (OCR). Conversely, MLLMs, which process both text and image inputs, present a novel approach by potentially bypassing the limitations of TATR plus OCR methods altogether. Models such as GPT-4o, Phi-3 Vision, and Granite Vision 3.2 demonstrate the potential of MLLMs to analyze and interpret table images directly, offering enhanced accuracy and robust extraction capabilities. A state-of-the-art metric like Grid Table Similarity (GriTS) evaluated these methodologies, providing nuanced insights into structural and text content effectiveness. Utilizing the PubTables-1M dataset, a comprehensive and widely used benchmark in the field, this study highlights the strengths and limitations of each approach, setting the stage for future innovations in table extraction technologies. Deep learning computer vision techniques still have a slight edge when extracting table structural layout, but in terms of text cell content, MLLMs are far better. |
|---|---|
| Autores principais: | Nunes, Guilherme |
| Outros Autores: | Rolla, Vitor; Pereira, Duarte; Alves, Vasco; Carreiro, André V.; Baptista, Márcia L. |
| Assunto: | SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | documento de conferência |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| _version_ | 1868415345324195840 |
|---|---|
| author | Nunes, Guilherme |
| author2 | Rolla, Vitor Pereira, Duarte Alves, Vasco Carreiro, André V. Baptista, Márcia L. |
| author2_role | author author author author author |
| author_facet | Nunes, Guilherme Rolla, Vitor Pereira, Duarte Alves, Vasco Carreiro, André V. Baptista, Márcia L. |
| author_role | author |
| contributor_name_str_mv | NOVA Information Management School (NOVA IMS) Information Management Research Center (MagIC) - NOVA Information Management School RUN |
| country_str | PT |
| creators_json_txt | [{\"Person.name\":\"Nunes, Guilherme\"},{\"Person.name\":\"Rolla, Vitor\"},{\"Person.name\":\"Pereira, Duarte\"},{\"Person.name\":\"Alves, Vasco\"},{\"Person.name\":\"Carreiro, André V.\"},{\"Person.name\":\"Baptista, Márcia L.\"}] |
| datacite.contributors.contributor.contributorName.fl_str_mv | NOVA Information Management School (NOVA IMS) Information Management Research Center (MagIC) - NOVA Information Management School RUN |
| datacite.creators.creator.creatorName.fl_str_mv | Nunes, Guilherme Rolla, Vitor Pereira, Duarte Alves, Vasco Carreiro, André V. Baptista, Márcia L. |
| datacite.date.Accepted.fl_str_mv | 2025-08-01T00:00:00Z |
| datacite.date.available.fl_str_mv | 2025-07-29T21:28:21Z |
| datacite.date.embargoed.fl_str_mv | 2025-07-29T21:28:21Z |
| datacite.rights.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| datacite.subjects.subject.fl_str_mv | SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals |
| datacite.titles.title.fl_str_mv | Benchmarking Table Extraction Multimodal LLMs vs Traditional OCR |
| dc.contributor.none.fl_str_mv | NOVA Information Management School (NOVA IMS) Information Management Research Center (MagIC) - NOVA Information Management School RUN |
| dc.creator.none.fl_str_mv | Nunes, Guilherme Rolla, Vitor Pereira, Duarte Alves, Vasco Carreiro, André V. Baptista, Márcia L. |
| dc.date.Accepted.fl_str_mv | 2025-08-01T00:00:00Z |
| dc.date.available.fl_str_mv | 2025-07-29T21:28:21Z |
| dc.date.embargoed.fl_str_mv | 2025-07-29T21:28:21Z |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | http://hdl.handle.net/10362/185720 |
| dc.language.none.fl_str_mv | eng |
| dc.publisher.none.fl_str_mv | Association for Computational Linguistics (ACL) |
| dc.rights.none.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| dc.subject.none.fl_str_mv | SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals |
| dc.title.fl_str_mv | Benchmarking Table Extraction Multimodal LLMs vs Traditional OCR |
| dc.type.none.fl_str_mv | http://purl.org/coar/resource_type/c_c94f |
| description | This paper compares two approaches for table extraction from images: deep learning computer vision and Multimodal Large Language Models (MLLMs). Computer vision models for table extraction, such as the Table Transformer model (TATR), have enhanced the extraction of complex table structural layouts by leveraging deep learning for precise structural recognition combined with traditional Optical Character Recognition (OCR). Conversely, MLLMs, which process both text and image inputs, present a novel approach by potentially bypassing the limitations of TATR plus OCR methods altogether. Models such as GPT-4o, Phi-3 Vision, and Granite Vision 3.2 demonstrate the potential of MLLMs to analyze and interpret table images directly, offering enhanced accuracy and robust extraction capabilities. A state-of-the-art metric like Grid Table Similarity (GriTS) evaluated these methodologies, providing nuanced insights into structural and text content effectiveness. Utilizing the PubTables-1M dataset, a comprehensive and widely used benchmark in the field, this study highlights the strengths and limitations of each approach, setting the stage for future innovations in table extraction technologies. Deep learning computer vision techniques still have a slight edge when extracting table structural layout, but in terms of text cell content, MLLMs are far better. |
| dirty | 0 |
| eu_rights_str_mv | openAccess |
| format | conferenceObject |
| fulltext.url.fl_str_mv | https://run.unl.pt/bitstreams/c08913ad-7188-414b-942b-e79665eb5445/download |
| id | run_72f45ddb58bcdfd21fbcc350e994fb86 |
| identifier.url.fl_str_mv | http://hdl.handle.net/10362/185720 |
| instacron_str | unl |
| institution | Universidade Nova de Lisboa |
| instname_str | Universidade Nova de Lisboa |
| language | eng |
| network_acronym_str | run |
| network_name_str | Repositório Institucional da UNL |
| oai_identifier_str | oai:run.unl.pt:10362/185720 |
| organization_str_mv | urn:organizationAcronym:unl |
| person_str_mv | Nunes, Guilherme Rolla, Vitor Pereira, Duarte Alves, Vasco Carreiro, André V. Baptista, Márcia L. |
| publishDate | 2025 |
| publisher.none.fl_str_mv | Association for Computational Linguistics (ACL) |
| reponame_str | Repositório Institucional da UNL |
| repository_id_str | urn:repositoryAcronym:run |
| service_str_mv | urn:repositoryAcronym:run |
| spelling | engAssociation for Computational Linguistics (ACL)enThis paper compares two approaches for table extraction from images: deep learning computer vision and Multimodal Large Language Models (MLLMs). Computer vision models for table extraction, such as the Table Transformer model (TATR), have enhanced the extraction of complex table structural layouts by leveraging deep learning for precise structural recognition combined with traditional Optical Character Recognition (OCR). Conversely, MLLMs, which process both text and image inputs, present a novel approach by potentially bypassing the limitations of TATR plus OCR methods altogether. Models such as GPT-4o, Phi-3 Vision, and Granite Vision 3.2 demonstrate the potential of MLLMs to analyze and interpret table images directly, offering enhanced accuracy and robust extraction capabilities. A state-of-the-art metric like Grid Table Similarity (GriTS) evaluated these methodologies, providing nuanced insights into structural and text content effectiveness. Utilizing the PubTables-1M dataset, a comprehensive and widely used benchmark in the field, this study highlights the strengths and limitations of each approach, setting the stage for future innovations in table extraction technologies. Deep learning computer vision techniques still have a slight edge when extracting table structural layout, but in terms of text cell content, MLLMs are far better.application/pdfenBenchmarking Table ExtractionSubtitleenMultimodal LLMs vs Traditional OCRNunes, GuilhermeRolla, VitorPereira, DuarteAlves, VascoCarreiro, André V.Baptista, Márcia L.NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolHostingInstitutionOrganizationalRUNe-mailmailto:run@unl.ptrun@unl.ptISBNIsPartOf979-8-89176-286-2URNIsPartOfPURE: 123207811URNIsPartOfPURE UUID: f4b8c1d3-b54b-4f99-ae35-98da25f086422025-07-29T21:28:21Z2025-082025-08-01T00:00:00ZHandlehttp://hdl.handle.net/10362/185720http://purl.org/coar/access_right/c_abf2open accessSDG 4 - Quality EducationSDG 9 - Industry, Innovation, and InfrastructureSDG 16 - Peace, Justice and Strong InstitutionsSDG 17 - Partnerships for the Goals824111 bytesother research producthttp://purl.org/coar/resource_type/c_c94fconference objecthttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://run.unl.pt/bitstreams/c08913ad-7188-414b-942b-e79665eb5445/download |
| spellingShingle | Benchmarking Table Extraction Nunes, Guilherme SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals |
| status | SINGLETON |
| subject.fl_str_mv | SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals |
| title | Benchmarking Table Extraction |
| title_full | Benchmarking Table Extraction |
| title_fullStr | Benchmarking Table Extraction |
| title_full_unstemmed | Benchmarking Table Extraction |
| title_short | Benchmarking Table Extraction |
| title_sort | Benchmarking Table Extraction |
| topic | SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals |
| topic_facet | SDG 4 - Quality Education SDG 9 - Industry, Innovation, and Infrastructure SDG 16 - Peace, Justice and Strong Institutions SDG 17 - Partnerships for the Goals |
| url | http://hdl.handle.net/10362/185720 |
| visible | 1 |