Skip to content

Extended I/O Framework: Readers/Writers for Parquet#2229

Closed
sayedkeika wants to merge 1 commit intoapache:mainfrom
sayedkeika:main
Closed

Extended I/O Framework: Readers/Writers for Parquet#2229
sayedkeika wants to merge 1 commit intoapache:mainfrom
sayedkeika:main

Conversation

@sayedkeika
Copy link
Copy Markdown

Overview

This pull request adds support for the Parquet file format. The implementation includes both readers and writers that can handle Parquet files sequentially and concurrently.

Details

  • Single-threaded reader and writer for smaller datasets
  • Multi-threaded reader and writer for larger datasets
  • Component tests for both sequential and parallel operations that cover different schemas
  • Detailed documentation

@sayedkeika sayedkeika changed the title LDE Project - Extended I/O Framework: Readers/Writers for Parquet Extended I/O Framework: Readers/Writers for Parquet Feb 16, 2025
@mboehm7 mboehm7 closed this in d19f505 Apr 18, 2025
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Apr 18, 2025
@mboehm7
Copy link
Copy Markdown
Contributor

mboehm7 commented Apr 18, 2025

LGTM - thanks for the patch @sayedkeika. During the merge I resolved the merge conflicts, fixed the warnings and formatting (tabs over spaces), added additional tests (sparse data), and left a FIXME (for removing the ExampleParquetWriter). Additionally, I fixed the parallel write task to call the sequential write instead of the parallel writer again (which worked because the number of part files was 1 due to the small size).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants