How to Convert PDF to Excel: Tables, Bank Statements, and Best-Effort Extraction

Bank statement, invoice line items, GST returns — anything tabular in a PDF you need to analyse in Excel. The right tool depends on whether it's a clean text PDF or a scanned image. Below: five methods, when each works, and when to give up and re-key by hand.

Converting PDF tables to Excel is the most thankless task in office work. Anyone who's tried it on a bank statement knows: the numbers come out shifted by a column, dates parse as text, totals don't add up. The reason is fundamental — PDF doesn't store “tables”; it stores text positioned at X/Y coordinates. Any converter has to guess which positions form a row and which form a column. Some PDFs make this easy. Others, impossible.

This guide covers the five methods that actually work, ranked by how cleanly they handle different PDF types.

Why PDF tables don't convert cleanly

A PDF table that looks like a spreadsheet is actually:

A grid of text fragments at specific (x, y) positions
No semantic info that says “this is column 2, row 5”
Sometimes thin lines drawn separately as graphics, sometimes no lines at all

A converter clusters fragments by Y position to find rows, then by X position to find columns. This works perfectly when columns are uniformly aligned (clean spreadsheet exports, modern bank statements) and falls apart when widths vary, cells span multiple lines, or column headers wrap.

Method 1: Browser-side table extraction (best for privacy)

Use our free PDF to Excel tool. Drop the PDF, click Convert. Each page becomes a sheet in the output XLSX. Each detected row maps to one row in Excel.

Best on: bank statements, invoice line items, simple data tables, PDFs originally exported from Excel/Google Sheets. Won't work well on: tables with merged cells, multi-row headers, scanned PDFs (no text layer).

Privacy: file never uploaded. For sensitive financial data (bank statements, salary slips, GSTR-1 returns), this is the right choice.

Method 2: Excel's built-in “From PDF” (Power Query)

Excel since 2021 / Microsoft 365 has a native importer: Data → Get Data → From File → From PDF. It opens Power Query Navigator showing every detected table; you pick the ones you want.

Quality is good for clean tabular PDFs and the integration with Power Query lets you clean the data (remove blank rows, fix data types, merge headers) before importing into the workbook. Limitation:requires Excel 2021+ or Microsoft 365 (₹659/month). Older Excel doesn't have it.

Method 3: Tabula (free desktop app, gold standard)

Tabula is a free open-source tool specifically designed for extracting tables from PDFs. Available for Mac, Windows, Linux. Drop a PDF, draw a rectangle around the table you want, click Extract. Output is CSV or TSV.

Quality is consistently the best for difficult PDFs because you tell it where the table is — no guessing. Good for one-off complex extractions where browser tools fail. Trade-off:manual selection per table, doesn't scale for batch jobs. Free.

Method 4: Adobe Acrobat Pro (paid, high fidelity)

File → Export to → Spreadsheet → Microsoft Excel Workbook. Acrobat's table detection is excellent — handles merged cells and complex layouts better than most. ₹1,475/month subscription. Worth it for finance teams processing PDF reports daily; overkill for occasional use.

Method 5: Bank statement specifically

Bank statements deserve special mention because they're both the most-asked use case AND the most consistent format. Major Indian banks (SBI, HDFC, ICICI, Axis, Kotak) all generate column-aligned PDF statements that browser tools handle well.

The catch: many banks password-protect their statements (last 6 digits of account, DOB, etc.). If yours is encrypted, unlock it first using our PDF Unlock tool, then run the unlocked file through PDF to Excel.

Method 6: When all else fails — re-key by hand

If the PDF is a scan, has merged cells everywhere, or uses a layout that confuses every tool — sometimes the fastest path is just typing it. For a 50-row table, manual entry takes 20-30 minutes. Beats spending 2 hours fighting a converter that produces 80%-correct output you have to verify anyway.

Quality comparison

Method	Privacy	Quality	Cost
Pyrelo browser	High	Best-effort	Free
Excel built-in (Power Query)	High	Good	M365 license
Tabula (desktop)	High	Excellent (manual)	Free
Adobe Acrobat Pro	High	Excellent	₹1,475/mo
Online uploaders	Cloud upload	Variable	Freemium

Tips for cleaner output, regardless of tool

Unlock encrypted PDFs first using PDF Unlock. Most extractors fail on password-protected files.
If table has multiple sections per page, split the PDF first using Pyrelo's Split PDF tool, then extract each section separately.
For scanned bank statements, OCR first using Image to Text (OCR). Tesseract handles English bank statements well; for Hindi/regional language statements, accuracy drops.
Always verify totals. Sum a few columns in Excel and cross-check against the PDF. A 5-minute audit catches column-shift errors that would take an hour to debug downstream.

Frequently asked

Will the conversion preserve formulas?No. PDFs don't store formulas — only the rendered values. The output is values-only; you'll need to re-create any formulas in Excel.

Does the order of rows / columns get preserved? Generally yes, in tools that group by Y/X coordinate. Browser-based tools may occasionally interleave rows if the source PDF has overlapping text positions (unusual). Tabula and Acrobat are most reliable here.

Can I convert multiple PDFs in one go? Browser tool: one at a time. Tabula: yes via batch script. Power Query: yes by repeating the import. For large-scale automation, write a Python script using pdfplumber or tabula-py.