Skip to content

Adding a Bank Parser

Ledger supports adding new banks by implementing a parser class. Each parser converts a bank-specific statement format (PDF or CSV) into a list of RawTransaction objects.

Steps

1. Create the parser file

Add a new file in etl/parsers/. Name it <bank>_<format>.py:

etl/parsers/my_bank_csv.py

2. Implement the BaseParser

Every parser must extend BaseParser from etl/parsers/base.py:

from abc import ABC, abstractmethod
from pathlib import Path
from etl.models import RawTransaction
class BaseParser(ABC):
@abstractmethod
def parse(self, file_path: Path) -> list[RawTransaction]:
...
@property
@abstractmethod
def source_type(self) -> str:
...

Here is a minimal CSV parser:

import csv
from pathlib import Path
from etl.models import RawTransaction
from etl.parsers.base import BaseParser
class MyBankCSVParser(BaseParser):
source_type = "mybank"
def parse(self, file_path: Path) -> list[RawTransaction]:
transactions = []
with open(file_path, newline="", encoding="utf-8-sig") as f:
reader = csv.DictReader(f)
for row in reader:
txn = self._build_transaction(row, file_path)
if txn:
transactions.append(txn)
return transactions
def _build_transaction(self, row: dict, file_path: Path) -> RawTransaction | None:
# Parse date -- convert to YYYY-MM-DD format
date = self._parse_date(row.get("Date", ""))
if not date:
return None
description = row.get("Description", "").strip()
if not description:
return None
# Parse amount -- positive = income, negative = expense
amount = float(row.get("Amount", "0").replace(",", ""))
return RawTransaction(
date=date,
description=description,
amount=amount,
currency="AUD",
source_type=self.source_type,
source_file=str(file_path),
raw_data=dict(row), # Store the full row for auditing
)
def _parse_date(self, s: str) -> str:
"""Convert DD/MM/YYYY to YYYY-MM-DD."""
s = s.strip()
if not s:
return ""
parts = s.split("/")
if len(parts) == 3:
day, month, year = parts
return f"{year}-{month.zfill(2)}-{day.zfill(2)}"
return ""

3. The RawTransaction model

Every parser produces RawTransaction objects (defined in etl/models.py):

@dataclass
class RawTransaction:
date: str # YYYY-MM-DD (required)
description: str # Transaction description (required)
amount: float # Positive = income, negative = expense (required)
currency: str = "AUD"
original_amount: float | None = None # For foreign currency transactions
original_currency: str | None = None
fee: float = 0.0 # Transaction fees (e.g. PayPal fees)
reference_id: str | None = None # Unique ID from the source (used for dedup)
source_type: str = "" # Must match your parser's source_type
source_file: str = "" # File path (set automatically)
raw_data: dict = field(default_factory=dict) # Original row data for auditing

Key points:

  • date must be in YYYY-MM-DD format
  • amount should be positive for income, negative for expenses
  • raw_data should contain the original row/record — it is stored in the raw_imports table for auditing and balance extraction
  • reference_id is used for dedup hashing if set (important for sources like PayPal that have unique transaction IDs)

4. Register the parser in cli.py

Add your parser to the PARSERS dict in etl/cli.py:

from etl.parsers.my_bank_csv import MyBankCSVParser
PARSERS = {
# ... existing parsers ...
"mybank": (MyBankCSVParser, "mybank", "*.csv"),
}

The tuple is (ParserClass, staging_subdirectory, glob_pattern).

Add a default account name:

ACCOUNT_NAMES = {
# ... existing names ...
"mybank": "My Bank",
}

5. Add the account to config

Add an entry to config/accounts.yaml:

- name: "My Bank"
source_type: mybank
currency: AUD
account_type: checking

6. Create the staging directory

Terminal window
mkdir -p staging/mybank

7. Test

Drop a statement file into staging/mybank/ and run:

Terminal window
ledger ingest --source mybank --dry-run

Check that transactions are parsed correctly, then run without --dry-run.

Tips for PDF parsers

For PDF statements, use pdfplumber (already a dependency):

import pdfplumber
from pathlib import Path
with pdfplumber.open(file_path) as pdf:
for page in pdf.pages:
text = page.extract_text()
# Parse lines from text...
# Or extract tables:
tables = page.extract_tables()

PDF parsing is trickier than CSV because:

  • Statement layouts vary between banks and even between statement periods
  • You need to handle multi-line descriptions, page breaks, and headers
  • Balance columns help verify you have parsed amounts correctly

Look at etl/parsers/ing_pdf.py, etl/parsers/hsbc_pdf.py, or etl/parsers/coles_pdf.py for real-world examples.

Tips for dedup hashing

The default dedup hash uses date|description|amount. This works for most banks but can collide if you have two identical transactions on the same day (e.g. two $5.00 coffees at the same cafe).

If your bank provides a unique transaction ID, set reference_id on the RawTransaction and the normalizer will use it for hashing instead.

For banks with multiple accounts in the same format (like ING), include the source filename in the hash — see compute_dedup_hash() in etl/normalizer.py.

Adding category rules

After adding a parser, you will likely need to add regex rules in config/categories.yaml for the merchant names that appear in that bank’s statements. Different banks format merchant names differently, so the same purchase might need multiple patterns.