Adding a Bank Parser
Ledger supports adding new banks by implementing a parser class. Each parser converts a bank-specific statement format (PDF or CSV) into a list of RawTransaction objects.
Steps
1. Create the parser file
Add a new file in etl/parsers/. Name it <bank>_<format>.py:
etl/parsers/my_bank_csv.py2. Implement the BaseParser
Every parser must extend BaseParser from etl/parsers/base.py:
from abc import ABC, abstractmethodfrom pathlib import Pathfrom etl.models import RawTransaction
class BaseParser(ABC): @abstractmethod def parse(self, file_path: Path) -> list[RawTransaction]: ...
@property @abstractmethod def source_type(self) -> str: ...Here is a minimal CSV parser:
import csvfrom pathlib import Path
from etl.models import RawTransactionfrom etl.parsers.base import BaseParser
class MyBankCSVParser(BaseParser): source_type = "mybank"
def parse(self, file_path: Path) -> list[RawTransaction]: transactions = [] with open(file_path, newline="", encoding="utf-8-sig") as f: reader = csv.DictReader(f) for row in reader: txn = self._build_transaction(row, file_path) if txn: transactions.append(txn) return transactions
def _build_transaction(self, row: dict, file_path: Path) -> RawTransaction | None: # Parse date -- convert to YYYY-MM-DD format date = self._parse_date(row.get("Date", "")) if not date: return None
description = row.get("Description", "").strip() if not description: return None
# Parse amount -- positive = income, negative = expense amount = float(row.get("Amount", "0").replace(",", ""))
return RawTransaction( date=date, description=description, amount=amount, currency="AUD", source_type=self.source_type, source_file=str(file_path), raw_data=dict(row), # Store the full row for auditing )
def _parse_date(self, s: str) -> str: """Convert DD/MM/YYYY to YYYY-MM-DD.""" s = s.strip() if not s: return "" parts = s.split("/") if len(parts) == 3: day, month, year = parts return f"{year}-{month.zfill(2)}-{day.zfill(2)}" return ""3. The RawTransaction model
Every parser produces RawTransaction objects (defined in etl/models.py):
@dataclassclass RawTransaction: date: str # YYYY-MM-DD (required) description: str # Transaction description (required) amount: float # Positive = income, negative = expense (required) currency: str = "AUD" original_amount: float | None = None # For foreign currency transactions original_currency: str | None = None fee: float = 0.0 # Transaction fees (e.g. PayPal fees) reference_id: str | None = None # Unique ID from the source (used for dedup) source_type: str = "" # Must match your parser's source_type source_file: str = "" # File path (set automatically) raw_data: dict = field(default_factory=dict) # Original row data for auditingKey points:
datemust be inYYYY-MM-DDformatamountshould be positive for income, negative for expensesraw_datashould contain the original row/record — it is stored in theraw_importstable for auditing and balance extractionreference_idis used for dedup hashing if set (important for sources like PayPal that have unique transaction IDs)
4. Register the parser in cli.py
Add your parser to the PARSERS dict in etl/cli.py:
from etl.parsers.my_bank_csv import MyBankCSVParser
PARSERS = { # ... existing parsers ... "mybank": (MyBankCSVParser, "mybank", "*.csv"),}The tuple is (ParserClass, staging_subdirectory, glob_pattern).
Add a default account name:
ACCOUNT_NAMES = { # ... existing names ... "mybank": "My Bank",}5. Add the account to config
Add an entry to config/accounts.yaml:
- name: "My Bank" source_type: mybank currency: AUD account_type: checking6. Create the staging directory
mkdir -p staging/mybank7. Test
Drop a statement file into staging/mybank/ and run:
ledger ingest --source mybank --dry-runCheck that transactions are parsed correctly, then run without --dry-run.
Tips for PDF parsers
For PDF statements, use pdfplumber (already a dependency):
import pdfplumberfrom pathlib import Path
with pdfplumber.open(file_path) as pdf: for page in pdf.pages: text = page.extract_text() # Parse lines from text...
# Or extract tables: tables = page.extract_tables()PDF parsing is trickier than CSV because:
- Statement layouts vary between banks and even between statement periods
- You need to handle multi-line descriptions, page breaks, and headers
- Balance columns help verify you have parsed amounts correctly
Look at etl/parsers/ing_pdf.py, etl/parsers/hsbc_pdf.py, or etl/parsers/coles_pdf.py for real-world examples.
Tips for dedup hashing
The default dedup hash uses date|description|amount. This works for most banks but can collide if you have two identical transactions on the same day (e.g. two $5.00 coffees at the same cafe).
If your bank provides a unique transaction ID, set reference_id on the RawTransaction and the normalizer will use it for hashing instead.
For banks with multiple accounts in the same format (like ING), include the source filename in the hash — see compute_dedup_hash() in etl/normalizer.py.
Adding category rules
After adding a parser, you will likely need to add regex rules in config/categories.yaml for the merchant names that appear in that bank’s statements. Different banks format merchant names differently, so the same purchase might need multiple patterns.