Residential Cleaning Company
Smart Data Entry & Classification
Result: Built an intelligent classifier that auto-categorizes 95%+ of transactions — learning from every correction
November 2025
The Problem
Here’s a task most business owners know too well: you download your bank statement, open a spreadsheet, and start labeling things.
“COSTCO WHOLESALE #482” — that’s cleaning supplies. “SQ *RALEIGH EQUIPMENT” — that’s equipment rental. “PAYPAL *SOMETHING” — okay, now you have to go look that one up.
Multiply this across five accounts — deposits, expenses, two bank accounts, a credit card, and payroll — and you’ve got a person spending half a day every week doing work that a machine should be handling.
The worst part? It’s not just slow. It’s error-prone. One wrong category and your P&L is off. One duplicate entry from a re-downloaded CSV and your numbers don’t reconcile. One pending transaction that changes its description when it posts and suddenly you’ve got a phantom charge.
What I Built
A three-tier classification engine that gets smarter over time.
Tier 1 — Overrides. Some transactions are weird. That $847 charge on March 3rd that looks like equipment but was actually a refund? Override. Exact match on amount + date + description. No ambiguity.
Tier 2 — Learned Maps. The system builds a dictionary of every merchant it’s ever seen. “COSTCO WHOLESALE” → Cleaning Supplies. “VERIZON WIRELESS” → Phone/Internet. 500+ patterns and growing. When a transaction matches a known signature, it’s categorized instantly.
Tier 3 — Pattern Rules. For anything new, regex-based pattern matching catches the common structures. Anything with “PAYROLL” in the description → Payroll. Anything from “SQ *” → Square payment, check the amount to determine category.
If all three tiers fail — and at this point, that’s rare — the system creates an unresolved.xlsx file with just the mystery charges. You label them once. The system learns. Next time, it handles them automatically.
The Details That Matter
Deduplication. Every transaction gets a SHA256 hash. Re-download the same export next week? No duplicates. This sounds simple until you realize that banks change pending transaction descriptions when they post. The system handles that too.
Per-employee role mapping. Payroll isn’t just “payroll.” Cleaner wages go to Cost of Goods Sold. Office staff goes to Operating Expenses. Owner draws get flagged separately. The system knows who’s who and routes accordingly.
Credit card payment detection. When you pay your Capital One bill from your checking account, that’s one transaction — not an expense. The system catches these cross-account transfers so they don’t inflate your numbers.
The Result
What used to take 2-3 hours a week now takes about 10 minutes — and most of that is reviewing the output, not doing the work.
The classification accuracy sits above 95%. The remaining 5% are genuinely new merchants the system hasn’t seen before. It asks once, remembers forever.
But the real win is trust. When every transaction has been categorized by the same logic every time — no Monday morning brain fog, no “was that supplies or equipment?” guessing — the numbers downstream are reliable. The P&L is reliable. The tax prep is reliable.
It’s not a flashy project. Nobody’s going to put “automated bank transaction categorization” on a conference slide. But for the person who was spending half a day every week on it? It’s the best money they ever spent.
Stack: Python · pandas · openpyxl · SHA256 hashing · regex pattern engine
How it gets built
Understand the bottleneck, the data, and what success looks like.
Design the simplest solution that fully solves the problem.
Iterative development with working previews at each stage.
Handoff with documentation, training, and a 30-day support window.
Want to eliminate your most time-consuming task?
Tell me exactly what you're doing manually. I'll price a solution and explain exactly how it works — no commitment.