Production AI Resume Search with HeadHunter: architecture, matching, and anti-bias pipeline
A practical breakdown of AI recruiting on hh.ru: from ingestion and resume normalization to explainable scoring and an HR-ready shortlist.
TL;DR
Production AI recruiting is not just a list of applicants for a vacancy.
A working system includes HeadHunter API ingestion, resume normalization, structured matching, and explainable scoring.
The current workflow includes authorization, vacancy updates, resume snapshots, batch AI analysis, and transparent candidate rating.
Key challenges: noisy resumes, inconsistent formats, outdated profiles, and bias risks in decision-making.
Core principle: AI accelerates recruiter decisions but does not replace human review.
What the system solves
Goal: find relevant candidates quickly and with clear reasoning across a large response pool.
Automated search
The system aggregates candidates by vacancy and prepares a single evaluation set.
Ranking
Candidates are sorted by a composite match score, not by one signal only.
Explainability
It shows why a candidate matches: aligned skills, experience, gaps, and recommendations.
Scale
Handles thousands of resumes with batch operations and saved datasets.
Real scenario: HR searches for “Senior Python + RAG + NLP”. The system retrieves relevant resumes, ranks them, explains the match, and highlights gaps.
Architecture Pipeline
Vacancy / Query
↓
Query Normalization
↓
Resume Ingestion (HeadHunter API)
↓
Resume Parsing
↓
Structured Extraction
↓
Candidate Filtering & Selection
↓
Scoring Engine
↓
LLM Explanation
↓
Ranked Candidates
This pipeline makes recruiting reproducible: each stage can be debugged and improved independently.
How the current implementation works
The system follows a step-by-step HR workflow:
Employer authorization
Connects a HeadHunter account via OAuth for secure access to vacancies and responses.
Vacancy refresh
Pulls current vacancies and key parameters for candidate matching.
Resume loading and snapshots
Resumes are loaded page by page and saved for reproducible analysis.
AI analysis and rating
Each candidate gets a score, strengths/weaknesses, skill match, and interview questions.
HeadHunter integration
HeadHunter API is integrated through OAuth and API calls for vacancies and resumes. The implementation handles:
OAuth authorization
Access token, refresh token, and user context handling.
Pagination
Resumes are fetched by pages (up to 50 per page) with staged saves.
Caching and snapshots
Data is stored as per-user JSON snapshots for repeatable analysis.
Data refresh
Vacancies and resumes can be refreshed and re-analyzed without manual rebuild.
Common source issues: incomplete profiles, mixed resume templates, outdated information, and no single skill standard.
Resume parsing and structured extraction
A critical stage is normalization of unstructured resume data into a unified candidate profile.
Challenges include free text, uneven detail, mixed formats, and partially filled sections.
Resume text vs structured fields
Free-form resume text alone is not enough for precise hiring decisions.
Production recruiting depends on structured fields: months of experience, expected salary, location, job-search status, skill stack, and role relevance.
Practical approach: first structure the candidate profile, then score and explain the result for HR.
Hybrid Retrieval
In this HH workflow, candidate selection combines several signals:
Vacancy source
Candidates are selected against a specific vacancy and its parameters from HeadHunter.
Structured filters
Experience, job-search status, location, salary expectations, and skills are considered.
AI interpretation
LLM analyzes the profile and produces an explainable fit assessment.
Final shortlist
Output is a ranked candidate list with reasons and interview recommendations.
Scoring / Matching Engine
The key differentiator is explainable scoring, not a black box.
Skill match: 0.8 Experience match: 0.6 Role match: 0.9 Education match: 0.3 Final score = w1*Skill + w2*Experience + w3*Role + w4*Education
Explainable output example: “Candidate fits because of 5 years of Python, proven NLP experience, and RAG practice; gap: limited MLOps evidence in production.”
LLM layer
LLM is used as an interpretation layer:
1) match and ranking explanation;
2) skill and term normalization;
3) query rewrite for better search precision;
4) candidate summary generation for HR.
Anti-bias and limitations
AI recruiting carries bias risks related to gender, age, company background, and education path.
Risk mitigation practices: exclude sensitive attributes from scoring, keep reasoning transparent, enforce human-in-the-loop, and run regular recommendation audits.
Important: the system assists hiring decisions; it should not automatically hire people.
Docker / Production contour
Recommended production setup:
HeadHunter API integration, recruiting business logic layer, snapshot storage, batch processing, and recruiter action audit.
Service separation supports stability, batch-analysis scaling, and SLA control.
Common implementation mistakes
1) Comparing candidates only by general impression without structured scoring.
2) Ignoring experience, seniority, and salary constraints.
3) No normalization for skills and role naming.
4) No explainability for HR decision flow.
5) Ignoring resume format diversity.
6) No feedback loop from real hiring outcomes.
Quality metrics
precision@k
Share of relevant candidates in top shortlist positions.
time-to-hire
How much the system reduces vacancy closing time.
match accuracy
How well final ranking correlates with expert HR/hiring-manager evaluation.
HR acceptance rate
Percentage of AI recommendations accepted to the next hiring stage.
JSON examples (API-ready)
Candidate analysis request
GET /dashboard/hh/?action=analyze_candidate&vacancy_id=131173772&resume_id=abc123
Explainable response fragment
{
"overall_rating": 78,
"skill_match": {
"match_percentage": 82,
"matched_skills": ["Python", "NLP", "SQL"],
"missing_skills": ["Production RAG", "MLOps"]
},
"conclusion": "Candidate is a good fit with caveats",
"interview_questions": [
"Describe a production quality-monitoring case",
"How did you reduce matching errors?"
]
}
FAQ
How does the system find candidates?
Using HeadHunter vacancy data, structured filtering, and AI scoring with explicit explanation.
Can it fully replace HR?
No. It speeds up and structures selection, but final decisions remain with people.
What matters most for matching quality?
HH source-data quality, proper filters, and transparent reasoned scoring.
Can it run locally?
Yes, with local infrastructure and controlled access to candidate data.
How to reduce bias?
Use explainable scoring, remove sensitive features, audit outcomes, and keep human-in-the-loop.
Key Takeaways
1) AI recruiting = HH data + structured matching + explainable scoring.
2) Structured data and skill normalization are critical for accurate ranking.
3) Automation speeds up hiring but does not replace HR expertise.
4) Data quality is the main risk and the main lever, not the model alone.
Who this is for
HR teams, recruiting agencies, enterprise companies, and high-volume hiring startups that need faster hiring and transparent selection quality.
Contact via Telegram →