Rychagov S.
🎯 AI Recruiting · HeadHunter · Matching

Production AI Resume Search with HeadHunter: architecture, matching, and anti-bias pipeline

A practical breakdown of AI recruiting on hh.ru: from ingestion and resume normalization to explainable scoring and an HR-ready shortlist.

TL;DR

Production AI recruiting is not just a list of applicants for a vacancy.

A working system includes HeadHunter API ingestion, resume normalization, structured matching, and explainable scoring.

The current workflow includes authorization, vacancy updates, resume snapshots, batch AI analysis, and transparent candidate rating.

Key challenges: noisy resumes, inconsistent formats, outdated profiles, and bias risks in decision-making.

Core principle: AI accelerates recruiter decisions but does not replace human review.

What the system solves

Goal: find relevant candidates quickly and with clear reasoning across a large response pool.

Automated search

The system aggregates candidates by vacancy and prepares a single evaluation set.

Ranking

Candidates are sorted by a composite match score, not by one signal only.

Explainability

It shows why a candidate matches: aligned skills, experience, gaps, and recommendations.

Scale

Handles thousands of resumes with batch operations and saved datasets.

Real scenario: HR searches for “Senior Python + RAG + NLP”. The system retrieves relevant resumes, ranks them, explains the match, and highlights gaps.

Architecture Pipeline

Vacancy / Query
    ↓
Query Normalization
    ↓
Resume Ingestion (HeadHunter API)
    ↓
Resume Parsing
    ↓
Structured Extraction
    ↓
Candidate Filtering & Selection
    ↓
Scoring Engine
    ↓
LLM Explanation
    ↓
Ranked Candidates

This pipeline makes recruiting reproducible: each stage can be debugged and improved independently.

How the current implementation works

The system follows a step-by-step HR workflow:

1

Employer authorization

Connects a HeadHunter account via OAuth for secure access to vacancies and responses.

2

Vacancy refresh

Pulls current vacancies and key parameters for candidate matching.

3

Resume loading and snapshots

Resumes are loaded page by page and saved for reproducible analysis.

4

AI analysis and rating

Each candidate gets a score, strengths/weaknesses, skill match, and interview questions.

HeadHunter integration

HeadHunter API is integrated through OAuth and API calls for vacancies and resumes. The implementation handles:

1

OAuth authorization

Access token, refresh token, and user context handling.

2

Pagination

Resumes are fetched by pages (up to 50 per page) with staged saves.

3

Caching and snapshots

Data is stored as per-user JSON snapshots for repeatable analysis.

4

Data refresh

Vacancies and resumes can be refreshed and re-analyzed without manual rebuild.

Common source issues: incomplete profiles, mixed resume templates, outdated information, and no single skill standard.

Resume parsing and structured extraction

A critical stage is normalization of unstructured resume data into a unified candidate profile.

Challenges include free text, uneven detail, mixed formats, and partially filled sections.

Field Example
SkillsPython, SQL, RAG, NLP
Experience5 years (60 months)
RolesBackend Developer, ML Engineer
CompaniesCompany X, Company Y
EducationMSU, Applied Mathematics
LevelMiddle / Senior

Resume text vs structured fields

Free-form resume text alone is not enough for precise hiring decisions.

Production recruiting depends on structured fields: months of experience, expected salary, location, job-search status, skill stack, and role relevance.

Practical approach: first structure the candidate profile, then score and explain the result for HR.

Hybrid Retrieval

In this HH workflow, candidate selection combines several signals:

Vacancy source

Candidates are selected against a specific vacancy and its parameters from HeadHunter.

Structured filters

Experience, job-search status, location, salary expectations, and skills are considered.

AI interpretation

LLM analyzes the profile and produces an explainable fit assessment.

Final shortlist

Output is a ranked candidate list with reasons and interview recommendations.

Scoring / Matching Engine

The key differentiator is explainable scoring, not a black box.

Skill match:      0.8
Experience match: 0.6
Role match:       0.9
Education match:  0.3

Final score = w1*Skill + w2*Experience + w3*Role + w4*Education

Explainable output example: “Candidate fits because of 5 years of Python, proven NLP experience, and RAG practice; gap: limited MLOps evidence in production.”

LLM layer

LLM is used as an interpretation layer:

1) match and ranking explanation;

2) skill and term normalization;

3) query rewrite for better search precision;

4) candidate summary generation for HR.

Anti-bias and limitations

AI recruiting carries bias risks related to gender, age, company background, and education path.

Risk mitigation practices: exclude sensitive attributes from scoring, keep reasoning transparent, enforce human-in-the-loop, and run regular recommendation audits.

Important: the system assists hiring decisions; it should not automatically hire people.

Docker / Production contour

Recommended production setup:

HeadHunter API integration, recruiting business logic layer, snapshot storage, batch processing, and recruiter action audit.

Service separation supports stability, batch-analysis scaling, and SLA control.

Common implementation mistakes

1) Comparing candidates only by general impression without structured scoring.

2) Ignoring experience, seniority, and salary constraints.

3) No normalization for skills and role naming.

4) No explainability for HR decision flow.

5) Ignoring resume format diversity.

6) No feedback loop from real hiring outcomes.

Quality metrics

precision@k

Share of relevant candidates in top shortlist positions.

time-to-hire

How much the system reduces vacancy closing time.

match accuracy

How well final ranking correlates with expert HR/hiring-manager evaluation.

HR acceptance rate

Percentage of AI recommendations accepted to the next hiring stage.

JSON examples (API-ready)

Candidate analysis request

GET /dashboard/hh/?action=analyze_candidate&vacancy_id=131173772&resume_id=abc123

Explainable response fragment

{
  "overall_rating": 78,
  "skill_match": {
    "match_percentage": 82,
    "matched_skills": ["Python", "NLP", "SQL"],
    "missing_skills": ["Production RAG", "MLOps"]
  },
  "conclusion": "Candidate is a good fit with caveats",
  "interview_questions": [
    "Describe a production quality-monitoring case",
    "How did you reduce matching errors?"
  ]
}

FAQ

How does the system find candidates?

Using HeadHunter vacancy data, structured filtering, and AI scoring with explicit explanation.

Can it fully replace HR?

No. It speeds up and structures selection, but final decisions remain with people.

What matters most for matching quality?

HH source-data quality, proper filters, and transparent reasoned scoring.

Can it run locally?

Yes, with local infrastructure and controlled access to candidate data.

How to reduce bias?

Use explainable scoring, remove sensitive features, audit outcomes, and keep human-in-the-loop.

Key Takeaways

1) AI recruiting = HH data + structured matching + explainable scoring.

2) Structured data and skill normalization are critical for accurate ranking.

3) Automation speeds up hiring but does not replace HR expertise.

4) Data quality is the main risk and the main lever, not the model alone.

Who this is for

HR teams, recruiting agencies, enterprise companies, and high-volume hiring startups that need faster hiring and transparent selection quality.

Contact via Telegram