Phase 3判定ロジックをconductorエージェント+フォールバック戦略に分離

Phase 3でレビュアーエージェントが判定タグを出力せず新しい作業を開始する問題を解決。
判定専用のconductorエージェントと4段階フォールバック戦略(AutoSelect→ReportBased→ResponseBased→AgentConsult)を導入し、
ParallelRunnerのlastResponse未配線問題とJudgmentDetectorのアンダースコア対応も修正。
This commit is contained in:
nrslib 2026-02-05 11:34:23 +09:00
parent da53ef54fb
commit ed367f27df
17 changed files with 757 additions and 60 deletions

View File

@ -0,0 +1,47 @@
# Conductor Agent
You are a **judgment specialist agent**.
## Role
Read the provided information (report, agent response, or conversation log) and output **exactly one tag** corresponding to the judgment result.
## What to do
1. Review the information provided in the instruction (report/response/conversation log)
2. Identify the judgment result (APPROVE/REJECT, etc.) or work outcome from the information
3. Output the corresponding tag in one line according to the decision criteria table
4. **If you cannot determine, clearly state "Cannot determine"**
## What NOT to do
- Do NOT perform review work
- Do NOT use tools
- Do NOT check additional files or analyze code
- Do NOT modify or expand the provided information
## Output Format
### When determination is possible
Output only the judgment tag in one line. Example:
```
[ARCH-REVIEW:1]
```
### When determination is NOT possible
If any of the following applies, clearly state "Cannot determine":
- The provided information does not match any of the judgment criteria
- Multiple criteria may apply
- Insufficient information
Example output:
```
Cannot determine: Insufficient information
```
**Important:** Respect the result shown in the provided information as-is and output the corresponding tag number. If uncertain, do NOT guess - state "Cannot determine" instead.

View File

@ -0,0 +1,47 @@
# Conductor Agent
あなたは**判定専門エージェント**です。
## 役割
提供された情報(レポート、エージェントの応答、または会話ログ)を読み、判定結果に対応するタグを**1つだけ**出力します。
## やること
1. 指示に含まれる情報(レポート/応答/会話ログ)を確認
2. 情報に記載された判定結果APPROVE/REJECT等や作業結果を特定
3. 判定基準表に従い、対応するタグを1行で出力
4. **判断できない場合は明確に「判断できない」と伝える**
## やらないこと
- レビュー作業は行わない
- ツールは使用しない
- 追加のファイル確認やコード解析は不要
- 提供された情報の内容を変更・拡張しない
## 出力フォーマット
### 判定できる場合
判定タグのみを1行で出力してください。例:
```
[ARCH-REVIEW:1]
```
### 判定できない場合
以下の場合は「判断できない」と明確に出力してください:
- 提供された情報から判定基準のどれにも当てはまらない
- 複数の基準に該当する可能性がある
- 情報が不足している
出力例:
```
判断できない:情報が不足しています
```
**重要:** 提供された情報で示された結果をそのまま尊重し、対応するタグ番号を出力してください。不確実な場合は推測せず「判断できない」と伝えてください。

View File

@ -1047,6 +1047,7 @@ describe('instruction-builder', () => {
function createJudgmentContext(overrides: Partial<StatusJudgmentContext> = {}): StatusJudgmentContext { function createJudgmentContext(overrides: Partial<StatusJudgmentContext> = {}): StatusJudgmentContext {
return { return {
language: 'en', language: 'en',
reportContent: '# Test Report\n\nReport content for testing.',
...overrides, ...overrides,
}; };
} }
@ -1062,8 +1063,8 @@ describe('instruction-builder', () => {
const result = buildStatusJudgmentInstruction(step, ctx); const result = buildStatusJudgmentInstruction(step, ctx);
expect(result).toContain('Review your work results and determine the status'); expect(result).toContain('Review is already complete');
expect(result).toContain('Do NOT perform any additional work'); expect(result).toContain('Output exactly one tag corresponding to the judgment result');
}); });
it('should include header instruction (ja)', () => { it('should include header instruction (ja)', () => {
@ -1077,8 +1078,8 @@ describe('instruction-builder', () => {
const result = buildStatusJudgmentInstruction(step, ctx); const result = buildStatusJudgmentInstruction(step, ctx);
expect(result).toContain('作業結果を振り返り、ステータスを判定してください'); expect(result).toContain('既にレビューは完了しています');
expect(result).toContain('追加の作業は行わないでください'); expect(result).toContain('レポートで示された判定結果に対応するタグを1つだけ出力してください');
}); });
it('should include criteria table with tags', () => { it('should include criteria table with tags', () => {
@ -1132,11 +1133,11 @@ describe('instruction-builder', () => {
const step = createMinimalStep('Do work'); const step = createMinimalStep('Do work');
step.name = 'test'; step.name = 'test';
step.rules = [{ condition: 'Done', next: 'COMPLETE' }]; step.rules = [{ condition: 'Done', next: 'COMPLETE' }];
const ctx: StatusJudgmentContext = {}; const ctx: StatusJudgmentContext = { reportContent: 'Test report content' };
const result = buildStatusJudgmentInstruction(step, ctx); const result = buildStatusJudgmentInstruction(step, ctx);
expect(result).toContain('Review your work results'); expect(result).toContain('Review is already complete');
expect(result).toContain('## Decision Criteria'); expect(result).toContain('## Decision Criteria');
}); });

View File

@ -322,7 +322,7 @@ describe('Instruction Builder IT: buildStatusJudgmentInstruction', () => {
], ],
}); });
const result = buildStatusJudgmentInstruction(step, { language: 'en' }); const result = buildStatusJudgmentInstruction(step, { language: 'en', reportContent: 'Test report content' });
expect(result).toContain('[PLAN:'); expect(result).toContain('[PLAN:');
expect(result).toContain('Clear'); expect(result).toContain('Clear');

View File

@ -0,0 +1,70 @@
/**
* Test for JudgmentDetector
*/
import { describe, it, expect } from 'vitest';
import { JudgmentDetector } from '../core/piece/judgment/JudgmentDetector.js';
describe('JudgmentDetector', () => {
describe('detect', () => {
it('should detect tag in simple response', () => {
const result = JudgmentDetector.detect('[ARCH-REVIEW:1]');
expect(result.success).toBe(true);
expect(result.tag).toBe('[ARCH-REVIEW:1]');
});
it('should detect tag with surrounding text', () => {
const result = JudgmentDetector.detect('Based on the review, I choose [MOVEMENT:2] because...');
expect(result.success).toBe(true);
expect(result.tag).toBe('[MOVEMENT:2]');
});
it('should detect tag with hyphenated movement name', () => {
const result = JudgmentDetector.detect('[AI-ANTIPATTERN-REVIEW:1]');
expect(result.success).toBe(true);
expect(result.tag).toBe('[AI-ANTIPATTERN-REVIEW:1]');
});
it('should detect tag with underscored movement name', () => {
const result = JudgmentDetector.detect('[AI_REVIEW:1]');
expect(result.success).toBe(true);
expect(result.tag).toBe('[AI_REVIEW:1]');
});
it('should detect "判断できない" (Japanese)', () => {
const result = JudgmentDetector.detect('判断できない:情報が不足しています');
expect(result.success).toBe(false);
expect(result.reason).toBe('Conductor explicitly stated it cannot judge');
});
it('should detect "Cannot determine" (English)', () => {
const result = JudgmentDetector.detect('Cannot determine: Insufficient information');
expect(result.success).toBe(false);
expect(result.reason).toBe('Conductor explicitly stated it cannot judge');
});
it('should detect "unable to judge"', () => {
const result = JudgmentDetector.detect('I am unable to judge based on the provided information.');
expect(result.success).toBe(false);
expect(result.reason).toBe('Conductor explicitly stated it cannot judge');
});
it('should fail when no tag and no explicit "cannot judge"', () => {
const result = JudgmentDetector.detect('This is a response without a tag or explicit statement.');
expect(result.success).toBe(false);
expect(result.reason).toBe('No tag found and no explicit "cannot judge" statement');
});
it('should fail on empty response', () => {
const result = JudgmentDetector.detect('');
expect(result.success).toBe(false);
expect(result.reason).toBe('No tag found and no explicit "cannot judge" statement');
});
it('should detect first tag when multiple tags exist', () => {
const result = JudgmentDetector.detect('[MOVEMENT:1] or [MOVEMENT:2]');
expect(result.success).toBe(true);
expect(result.tag).toBe('[MOVEMENT:1]');
});
});
});

View File

@ -0,0 +1,137 @@
/**
* Test for Fallback Strategies
*/
import { describe, it, expect, vi, beforeEach } from 'vitest';
import type { PieceMovement } from '../core/models/types.js';
import type { JudgmentContext } from '../core/piece/judgment/FallbackStrategy.js';
import {
AutoSelectStrategy,
ReportBasedStrategy,
ResponseBasedStrategy,
AgentConsultStrategy,
JudgmentStrategyFactory,
} from '../core/piece/judgment/FallbackStrategy.js';
// Mock runAgent
vi.mock('../agents/runner.js', () => ({
runAgent: vi.fn(),
}));
describe('JudgmentStrategies', () => {
const mockStep: PieceMovement = {
name: 'test-movement',
agent: 'test-agent',
rules: [
{ description: 'Rule 1', condition: 'approved' },
{ description: 'Rule 2', condition: 'rejected' },
],
};
const mockContext: JudgmentContext = {
step: mockStep,
cwd: '/test/cwd',
language: 'en',
reportDir: '/test/reports',
lastResponse: 'Last response content',
sessionId: 'session-123',
};
beforeEach(() => {
vi.clearAllMocks();
});
describe('AutoSelectStrategy', () => {
it('should apply when step has only one rule', () => {
const singleRuleStep: PieceMovement = {
name: 'single-rule',
rules: [{ description: 'Only rule', condition: 'always' }],
};
const strategy = new AutoSelectStrategy();
expect(strategy.canApply({ ...mockContext, step: singleRuleStep })).toBe(true);
});
it('should not apply when step has multiple rules', () => {
const strategy = new AutoSelectStrategy();
expect(strategy.canApply(mockContext)).toBe(false);
});
it('should return auto-selected tag', async () => {
const singleRuleStep: PieceMovement = {
name: 'single-rule',
rules: [{ description: 'Only rule', condition: 'always' }],
};
const strategy = new AutoSelectStrategy();
const result = await strategy.execute({ ...mockContext, step: singleRuleStep });
expect(result.success).toBe(true);
expect(result.tag).toBe('[SINGLE-RULE:1]');
});
});
describe('ReportBasedStrategy', () => {
it('should apply when reportDir and report files are configured', () => {
const strategy = new ReportBasedStrategy();
const stepWithReport: PieceMovement = {
...mockStep,
report: 'review-report.md',
};
expect(strategy.canApply({ ...mockContext, step: stepWithReport })).toBe(true);
});
it('should not apply when reportDir is missing', () => {
const strategy = new ReportBasedStrategy();
expect(strategy.canApply({ ...mockContext, reportDir: undefined })).toBe(false);
});
it('should not apply when step has no report files configured', () => {
const strategy = new ReportBasedStrategy();
// mockStep has no report field → getReportFiles returns []
expect(strategy.canApply(mockContext)).toBe(false);
});
});
describe('ResponseBasedStrategy', () => {
it('should apply when lastResponse is provided', () => {
const strategy = new ResponseBasedStrategy();
expect(strategy.canApply(mockContext)).toBe(true);
});
it('should not apply when lastResponse is missing', () => {
const strategy = new ResponseBasedStrategy();
expect(strategy.canApply({ ...mockContext, lastResponse: undefined })).toBe(false);
});
it('should not apply when lastResponse is empty', () => {
const strategy = new ResponseBasedStrategy();
expect(strategy.canApply({ ...mockContext, lastResponse: '' })).toBe(false);
});
});
describe('AgentConsultStrategy', () => {
it('should apply when sessionId is provided', () => {
const strategy = new AgentConsultStrategy();
expect(strategy.canApply(mockContext)).toBe(true);
});
it('should not apply when sessionId is missing', () => {
const strategy = new AgentConsultStrategy();
expect(strategy.canApply({ ...mockContext, sessionId: undefined })).toBe(false);
});
it('should not apply when sessionId is empty', () => {
const strategy = new AgentConsultStrategy();
expect(strategy.canApply({ ...mockContext, sessionId: '' })).toBe(false);
});
});
describe('JudgmentStrategyFactory', () => {
it('should create strategies in correct order', () => {
const strategies = JudgmentStrategyFactory.createStrategies();
expect(strategies).toHaveLength(4);
expect(strategies[0]).toBeInstanceOf(AutoSelectStrategy);
expect(strategies[1]).toBeInstanceOf(ReportBasedStrategy);
expect(strategies[2]).toBeInstanceOf(ResponseBasedStrategy);
expect(strategies[3]).toBeInstanceOf(AgentConsultStrategy);
});
});
});

View File

@ -112,7 +112,7 @@ export class MovementExecutor {
updateAgentSession(sessionKey, response.sessionId); updateAgentSession(sessionKey, response.sessionId);
this.deps.onPhaseComplete?.(step, 1, 'execute', response.content, response.status, response.error); this.deps.onPhaseComplete?.(step, 1, 'execute', response.content, response.status, response.error);
const phaseCtx = this.deps.optionsBuilder.buildPhaseRunnerContext(state, updateAgentSession, this.deps.onPhaseStart, this.deps.onPhaseComplete); const phaseCtx = this.deps.optionsBuilder.buildPhaseRunnerContext(state, response.content, updateAgentSession, this.deps.onPhaseStart, this.deps.onPhaseComplete);
// Phase 2: report output (resume same session, Write only) // Phase 2: report output (resume same session, Write only)
if (step.report) { if (step.report) {

View File

@ -89,6 +89,7 @@ export class OptionsBuilder {
/** Build PhaseRunnerContext for Phase 2/3 execution */ /** Build PhaseRunnerContext for Phase 2/3 execution */
buildPhaseRunnerContext( buildPhaseRunnerContext(
state: PieceState, state: PieceState,
lastResponse: string | undefined,
updateAgentSession: (agent: string, sessionId: string | undefined) => void, updateAgentSession: (agent: string, sessionId: string | undefined) => void,
onPhaseStart?: (step: PieceMovement, phase: 1 | 2 | 3, phaseName: PhaseName, instruction: string) => void, onPhaseStart?: (step: PieceMovement, phase: 1 | 2 | 3, phaseName: PhaseName, instruction: string) => void,
onPhaseComplete?: (step: PieceMovement, phase: 1 | 2 | 3, phaseName: PhaseName, content: string, status: string, error?: string) => void, onPhaseComplete?: (step: PieceMovement, phase: 1 | 2 | 3, phaseName: PhaseName, content: string, status: string, error?: string) => void,
@ -98,6 +99,7 @@ export class OptionsBuilder {
reportDir: join(this.getProjectCwd(), this.getReportDir()), reportDir: join(this.getProjectCwd(), this.getReportDir()),
language: this.getLanguage(), language: this.getLanguage(),
interactive: this.engineOptions.interactive, interactive: this.engineOptions.interactive,
lastResponse,
getSessionId: (agent: string) => state.agentSessions.get(agent), getSessionId: (agent: string) => state.agentSessions.get(agent),
buildResumeOptions: this.buildResumeOptions.bind(this), buildResumeOptions: this.buildResumeOptions.bind(this),
updateAgentSession, updateAgentSession,

View File

@ -74,7 +74,6 @@ export class ParallelRunner {
}) })
: undefined; : undefined;
const phaseCtx = this.deps.optionsBuilder.buildPhaseRunnerContext(state, updateAgentSession, this.deps.onPhaseStart, this.deps.onPhaseComplete);
const ruleCtx = { const ruleCtx = {
state, state,
cwd: this.deps.getCwd(), cwd: this.deps.getCwd(),
@ -103,6 +102,9 @@ export class ParallelRunner {
updateAgentSession(subSessionKey, subResponse.sessionId); updateAgentSession(subSessionKey, subResponse.sessionId);
this.deps.onPhaseComplete?.(subMovement, 1, 'execute', subResponse.content, subResponse.status, subResponse.error); this.deps.onPhaseComplete?.(subMovement, 1, 'execute', subResponse.content, subResponse.status, subResponse.error);
// Build phase context for this sub-movement with its lastResponse
const phaseCtx = this.deps.optionsBuilder.buildPhaseRunnerContext(state, subResponse.content, updateAgentSession, this.deps.onPhaseStart, this.deps.onPhaseComplete);
// Phase 2: report output for sub-movement // Phase 2: report output for sub-movement
if (subMovement.report) { if (subMovement.report) {
await runReportPhase(subMovement, subIteration, phaseCtx); await runReportPhase(subMovement, subIteration, phaseCtx);

View File

@ -3,6 +3,7 @@
*/ */
import type { PieceMovement } from '../../models/types.js'; import type { PieceMovement } from '../../models/types.js';
import { isReportObjectConfig } from '../instruction/InstructionBuilder.js';
/** /**
* Check whether a movement has tag-based rules (i.e., rules that require * Check whether a movement has tag-based rules (i.e., rules that require
@ -16,3 +17,33 @@ export function hasTagBasedRules(step: PieceMovement): boolean {
const allNonTagConditions = step.rules.every((r) => r.isAiCondition || r.isAggregateCondition); const allNonTagConditions = step.rules.every((r) => r.isAiCondition || r.isAggregateCondition);
return !allNonTagConditions; return !allNonTagConditions;
} }
/**
* Check if a movement has only one branch (automatic selection possible).
* Returns true when rules.length === 1, meaning no actual choice is needed.
*/
export function hasOnlyOneBranch(step: PieceMovement): boolean {
return step.rules !== undefined && step.rules.length === 1;
}
/**
* Get the auto-selected tag when there's only one branch.
* Returns the tag for the first rule (e.g., "[MOVEMENT:1]").
*/
export function getAutoSelectedTag(step: PieceMovement): string {
if (!hasOnlyOneBranch(step)) {
throw new Error('Cannot auto-select tag when multiple branches exist');
}
return `[${step.name.toUpperCase()}:1]`;
}
/**
* Get report file names from a movement's report configuration.
* Handles all three report config formats: string, ReportObjectConfig, and ReportConfig[].
*/
export function getReportFiles(report: PieceMovement['report']): string[] {
if (!report) return [];
if (typeof report === 'string') return [report];
if (isReportObjectConfig(report)) return [report.name];
return report.map((rc) => rc.path);
}

View File

@ -1,8 +1,9 @@
/** /**
* Phase 3 instruction builder (status judgment) * Phase 3 instruction builder (status judgment)
* *
* Resumes the agent session and asks it to evaluate its work * Builds instructions for the conductor agent to evaluate work results
* and output the appropriate status tag. No tools are allowed. * and output the appropriate status tag. Supports report-based and
* response-based input sources.
* *
* Renders a single complete template combining the judgment header * Renders a single complete template combining the judgment header
* and status rules (criteria table + output format). * and status rules (criteria table + output format).
@ -20,6 +21,12 @@ export interface StatusJudgmentContext {
language?: Language; language?: Language;
/** Whether interactive-only rules are enabled */ /** Whether interactive-only rules are enabled */
interactive?: boolean; interactive?: boolean;
/** Pre-read report content (from gatherInput) */
reportContent?: string;
/** Last response from Phase 1 (from gatherInput) */
lastResponse?: string;
/** Input source type for fallback strategies */
inputSource?: 'report' | 'response';
} }
/** /**
@ -47,11 +54,42 @@ export class StatusJudgmentBuilder {
{ interactive: this.context.interactive }, { interactive: this.context.interactive },
); );
// 情報源に応じた内容を構築
const inputSource = this.context.inputSource || 'report';
let contentToJudge = '';
if (inputSource === 'report') {
contentToJudge = this.buildFromReport();
} else if (inputSource === 'response') {
contentToJudge = this.buildFromResponse();
}
return loadTemplate('perform_phase3_message', language, { return loadTemplate('perform_phase3_message', language, {
reportContent: contentToJudge,
criteriaTable: components.criteriaTable, criteriaTable: components.criteriaTable,
outputList: components.outputList, outputList: components.outputList,
hasAppendix: components.hasAppendix, hasAppendix: components.hasAppendix,
appendixContent: components.appendixContent, appendixContent: components.appendixContent,
}); });
} }
/**
* Build judgment content from pre-read report content.
*/
private buildFromReport(): string {
if (!this.context.reportContent) {
throw new Error('reportContent is required for report-based judgment');
}
return this.context.reportContent;
}
/**
* Build judgment content from last response.
*/
private buildFromResponse(): string {
if (!this.context.lastResponse) {
throw new Error('lastResponse is required for response-based judgment');
}
return `\n## Agent Response\n\n${this.context.lastResponse}`;
}
} }

View File

@ -0,0 +1,255 @@
/**
* Fallback strategies for Phase 3 judgment.
*
* Implements Chain of Responsibility pattern to try multiple judgment methods
* when conductor cannot determine the status from report alone.
*/
import { readFileSync } from 'node:fs';
import { resolve } from 'node:path';
import type { PieceMovement, Language } from '../../models/types.js';
import { runAgent } from '../../../agents/runner.js';
import { StatusJudgmentBuilder } from '../instruction/StatusJudgmentBuilder.js';
import { JudgmentDetector, type JudgmentResult } from './JudgmentDetector.js';
import { hasOnlyOneBranch, getAutoSelectedTag, getReportFiles } from '../evaluation/rule-utils.js';
import { createLogger } from '../../../shared/utils/index.js';
const log = createLogger('fallback-strategy');
export interface JudgmentContext {
step: PieceMovement;
cwd: string;
language?: Language;
reportDir?: string;
lastResponse?: string; // Phase 1の最終応答
sessionId?: string;
}
export interface JudgmentStrategy {
readonly name: string;
canApply(context: JudgmentContext): boolean;
execute(context: JudgmentContext): Promise<JudgmentResult>;
}
/**
* Base class for judgment strategies using Template Method Pattern.
*/
abstract class JudgmentStrategyBase implements JudgmentStrategy {
abstract readonly name: string;
abstract canApply(context: JudgmentContext): boolean;
async execute(context: JudgmentContext): Promise<JudgmentResult> {
try {
// 1. 情報収集(サブクラスで実装)
const input = await this.gatherInput(context);
// 2. 指示生成(サブクラスで実装)
const instruction = this.buildInstruction(input, context);
// 3. conductor実行共通
const response = await this.runConductor(instruction, context);
// 4. 結果検出(共通)
return JudgmentDetector.detect(response);
} catch (error) {
const errorMsg = error instanceof Error ? error.message : String(error);
log.debug(`Strategy ${this.name} threw error`, { error: errorMsg });
return {
success: false,
reason: `Strategy failed with error: ${errorMsg}`,
};
}
}
protected abstract gatherInput(context: JudgmentContext): Promise<string>;
protected abstract buildInstruction(input: string, context: JudgmentContext): string;
protected async runConductor(instruction: string, context: JudgmentContext): Promise<string> {
const response = await runAgent('conductor', instruction, {
cwd: context.cwd,
allowedTools: [],
maxTurns: 3,
language: context.language,
});
if (response.status !== 'done') {
throw new Error(`Conductor failed: ${response.error || response.content || 'Unknown error'}`);
}
return response.content;
}
}
/**
* Strategy 1: Auto-select when there's only one branch.
* This strategy doesn't use conductor - just returns the single tag.
*/
export class AutoSelectStrategy implements JudgmentStrategy {
readonly name = 'AutoSelect';
canApply(context: JudgmentContext): boolean {
return hasOnlyOneBranch(context.step);
}
async execute(context: JudgmentContext): Promise<JudgmentResult> {
const tag = getAutoSelectedTag(context.step);
log.debug('Auto-selected tag (single branch)', { tag });
return {
success: true,
tag,
};
}
}
/**
* Strategy 2: Report-based judgment.
* Read report files and ask conductor to judge.
*/
export class ReportBasedStrategy extends JudgmentStrategyBase {
readonly name = 'ReportBased';
canApply(context: JudgmentContext): boolean {
return context.reportDir !== undefined && getReportFiles(context.step.report).length > 0;
}
protected async gatherInput(context: JudgmentContext): Promise<string> {
if (!context.reportDir) {
throw new Error('Report directory not provided');
}
const reportFiles = getReportFiles(context.step.report);
if (reportFiles.length === 0) {
throw new Error('No report files configured');
}
const reportContents: string[] = [];
for (const fileName of reportFiles) {
const filePath = resolve(context.reportDir, fileName);
try {
const content = readFileSync(filePath, 'utf-8');
reportContents.push(`# ${fileName}\n\n${content}`);
} catch (error) {
const errorMsg = error instanceof Error ? error.message : String(error);
throw new Error(`Failed to read report file ${fileName}: ${errorMsg}`);
}
}
return reportContents.join('\n\n---\n\n');
}
protected buildInstruction(input: string, context: JudgmentContext): string {
return new StatusJudgmentBuilder(context.step, {
language: context.language,
reportContent: input,
inputSource: 'report',
}).build();
}
}
/**
* Strategy 3: Response-based judgment.
* Use the last response from Phase 1 to judge.
*/
export class ResponseBasedStrategy extends JudgmentStrategyBase {
readonly name = 'ResponseBased';
canApply(context: JudgmentContext): boolean {
return context.lastResponse !== undefined && context.lastResponse.length > 0;
}
protected async gatherInput(context: JudgmentContext): Promise<string> {
if (!context.lastResponse) {
throw new Error('Last response not provided');
}
return context.lastResponse;
}
protected buildInstruction(input: string, context: JudgmentContext): string {
return new StatusJudgmentBuilder(context.step, {
language: context.language,
lastResponse: input,
inputSource: 'response',
}).build();
}
}
/**
* Strategy 4: Agent consult.
* Resume the Phase 1 agent session and ask which tag is appropriate.
*/
export class AgentConsultStrategy implements JudgmentStrategy {
readonly name = 'AgentConsult';
canApply(context: JudgmentContext): boolean {
return context.sessionId !== undefined && context.sessionId.length > 0;
}
async execute(context: JudgmentContext): Promise<JudgmentResult> {
if (!context.sessionId) {
return {
success: false,
reason: 'Session ID not provided',
};
}
try {
const question = this.buildQuestion(context);
const response = await runAgent(context.step.agent ?? context.step.name, question, {
cwd: context.cwd,
sessionId: context.sessionId,
maxTurns: 3,
language: context.language,
});
if (response.status !== 'done') {
return {
success: false,
reason: `Agent consultation failed: ${response.error || 'Unknown error'}`,
};
}
return JudgmentDetector.detect(response.content);
} catch (error) {
const errorMsg = error instanceof Error ? error.message : String(error);
log.debug('Agent consult strategy failed', { error: errorMsg });
return {
success: false,
reason: `Agent consultation error: ${errorMsg}`,
};
}
}
private buildQuestion(context: JudgmentContext): string {
const rules = context.step.rules || [];
const ruleDescriptions = rules.map((rule, idx) => {
const tag = `[${context.step.name.toUpperCase()}:${idx + 1}]`;
const desc = rule.condition || `Rule ${idx + 1}`;
return `- ${tag}: ${desc}`;
}).join('\n');
const lang = context.language || 'en';
if (lang === 'ja') {
return `あなたの作業結果に基づいて、以下の判定タグのうちどれが適切か教えてください:\n\n${ruleDescriptions}\n\n該当するタグを1つだけ出力してください例: [${context.step.name.toUpperCase()}:1])。`;
} else {
return `Based on your work, which of the following judgment tags is appropriate?\n\n${ruleDescriptions}\n\nPlease output only one tag (e.g., [${context.step.name.toUpperCase()}:1]).`;
}
}
}
/**
* Factory for creating judgment strategies in order of priority.
*/
export class JudgmentStrategyFactory {
static createStrategies(): JudgmentStrategy[] {
return [
new AutoSelectStrategy(),
new ReportBasedStrategy(),
new ResponseBasedStrategy(),
new AgentConsultStrategy(),
];
}
}

View File

@ -0,0 +1,45 @@
/**
* Detect judgment result from conductor's response.
*/
export interface JudgmentResult {
success: boolean;
tag?: string; // e.g., "[ARCH-REVIEW:1]"
reason?: string;
}
export class JudgmentDetector {
private static readonly TAG_PATTERN = /\[([A-Z_-]+):(\d+)\]/;
private static readonly CANNOT_JUDGE_PATTERNS = [
/判断できない/i,
/cannot\s+determine/i,
/unable\s+to\s+judge/i,
/insufficient\s+information/i,
];
static detect(response: string): JudgmentResult {
// 1. タグ検出
const tagMatch = response.match(this.TAG_PATTERN);
if (tagMatch) {
return {
success: true,
tag: tagMatch[0], // e.g., "[ARCH-REVIEW:1]"
};
}
// 2. 「判断できない」検出
for (const pattern of this.CANNOT_JUDGE_PATTERNS) {
if (pattern.test(response)) {
return {
success: false,
reason: 'Conductor explicitly stated it cannot judge',
};
}
}
// 3. タグも「判断できない」もない → 失敗
return {
success: false,
reason: 'No tag found and no explicit "cannot judge" statement',
};
}
}

View File

@ -0,0 +1,18 @@
/**
* Judgment module exports
*/
export {
JudgmentDetector,
type JudgmentResult,
} from './JudgmentDetector.js';
export {
AutoSelectStrategy,
ReportBasedStrategy,
ResponseBasedStrategy,
AgentConsultStrategy,
JudgmentStrategyFactory,
type JudgmentContext,
type JudgmentStrategy,
} from './FallbackStrategy.js';

View File

@ -11,9 +11,8 @@ import type { PieceMovement, Language } from '../models/types.js';
import type { PhaseName } from './types.js'; import type { PhaseName } from './types.js';
import { runAgent, type RunAgentOptions } from '../../agents/runner.js'; import { runAgent, type RunAgentOptions } from '../../agents/runner.js';
import { ReportInstructionBuilder } from './instruction/ReportInstructionBuilder.js'; import { ReportInstructionBuilder } from './instruction/ReportInstructionBuilder.js';
import { StatusJudgmentBuilder } from './instruction/StatusJudgmentBuilder.js'; import { hasTagBasedRules, getReportFiles } from './evaluation/rule-utils.js';
import { hasTagBasedRules } from './evaluation/rule-utils.js'; import { JudgmentStrategyFactory, type JudgmentContext } from './judgment/index.js';
import { isReportObjectConfig } from './instruction/InstructionBuilder.js';
import { createLogger } from '../../shared/utils/index.js'; import { createLogger } from '../../shared/utils/index.js';
const log = createLogger('phase-runner'); const log = createLogger('phase-runner');
@ -27,6 +26,8 @@ export interface PhaseRunnerContext {
language?: Language; language?: Language;
/** Whether interactive-only rules are enabled */ /** Whether interactive-only rules are enabled */
interactive?: boolean; interactive?: boolean;
/** Last response from Phase 1 */
lastResponse?: string;
/** Get agent session ID */ /** Get agent session ID */
getSessionId: (agent: string) => string | undefined; getSessionId: (agent: string) => string | undefined;
/** Build resume options for a movement */ /** Build resume options for a movement */
@ -47,12 +48,6 @@ export function needsStatusJudgmentPhase(step: PieceMovement): boolean {
return hasTagBasedRules(step); return hasTagBasedRules(step);
} }
function getReportFiles(report: PieceMovement['report']): string[] {
if (!report) return [];
if (typeof report === 'string') return [report];
if (isReportObjectConfig(report)) return [report.name];
return report.map((rc) => rc.path);
}
function writeReportFile(reportDir: string, fileName: string, content: string): void { function writeReportFile(reportDir: string, fileName: string, content: string): void {
const baseDir = resolve(reportDir); const baseDir = resolve(reportDir);
@ -152,53 +147,54 @@ export async function runReportPhase(
/** /**
* Phase 3: Status judgment. * Phase 3: Status judgment.
* Resumes the agent session with no tools to ask the agent to output a status tag. * Uses the 'conductor' agent in a new session to output a status tag.
* Implements multi-stage fallback logic to ensure judgment succeeds.
* Returns the Phase 3 response content (containing the status tag). * Returns the Phase 3 response content (containing the status tag).
*/ */
export async function runStatusJudgmentPhase( export async function runStatusJudgmentPhase(
step: PieceMovement, step: PieceMovement,
ctx: PhaseRunnerContext, ctx: PhaseRunnerContext,
): Promise<string> { ): Promise<string> {
log.debug('Running status judgment phase', { movement: step.name });
// フォールバック戦略を順次試行AutoSelectStrategy含む
const strategies = JudgmentStrategyFactory.createStrategies();
const sessionKey = step.agent ?? step.name; const sessionKey = step.agent ?? step.name;
const sessionId = ctx.getSessionId(sessionKey); const judgmentContext: JudgmentContext = {
if (!sessionId) { step,
throw new Error(`Status judgment phase requires a session to resume, but no sessionId found for agent "${sessionKey}" in movement "${step.name}"`); cwd: ctx.cwd,
language: ctx.language,
reportDir: ctx.reportDir,
lastResponse: ctx.lastResponse,
sessionId: ctx.getSessionId(sessionKey),
};
for (const strategy of strategies) {
if (!strategy.canApply(judgmentContext)) {
log.debug(`Strategy ${strategy.name} not applicable, skipping`);
continue;
} }
log.debug('Running status judgment phase', { movement: step.name, sessionId }); log.debug(`Trying strategy: ${strategy.name}`);
ctx.onPhaseStart?.(step, 3, 'judge', `Strategy: ${strategy.name}`);
const judgmentInstruction = new StatusJudgmentBuilder(step, {
language: ctx.language,
interactive: ctx.interactive,
}).build();
ctx.onPhaseStart?.(step, 3, 'judge', judgmentInstruction);
const judgmentOptions = ctx.buildResumeOptions(step, sessionId, {
allowedTools: [],
maxTurns: 3,
});
let judgmentResponse;
try { try {
judgmentResponse = await runAgent(step.agent, judgmentInstruction, judgmentOptions); const result = await strategy.execute(judgmentContext);
if (result.success) {
log.debug(`Strategy ${strategy.name} succeeded`, { tag: result.tag });
ctx.onPhaseComplete?.(step, 3, 'judge', result.tag!, 'done');
return result.tag!;
}
log.debug(`Strategy ${strategy.name} failed`, { reason: result.reason });
} catch (error) { } catch (error) {
const errorMsg = error instanceof Error ? error.message : String(error); const errorMsg = error instanceof Error ? error.message : String(error);
log.debug(`Strategy ${strategy.name} threw error`, { error: errorMsg });
}
}
// 全戦略失敗
const errorMsg = 'All judgment strategies failed';
ctx.onPhaseComplete?.(step, 3, 'judge', '', 'error', errorMsg); ctx.onPhaseComplete?.(step, 3, 'judge', '', 'error', errorMsg);
throw error; throw new Error(errorMsg);
}
// Check for errors in status judgment phase
if (judgmentResponse.status !== 'done') {
const errorMsg = judgmentResponse.error || judgmentResponse.content || 'Unknown error';
ctx.onPhaseComplete?.(step, 3, 'judge', judgmentResponse.content, judgmentResponse.status, errorMsg);
throw new Error(`Status judgment phase failed: ${errorMsg}`);
}
// Update session (phase 3 may update it)
ctx.updateAgentSession(sessionKey, judgmentResponse.sessionId);
ctx.onPhaseComplete?.(step, 3, 'judge', judgmentResponse.content, judgmentResponse.status);
log.debug('Status judgment phase complete', { movement: step.name, status: judgmentResponse.status });
return judgmentResponse.content;
} }

View File

@ -1,10 +1,12 @@
<!-- <!--
template: perform_phase3_message template: perform_phase3_message
phase: 3 (status judgment) phase: 3 (status judgment)
vars: criteriaTable, outputList, hasAppendix, appendixContent vars: reportContent, criteriaTable, outputList, hasAppendix, appendixContent
builder: StatusJudgmentBuilder builder: StatusJudgmentBuilder
--> -->
Review your work results and determine the status. Do NOT perform any additional work. **Review is already complete. Output exactly one tag corresponding to the judgment result shown in the report below.**
{{reportContent}}
## Decision Criteria ## Decision Criteria
@ -12,6 +14,8 @@ Review your work results and determine the status. Do NOT perform any additional
## Output Format ## Output Format
**Output the tag corresponding to the judgment shown in the report in one line:**
{{outputList}} {{outputList}}
{{#if hasAppendix}} {{#if hasAppendix}}

View File

@ -1,10 +1,12 @@
<!-- <!--
template: perform_phase3_message template: perform_phase3_message
phase: 3 (status judgment) phase: 3 (status judgment)
vars: criteriaTable, outputList, hasAppendix, appendixContent vars: reportContent, criteriaTable, outputList, hasAppendix, appendixContent
builder: StatusJudgmentBuilder builder: StatusJudgmentBuilder
--> -->
作業結果を振り返り、ステータスを判定してください。追加の作業は行わないでください。 **既にレビューは完了しています。以下のレポートで示された判定結果に対応するタグを1つだけ出力してください。**
{{reportContent}}
## 判定基準 ## 判定基準
@ -12,6 +14,8 @@
## 出力フォーマット ## 出力フォーマット
**レポートで示した判定に対応するタグを1行で出力してください**
{{outputList}} {{outputList}}
{{#if hasAppendix}} {{#if hasAppendix}}