Learn LlamaIndex: Agents and Tools

Background

LlamaIndex is an exceptional open-source project that offers a wide range of capabilities. It serves as a valuable resource for practitioners and developers seeking to enhance their Large Language Models (LLMs) with versatile tools and data. Designed to foster seamless integration between custom data sources and LLM applications, LlamaIndex represents a simple yet highly flexible data framework.Driven by the rapidly growing demand for sophisticated language processing models, LlamaIndex aims to empower users by facilitating effortless data retrieval and utilization. With LlamaIndex, developers can streamline their workflow and enrich their language models with a diverse range of data sources.

However, as the content of LlamaIndex is extensive and diverse on a large scale, it can be challenging to grasp the entire project. Therefore, I am interested in learning LlamaIndex through its source code.

In this blog series, I will document my learning experience with llamaindex. Today I will focus on Tools and Agents.

Tools

Having proper tool abstractions is crucial when it comes to building effective data agents. In the context of agents, tools are designed as a set of abstractions similar to API interfaces, but specifically intended for agent use rather than for humans. These tools allow users to perform various functions and interact with the agent in a structured manner.

Tool is implemented by defining a special method called “call”. This method acts as the entry point for the tool and handles the relevant functionality. Additionally, a tool also provides some basic metadata such as name, description, and function schema. This metadata helps users understand the purpose and capabilities of each tool.

However, defining a single tool may not always be sufficient for complex tasks. That’s where Tool Specs come into play. A Tool Spec allows users to define a complete API specification for a service, which can then be converted into a list of individual tools. By encapsulating multiple tools under a common specification, Tool Specs provide a more structured and organized approach to interacting with complex agent functionalities.

With a Tool Spec, users can easily access and utilize a range of tools that collectively serve a specific purpose or provide a comprehensive set of functionalities. This ensures flexibility and convenience when working with diverse agent capabilities or complex data tasks.

Tool Example


def add_numbers(x: int, y: int) -> int:
    """
    Adds the two numbers together and returns the result.
    """
    return x + y


function_tool = FunctionTool.from_defaults(fn=add_numbers)

ToolSpec Example

class CodeInterpreterToolSpec(BaseToolSpec):
    """Code Interpreter tool spec.

    WARNING: This tool provides the Agent access to the `subprocess.run` command.
    Arbitrary code execution is possible on the machine running this tool.
    This tool is not recommended to be used in a production setting, and would require heavy sandboxing or virtual machines

    """

    spec_functions = ["code_interpreter"]

    def code_interpreter(self, code: str):
        """
        A function to execute python code, and return the stdout and stderr

        You should import any libraries that you wish to use. You have access to any libraries the user has installed.

        The code passed to this functuon is executed in isolation. It should be complete at the time it is passed to this function.

        You should interpret the output and errors returned from this function, and attempt to fix any problems.
        If you cannot fix the error, show the code to the user and ask for help

        It is not possible to return graphics or other complicated data from this function. If the user cannot see the output, save it to a file and tell the user.
        """
        result = subprocess.run(
            [sys.executable, "-c", code], stdout=subprocess.PIPE, stderr=subprocess.PIPE
        )
        return f"StdOut:\n{result.stdout}\nStdErr:\n{result.stderr}"

In this example, we have a toolspec that defines a set of spec functions. To expose the required functions to agents, we have used a list to define them. It is crucial for agents to understand the functionality and necessary arguments of each function, which is why the docstring plays a critical role. The docstring provides a clear description of what the function does and the arguments it requires, aiding agents in their understanding.

How to use Tool and ToolSpec

Llamaindex will parse tool specs and generate tool metadata for you. You can use the tool directly in the agent.

code_spec = CodeInterpreterToolSpec()

agent = OpenAIAgent.from_tools(code_spec.to_tool_list())

Source code understanding

In the source code, we can see that the tool spec is parsed into a list of tools. Then we can use the tool list to initialize the agent.

from llama_index.tools.utils import create_schema_from_function
class BaseToolSpec:
    """Base tool spec class."""

    # list of functions that you'd want to convert to spec
    spec_functions: List[Union[str, Tuple[str, str]]]

    def get_fn_schema_from_fn_name(self, fn_name: str) -> Optional[Type[BaseModel]]:
        """Return map from function name.

        Return type is Optional, meaning that the schema can be None.
        In this case, it's up to the downstream tool implementation to infer the schema.

        """
        for fn in self.spec_functions:
            if fn == fn_name:
                return create_schema_from_function(fn_name, getattr(self, fn_name))

        raise ValueError(f"Invalid function name: {fn_name}")

    def get_metadata_from_fn_name(self, fn_name: str) -> Optional[ToolMetadata]:
        """Return map from function name.

        Return type is Optional, meaning that the schema can be None.
        In this case, it's up to the downstream tool implementation to infer the schema.

        """
        try:
            func = getattr(self, fn_name)
        except AttributeError:
            return None
        name = fn_name
        docstring = func.__doc__ or ""
        description = f"{name}{signature(func)}\n{docstring}"
        fn_schema = self.get_fn_schema_from_fn_name(fn_name)
        return ToolMetadata(name=name, description=description, fn_schema=fn_schema)

    def to_tool_list(
        self,
        func_to_metadata_mapping: Optional[Dict[str, ToolMetadata]] = None,
    ) -> List[FunctionTool]:
        """Convert tool spec to list of tools."""
        func_to_metadata_mapping = func_to_metadata_mapping or {}
        tool_list = []
        for func_spec in self.spec_functions:
            func_sync = None
            func_async = None
            if isinstance(func_spec, str):
                func = getattr(self, func_spec)
                if asyncio.iscoroutinefunction(func):
                    func_async = func
                else:
                    func_sync = func
                metadata = func_to_metadata_mapping.get(func_spec, None)
                if metadata is None:
                    metadata = self.get_metadata_from_fn_name(func_spec)
            ...

            ...

            tool = FunctionTool.from_defaults(
                fn=func_sync,
                async_fn=func_async,
                tool_metadata=metadata,
            )
            tool_list.append(tool)
        return tool_list

Here fn_schema is a pydantic model that defines the input and output of the function. In this doc explains what is pydantic model and it provide a easy way to dump json format data.

Inside llama_index/tools/utils.py we can understand create_schema_from_function logic.

Agents

In the realm of advanced language models, one crucial component that drives efficient and accurate results is an “agent.” Agents serve as the backbone of automated reasoning and decision-making processes, providing a bridge between user queries and the execution of tasks. Through their ability to break down complex questions, select external tools and parameters, plan out tasks, and utilize memory modules, agents streamline the interaction between users and language models. In this blog post, we will delve deeper into the key components of agents and explore how they facilitate the seamless functioning of the LlamaIndex.

Choosing an external tool and defining parameters. Another vital responsibility of agents is to determine the most appropriate external tool to employ for executing user queries. Based on the requirements of a particular task, agents evaluate a range of tools and select the one that will yield the best results. Additionally, agents are equipped with the intelligence to derive optimal parameters for these tools, fine-tuning their usage for each unique situation. This decision-making process is critical in ensuring the effectiveness and efficiency of the chosen tools.

Agents excel in their ability to strategize and plan out a sequence of tasks necessary to accomplish a user’s query. Drawing upon their knowledge and understanding of various tools and methodologies, agents devise a plan of action that will yield the best results. This planning process is crucial in ensuring that the agent can execute the task in the most efficient manner possible.

AgentWorker and AgentRunner

LlamaIndex “agents” are composed of AgentRunner objects that interact with AgentWorkers:

  • AgentRunners are orchestrators that store state (including conversational memory), create and maintain tasks, run steps through each task, and offer the user-facing, high-level interface for users to interact with.
  • AgentWorkers control the step-wise execution of a Task. Given an input step, an agent worker is responsible for generating the next step. They can be initialized with parameters and act upon state passed down from the Task/TaskStep objects, but do not inherently store state themselves. The outer AgentRunner is responsible for calling an AgentWorker and collecting/aggregating the results.

Task and Step

Here Llamaindex follow agent protocol.

  • Task: high-level task, takes in a user query + passes along other info like memory
  • TaskStep: represents a single step. Feed this in as input to AgentWorker, get back a TaskStepOutput. Completing a Task can involve multiple TaskStep.
  • TaskStepOutput: Output from a given step execution. Outputs whether or not a task is done.

class TaskStep(BaseModel):
    task_id: str
    step_id: str
    input: Optional[str] = None
    step_state: Dict[str, Any] = {}
    next_steps: Dict[str, "TaskStep"] = {}
    prev_steps: Dict[str, "TaskStep"] = {}
    is_ready: bool = True

    def get_next_step(self, step_id: str, input: Optional[str] = None, step_state: Optional[Dict[str, Any]] = None) -> "TaskStep":
        return TaskStep(task_id=self.task_id, step_id=step_id, input=input, step_state=step_state or self.step_state)

    def link_step(self, next_step: "TaskStep") -> None:
        self.next_steps[next_step.step_id] = next_step
        next_step.prev_steps[self.step_id] = self

class TaskStepOutput(BaseModel):
    output: Any
    task_step: TaskStep
    next_steps: List[TaskStep]
    is_last: bool = False

class Task(BaseModel):
    task_id: str
    input: str
    memory: Any
    callback_manager: Any
    extra_state: Dict[str, Any] = {}

TaskStep is a link list, it has next_steps and prev_steps. It is a way to represent the flow of the task.

AgentWorker

code

class BaseAgentWorker(PromptMixin):
    """Base agent worker."""

    @abstractmethod
    def initialize_step(self, task: Task, **kwargs: Any) -> TaskStep:
        """Initialize step from task."""

    @abstractmethod
    def run_step(self, step: TaskStep, task: Task, **kwargs: Any) -> TaskStepOutput:
        """Run step."""


    @abstractmethod
    def finalize_task(self, task: Task, **kwargs: Any) -> None:
        """Finalize task, after all the steps are completed."""

AgentRunner


class BaseAgentRunner(BaseAgent, ABC):
    @abstractmethod
    def create_task(self, input: str, **kwargs: Any) -> Task:
        pass

    @abstractmethod
    def delete_task(self, task_id: str) -> None:
        pass

    @abstractmethod
    def list_tasks(self, **kwargs: Any) -> List[Task]:
        pass

    @abstractmethod
    def get_task(self, task_id: str, **kwargs: Any) -> Task:
        pass

    @abstractmethod
    def get_upcoming_steps(self, task_id: str, **kwargs: Any) -> List[TaskStep]:
        pass

    @abstractmethod
    def get_completed_steps(self, task_id: str, **kwargs: Any) -> List[TaskStepOutput]:
        pass

    def get_completed_step(self, task_id: str, step_id: str, **kwargs: Any) -> TaskStepOutput:
        completed_steps = self.get_completed_steps(task_id, **kwargs)
        for step_output in completed_steps:
            if step_output.task_step.step_id == step_id:
                return step_output
        raise ValueError(f"Could not find step_id: {step_id}")

    @abstractmethod
    def run_step(self, task_id: str, input: Optional[str] = None, step: Optional[TaskStep] = None, **kwargs: Any) -> TaskStepOutput:
        # here call LLM to run step and get next step

    @abstractmethod
    def finalize_response(self, task_id: str, step_output: Optional[TaskStepOutput] = None) -> Any:
        pass

    @abstractmethod
    def undo_step(self, task_id: str) -> None:
        pass

class TaskState(BaseModel):
    """Task state."""

    task: Task = Field(..., description="Task.")
    step_queue: Deque[TaskStep] = Field(
        default_factory=deque, description="Task step queue."
    )
    completed_steps: List[TaskStepOutput] = Field(
        default_factory=list, description="Completed step outputs."
    )


class AgentState(BaseModel):
    """Agent state."""

    task_dict: Dict[str, TaskState] = Field(
        default_factory=dict, description="Task dictionary."
    )

class AgentRunner(BaseAgentRunner):
    """Agent runner.

    Top-level agent orchestrator that can create tasks, run each step in a task,
    or run a task e2e. Stores state and keeps track of tasks.

    Args:
        agent_worker (BaseAgentWorker): step executor
        chat_history (Optional[List[ChatMessage]], optional): chat history. Defaults to None.
        state (Optional[AgentState], optional): agent state. Defaults to None.
        memory (Optional[BaseMemory], optional): memory. Defaults to None.
        llm (Optional[LLM], optional): LLM. Defaults to None.
        callback_manager (Optional[CallbackManager], optional): callback manager. Defaults to None.
        init_task_state_kwargs (Optional[dict], optional): init task state kwargs. Defaults to None.

    """
    def __init__(
        self,
        agent_worker: BaseAgentWorker,
        chat_history: Optional[List[ChatMessage]] = None,
        state: Optional[AgentState] = None,
        memory: Optional[BaseMemory] = None,
        llm: Optional[LLM] = None,
        callback_manager: Optional[CallbackManager] = None,
        init_task_state_kwargs: Optional[dict] = None,
        delete_task_on_finish: bool = False,
        default_tool_choice: str = "auto",
        verbose: bool = False,
    ) -> None:
        """Initialize."""
        self.agent_worker = agent_worker
        self.state = state or AgentState()
        self.memory = memory or ChatMemoryBuffer.from_defaults(chat_history, llm=llm)
        self.llm = llm
        ...
    def create_task(self, input: str, **kwargs: Any) -> Task:
        """Create task."""
        if not self.init_task_state_kwargs:
            extra_state = kwargs.pop("extra_state", {})
        else:
            if "extra_state" in kwargs:
                raise ValueError(
                    "Cannot specify both `extra_state` and `init_task_state_kwargs`"
                )
            else:
                extra_state = self.init_task_state_kwargs

        callback_manager = kwargs.pop("callback_manager", self.callback_manager)
        task = Task(
            input=input,
            memory=self.memory,
            extra_state=extra_state,
            callback_manager=callback_manager,
            **kwargs,
        )
        
        # get initial step from task, and put it in the step queue
        initial_step = self.agent_worker.initialize_step(task)
        task_state = TaskState(
            task=task,
            step_queue=deque([initial_step]),
        )
        # add it to state
        self.state.task_dict[task.task_id] = task_state

        return task


    def _run_step(
        self,
        task_id: str,
        step: Optional[TaskStep] = None,
        input: Optional[str] = None,
        mode: ChatResponseMode = ChatResponseMode.WAIT,
        **kwargs: Any,
    ) -> TaskStepOutput:
        """Execute step."""
        task = self.state.get_task(task_id)
        step_queue = self.state.get_step_queue(task_id)
        step = step or step_queue.popleft()
        if input is not None:
            step.input = input

    
        # append cur_step_output next steps to queue
        next_steps = cur_step_output.next_steps
        step_queue.extend(next_steps)

        # add cur_step_output to completed steps
        completed_steps = self.state.get_completed_steps(task_id)
        completed_steps.append(cur_step_output)

        return cur_step_output

Conclusion

In this blog post, we have explored the key components of LlamaIndex agents and tools. We have learned how tools and agents work together to facilitate the seamless functioning of the LlamaIndex. By understanding the inner workings of these components, we can gain a deeper appreciation for the capabilities of LlamaIndex and how it can be leveraged to enhance language models. In the next blog post, we will continue our exploration of LlamaIndex and delve into more advanced topics. Stay tuned for more insights and learning experiences!

comments powered by Disqus