Parse PDFs with LLMs in Pipeline Builder
Palantir Developers

Parse PDFs with LLMs in Pipeline Builder

Use Pipeline Builder and LLMs to transform PDFs into datasets ready for use in your Ontology. Extract text from PDFs, then use LLMs to parse that into structured data with total cost, type of expense, and order summaries.

Overview

This example shows you how to use Pipeline Builder and LLMs to transform purchase order PDFs into datasets ready for use in your Ontology. You’ll learn to extract text from PDFs and to use LLMs to parse that text into structured data elements like total cost, type of expense, and order summaries.

Example Uses

  • Accounting workflows
  • Supply chain management
  • Auditing
  • Digitization efforts
  • Trend analysis

Feature Highlights

  • Pipeline Builder Text Extraction Board: this board allows for PDF text extraction using either raw text extraction or OCR extraction.
  • Pipeline Builder LLM Node: LLM Node comes with five guided prompts, including classification, summarization, translation, sentiment analysis and entity extraction, as well as the ability to create a custom prompt from scratch.

Next Steps

  • Test additional extractions: Apply what you’ve learned here to new data extraction, summarization, and classification use cases.
  • Configure downstream alerts: Build alerts on top of the structured data you’ve extracted (i.e., alerts that notify teams when expenses exceed a certain threshold).

Implementation

Pipeline
Transform raw data to produce high-quality, curated datasets that can be used in the Ontology or serve as the foundation of machine learning and analytical workflows
Parse receipts with LLMs in Pipeline Builder
Actual results and experiences may vary. Substitute notional data with organizational data to deploy an operational workflow.