# API Reference — Indirect Prompt Injection Detection ## LLM Guard Install: `pip install llm-guard` | API | Description | |-----|-------------| | `from llm_guard.input_scanners import PromptInjection` | Import the injection scanner | | `PromptInjection(threshold=0.5, match_type=MatchType.FULL)` | Construct scanner (FULL or SENTENCE match) | | `scanner.scan(text)` | Returns `(sanitized_text, is_valid, risk_score)` | | `from llm_guard import scan_prompt` | Run multiple scanners over a prompt | `is_valid == False` indicates an injection was detected. ## Transformers detector models Install: `pip install transformers torch` | API | Description | |-----|-------------| | `pipeline("text-classification", model=...)` | Load a classifier pipeline | | `protectai/deberta-v3-base-prompt-injection-v2` | Open prompt-injection classifier (labels: SAFE / INJECTION) | | `meta-llama/Llama-Prompt-Guard-2-86M` | Meta jailbreak/injection classifier (gated license) | ## Content extraction | API | Description | |-----|-------------| | `BeautifulSoup(html, "html.parser")` | Parse HTML | | `soup.find_all(string=lambda t: isinstance(t, Comment))` | Extract HTML comments | | `pypdf.PdfReader(path).pages[i].extract_text()` | Extract PDF text | | `pytesseract.image_to_string(Image.open(path))` | OCR text from an image | | `PIL.Image.open(path)._getexif()` | Read EXIF metadata | ## Normalization helpers | Technique | Method | |-----------|--------| | Strip zero-width chars | `str.translate` over U+200B..U+FEFF | | Strip Unicode tag chars | filter ord in range 0xE0000-0xE007F | | Canonicalize | `unicodedata.normalize("NFKC", text)` | | Decode Base64 | `base64.b64decode(token)` | | Decode ROT13 | `codecs.decode(text, "rot_13")` | ## External References - LLM Guard PromptInjection docs: https://llm-guard.com/input_scanners/prompt_injection/ - Hugging Face transformers pipelines: https://huggingface.co/docs/transformers/main_classes/pipelines - pytesseract: https://github.com/madmaze/pytesseract