[알고리즘 특강] 8. Trie

November 14, 2024 2 분 소요

1. 트라이의 정의와 필요성

1.1 트라이란?

정의: 문자열을 저장하고 검색하는데 사용되는 트리 자료구조
특징:
- 각 노드는 하나의 문자를 저장
- 루트에서 리프까지의 경로가 하나의 문자열을 나타냄
- 문자열 검색에 O(m) 시간 복잡도 (m은 문자열 길이)

1.2 트라이의 장단점

장점:
- 빠른 탐색 시간 O(N) (N=문자열 길이)
- Hash와 같은 시간 복잡도
- BST 동작을 응용해서 더 많은 동작 가능
단점:
- 큰 메모리 요구 O(M) (M=삽입된 문자열의 총 길이)
- O(포인터크기 * 포인터배열개수 * 노드개수)
- 예: 1000자 문자열 500개 = 50만개 노드 -> 약 100MB 메모리

2. 트라이 구조

2.1 노드 및 간선

노드 구조:
- 각 노드는 <Key, Value> Map을 가짐
- Key는 알파벳
- Value는 이어지는 자식노드
- 노드 자체보다는 노드에서 이어진 간선이 문자를 가짐
특징:
- root에서 leaf node까지 닿지 못하면, 원래 문자열 집합을 전부 뽑아낼 수 없음
- 코딩상에서 bool 변수 isLastChar 필요
- 부모가 같은 자식 노드들은 공통 접두어를 가짐
- 같은 노드에서 여러 간선이 나올 수 있으나, 서로 다른 라벨(문자)
- 두 문자열의 접두사가 같다면 그 것만큼 트리에서 공유

3. 트라이 구현

3.1 C++ 구현

class TrieNode {
public:
    TrieNode* children[26];  // 알파벳 소문자만 가정
    bool isEndOfWord;
    
    TrieNode() {
        for (int i = 0; i < 26; i++) {
            children[i] = nullptr;
        }
        isEndOfWord = false;
    }
};

class Trie {
private:
    TrieNode* root;
    
public:
    Trie() {
        root = new TrieNode();
    }
    
    void insert(const string& word) {
        TrieNode* current = root;
        for (char c : word) {
            int index = c - 'a';
            if (current->children[index] == nullptr) {
                current->children[index] = new TrieNode();
            }
            current = current->children[index];
        }
        current->isEndOfWord = true;
    }
    
    bool search(const string& word) {
        TrieNode* current = root;
        for (char c : word) {
            int index = c - 'a';
            if (current->children[index] == nullptr) {
                return false;
            }
            current = current->children[index];
        }
        return current->isEndOfWord;
    }
    
    bool startsWith(const string& prefix) {
        TrieNode* current = root;
        for (char c : prefix) {
            int index = c - 'a';
            if (current->children[index] == nullptr) {
                return false;
            }
            current = current->children[index];
        }
        return true;
    }
};

3.2 Java 구현

import java.util.HashMap;
import java.util.Map;

public class TrieNode {
    private Map<Character, TrieNode> childNodes = new HashMap<>();
    private boolean isLastChar;
    
    Map<Character, TrieNode> getChildNodes() {
        return this.childNodes;
    }
    
    boolean isLastChar() {
        return this.isLastChar;
    }
    
    void setIsLastChar(boolean isLastChar) {
        this.isLastChar = isLastChar;
    }
}

class Trie {
    private TrieNode rootNode;
    
    Trie() {
        rootNode = new TrieNode();
    }
    
    void insert(String word) {
        TrieNode thisNode = this.rootNode;
        for (int i = 0; i < word.length(); i++) {
            thisNode = thisNode.getChildNodes().computeIfAbsent(word.charAt(i),
                    c -> new TrieNode());
        }
        thisNode.setIsLastChar(true);
    }
    
    boolean contains(String word) {
        TrieNode thisNode = this.rootNode;
        for (int i = 0; i < word.length(); i++) {
            char character = word.charAt(i);
            TrieNode node = thisNode.getChildNodes().get(character);
            if (node == null)
                return false;
            thisNode = node;
        }
        return thisNode.isLastChar();
    }
    
    void delete(String word) {
        delete(this.rootNode, word, 0);
    }
    
    private void delete(TrieNode thisNode, String word, int index) {
        char character = word.charAt(index);
        if(!thisNode.getChildNodes().containsKey(character))
            throw new Error("There is no [" + word + "] in this Trie.");

        TrieNode childNode = thisNode.getChildNodes().get(character);
        index++;

        if(index == word.length()) {
            if (!childNode.isLastChar())
                throw new Error("There is no [" + word + "] in this Trie.");
            childNode.setIsLastChar(false);

            if (childNode.getChildNodes().isEmpty())
                thisNode.getChildNodes().remove(character);
        } else {
            delete(childNode, word, index);
            if(!childNode.isLastChar() && childNode.getChildNodes().isEmpty())
                thisNode.getChildNodes().remove(character);
        }
    }
}

4. 트라이의 응용

4.1 자동완성

구현 방식:
- 트라이에 단어들을 저장
- 접두어로 검색하여 일치하는 모든 단어 찾기
장점:
- 빠른 검색 속도
- 메모리 효율적

4.2 철자 검사기

구현 방식:
- 사전의 단어들을 트라이에 저장
- 입력된 단어가 사전에 있는지 검사
장점:
- 실시간 검사 가능
- 오타 수정 제안 가능

Twitter Facebook LinkedIn

Baymax

[알고리즘 특강] 8. Trie

1. 트라이의 정의와 필요성

1.1 트라이란?

1.2 트라이의 장단점

2. 트라이 구조

2.1 노드 및 간선

3. 트라이 구현

3.1 C++ 구현

3.2 Java 구현

4. 트라이의 응용

4.1 자동완성

4.2 철자 검사기

공유하기

댓글남기기

참고

[LLM] 2. LLM의 핵심 구성 요소

[LLM] 1. LLM의 기본 개념 이해

[LeetCode] 리트코드 - Single Number (C++)

[LeetCode] 리트코드 - Rotate Array (C++)