llama 계열 gguf 제공되는 경우 가져와서 사용하는 예제

llama 계열의 모델이 친절하게 gguf 형태로 제공되는 경우 어떻게 다운받고 어떻게 ollama에 추가하는지

예전에 gguf 파일을 등록하는 유튜브 강의를 보고 메모해두것을 기반으로 2024년 12월31일 추운 겨울 밤 기억을 백업해 봅니다

수동으로 작성한 것은 지우고 copilot으로 포맷을 정리해서 업데이트합니다.

Bllossom/llama-3.2-Korean-Bllossom-3B-gguf-Q4_K_M 한국어 모델 추가하기

시작:

MS Copilot과의 질의응답 중 llama 3.2 기반의 한국어 학습 모델을 발견.

현재 사용 모델:

EEVE-Korean-10.8B (약 7.7 GB) 모델 사용 중.

llama 3.2 기반 한국어 모델 소개:

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 기반 한국어 모델 소개:

모델 설정 파일 (Modelfile):

FROM llama-3.2-Korean-Bllossom-3B-gguf-Q4_K_M.gguf

PARAMETER temperature 0.6
PARAMETER top_p 0.9

TEMPLATE """<|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

{{ if .System }}{{ .System }}
{{- end }}
{{- if .Tools }}When you receive a tool call response, use the output to format an answer to the orginal user question.

You are a helpful assistant with tool calling capabilities.
{{- end }}<|eot_id|>
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
{{- if and $.Tools $last }}

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{{ range $.Tools }}
{{- . }}
{{ end }}
{{ .Content }}<|eot_id|>
{{- else }}

{{ .Content }}<|eot_id|>
{{- end }}{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
{{- if .ToolCalls }}
{{ range .ToolCalls }}
{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}
{{- else }}

{{ .Content }}
{{- end }}{{ if not $last }}<|eot_id|>{{ end }}
{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>

{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- end }}
{{- end }}"""

SYSTEM """You are a helpful AI assistant. Please answer the user's questions kindly. 당신은 유능한 AI 어시스턴트 입니다. 사용자의 질문에 대해 친절하게 답변해주세요."""

추가 방법:

gguf 파일 다운로드

브라우저나 Hugging Face CLI를 통해 다운로드

bash

huggingface-cli download Bllossom/llama-3.2-Korean-Bllossom-3B-gguf-Q4_K_M --local-dir m:\temp --local-dir-use-symlinks False

gguf 파일 및 Modelfile 배치
- gguf 파일: M:\temp\llama-3.2-Korean-Bllossom-3B-gguf-Q4_K_M\
- Modelfile: M:\temp\llama-3.2-Korean-Bllossom-3B-gguf-Q4_K_M\

모델 등록 및 확인:

bash

ollama create llama-3.2-Korean-Bllossom-3B -f llama-3.2-Korean-Bllossom-3B-gguf-Q4_K_M/Modelfile
ollama list

모델 동작 확인:

bash

ollama run llama-3.2-Korean-Bllossom-3B

결론:

2024년은 말도 안되는 사건들과 대형참사가 많았으나, 평안한 2025년을 기다립니다.

SilverBullet님의 메시지…

성능비교 3B VS 10.8B

같은 HW환경에서 응답속도는 속도는 3B가 훨씬 빨랐습니다.
----
M:\temp>ollama run llama-3.2-Korean-Bllossom-3B
>>>
>>> 철수가 20개의 연필을 가지고 있었는데 영희가 절반을 가져가고 민수가 남은 5개를 가져갔으면 철수에게 남은 연필의 갯수는
...
... 몇개인가요?
영희가 철수의 20개 연필 중 절반을 가져간다면, 철수가 가지고 있는 연필의 수는 20/2 = 10개가 됩니다.

이제 민수가 남은 5개를 가져갔으므로, 철수에게 남아 있는 연필의 수는 10 - 5 = 5개입니다.

따라서 철수가 남아 있는 연필의 갯수는 5개입니다.

>>> /bye

M:\temp>ollama list
NAME ID SIZE MODIFIED
llama-3.2-Korean-Bllossom-3B:latest 964e2040fa94 2.0 GB 30 minutes ago
llama3.1:latest 46e0c10c039e 4.9 GB 5 days ago
llama3.3:latest a6eb4748fd29 42 GB 5 days ago
llama3.2-vision:latest 38107a0cd119 7.9 GB 4 weeks ago
llama3.2:latest a80c4f17acd5 2.0 GB 4 weeks ago
mistral:latest f974a74358d6 4.1 GB 4 weeks ago
nomic-embed-text:latest 0a109f422b47 274 MB 4 weeks ago
llama3:latest 365c0bd3c000 4.7 GB 6 weeks ago
phi3:latest 4f2222927938 2.2 GB 6 weeks ago
llama3-instruct-8b:latest 1d0e661eaade 8.5 GB 3 months ago
EEVE-Korean-10.8B:latest 78776a54c25d 7.7 GB 3 months ago

M:\temp>ollama run EEVE-Korean-10.8B
>>> 철수가 20개의 연필을 가지고 있었는데 영희가 절반을 가져가고 민수가 남은 5개를 가져갔으면 철수에게 남은 연필의 갯수는
...
... 몇개인가요?
철수가 처음에 20개의 연필을 가졌고, 영희가 그 중 절반인 10개의 연필을 가져갔으니 이제 철수에게는 다음과 같이 남아있
습니다:

20 - 10 = 10개의 연필

다음으로, 민수가 남은 5개의 연필을 가져갔으므로 철수에겐 다음과 같이 남아있습니다:

10 - 5 = 5개의 연필

따라서, 철수는 이제 5개의 연필이 남았습니다.

12/31/2024 3:50 오전

SilverBullet님의 메시지…

1. gguf 다운로드받기
2. Modelfile만들기
3. ollama 에 등록하기
ollama create ko-r1-7b-v2.0.3-Q6_K -f ko-r1-7b-v2.0.3-Q6_K/Modelfile
4. 추가된 모델 확인하기
ollama show ko-r1-7b-v2.0.3-Q6_K
Model
architecture qwen2
parameters 7.6B
context length 131072
embedding length 3584
quantization Q6_K

3/06/2025 7:39 오전

Oh Happy Life

이 블로그 검색

llama 계열 gguf 제공되는 경우 가져와서 사용하는 예제

태그

댓글

이 블로그의 인기 게시물

DELL의 새게이밍 노트북, Inspiron 15 7000 Gaming, Dual Display 문제

우분투에서 성가신 자동 실행 처리

TUF Z390-PLUS GAMING 보드 불편한점