env vars and starting work on repo_manager

2025-05-23 15:47:25 +10:00 · 2025-05-23 15:47:25 +10:00 · c606f72d90
commit c606f72d90
parent 8a64d9c959
9 changed files with 195 additions and 162 deletions
--- a/.gitignore
+++ b/.gitignore
@ -3,3 +3,5 @@ __pycache__
 .venv
 .aider*
 .vscode
 .zed
 pyproject.toml
--- a/README.md
+++ b/README.md
@ -3,10 +3,19 @@
 This creator requires you to use a working Trilium Instance and create a .env file with the following
 ```
-TRILIUM_HOST
+TRILIUM_HOST=
-TRILIUM_PORT
+TRILIUM_PORT=
-TRILIUM_PROTOCOL
+TRILIUM_PROTOCOL=
-TRILIUM_PASS
+TRILIUM_PASS=
 TRILIUM_TOKEN=
 OLLAMA_PROTOCOL=
 OLLAMA_HOST=
 OLLAMA_PORT=11434
 EMBEDDING_MODEL=
 EDITOR_MODEL=
 # This is expected in python list format example `[phi4-mini:latest, qwen3:1.7b, gemma3:latest]`
 CONTENT_CREATOR_MODELS=
 CHROMA_SERVER=<IP_ADDRESS>
 ```
 This container is going to be what I use to trigger a blog creation event
@ -29,7 +38,7 @@ To do this we will
 4. cd /src/content
-5. take the information from the trillium note and prepare a 500 word blog post, insert the following at the top 
+5. take the information from the trillium note and prepare a 500 word blog post, insert the following at the top
 ```
 Title: <title>
@ -42,7 +51,7 @@ Authors: <model name>.ai
 Summary: <have ai write a 10 word summary of the post
 ```
-6. write it to `<title>.md` 
+6. write it to `<title>.md`
 7. `git checkout -b <title>`
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -1,53 +1,44 @@
 networks:
-  net:
+    net:
-    driver: bridge
+        driver: bridge
 services:
-  blog_creator:
+    blog_creator:
-    build:
+        build:
-      context: .
+            context: .
-      dockerfile: Dockerfile
+            dockerfile: Dockerfile
-    container_name: blog_creator
+        container_name: blog_creator
-    env_file: 
+        env_file:
-      - .env
+            - .env
-    volumes:
+        volumes:
-      - ./generated_files/:/blog_creator/generated_files
+            - ./generated_files/:/blog_creator/generated_files
-    networks:
+        networks:
-      - net
+            - net
-  chroma:
+    chroma:
-    image: chromadb/chroma
+        image: chromadb/chroma
-    container_name: chroma
+        container_name: chroma
-    volumes:
+        volumes:
-      # Be aware that indexed data are located in "/chroma/chroma/"
+            # Be aware that indexed data are located in "/chroma/chroma/"
-      # Default configuration for persist_directory in chromadb/config.py
+            # Default configuration for persist_directory in chromadb/config.py
-      # Read more about deployments: https://docs.trychroma.com/deployment
+            # Read more about deployments: https://docs.trychroma.com/deployment
-      - chroma-data:/chroma/chroma
+            - chroma-data:/chroma/chroma
-    command: "--workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30"
+        #command: "--host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30"
-    environment:
+        environment:
-      - IS_PERSISTENT=TRUE
+            - IS_PERSISTENT=TRUE
-      - CHROMA_SERVER_AUTHN_PROVIDER=${CHROMA_SERVER_AUTHN_PROVIDER}
+        restart: unless-stopped # possible values are: "no", always", "on-failure", "unless-stopped"
-      - CHROMA_SERVER_AUTHN_CREDENTIALS_FILE=${CHROMA_SERVER_AUTHN_CREDENTIALS_FILE}
+        ports:
-      - CHROMA_SERVER_AUTHN_CREDENTIALS=${CHROMA_SERVER_AUTHN_CREDENTIALS}
+            - "8000:8000"
-      - CHROMA_AUTH_TOKEN_TRANSPORT_HEADER=${CHROMA_AUTH_TOKEN_TRANSPORT_HEADER}
+        healthcheck:
-      - PERSIST_DIRECTORY=${PERSIST_DIRECTORY:-/chroma/chroma}
+            # Adjust below to match your container port
-      - CHROMA_OTEL_EXPORTER_ENDPOINT=${CHROMA_OTEL_EXPORTER_ENDPOINT}
+            test:
-      - CHROMA_OTEL_EXPORTER_HEADERS=${CHROMA_OTEL_EXPORTER_HEADERS}
+                ["CMD", "curl", "-f", "http://localhost:8000/api/v2/heartbeat"]
-      - CHROMA_OTEL_SERVICE_NAME=${CHROMA_OTEL_SERVICE_NAME}
+            interval: 30s
-      - CHROMA_OTEL_GRANULARITY=${CHROMA_OTEL_GRANULARITY}
+            timeout: 10s
-      - CHROMA_SERVER_NOFILE=${CHROMA_SERVER_NOFILE}
+            retries: 3
-    restart: unless-stopped # possible values are: "no", always", "on-failure", "unless-stopped"
+        networks:
-    ports:
+            - net
      - "8000:8000"
    healthcheck:
      # Adjust below to match your container port
      test: [ "CMD", "curl", "-f", "http://localhost:8000/api/v2/heartbeat" ]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - net
 volumes:
-  chroma-data:
+    chroma-data:
-    driver: local
+        driver: local
--- a/generated_files/when_to_use_ai.md
+++ b/generated_files/when_to_use_ai.md
@ -0,0 +1,53 @@
 # When Should You Use AI?
 Right off the bat? Well, let’s talk about when *not* using an LLM is actually pretty much like trying to build that perfect pavlova with a robot: Sure, they might have all these instructions and ingredients laid out for them (or so it seems), but can you really trust this machine to understand those subtle nuances of temperature or timing? No. And let’s be real here – if we’re talking about tasks requiring precise logic like financial calculations or scientific modeling - well, that sounds more suited to the human brain.
 But where does AI actually shine bright and come in handy?
 * **Pattern Recognition:** Spotting trends within data is one of those areas LLMs are pretty darn good at. Whether it’s identifying patterns across a dataset for insights (or even generating creative ideas based on existing information), they can do that with speed, efficiency - not to mention accuracy.
 **And when shouldn’t you use AI?**
 * **Tasks Requiring Precise Logic:** If your job is something needing absolute precision – like crunching numbers or modeling scientific data where a miscalculation could mean millions in losses for the company. Well… maybe hold off on letting an LLM take over.
 * **Situations Demanding Critical Thinking**: Let’s be honest, if you need to make judgment calls based upon complex factors that even humans can struggle with – then it might not just do a good job; but rather fall short.
 LMLs are great at mimicking intelligence. But they don’t actually understand things the way we human beings (or I should say: non-humans) comprehend them.
 * **Processes Where Errors Have Serious Consequences:** If your work involves tasks where errors can have serious consequences, then you probably want to keep it in human hands.
 **The Bottom Line**
 AI is a powerful tool. But like any good chef knows – even the best kitchen appliances can't replace their own skills and experience when making that perfect pavlova (or for us humans: delivering results). It’s about finding balance between leveraging AI capabilities, while also relying on our critical thinking - and human intuition.
 Don’t get me wrong here; I’m not anti-AI. But let’s be sensible – use it where it's truly helpful but don't forget to keep those tasks in the hands of your fellow humans (or at least their non-humans).
 ---
 **Note for Editors:** This draft is designed with ease-of-editing and clarity as a priority, so feel free to adjust any sections that might need further refinement or expansion. I aimed this piece towards an audience who appreciates both humor-infused insights into the world of AI – while also acknowledging its limitations in certain scenarios.
 ```markdown
 # When Should You Use AI?
 Right off the bat? Well, let’s talk about when *not* using LLM is actually pretty much like trying to build that perfect pavlova with a robot: Sure, they might have all these instructions and ingredients laid out for them (or so it seems), but can you really trust this machine to understand those subtle nuances of temperature or timing? No. And let’s be real here – if we’re talking about tasks requiring precise logic like financial calculations or scientific modeling - well, that sounds more suited to the human brain.
 But where does AI actually shine bright and come in handy?
 * **Pattern Recognition:** Spotting trends within data is one of those areas LLMs are pretty darn good at. Whether it’s identifying patterns across a dataset for insights (or even generating creative ideas based on existing information), they can do that with speed, efficiency - not to mention accuracy.
 **And when shouldn’t you use AI?**
 * **Tasks Requiring Precise Logic:** If your job is something needing absolute precision – like crunching numbers or modeling scientific data where a miscalculation could mean millions in losses for the company. Well… maybe hold off on letting an LLM take over.
 * **Situations Demanding Critical Thinking**: Let’s be honest, if you need to make judgment calls based upon complex factors that even humans can struggle with – then it might not just do a good job; but rather fall short.
 LMLs are great at mimicking intelligence. But they don’t actually understand things the way we human beings (or I should say: non-humans) comprehend them.
 * **Processes Where Errors Have Serious Consequences:** If your work involves tasks where errors can have serious consequences, then you probably want to keep it in human hands.
 **The Bottom Line**
 AI is a powerful tool. But like any good chef knows – even the best kitchen appliances can't replace their own skills and experience when making that perfect pavlova (or for us humans: delivering results). It’s about finding balance between leveraging AI capabilities, while also relying on our critical thinking - and human intuition.
 Don’t get me wrong here; I’m not anti-AI. But let’s be sensible – use it where it's truly helpful but don't forget to keep those tasks in the hands of your fellow humans (or at least their non-humans).
 ---
 **Note for Editors:** This draft is designed with ease-of-editing and clarity as a priority, so feel free to adjust any sections that might need further refinement or expansion. I aimed this piece towards an audience who appreciates both humor-infused insights into the world of AI – while also acknowledging its limitations in certain scenarios.
 ```
--- a/src/ai_generators/ollama_md_generator.py
+++ b/src/ai_generators/ollama_md_generator.py
@ -1,11 +1,11 @@
-import os, re
+import os, re, json, random, time
 from ollama import Client
-import chromadb, time
+import chromadb
 from langchain_ollama import ChatOllama
 class OllamaGenerator:
-    def __init__(self, title: str, content: str, model: str, inner_title: str):
+    def __init__(self, title: str, content: str, inner_title: str):
        self.title = title
        self.inner_title = inner_title
        self.content = content
@ -13,15 +13,15 @@ class OllamaGenerator:
        self.chroma = chromadb.HttpClient(host="172.18.0.2", port=8000)
        ollama_url = f"{os.environ["OLLAMA_PROTOCOL"]}://{os.environ["OLLAMA_HOST"]}:{os.environ["OLLAMA_PORT"]}"
        self.ollama_client = Client(host=ollama_url)
-        self.ollama_model = model
+        self.ollama_model = os.environ["EDITOR_MODEL"]
-        self.embed_model = "snowflake-arctic-embed2:latest"
+        self.embed_model = os.environ["EMBEDDING_MODEL"]
-        self.agent_models = ["openthinker:7b", "deepseek-r1:7b", "qwen2.5:7b", "gemma3:latest"]
+        self.agent_models = json.loads(os.environ["CONTENT_CREATOR_MODELS"])
-        self.llm = ChatOllama(model=self.ollama_model, temperature=0.7)
+        self.llm = ChatOllama(model=self.ollama_model, temperature=0.6, top_p=0.5) #This  is the level head in the room
        self.prompt_inject = f"""
            You are a journalist, Software Developer and DevOps expert
            writing a 1000 word draft blog for other tech enthusiasts.
            You like to use almost no code examples and prefer to talk
-            in a light comedic tone. You are also Australian 
+            in a light comedic tone. You are also Australian
            As this person write this blog as a markdown document.
            The title for the blog is {self.inner_title}.
            Do not output the title in the markdown.
@ -50,16 +50,24 @@ class OllamaGenerator:
            chunks.append(' '.join(current_chunk))
        return chunks
- 
+
    def generate_draft(self, model) -> str:
        '''Generate a draft blog post using the specified model'''
        try:
-            agent_llm = ChatOllama(model=model, temperature=0.8)
+            # the idea behind this is to make the "creativity" random amongst the content creators
            # contorlling temperature will allow cause the output to allow more "random" connections in sentences
            # Controlling top_p will tighten or loosen the embedding connections made
            # The result should be varied levels of "creativity" in the writing of the drafts
            # for more see https://python.langchain.com/v0.2/api_reference/ollama/chat_models/langchain_ollama.chat_models.ChatOllama.html
            temp = random.uniform(0.5, 1.0)
            top_p = random.uniform(0.4, 0.8)
            top_k = int(random.uniform(30, 80))
            agent_llm = ChatOllama(model=model, temperature=temp, top_p=top_p, top_k=top_k)
            messages = [
-                ("system", self.prompt_inject), 
+                ("system", self.prompt_inject),
                ("human", "make the blog post in a format to be edited easily" )
            ]
-            self.response = agent_llm.invoke(messages)
+            response = agent_llm.invoke(messages)
            # self.response = self.ollama_client.chat(model=model,
            #                                         messages=[
            #         {
@ -67,11 +75,13 @@ class OllamaGenerator:
            #             'content': f'{self.prompt_inject}',
            #         },
            #     ])
-            return self.response.text()#['message']['content']
+            #print ("draft")
            #print (response)
            return response.text()#['message']['content']
        except Exception as e:
            raise Exception(f"Failed to generate blog draft: {e}")
-        
+
    def get_draft_embeddings(self, draft_chunks):
        '''Get embeddings for the draft chunks'''
        embeds = self.ollama_client.embed(model=self.embed_model, input=draft_chunks)
@ -96,16 +106,16 @@ class OllamaGenerator:
            collection.add(documents=draft_chunks, embeddings=embeds, ids=ids, metadatas=metadata)
        return collection
-    
+
    def generate_markdown(self) -> str:
-        
+
        prompt_system = f"""
-            You are an editor taking information from {len(self.agent_models)} Software 
+            You are an editor taking information from {len(self.agent_models)} Software
-            Developers and Data experts 
+            Developers and Data experts
            writing a 3000 word blog for other tech enthusiasts.
-            You like when they use almost no code examples and the 
+            You like when they use almost no code examples and the
-            voice is in a light comedic tone. You are also Australian 
+            voice is in a light comedic tone. You are also Australian
            As this person produce and an amalgamtion of this blog as a markdown document.
            The title for the blog is {self.inner_title}.
            Do not output the title in the markdown. Avoid repeated sentences
@ -118,6 +128,7 @@ class OllamaGenerator:
            collection_query = collection.query(query_embeddings=query_embed, n_results=100)
            print("Showing pertinent info from drafts used in final edited edition")
            pertinent_draft_info = '\n\n'.join(collection.query(query_embeddings=query_embed, n_results=100)['documents'][0])
            #print(pertinent_draft_info)
            prompt_human = f"Generate the final document using this information from the drafts: {pertinent_draft_info} - ONLY OUTPUT THE MARKDOWN"
            print("Generating final document")
            messages = [("system", prompt_system), ("human", prompt_human),]
@ -129,6 +140,8 @@ class OllamaGenerator:
            #             'content': f'{prompt_enhanced}',
            #         },
            #     ])
            #print ("Markdown Generated")
            #print (self.response)
            return self.response#['message']['content']
        except Exception as e:
--- a/src/main.py
+++ b/src/main.py
@ -18,10 +18,9 @@ for note in tril_notes:
    print(tril_notes[note]['title'])
    # print(tril_notes[note]['content'])
    print("Generating Document")
-    
+
    os_friendly_title = convert_to_lowercase_with_underscores(tril_notes[note]['title'])
    ai_gen = omg.OllamaGenerator(os_friendly_title,
                                 tril_notes[note]['content'],
                                 "gemma3:latest",
                                 tril_notes[note]['title'])
    ai_gen.save_to_file(f"/blog_creator/generated_files/{os_friendly_title}.md")
--- a/src/repo_management/push_markdown.py
+++ b/src/repo_management/push_markdown.py
@ -1,48 +0,0 @@
 import os
 import sys
 from git import Repo
 # Set these variables accordingly
 REPO_OWNER = "your_repo_owner"
 REPO_NAME = "your_repo_name"
 def clone_repo(repo_url, branch="main"):
    Repo.clone_from(repo_url, ".", branch=branch)
 def create_markdown_file(file_name, content):
    with open(f"{file_name}.md", "w") as f:
        f.write(content)
 def commit_and_push(file_name, message):
    repo = Repo(".")
    repo.index.add([f"{file_name}.md"])
    repo.index.commit(message)
    repo.remote().push()
 def create_new_branch(branch_name):
    repo = Repo(".")
    repo.create_head(branch_name).checkout()
    repo.head.reference.set_tracking_url(f"https://your_git_server/{REPO_OWNER}/{REPO_NAME}.git/{branch_name}")
    repo.remote().push()
 if __name__ == "__main__":
    if len(sys.argv) < 3:
        print("Usage: python push_markdown.py <repo_url> <markdown_file_name>")
        sys.exit(1)
    repo_url = sys.argv[1]
    file_name = sys.argv[2]
    # Clone the repository
    clone_repo(repo_url)
    # Create a new Markdown file with content
    create_markdown_file(file_name, "Hello, World!\n")
    # Commit and push changes to the main branch
    commit_and_push(file_name, f"Add {file_name}.md")
    # Create a new branch named after the Markdown file
    create_new_branch(file_name)
    print(f"Successfully created '{file_name}' branch with '{file_name}.md'.")
--- a/src/repo_management/repo_manager.py
+++ b/src/repo_management/repo_manager.py
@ -1,39 +1,52 @@
-import os
+import os, shutil
-from git import Git
+from git import Repo
-from git.repo import BaseRepository
+from git.exc import GitCommandError
 from git.exc import InvalidGitRepositoryError
 from git.remote import RemoteAction
 class GitRepository:
    # This is designed to be transitory it will desctruvtively create the repo at repo_path
    # if you have uncommited changes you can kiss them goodbye!
    # Don't use the repo created by this function for dev -> its a tool!
    # It is expected that  when used you will add, commit, push, delete
    def __init__(self, repo_path, username=None, password=None):
        git_protocol = os.environ["GIT_PROTOCOL"]
        git_remote = os.environ["GIT_REMOTE"]
        remote = f"{git_protocol}://{username}:{password}@{git_remote}"
-def try_something(test): 
+        if os.path.exists(repo_path):
            shutil.rmtree(repo_path)
-# Set the path to your blog repo here
+        Repo.clone_from(remote, repo_path)
-blog_repo = "/path/to/your/blog/repo"
+        self.repo = Repo(repo_path)
        self.username = username
        self.password = password
    def clone(self, remote_url, destination_path):
        """Clone a Git repository with authentication"""
        try:
            self.repo.clone(remote_url, destination_path)
            return True
        except GitCommandError as e:
            print(f"Cloning failed: {e}")
            return False
-# Checkout a new branch and create a new file for our blog post
+    def fetch(self, remote_name='origin', ref_name='main'):
-branch_name = "new-post"
+        """Fetch updates from a remote repository with authentication"""
-try:
+        try:
-    repo = Git(blog_repo)
+            self.repo.remotes[remote_name].fetch(ref_name=ref_name)
-    repo.checkout("-b", branch_name, "origin/main")
+            return True
-    with open("my-blog-post.md", "w") as f:
+        except GitCommandError as e:
-        f.write(content)
+            print(f"Fetching failed: {e}")
-except InvalidGitRepositoryError:
+            return False
    # Handle repository errors gracefully
    pass
-# Add and commit the changes to Git
+    def pull(self, remote_name='origin', ref_name='main'):
-repo.add("my-blog-post.md")
+        """Pull updates from a remote repository with authentication"""
-repo.commit("-m", "Added new blog post about DevOps best practices.")
+        try:
-
+            self.repo.remotes[remote_name].pull(ref_name=ref_name)
-# Push the changes to Git and create a PR
+            return True
-repo.remote().push("refs/heads/{0}:refs/for/main".format(branch_name), "--set-upstream")
+        except GitCommandError as e:
-base_branch = "origin/main"
+            print(f"Pulling failed: {e}")
-target_branch = "main"
+            return False
 pr_title = "DevOps best practices"
 try:
    repo.create_head("{0}-{1}", base=base_branch, message="{}".format(pr_title))
 except RemoteAction.GitExitStatus as e:
    # Handle Git exit status errors gracefully
    pass
    def get_branches(self):
        """List all branches in the repository"""
        return [branch.name for branch in self.repo.branches]
--- a/src/trilium/notes.py
+++ b/src/trilium/notes.py
@ -11,16 +11,16 @@ class TrilumNotes:
        self.token = os.environ.get('TRILIUM_TOKEN')
        if not all([self.protocol, self.host, self.port, self.tril_pass]):
            print("One or more required environment variables not found. Have you set a .env?")
-        
+
        self.server_url = f'{self.protocol}://{self.host}:{self.port}'
-        
+
        if not self.token:
            print("Please run get_token and set your token")
        else:
            self.ea = ETAPI(self.server_url, self.token)
-        self.new_notes = None 
+        self.new_notes = None
-        self.note_content = None 
+        self.note_content = None
-    
+
    def get_token(self):
        ea = ETAPI(self.server_url)
        if self.tril_pass == None:
@ -44,10 +44,11 @@ class TrilumNotes:
    def get_notes_content(self):
        content_dict = {}
        if self.new_notes is None:
            raise ValueError("How did you do this? new_notes is None!")
        for note in self.new_notes['results']:
-            content_dict[note['noteId']] = {"title" : f"{note['title']}", 
+            content_dict[note['noteId']] = {"title" : f"{note['title']}",
                                            "content" : f"{self._get_content(note['noteId'])}"
                                            }
        self.note_content = content_dict
        return content_dict