Generate tags around context

Merge pull request 'prompt enhancement' (#16 ) from prompt_fix into master
Reviewed-on: #16
2025-06-16 10:35:21 +10:00 · 2025-06-06 12:04:44 +10:00 · 2025-06-06 12:04:19 +10:00 · 2025-06-05 09:22:50 +10:00 · 2025-06-05 09:22:28 +10:00 · 2025-06-05 09:22:19 +10:00
16 changed files with 495 additions and 258 deletions
--- a/.gitea/workflows/deploy.yml
+++ b/.gitea/workflows/deploy.yml
@ -0,0 +1,56 @@
 name: Create Blog Article if new notes exist
 on:
    schedule:
        - cron: "15 3 * * *"
    push:
        branches:
            - master
 jobs:
    prepare_blog_drafts_and_push:
        runs-on: ubuntu-latest
        steps:
            - name: Checkout repository
              uses: actions/checkout@v4
            - name: Install dependencies
              shell: bash
              run: |
                  apt update && apt upgrade -y
                  apt install rustc cargo python-is-python3 pip python3-venv python3-virtualenv libmagic-dev git -y
                  virtualenv .venv
                  source .venv/bin/activate
                  pip install --upgrade pip
                  pip install -r requirements.txt
                  git config --global user.name "Blog Creator"
                  git config --global user.email "ridgway.infrastructure@gmail.com"
                  git config --global push.autoSetupRemote true
            - name: Create .env
              shell: bash
              run: |
                  echo "TRILIUM_HOST=${{ vars.TRILIUM_HOST }}" > .env
                  echo "TRILIUM_PORT='${{ vars.TRILIUM_PORT }}'" >> .env
                  echo "TRILIUM_PROTOCOL='${{ vars.TRILIUM_PROTOCOL }}'" >> .env
                  echo "TRILIUM_PASS='${{ secrets.TRILIUM_PASS }}'" >> .env
                  echo "TRILIUM_TOKEN='${{ secrets.TRILIUM_TOKEN }}'" >> .env
                  echo "OLLAMA_PROTOCOL='${{ vars.OLLAMA_PROTOCOL }}'" >> .env
                  echo "OLLAMA_HOST='${{ vars.OLLAMA_HOST }}'" >> .env
                  echo "OLLAMA_PORT='${{ vars.OLLAMA_PORT }}'" >> .env
                  echo "EMBEDDING_MODEL='${{ vars.EMBEDDING_MODEL }}'" >> .env
                  echo "EDITOR_MODEL='${{ vars.EDITOR_MODEL }}'" >> .env
                  export PURE='["${{ vars.CONTENT_CREATOR_MODELS_1 }}", "${{ vars.CONTENT_CREATOR_MODELS_2 }}", "${{ vars.CONTENT_CREATOR_MODELS_3 }}", "${{ vars.CONTENT_CREATOR_MODELS_4 }}"]'
                  echo "CONTENT_CREATOR_MODELS='$PURE'" >> .env
                  echo "GIT_PROTOCOL='${{ vars.GIT_PROTOCOL }}'" >> .env
                  echo "GIT_REMOTE='${{ vars.GIT_REMOTE }}'" >> .env
                  echo "GIT_USER='${{ vars.GIT_USER }}'" >> .env
                  echo "GIT_PASS='${{ secrets.GIT_PASS }}'" >> .env
                  echo "N8N_SECRET='${{ secrets.N8N_SECRET }}'" >> .env
                  echo "N8N_WEBHOOK_URL='${{ vars.N8N_WEBHOOK_URL }}'" >> .env
                  echo "CHROMA_HOST='${{ vars.CHROMA_HOST }}'" >> .env
                  echo "CHROMA_PORT='${{ vars.CHROMA_PORT }}'" >> .env
            - name: Create Blogs
              shell: bash
              run: |
                  source .venv/bin/activate
                  python src/main.py
--- a/.gitignore
+++ b/.gitignore
@ -2,3 +2,9 @@
 __pycache__
 .venv
 .aider*
 .vscode
 .zed
 pyproject.toml
 .ropeproject
 generated_files/*
 pyright*
--- a/8
+++ b/8
@ -7,8 +7,12 @@ ENV PYTHONUNBUFFERED 1
 ADD src/ /blog_creator
-RUN apt-get update && apt-get install -y rustc cargo python-is-python3 pip python3.12-venv libmagic-dev 
+RUN apt-get update && apt-get install -y rustc cargo python-is-python3 pip python3-venv libmagic-dev git
-
+# Need to set up git here or we get funky errors
 RUN git config --global user.name "Blog Creator"
 RUN git config --global user.email "ridgway.infrastructure@gmail.com"
 RUN git config --global push.autoSetupRemote true
 #Get a python venv going as well cause safety
 RUN python -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"
--- a/README.md
+++ b/README.md
@ -3,10 +3,19 @@
 This creator requires you to use a working Trilium Instance and create a .env file with the following
 ```
-TRILIUM_HOST
+TRILIUM_HOST=
-TRILIUM_PORT
+TRILIUM_PORT=
-TRILIUM_PROTOCOL
+TRILIUM_PROTOCOL=
-TRILIUM_PASS
+TRILIUM_PASS=
 TRILIUM_TOKEN=
 OLLAMA_PROTOCOL=
 OLLAMA_HOST=
 OLLAMA_PORT=11434
 EMBEDDING_MODEL=
 EDITOR_MODEL=
 # This is expected in python list format example `[phi4-mini:latest, qwen3:1.7b, gemma3:latest]`
 CONTENT_CREATOR_MODELS=
 CHROMA_SERVER=<IP_ADDRESS>
 ```
 This container is going to be what I use to trigger a blog creation event
@ -29,7 +38,7 @@ To do this we will
 4. cd /src/content
-5. take the information from the trillium note and prepare a 500 word blog post, insert the following at the top 
+5. take the information from the trillium note and prepare a 500 word blog post, insert the following at the top
 ```
 Title: <title>
@ -42,7 +51,7 @@ Authors: <model name>.ai
 Summary: <have ai write a 10 word summary of the post
 ```
-6. write it to `<title>.md` 
+6. write it to `<title>.md`
 7. `git checkout -b <title>`
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -1,11 +1,44 @@
-services:
+networks:
-  blog_creator:
+    net:
-    build:
+        driver: bridge
      context: .
      dockerfile: Dockerfile
    container_name: blog_creator
    env_file: 
      - .env
    volumes:
      - ./generated_files/:/blog_creator/generated_files
 services:
    blog_creator:
        build:
            context: .
            dockerfile: Dockerfile
        container_name: blog_creator
        env_file:
            - .env
        volumes:
            - ./generated_files/:/blog_creator/generated_files
        networks:
            - net
    chroma:
        image: chromadb/chroma
        container_name: chroma
        volumes:
            # Be aware that indexed data are located in "/chroma/chroma/"
            # Default configuration for persist_directory in chromadb/config.py
            # Read more about deployments: https://docs.trychroma.com/deployment
            - chroma-data:/chroma/chroma
        #command: "--host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30"
        environment:
            - IS_PERSISTENT=TRUE
        restart: unless-stopped # possible values are: "no", always", "on-failure", "unless-stopped"
        ports:
            - "8000:8000"
        healthcheck:
            # Adjust below to match your container port
            test:
                ["CMD", "curl", "-f", "http://localhost:8000/api/v2/heartbeat"]
            interval: 30s
            timeout: 10s
            retries: 3
        networks:
            - net
 volumes:
    chroma-data:
        driver: local
--- a/generated_files/.gitignore
+++ b/generated_files/.gitignore
@ -0,0 +1,2 @@
 *
 !.gitignore
--- a/generated_files/creating_an_ollama_blog_writer.md
+++ b/generated_files/creating_an_ollama_blog_writer.md
@ -1,83 +0,0 @@
 <think>
 Alright, I've got this query from someone who wants to create an Ollama Blog Writer using Python. Let me break down what they're asking for.
 First off, they mention needing a Python file that can communicate with a local Ollama instance. So, I should look into existing libraries or tools that enable communication with Ollama. The user is also interested in integrating Trilium for structured notes as prompts. They've provided a link to the trilium-py GitHub repository, which seems like a good starting point.
 Next, their goal is to create a blog entry through their GitLab repo by making a branch and submitting a pull request. They want the PR content approved before proceeding further. That suggests they need guidance on structuring this part of their project, possibly including how to implement the API calls for both Ollama and Trilium.
 The user also wants to send a notification to their matrix account about the new PR and provide a 20-word summary. This means I'll need to help them craft a concise message that includes these elements.
 Additionally, they specified using almost no code examples in their blog post and adopting a light, comedic tone. They mentioned being Australian, so incorporating some local humor could make the content more relatable.
 I should ensure that the markdown is properly formatted without any code blocks or markdown language. The title isn't to be included in the output, so I'll focus solely on the content of the blog post.
 Putting it all together, the structure will start with an introduction explaining the project's purpose and then delve into each component: Ollama communication, Trilium setup, blog entry creation via GitLab, and finally, notifications. Each section should be concise to keep within the 1000-word limit and maintain a friendly tone.
 I need to make sure that the instructions are clear but not too technical, avoiding jargon where possible or explaining it when necessary. The humor will come from the relatable examples of writing blog posts and handling PRs with enthusiasm.
 </think>
 # Creating an Ollama Blog Writer  
 Alright, fellow tech enthusiasts! Today, I’m super excited to share how I built a custom Python tool to create my very own Ollama blog writer. It’s basically my personal scribe for tech blogs—except it uses AI to generate content instead of me typing it out. Let me break down the process step by step, because honestly, it’s as much of a rollercoaster as writing a blog post!  
 ## Step 1: Communicating with Ollama  
 First things first, I needed to connect my Python script to a running Ollama instance. Lucky for me, there are some great libraries out there that make this happen. One of my favorites is `ollama-sql` for SQL-like queries and `ollama-py` for general communication. With these tools, I could send requests to Ollama and get back the responses in a structured format.  
 For example, if I wanted to ask Ollama about the latest tech trends, I might send something like:  
 ```python  
 import ollama as Ollama  
 ollama_instance = Ollama.init()  
 response = ollama_instance.query("What are the top AI developments this year?")  
 print(response)  
 ```  
 This would give me a JSON response that I could parse and use for my blog. Easy peasy!  
 ## Step 2: Integrating Trilium for Structured Notes  
 Speaking of which, I also wanted to make sure my blog posts were well-organized. That’s where Trilium comes in—its structured note system is perfect for keeping track of ideas before writing them up. By using prompts based on Trilium entries, my Python script can generate more focused and coherent blog posts.  
 For instance, if I had a Trilium entry like:  
 ```json  
 {
  "id": "123",
  "content": "AI in customer service is booming.",
  "type": "thought"
 }
 ```
 I could use that as a prompt to generate something like:  
 *"In the rapidly evolving landscape of AI applications, customer service has taken a quantum leap with AI-powered platforms...."*  
 Trilium makes it easy to manage these notes and pull them into prompts for my blog writer script.  
 ## Step 3: Creating Blog Entries in My GitLab Repo  
 Now, here’s where things get interesting (and slightly nerve-wracking). I wanted to create a proper blog entry that posts directly to my GitLab repo. So, I forked the [aridgwayweb/blog](https://git.aridgwayweb.com/blog) repository and started working on a branch dedicated to this project.  
 In my `create_blog_entry.py` script, I used GitLab’s API to create a new entry. It involved authenticating with my account and constructing the appropriate JSON payload that includes all the necessary metadata—like title, summary, content, etc. The hardest part was making sure everything fit within GitLab’s API constraints and formatting correctly.  
 Here’s an excerpt of what I sent:  
 ```python  
 import gitlab  
 gl = gitlab.Gitlab('gitlab.com', 'your_api_key')  
 entry = gl.entries.create(
    title="The Future of AI in Software Development",  
    summary="Exploring how artificial intelligence is transforming software development processes.",  
    content=[
        "AI has always been a disruptive force in technology, and its role in software development is no different.",
        "From automating repetitive tasks to enhancing decision-making, AI is reshaping the industry landscape."
    ]
 )  
 ```  
 And then I notified myself that it was done!  
 ## Step 4: Sending Notifications via Matrix  
 Finally, after everything was up and running, I sent a quick notification to my matrix account about the new pull request. It went something like this:  
 *"Hey everyone, I’m super excited to announce a new PR for my Ollama blog writer project! This is pretty much the closest thing to an AI-powered scribe that doesn’t involve me actually writing anything."*  
 Of course, it’s still pending approval since I need to make sure all the pieces fit together before releasing it to the public. But hey, at least I’ve got a solid foundation to build on!  
 In conclusion, creating my Ollama Blog Writer has been an absolute blast. It combines my love for tech with Python and AI in ways I never imagined. Now, if only I could find a way to automate writing blog *reviews*…
--- a/generated_files/powerbi_and_api_performance.md
+++ b/generated_files/powerbi_and_api_performance.md
@ -1,46 +0,0 @@
 <think>
 Okay, so I'm trying to wrap my head around this PowerBI experience for a data product. Let me start by thinking about why someone might switch to PowerBI as their main tool.
 First, the blog title says it's about API performance. So maybe they're looking at how well PowerBI can handle getting data from different sources efficiently. The user mentioned that PowerBI requires everyone to be on the same tier, which probably means they have to use the same subscription level or tiered access. That could be a problem if you're in a company where not everyone is on the same plan because it might limit flexibility or cause costs to spike.
 Next, pulling data with PowerBI seems limited. They say it's only useful for small tables. I guess that means if your dataset is big or complex, PowerBI can't handle it well. Maybe it's not optimized for large-scale data or intricate queries, which could be a deal-breaker for more robust applications.
 Then there's the issue of being part of the Microsoft ecosystem. If you're using other Microsoft tools like SQL Server or Azure, that might actually help with structuring queries in PowerBI. But if you're outside this ecosystem, it fails. Hmm, so maybe the user is frustrated because their team isn't all Microsoft users, making integration tricky.
 Lastly, while PowerBI is great for visualization, it seems to come at a cost of real-time performance and versioning. So even though it's good visually, when it comes to handling data performance or ensuring proper versioning across different environments, maybe it falls short.
 Putting this together, the user's takeaway is that unless there's no choice, especially if you're not within Microsoft's ecosystem, PowerBI might not be the best fit. It could lead to failures and hurt their product's performance. They probably need to consider alternatives or find ways to integrate PowerBI more effectively even outside the Microsoft ecosystem.
 </think>
 # The Curious Case of PowerBI in Data Product Development  
 Alright, let me spill the beans on my latest adventure with PowerBI—spoiler alert: it wasn’t all smooth sailing. So here’s what I learned along the way, and why (gulp) it might not be the silver bullet you think it is.
 ## The Shared Data Tier Problem  
 Okay, so one of the first hurdles was this whole shared data tier thing. Let me tell ya, it felt like a non-starter for most companies out there. Imagine walking into an office with this in your lap: “Everyone has to be on the same tier to use PowerBI.” Yeah, sounds like a lot of bureaucracy just to get some data flowing. But then I started thinking—what if they’re not? What if your team isn’t all on the same wavelength when it comes to subscriptions or access levels?
 This meant that not only did you have to manage multiple tiers, but you also had to ensure everyone was up to speed before anyone could even start pulling data. It was like being in a room with people speaking different dialects—nobody could communicate effectively without translating. And trust me, once PowerBI started acting like that, it wasn’t just a little slow; it felt like a whole lot of red tape.
 ## Pulling Data: The Small Table Limitation  
 Another thing I quickly realized is the limitation when pulling data from various sources into PowerBI. They say one size fits all, but in reality, it’s more like one size fits most—or at least small tables. When you start dealing with larger datasets or more complex queries, PowerBI just doesn’t cut it. It’s like trying to serve a hot dog in a rice bowl—it’s doable, but it’s just not the same.
 I mean, sure, PowerBI is great for visualizing data once it’s in its native format. But if you need to pull from multiple databases or APIs, it starts to feel like it was built by someone who couldn’t handle more than five columns without getting overwhelmed. And then there are those pesky API calls—each one feels like a separate language that PowerBI doesn’t understand well.
 ## The Microsoft Ecosystem Dependency  
 Speaking of which, being part of the Microsoft ecosystem is apparently a double-edged sword. On one hand, it does make integrating and structuring queries within PowerBI much smoother. It’s like having a native tool for your data needs instead of forcing your data into an Excel spreadsheet or some other proprietary format.
 But on the flip side, if you’re not in this ecosystem—whether because of company policy, budget constraints, or just plain convenience—it starts to feel like a failsafe. Imagine trying to drive with one wheel—well, maybe that’s not exactly analogous, but it gets the point across. Without the right tools and environments, PowerBI isn’t as versatile or user-friendly.
 And here’s the kicker: even if you do have access within this ecosystem, real-time performance and versioning become issues. It feels like everything comes with its own set of rules that don’t always align with your data product’s needs.
 ## The Visualization vs. Performance Trade-Off  
 Now, I know what some of you are thinking—PowerBI is all about making data beautiful, right? And it does a fantastic job at that. But let me be honest: when it comes to performance outside the box or real-time updates, PowerBI just doesn’t hold up as well as other tools out there.
 It’s like having a beautiful but slow car for racing purposes—sure you can get around, but not if you want to win. Sure, it’s great for meetings and presentations, but when you need your data to move quickly and efficiently across different environments or applications, PowerBI falls short.
 ## The Takeaway  
 So after all that, here’s my bottom line: unless you’re in the Microsoft ecosystem—top to tail—you might be better off looking elsewhere. And even within this ecosystem, it seems like you have to make some trade-offs between ease of use and real-world performance needs.
 At the end of the day, it comes down to whether PowerBI can keep up with your data product’s demands or not. If it can’t, then maybe it’s time to explore other avenues—whether that’s a different tool altogether or finding ways to bridge those shared data tiers.
 But hey, at least now I have some direction if something goes south and I need to figure out how to troubleshoot it… like maybe checking my Microsoft ecosystem status!
--- a/requirements.txt
+++ b/requirements.txt
@ -2,3 +2,7 @@ ollama
 trilium-py
 gitpython
 PyGithub
 chromadb
 langchain-ollama
 PyJWT
 dotenv
--- a/src/ai_generators/ollama_md_generator.py
+++ b/src/ai_generators/ollama_md_generator.py
@ -1,40 +1,160 @@
-import os
+import os, re, json, random, time, string
 from ollama import Client
 import chromadb
 from langchain_ollama import ChatOllama
 class OllamaGenerator:
-    def __init__(self, title: str, content: str, model: str):
+    def __init__(self, title: str, content: str, inner_title: str):
        self.title = title
        self.inner_title = inner_title
        self.content = content
        self.response = None
        print("In Class")
        print(os.environ["CONTENT_CREATOR_MODELS"])
        try:
            chroma_port = int(os.environ['CHROMA_PORT'])
        except ValueError as e:
            raise Exception(f"CHROMA_PORT is not an integer: {e}")
        self.chroma = chromadb.HttpClient(host=os.environ['CHROMA_HOST'], port=chroma_port)
        ollama_url = f"{os.environ["OLLAMA_PROTOCOL"]}://{os.environ["OLLAMA_HOST"]}:{os.environ["OLLAMA_PORT"]}"
        self.ollama_client = Client(host=ollama_url)
-        self.ollama_model = model
+        self.ollama_model = os.environ["EDITOR_MODEL"]
        self.embed_model = os.environ["EMBEDDING_MODEL"]
        self.agent_models = json.loads(os.environ["CONTENT_CREATOR_MODELS"])
        self.llm = ChatOllama(model=self.ollama_model, temperature=0.6, top_p=0.5) #This  is the level head in the room
        self.prompt_inject = f"""
            You are a journalist, Software Developer and DevOps expert
            writing a 3000 word draft blog article for other tech enthusiasts.
            You like to use almost no code examples and prefer to talk
            in a light comedic tone. You are also Australian
            As this person write this blog as a markdown document.
            The title for the blog is {self.inner_title}.
            Do not output the title in the markdown.
            The basis for the content of the blog is:
                <blog>{self.content}</blog>
            """
    def split_into_chunks(self, text, chunk_size=100):
        '''Split text into chunks of size chunk_size'''
        words = re.findall(r'\S+', text)
        chunks = []
        current_chunk = []
        word_count = 0
        for word in words:
            current_chunk.append(word)
            word_count += 1
        if word_count >= chunk_size:
            chunks.append(' '.join(current_chunk))
            current_chunk = []
            word_count = 0
        if current_chunk:
            chunks.append(' '.join(current_chunk))
        return chunks
    def generate_draft(self, model) -> str:
        '''Generate a draft blog post using the specified model'''
        try:
            # the idea behind this is to make the "creativity" random amongst the content creators
            # contorlling temperature will allow cause the output to allow more "random" connections in sentences
            # Controlling top_p will tighten or loosen the embedding connections made
            # The result should be varied levels of "creativity" in the writing of the drafts
            # for more see https://python.langchain.com/v0.2/api_reference/ollama/chat_models/langchain_ollama.chat_models.ChatOllama.html
            temp = random.uniform(0.5, 1.0)
            top_p = random.uniform(0.4, 0.8)
            top_k = int(random.uniform(30, 80))
            agent_llm = ChatOllama(model=model, temperature=temp, top_p=top_p, top_k=top_k)
            messages = [
                ("system", self.prompt_inject),
                ("human", "make the blog post in a format to be edited easily" )
            ]
            response = agent_llm.invoke(messages)
            # self.response = self.ollama_client.chat(model=model,
            #                                         messages=[
            #         {
            #             'role': 'user',
            #             'content': f'{self.prompt_inject}',
            #         },
            #     ])
            #print ("draft")
            #print (response)
            return response.text()#['message']['content']
        except Exception as e:
            raise Exception(f"Failed to generate blog draft: {e}")
    def get_draft_embeddings(self, draft_chunks):
        '''Get embeddings for the draft chunks'''
        embeds = self.ollama_client.embed(model=self.embed_model, input=draft_chunks)
        return embeds.get('embeddings', [])
    def id_generator(self, size=6, chars=string.ascii_uppercase + string.digits):
        return ''.join(random.choice(chars) for _ in range(size))
    def load_to_vector_db(self):
        '''Load the generated blog drafts into a vector database'''
        collection_name = f"blog_{self.title.lower().replace(" ", "_")}_{self.id_generator()}"
        collection = self.chroma.get_or_create_collection(name=collection_name)#, metadata={"hnsw:space": "cosine"})
        #if any(collection.name == collectionname for collectionname in self.chroma.list_collections()):
        #    self.chroma.delete_collection("blog_creator")
        for model in self.agent_models:
            print (f"Generating draft from {model} for load into vector database")
            draft_chunks = self.split_into_chunks(self.generate_draft(model))
            print(f"generating embeds")
            embeds = self.get_draft_embeddings(draft_chunks)
            ids = [model + str(i) for i in range(len(draft_chunks))]
            chunknumber = list(range(len(draft_chunks)))
            metadata = [{"model_agent": model} for index in chunknumber]
            print(f'loading into collection')
            collection.add(documents=draft_chunks, embeddings=embeds, ids=ids, metadatas=metadata)
        return collection
    def generate_markdown(self) -> str:
-        prompt = f"""
+        prompt_system = f"""
-            You are a Software Developer and DevOps expert
+            You are an editor taking information from {len(self.agent_models)} Software
-            who has transistioned in Developer Relations 
+            Developers and Data experts
-            writing a 1000 word blog for other tech enthusiast.
+            writing a 3000 word blog article. You like when they use almost no code examples.
-            You like to use almost no code examples and prefer to talk
+            You are also Australian. The content may have light comedic elements,
-            in a light comedic tone. You are also Australian 
+            you are more professional and will attempt to tone these down
-            As this person write this blog as a markdown document.
+            As this person produce the final version of this blog as a markdown document
-            The title for the blog is {self.title}.
+            keeping in mind the context provided by the previous drafts.
-            Do not output the title in the markdown.
+            The title for the blog is {self.inner_title}.
            Do not output the title in the markdown. Avoid repeated sentences
            The basis for the content of the blog is:
-                {self.content}
+                <blog>{self.content}</blog>
            Only output markdown DO NOT GENERATE AN EXPLANATION
            """
        try:
-            self.response = self.ollama_client.chat(model=self.ollama_model,
+            query_embed = self.ollama_client.embed(model=self.embed_model, input=prompt_system)['embeddings']
-                                                    messages=[
+            collection = self.load_to_vector_db()
-                    {
+            collection_query = collection.query(query_embeddings=query_embed, n_results=100)
-                        'role': 'user',
+            print("Showing pertinent info from drafts used in final edited edition")
-                        'content': f'{prompt}',
+            pertinent_draft_info = '\n\n'.join(collection.query(query_embeddings=query_embed, n_results=100)['documents'][0])
-                    },
+            #print(pertinent_draft_info)
-                ])
+            prompt_human = f"""Generate the final, 3000 word, draft of the blog using this information from the drafts: <context>{pertinent_draft_info}</context>
-            return self.response['message']['content']
+                            - Only output in markdown, do not wrap in markdown tags, Only provide the draft not a commentary on the drafts in the context
                            """
            print("Generating final document")
            messages = [("system", prompt_system), ("human", prompt_human),]
            self.response = self.llm.invoke(messages).text()
            # self.response = self.ollama_client.chat(model=self.ollama_model,
            #                                         messages=[
            #         {
            #             'role': 'user',
            #             'content': f'{prompt_enhanced}',
            #         },
            #     ])
            #print ("Markdown Generated")
            #print (self.response)
            return self.response#['message']['content']
        except Exception as e:
            raise Exception(f"Failed to generate markdown: {e}")
@ -42,3 +162,8 @@ class OllamaGenerator:
    def save_to_file(self, filename: str) -> None:
        with open(filename, "w") as f:
            f.write(self.generate_markdown())
    def generate_system_message(self, prompt_system, prompt_human):
        messages = [("system", prompt_system), ("human", prompt_human),]
        ai_message = self.llm.invoke(messages).text()
        return ai_message
--- a/src/main.py
+++ b/src/main.py
@ -1,5 +1,13 @@
 import ai_generators.ollama_md_generator as omg
 import trilium.notes as tn
 import repo_management.repo_manager as git_repo
 from notifications.n8n import N8NWebhookJwt
 import string,os
 from datetime import datetime
 from dotenv import load_dotenv
 load_dotenv()
 print(os.environ["CONTENT_CREATOR_MODELS"])
 tril = tn.TrilumNotes()
@ -7,16 +15,66 @@ tril.get_new_notes()
 tril_notes = tril.get_notes_content()
-def convert_to_lowercase_with_underscores(string):
+def convert_to_lowercase_with_underscores(s):
-    return string.lower().replace(" ", "_")
+    allowed = set(string.ascii_letters + string.digits + ' ')
    filtered_string = ''.join(c for c in s if c in allowed)
    return filtered_string.lower().replace(" ", "_")
 for note in tril_notes:
    print(tril_notes[note]['title'])
    # print(tril_notes[note]['content'])
    print("Generating Document")
-    ai_gen = omg.OllamaGenerator(tril_notes[note]['title'],
+
                                 tril_notes[note]['content'],
                                 "deepseek-r1:7b")
    os_friendly_title = convert_to_lowercase_with_underscores(tril_notes[note]['title'])
-    ai_gen.save_to_file(f"/blog_creator/generated_files/{os_friendly_title}.md")
+    ai_gen = omg.OllamaGenerator(os_friendly_title,
                                 tril_notes[note]['content'],
                                 tril_notes[note]['title'])
    blog_path = f"generated_files/{os_friendly_title}.md"
    ai_gen.save_to_file(blog_path)
    # Generate commit messages and push to repo
    print("Generating Commit Message")
    git_sytem_prompt = "You are a blog creator commiting a piece of content to a central git repo"
    git_human_prompt = f"Generate a 5 word git commit message describing {ai_gen.response}. ONLY OUTPUT THE RESPONSE"
    commit_message = ai_gen.generate_system_message(git_sytem_prompt, git_human_prompt)
    git_user = os.environ["GIT_USER"]
    git_pass = os.environ["GIT_PASS"]
    repo_manager = git_repo.GitRepository("blog/", git_user, git_pass)
    print("Pushing to Repo")
    repo_manager.create_copy_commit_push(blog_path, os_friendly_title, commit_message)
    # Generate notification for Matrix
    print("Generating Notification Message")
    git_branch_url = f'https://git.aridgwayweb.com/armistace/blog/src/branch/{os_friendly_title}/src/content/{os_friendly_title}.md'
    n8n_system_prompt = f"You are a blog creator notifiying the final editor of the final creation of blog available at {git_branch_url}"
    n8n_prompt_human = f"""
        Generate an informal 100 word
        summary describing {ai_gen.response}.
        Don't address it or use names. ONLY OUTPUT THE RESPONSE.
        ONLY OUTPUT IN PLAINTEXT STRIP ALL MARKDOWN
        """
    notification_message = ai_gen.generate_system_message(n8n_system_prompt, n8n_prompt_human)
    secret_key = os.environ['N8N_SECRET']
    webhook_url = os.environ['N8N_WEBHOOK_URL']
    notification_string = f"""
        <h2>{tril_notes[note]['title']}</h2>
        <h3>Summary</h3>
        <p>{notification_message}</p>
        <h3>Branch</h3>
        <p>{os_friendly_title}</p>
        <p><a href="{git_branch_url}">Link to Branch</a></p>
    """
    payload = {
        "message": f"{notification_string}",
        "timestamp": datetime.now().isoformat()
    }
    webhook_client = N8NWebhookJwt(secret_key, webhook_url)
    print("Notifying")
    n8n_result = webhook_client.send_webhook(payload)
    print(f"N8N response: {n8n_result['status']}")
--- a/src/notifications/init.py
+++ b/src/notifications/init.py
--- a/src/notifications/n8n.py
+++ b/src/notifications/n8n.py
@ -0,0 +1,45 @@
 from datetime import datetime, timedelta
 import jwt
 import requests
 from typing import Dict, Optional
 class N8NWebhookJwt:
    def __init__(self, secret_key: str, webhook_url: str):
        self.secret_key = secret_key
        self.webhook_url = webhook_url
        self.token_expiration = datetime.now() + timedelta(hours=1)
    def _generate_jwt_token(self, payload: Dict) -> str:
        """Generate JWT token with the given payload."""
        # Include expiration time (optional)
        payload["exp"] = self.token_expiration.timestamp()
        encoded_jwt = jwt.encode(
                    payload,
                    self.secret_key,
                    algorithm="HS256",
                )
        return encoded_jwt #jwt.decode(encoded_jwt, self.secret_key, algorithms=['HS256'])
    def send_webhook(self, payload: Dict) -> Dict:
        """Send a webhook request with JWT authentication."""
        # Generate JWT token
        token = self._generate_jwt_token(payload)
        # Set headers with JWT token
        headers = {
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json"
        }
        # Send POST request
        response = requests.post(
            self.webhook_url,
            json=payload,
            headers=headers
        )
        # Handle response
        if response.status_code == 200:
            return {"status": "success", "response": response.json()}
        else:
            return {"status": "error", "response": response.status_code, "message": response.text}
--- a/src/repo_management/push_markdown.py
+++ b/src/repo_management/push_markdown.py
@ -1,48 +0,0 @@
 import os
 import sys
 from git import Repo
 # Set these variables accordingly
 REPO_OWNER = "your_repo_owner"
 REPO_NAME = "your_repo_name"
 def clone_repo(repo_url, branch="main"):
    Repo.clone_from(repo_url, ".", branch=branch)
 def create_markdown_file(file_name, content):
    with open(f"{file_name}.md", "w") as f:
        f.write(content)
 def commit_and_push(file_name, message):
    repo = Repo(".")
    repo.index.add([f"{file_name}.md"])
    repo.index.commit(message)
    repo.remote().push()
 def create_new_branch(branch_name):
    repo = Repo(".")
    repo.create_head(branch_name).checkout()
    repo.head.reference.set_tracking_url(f"https://your_git_server/{REPO_OWNER}/{REPO_NAME}.git/{branch_name}")
    repo.remote().push()
 if __name__ == "__main__":
    if len(sys.argv) < 3:
        print("Usage: python push_markdown.py <repo_url> <markdown_file_name>")
        sys.exit(1)
    repo_url = sys.argv[1]
    file_name = sys.argv[2]
    # Clone the repository
    clone_repo(repo_url)
    # Create a new Markdown file with content
    create_markdown_file(file_name, "Hello, World!\n")
    # Commit and push changes to the main branch
    commit_and_push(file_name, f"Add {file_name}.md")
    # Create a new branch named after the Markdown file
    create_new_branch(file_name)
    print(f"Successfully created '{file_name}' branch with '{file_name}.md'.")
--- a/src/repo_management/repo_manager.py
+++ b/src/repo_management/repo_manager.py
@ -1,35 +1,102 @@
-import os
+import os, shutil
-from git import Git
+from urllib.parse import quote
-from git.repo import BaseRepository
+from git import Repo
-from git.exc import InvalidGitRepositoryError
+from git.exc import GitCommandError
 from git.remote import RemoteAction
-# Set the path to your blog repo here
+class GitRepository:
-blog_repo = "/path/to/your/blog/repo"
+    # This is designed to be transitory it will desctruvtively create the repo at repo_path
    # if you have uncommited changes you can kiss them goodbye!
    # Don't use the repo created by this function for dev -> its a tool!
    # It is expected that  when used you will add, commit, push, delete
    def __init__(self, repo_path, username=None, password=None):
        git_protocol = os.environ["GIT_PROTOCOL"]
        git_remote = os.environ["GIT_REMOTE"]
        #if username is not set we don't need parse to the url
        if username==None or password == None:
            remote = f"{git_protocol}://{git_remote}"
        else:
            # of course if it is we need to parse and escape it so that it
            # can generate a url
            git_user = quote(username)
            git_password = quote(password)
            remote = f"{git_protocol}://{git_user}:{git_password}@{git_remote}"
-# Checkout a new branch and create a new file for our blog post
+        if os.path.exists(repo_path):
-branch_name = "new-post"
+            shutil.rmtree(repo_path)
-try:
+        self.repo_path = repo_path
-    repo = Git(blog_repo)
+        print("Cloning Repo")
-    repo.checkout("-b", branch_name, "origin/main")
+        Repo.clone_from(remote, repo_path)
-    with open("my-blog-post.md", "w") as f:
+        self.repo = Repo(repo_path)
-        f.write(content)
+        self.username = username
-except InvalidGitRepositoryError:
+        self.password = password
    # Handle repository errors gracefully
    pass
-# Add and commit the changes to Git
+    def clone(self, remote_url, destination_path):
-repo.add("my-blog-post.md")
+        """Clone a Git repository with authentication"""
-repo.commit("-m", "Added new blog post about DevOps best practices.")
+        try:
            self.repo.clone(remote_url, destination_path)
            return True
        except GitCommandError as e:
            print(f"Cloning failed: {e}")
            return False
-# Push the changes to Git and create a PR
+    def fetch(self, remote_name='origin', ref_name='main'):
-repo.remote().push("refs/heads/{0}:refs/for/main".format(branch_name), "--set-upstream")
+        """Fetch updates from a remote repository with authentication"""
-base_branch = "origin/main"
+        try:
-target_branch = "main"
+            self.repo.remotes[remote_name].fetch(ref_name=ref_name)
-pr_title = "DevOps best practices"
+            return True
-try:
+        except GitCommandError as e:
-    repo.create_head("{0}-{1}", base=base_branch, message="{}".format(pr_title))
+            print(f"Fetching failed: {e}")
-except RemoteAction.GitExitStatus as e:
+            return False
    # Handle Git exit status errors gracefully
    pass
    def pull(self, remote_name='origin', ref_name='main'):
        """Pull updates from a remote repository with authentication"""
        print("Pulling Latest Updates (if any)")
        try:
            self.repo.remotes[remote_name].pull(ref_name)
            return True
        except GitCommandError as e:
            print(f"Pulling failed: {e}")
            return False
    def get_branches(self):
        """List all branches in the repository"""
        return [branch.name for branch in self.repo.branches]
    def create_and_switch_branch(self, branch_name, remote_name='origin', ref_name='main'):
        """Create a new branch in the repository with authentication."""
        try:
            print(f"Creating Branch {branch_name}")
            # Use the same remote and ref as before
            self.repo.git.branch(branch_name)
        except GitCommandError:
            print("Branch already exists switching")
            # ensure remote commits are pulled into local
        self.repo.git.checkout(branch_name)
    def add_and_commit(self, message=None):
        """Add and commit changes to the repository."""
        try:
            print("Commiting latest draft")
            # Add all changes
            self.repo.git.add(all=True)
            # Commit with the provided message or a default
            if message is None:
                commit_message = "Added and committed new content"
            else:
                commit_message = message
            self.repo.git.commit(message=commit_message)
            return True
        except GitCommandError as e:
            print(f"Commit failed: {e}")
            return False
    def create_copy_commit_push(self, file_path, title, commit_messge):
        self.create_and_switch_branch(title)
        self.pull(ref_name=title)
        shutil.copy(f"{file_path}", f"{self.repo_path}src/content/")
        self.add_and_commit(f"'{commit_messge}'")
        self.repo.git.push()
--- a/src/trilium/notes.py
+++ b/src/trilium/notes.py
@ -11,16 +11,20 @@ class TrilumNotes:
        self.token = os.environ.get('TRILIUM_TOKEN')
        if not all([self.protocol, self.host, self.port, self.tril_pass]):
            print("One or more required environment variables not found. Have you set a .env?")
-        
+
        self.server_url = f'{self.protocol}://{self.host}:{self.port}'
-        
+
        if not self.token:
            print("Please run get_token and set your token")
        else:
            self.ea = ETAPI(self.server_url, self.token)
-    
+        self.new_notes = None
        self.note_content = None
    def get_token(self):
        ea = ETAPI(self.server_url)
        if self.tril_pass == None:
            raise ValueError("Trillium password can not be none")
        token = ea.login(self.tril_pass)
        print(token)
        print("I would recomend you update the env file with this tootsweet!")
@ -40,10 +44,11 @@ class TrilumNotes:
    def get_notes_content(self):
        content_dict = {}
        if self.new_notes is None:
            raise ValueError("How did you do this? new_notes is None!")
        for note in self.new_notes['results']:
-            content_dict[note['noteId']] = {"title" : f"{note['title']}", 
+            content_dict[note['noteId']] = {"title" : f"{note['title']}",
                                            "content" : f"{self._get_content(note['noteId'])}"
                                            }
        self.note_content = content_dict
        return content_dict
Author	SHA1	Message	Date
armistace	bce439921f	Generate tags around context All checks were successful Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Successful in 27m22s Details	2025-06-16 10:35:21 +10:00
armistace	2de2d0fe3a	Merge pull request 'prompt enhancement' (#16 ) from prompt_fix into master All checks were successful Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Successful in 11m44s Details Reviewed-on: #16	2025-06-06 12:04:44 +10:00
armistace	cf795bbc35	prompt enhancement	2025-06-06 12:04:19 +10:00
armistace	a6ed20451a	Merge pull request 'pipeline_creation' (#15 ) from pipeline_creation into master All checks were successful Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Successful in 25m13s Details Reviewed-on: #15	2025-06-05 09:22:50 +10:00
armistace	7fd32b3024	improve notification prompt	2025-06-05 09:22:28 +10:00
armistace	a88d233c6b	remove tail and improve notification prompt	2025-06-05 09:22:19 +10:00
armistace	2abc39e3ac	Merge pull request 'pipeline_creation' (#14 ) from pipeline_creation into master All checks were successful Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Successful in 30m33s Details Reviewed-on: #14	2025-06-05 08:43:23 +10:00
armistace	f430998137	typo	2025-06-05 08:42:45 +10:00
armistace	8dceb79d91	remove repo reference	2025-06-05 08:41:32 +10:00
armistace	6c5b0f778d	remove trailing slash	2025-06-05 08:41:32 +10:00
armistace	37ed8fd0f9	fix git for pipeline"	2025-06-05 08:40:59 +10:00
armistace	0594ea54aa	remove repo reference Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Failing after 27m5s Details	2025-06-05 01:02:42 +10:00
armistace	60f7473297	remove trailing slash Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Has been cancelled Details	2025-06-05 01:00:58 +10:00
armistace	ec69e8e4f7	Merge pull request 'do it right' (#13 ) from pipeline_creation into master Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Failing after 8m58s Details Reviewed-on: #13	2025-06-05 00:47:05 +10:00
armistace	62b1175aeb	do it right	2025-06-05 00:46:45 +10:00
armistace	41f804a1eb	Merge pull request 'pipeline_creation' (#12 ) from pipeline_creation into master Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Has been cancelled Details Reviewed-on: #12	2025-06-05 00:45:53 +10:00
armistace	f50d076164	cleanup	2025-06-05 00:45:39 +10:00
armistace	fc4f9c5053	dealing with pipeline weirdness	2025-06-05 00:44:57 +10:00
armistace	e3262cd366	Merge pull request 'weird trailing newline"' (#11 ) from pipeline_creation into master Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Failing after 10m26s Details Reviewed-on: #11	2025-06-05 00:12:04 +10:00
armistace	341f3d8623	weird trailing newline" "	2025-06-05 00:11:44 +10:00
armistace	e2c29204fa	Merge pull request 'pipeline_creation' (#10 ) from pipeline_creation into master Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Has been cancelled Details Reviewed-on: #10	2025-06-04 23:48:10 +10:00
armistace	f0e6a0cb52	load_dotenv work different?	2025-06-04 23:47:53 +10:00
armistace	7f0b0376d1	load_dotenv work different?	2025-06-04 23:47:05 +10:00
armistace	44b5ea6a68	Merge pull request 'load_dotenv work different?' (#9 ) from pipeline_creation into master Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Failing after 9m17s Details Reviewed-on: #9	2025-06-04 22:55:26 +10:00
armistace	a49457094d	load_dotenv work different?	2025-06-04 22:54:09 +10:00
armistace	9296fda390	Merge pull request 'tail the .env so we can see it in pipelin' (#8 ) from pipeline_creation into master Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Failing after 9m5s Details Reviewed-on: #8	2025-06-04 22:44:14 +10:00
armistace	bb0d9090f3	tail the .env so we can see it in pipelin	2025-06-04 22:43:53 +10:00
armistace	703a2384e7	Merge pull request 'sigh stray U' (#7 ) from pipeline_creation into master Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Failing after 9m2s Details Reviewed-on: #7	2025-06-04 22:30:36 +10:00
armistace	4b3f00c325	Merge branch 'master' into pipeline_creation	2025-06-04 22:29:42 +10:00
armistace	38dfe404d1	sigh stray U	2025-06-04 22:29:12 +10:00
armistace	347ac63f86	Merge pull request 'helps to install virtualenv' (#6 ) from pipeline_creation into master Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Failing after 8m59s Details Reviewed-on: #6	2025-06-04 22:17:46 +10:00
armistace	506758f67d	helps to install virtualenv	2025-06-04 22:17:11 +10:00
armistace	f0572ba9fb	Merge pull request 'pipeline_creation' (#5 ) from pipeline_creation into master Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Failing after 7m5s Details Reviewed-on: #5	2025-06-04 22:08:28 +10:00
armistace	4686f3fae0	y in right place	2025-06-04 22:07:47 +10:00
armistace	ea1c8cfb13	add y to apt call	2025-06-04 22:07:15 +10:00
armistace	9ca7578d28	Merge pull request 'pipeline_creation' (#4 ) from pipeline_creation into master Some checks failed Create Blog Article if new notes exist / prepare_blog_drafts_and_push (push) Failing after 4m28s Details Reviewed-on: #4	2025-06-04 22:02:00 +10:00
armistace	64b466c4ac	load dotenv in main.py	2025-06-04 22:01:15 +10:00
armistace	49174de9ff	correct pipeline titles	2025-06-04 21:59:33 +10:00
armistace	59f9f01c69	first cut at pipeline	2025-06-04 21:48:59 +10:00
armistace	a7eae4b09f	Merge pull request 'matrix_notifications' (#3 ) from matrix_notifications into master Reviewed-on: #3	2025-06-04 21:34:12 +10:00
armistace	c466b04a25	matrix notifications and config driven chroma	2025-06-04 21:32:51 +10:00
armistace	431e5c63aa	first pass at docker run	2025-06-04 16:56:08 +10:00
armistace	6e117e3ce9	language cleanup for integration testing	2025-06-02 12:32:21 +10:00
armistace	9a9228bc07	Merge pull request 'repo_work_fix' (#2 ) from repo_work_fix into master Reviewed-on: #2	2025-05-30 17:47:31 +10:00
armistace	2dd371408f	trying for the hard fix	2025-05-30 17:25:13 +10:00
armistace	0005ad1fd3	hard reset for the repo work	2025-05-30 17:20:58 +10:00
Andrew Ridgway	446978704d	further directory cleanup	2025-01-24 04:51:50 +00:00
Andrew Ridgway	f24bd5b361	cleanup directory	2025-01-24 04:44:23 +00:00
Andrew Ridgway	4d5c27cfaa	clean up	2025-01-24 04:42:04 +00:00
Andrew Ridgway	d45f0be314	env set up for remote	2025-01-24 04:41:14 +00:00
Andrew Ridgway	e1a24aff20	get rid of think tags	2025-01-24 02:17:05 +00:00