fix dockerifle

untested git stuff
env vars and starting work on repo_manager
2025-05-26 00:18:07 +10:00 · 2025-05-24 00:25:35 +10:00 · 2025-05-23 15:47:25 +10:00 · 2025-05-19 11:38:15 +10:00 · 2025-05-19 11:28:10 +10:00 · 2025-05-19 11:07:41 +10:00
15 changed files with 378 additions and 233 deletions
--- a/.gitignore
+++ b/.gitignore
@ -2,3 +2,6 @@
 __pycache__
 .venv
 .aider*
+.vscode
+.zed
+pyproject.toml
--- a/2
+++ b/2
@ -7,7 +7,7 @@ ENV PYTHONUNBUFFERED 1

 ADD src/ /blog_creator

-RUN apt-get update && apt-get install -y rustc cargo python-is-python3 pip python3.12-venv libmagic-dev 
+RUN apt-get update && apt-get install -y rustc cargo python-is-python3 pip python3-venv libmagic-dev git 

 RUN python -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"
--- a/README.md
+++ b/README.md
@ -3,10 +3,19 @@
 This creator requires you to use a working Trilium Instance and create a .env file with the following

 ```
-TRILIUM_HOST
-TRILIUM_PORT
-TRILIUM_PROTOCOL
-TRILIUM_PASS
+TRILIUM_HOST=
+TRILIUM_PORT=
+TRILIUM_PROTOCOL=
+TRILIUM_PASS=
+TRILIUM_TOKEN=
+OLLAMA_PROTOCOL=
+OLLAMA_HOST=
+OLLAMA_PORT=11434
+EMBEDDING_MODEL=
+EDITOR_MODEL=
+# This is expected in python list format example `[phi4-mini:latest, qwen3:1.7b, gemma3:latest]`
+CONTENT_CREATOR_MODELS=
+CHROMA_SERVER=<IP_ADDRESS>
 ```

 This container is going to be what I use to trigger a blog creation event
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -1,3 +1,7 @@
+networks:
+    net:
+        driver: bridge
+
 services:
    blog_creator:
        build:
@ -8,4 +12,33 @@ services:
            - .env
        volumes:
            - ./generated_files/:/blog_creator/generated_files
+        networks:
+            - net

+    chroma:
+        image: chromadb/chroma
+        container_name: chroma
+        volumes:
+            # Be aware that indexed data are located in "/chroma/chroma/"
+            # Default configuration for persist_directory in chromadb/config.py
+            # Read more about deployments: https://docs.trychroma.com/deployment
+            - chroma-data:/chroma/chroma
+        #command: "--host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30"
+        environment:
+            - IS_PERSISTENT=TRUE
+        restart: unless-stopped # possible values are: "no", always", "on-failure", "unless-stopped"
+        ports:
+            - "8000:8000"
+        healthcheck:
+            # Adjust below to match your container port
+            test:
+                ["CMD", "curl", "-f", "http://localhost:8000/api/v2/heartbeat"]
+            interval: 30s
+            timeout: 10s
+            retries: 3
+        networks:
+            - net
+
+volumes:
+    chroma-data:
+        driver: local
--- a/generated_files/creating_an_ollama_blog_writer.md
+++ b/generated_files/creating_an_ollama_blog_writer.md
@ -1,83 +1,29 @@
-<think>
-Alright, I've got this query from someone who wants to create an Ollama Blog Writer using Python. Let me break down what they're asking for.
+```markdown
+# Creating an Ollama Blog Writer: A Hilariously Tedious Adventure

-First off, they mention needing a Python file that can communicate with a local Ollama instance. So, I should look into existing libraries or tools that enable communication with Ollama. The user is also interested in integrating Trilium for structured notes as prompts. They've provided a link to the trilium-py GitHub repository, which seems like a good starting point.
+Hey tech enthusiasts! 👋 I’m back with another installment of my tech journey, but this time it’s personal. I decided to create a Python script that not only writes blogs for me (please don’t tell my boss), but also uses Ollama for some AI-assisted content creation and connects with Trilium for structured note-taking. Let’s dive into the details!

-Next, their goal is to create a blog entry through their GitLab repo by making a branch and submitting a pull request. They want the PR content approved before proceeding further. That suggests they need guidance on structuring this part of their project, possibly including how to implement the API calls for both Ollama and Trilium.
+### Step 1: Get Your Ollama On

-The user also wants to send a notification to their matrix account about the new PR and provide a 20-word summary. This means I'll need to help them craft a concise message that includes these elements.
+First things first, I needed a Python file that could talk to my local Ollama instance. If you haven't heard of Ollama, it's like a tiny llama in your terminal that helps with text generation. It took me a while to figure out how to configure the `.env` file and set up the connection properly. But once I did, I was off to a running start!

-Additionally, they specified using almost no code examples in their blog post and adopting a light, comedic tone. They mentioned being Australian, so incorporating some local humor could make the content more relatable.
+### Step 2: Connecting Trilium for Structured Notes

-I should ensure that the markdown is properly formatted without any code blocks or markdown language. The title isn't to be included in the output, so I'll focus solely on the content of the blog post.
+For this part, I used a Python library called `trilium-py` (because why not?). It's like having a brain that can store and retrieve information in an organized way. To make sure my notes are super structured, I had to find the right prompts and ensure they were fed into Ollama correctly. This part was more about figuring out how to structure the data than actually coding—but hey, it’s all part of the fun!

-Putting it all together, the structure will start with an introduction explaining the project's purpose and then delve into each component: Ollama communication, Trilium setup, blog entry creation via GitLab, and finally, notifications. Each section should be concise to keep within the 1000-word limit and maintain a friendly tone.
+### Step 3: Automating the Blog Creation

-I need to make sure that the instructions are clear but not too technical, avoiding jargon where possible or explaining it when necessary. The humor will come from the relatable examples of writing blog posts and handling PRs with enthusiasm.
-</think>
+Now that I have my notes and AI-generated content sorted, it was time to automate the blog creation process. Here’s where things got a bit Git-y (yes, I made up that word). I wrote a script that would create a new branch in our company's blog repo, push the changes, and voilà—a PR! Just like that, my humble contributions were ready for review by the big boss.

-# Creating an Ollama Blog Writer  
+### Step 4: Sending Notifications to Matrix

-Alright, fellow tech enthusiasts! Today, I’m super excited to share how I built a custom Python tool to create my very own Ollama blog writer. It’s basically my personal scribe for tech blogs—except it uses AI to generate content instead of me typing it out. Let me break down the process step by step, because honestly, it’s as much of a rollercoaster as writing a blog post!  
+Finally, as any good DevRel should do, I sent out a notification to our internal Matrix channel. It’s like Slack but with more tech talk and less memes about dogs in hats. The message was short and sweet—just a summary of the blog changes and a request for feedback. Hey, if Elon can tweet at Tesla shareholders, why not send a quick matrix message?

-## Step 1: Communicating with Ollama  
+### Wrapping Up

-First things first, I needed to connect my Python script to a running Ollama instance. Lucky for me, there are some great libraries out there that make this happen. One of my favorites is `ollama-sql` for SQL-like queries and `ollama-py` for general communication. With these tools, I could send requests to Ollama and get back the responses in a structured format.  
+Creating this Ollama Blog Writer wasn’t just about writing better blogs (though that would be nice). It was about embracing the joy of automation and the occasional struggle to get things working right. I learned a lot about Python libraries, local server configurations, and how to communicate effectively with my team via Matrix.

-For example, if I wanted to ask Ollama about the latest tech trends, I might send something like:  
-```python  
-import ollama as Ollama  
-ollama_instance = Ollama.init()  
-response = ollama_instance.query("What are the top AI developments this year?")  
-print(response)  
+So there you have it—a step-by-step guide on how not to write blogs but definitely how to automate the process. If you’re into tech, automation, or just want to laugh at someone else’s coding mishaps, this blog is for you!
+
+Keep on hacking (and automating), [Your Name]
 ```
-
-This would give me a JSON response that I could parse and use for my blog. Easy peasy!  
-
-## Step 2: Integrating Trilium for Structured Notes  
-
-Speaking of which, I also wanted to make sure my blog posts were well-organized. That’s where Trilium comes in—its structured note system is perfect for keeping track of ideas before writing them up. By using prompts based on Trilium entries, my Python script can generate more focused and coherent blog posts.  
-
-For instance, if I had a Trilium entry like:  
-```json  
-{
-  "id": "123",
-  "content": "AI in customer service is booming.",
-  "type": "thought"
-}
-```
-I could use that as a prompt to generate something like:  
-*"In the rapidly evolving landscape of AI applications, customer service has taken a quantum leap with AI-powered platforms...."*  
-
-Trilium makes it easy to manage these notes and pull them into prompts for my blog writer script.  
-
-## Step 3: Creating Blog Entries in My GitLab Repo  
-
-Now, here’s where things get interesting (and slightly nerve-wracking). I wanted to create a proper blog entry that posts directly to my GitLab repo. So, I forked the [aridgwayweb/blog](https://git.aridgwayweb.com/blog) repository and started working on a branch dedicated to this project.  
-
-In my `create_blog_entry.py` script, I used GitLab’s API to create a new entry. It involved authenticating with my account and constructing the appropriate JSON payload that includes all the necessary metadata—like title, summary, content, etc. The hardest part was making sure everything fit within GitLab’s API constraints and formatting correctly.  
-
-Here’s an excerpt of what I sent:  
-```python  
-import gitlab  
-gl = gitlab.Gitlab('gitlab.com', 'your_api_key')  
-entry = gl.entries.create(
-    title="The Future of AI in Software Development",  
-    summary="Exploring how artificial intelligence is transforming software development processes.",  
-    content=[
-        "AI has always been a disruptive force in technology, and its role in software development is no different.",
-        "From automating repetitive tasks to enhancing decision-making, AI is reshaping the industry landscape."
-    ]
-)  
-```  
-
-And then I notified myself that it was done!  
-
-## Step 4: Sending Notifications via Matrix  
-
-Finally, after everything was up and running, I sent a quick notification to my matrix account about the new pull request. It went something like this:  
-*"Hey everyone, I’m super excited to announce a new PR for my Ollama blog writer project! This is pretty much the closest thing to an AI-powered scribe that doesn’t involve me actually writing anything."*  
-
-Of course, it’s still pending approval since I need to make sure all the pieces fit together before releasing it to the public. But hey, at least I’ve got a solid foundation to build on!  
-
-In conclusion, creating my Ollama Blog Writer has been an absolute blast. It combines my love for tech with Python and AI in ways I never imagined. Now, if only I could find a way to automate writing blog *reviews*…
--- a/generated_files/down_the_data_pipeline_rabbit_hole2.md
+++ b/generated_files/down_the_data_pipeline_rabbit_hole2.md
--- a/generated_files/powerbi_and_api_performance.md
+++ b/generated_files/powerbi_and_api_performance.md
@ -1,46 +1,23 @@
-<think>
-Okay, so I'm trying to wrap my head around this PowerBI experience for a data product. Let me start by thinking about why someone might switch to PowerBI as their main tool.
+Title: When Data Visualization Meets Frustration: A Comic Take on PowerBI's API Woes

-First, the blog title says it's about API performance. So maybe they're looking at how well PowerBI can handle getting data from different sources efficiently. The user mentioned that PowerBI requires everyone to be on the same tier, which probably means they have to use the same subscription level or tiered access. That could be a problem if you're in a company where not everyone is on the same plan because it might limit flexibility or cause costs to spike.
+---

-Next, pulling data with PowerBI seems limited. They say it's only useful for small tables. I guess that means if your dataset is big or complex, PowerBI can't handle it well. Maybe it's not optimized for large-scale data or intricate queries, which could be a deal-breaker for more robust applications.
+In the ever-evolving world of data and tech, few tools hold as much promise—or frustration—as Microsoft's PowerBI. Its sleek interface, intuitive visuals, and promise to simplify data into digestible insights have made it a favorite among many. But beneath its polished surface lies a storm of challenges that can leave even the most seasoned developers in its dust.

-Then there's the issue of being part of the Microsoft ecosystem. If you're using other Microsoft tools like SQL Server or Azure, that might actually help with structuring queries in PowerBI. But if you're outside this ecosystem, it fails. Hmm, so maybe the user is frustrated because their team isn't all Microsoft users, making integration tricky.
+Imagine this: you've spent hours refining your data model, only to find that your team's hierarchy resists your attempt to share sensitive information without breaking hearts. "We're all on different tiers," you mutter, your frustration evident. But here's the kicker—PowerBI won't even let everyone in your company join the party if they're not up to tier 5. And guess what? Most companies operate in reality tier 3 at best. So, step one: API calls to PowerBI. You'd think pulling data would be straightforward, but oh, how it pulls you into a tailspin.

-Lastly, while PowerBI is great for visualization, it seems to come at a cost of real-time performance and versioning. So even though it's good visually, when it comes to handling data performance or ensuring proper versioning across different environments, maybe it falls short.
+Here's where things get interesting: PowerBI APIs are mostly limited to small tables. It's like trying to fit furniture through a door that's slightly too narrow—it just doesn't work unless you have a magic wand (or in this case, an API upgrade). Imagine needing to fetch data from three different on-premises databases seamlessly; PowerBI might just give you the finger.

-Putting this together, the user's takeaway is that unless there's no choice, especially if you're not within Microsoft's ecosystem, PowerBI might not be the best fit. It could lead to failures and hurt their product's performance. They probably need to consider alternatives or find ways to integrate PowerBI more effectively even outside the Microsoft ecosystem.
-</think>
+Now, if your company happens to be in the Microsoft ecosystem—like the Azure universe—then maybe things are a bit easier. But here's the kicker: it's not being top-to-bottom within that ecosystem that counts as success. If even one part is outside, you're facing performance issues akin to driving through a snowstorm without an umbrella. You get the picture.

-# The Curious Case of PowerBI in Data Product Development  
+So what does this mean for the average user? Unless you've got no choice but to use PowerBI... well, let's just say it might not be your friend in such scenarios. It's like having a GPS that only works if you're willing to drive on a dirt road and expect it to guide you through with zero warnings—sounds great until you end up stranded.

-Alright, let me spill the beans on my latest adventure with PowerBI—spoiler alert: it wasn’t all smooth sailing. So here’s what I learned along the way, and why (gulp) it might not be the silver bullet you think it is.
+But wait, maybe there's silver lining. Other tools have learned the hard lessons PowerBI has taught us. They allow APIs beyond just small tables and handle ecosystems with ease, making them more versatile for real-world applications. It's like upgrading your car's GPS to one that not only knows all the roads but also can navigate through different weather conditions without complaints.

-## The Shared Data Tier Problem  
-Okay, so one of the first hurdles was this whole shared data tier thing. Let me tell ya, it felt like a non-starter for most companies out there. Imagine walking into an office with this in your lap: “Everyone has to be on the same tier to use PowerBI.” Yeah, sounds like a lot of bureaucracy just to get some data flowing. But then I started thinking—what if they’re not? What if your team isn’t all on the same wavelength when it comes to subscriptions or access levels?
+In conclusion, while PowerBI is undeniably a powerful tool when used correctly—like driving in calm weather on perfectly paved roads—it has its limitations. Its API restrictions and ecosystem integration issues make it less than ideal for many real-world scenarios. So unless you're in a controlled environment where these issues don't arise, maybe it's time to explore other options that can handle the data journey with more grace.

-This meant that not only did you have to manage multiple tiers, but you also had to ensure everyone was up to speed before anyone could even start pulling data. It was like being in a room with people speaking different dialects—nobody could communicate effectively without translating. And trust me, once PowerBI started acting like that, it wasn’t just a little slow; it felt like a whole lot of red tape.
+After all, Data Overload isn't just a Star Trek term—it could be your reality if you're not careful with PowerBI. 

-## Pulling Data: The Small Table Limitation  
-Another thing I quickly realized is the limitation when pulling data from various sources into PowerBI. They say one size fits all, but in reality, it’s more like one size fits most—or at least small tables. When you start dealing with larger datasets or more complex queries, PowerBI just doesn’t cut it. It’s like trying to serve a hot dog in a rice bowl—it’s doable, but it’s just not the same.
+---

-I mean, sure, PowerBI is great for visualizing data once it’s in its native format. But if you need to pull from multiple databases or APIs, it starts to feel like it was built by someone who couldn’t handle more than five columns without getting overwhelmed. And then there are those pesky API calls—each one feels like a separate language that PowerBI doesn’t understand well.
-
-## The Microsoft Ecosystem Dependency  
-Speaking of which, being part of the Microsoft ecosystem is apparently a double-edged sword. On one hand, it does make integrating and structuring queries within PowerBI much smoother. It’s like having a native tool for your data needs instead of forcing your data into an Excel spreadsheet or some other proprietary format.
-
-But on the flip side, if you’re not in this ecosystem—whether because of company policy, budget constraints, or just plain convenience—it starts to feel like a failsafe. Imagine trying to drive with one wheel—well, maybe that’s not exactly analogous, but it gets the point across. Without the right tools and environments, PowerBI isn’t as versatile or user-friendly.
-
-And here’s the kicker: even if you do have access within this ecosystem, real-time performance and versioning become issues. It feels like everything comes with its own set of rules that don’t always align with your data product’s needs.
-
-## The Visualization vs. Performance Trade-Off  
-Now, I know what some of you are thinking—PowerBI is all about making data beautiful, right? And it does a fantastic job at that. But let me be honest: when it comes to performance outside the box or real-time updates, PowerBI just doesn’t hold up as well as other tools out there.
-
-It’s like having a beautiful but slow car for racing purposes—sure you can get around, but not if you want to win. Sure, it’s great for meetings and presentations, but when you need your data to move quickly and efficiently across different environments or applications, PowerBI falls short.
-
-## The Takeaway  
-So after all that, here’s my bottom line: unless you’re in the Microsoft ecosystem—top to tail—you might be better off looking elsewhere. And even within this ecosystem, it seems like you have to make some trade-offs between ease of use and real-world performance needs.
-
-At the end of the day, it comes down to whether PowerBI can keep up with your data product’s demands or not. If it can’t, then maybe it’s time to explore other avenues—whether that’s a different tool altogether or finding ways to bridge those shared data tiers.
-
-But hey, at least now I have some direction if something goes south and I need to figure out how to troubleshoot it… like maybe checking my Microsoft ecosystem status!
+*So, is PowerBI still your best friend in this complex tech world? Or are there better tools out there waiting to be discovered? Share your thoughts and experiences below!*
--- a/generated_files/the_melding_of_data_engineering_and_ai.md
+++ b/generated_files/the_melding_of_data_engineering_and_ai.md
@ -0,0 +1,35 @@
+# Wrangling Data: A Reality Check
+
+Okay, let’s be honest. Data wrangling isn't glamorous. It’s not a sleek, automated process of magically transforming chaos into insights. It’s a messy, frustrating, and surprisingly human endeavor. Let’s break down the usual suspects – the steps we take to get even a vaguely useful dataset, and why they’re often a monumental task.
+
+**Phase 1: The Hunt**
+
+First, you’re handed a dataset. Let’s call it “Customer_Data_v2”. It’s… somewhere. Maybe a CSV file, maybe a database table, maybe a collection of spreadsheets that haven’t been updated since 2008. Finding it is half the battle. It's like searching for a decent cup of coffee in Melbourne – you know it’s out there, but it’s often hidden behind a wall of bureaucracy.
+
+**Phase 2: Deciphering the Ancient Texts**
+
+Once you *find* it, you start learning what it *means*. This is where things get… interesting. You’re trying to understand what fields represent, what units of measurement are used, and why certain columns have bizarre names (seriously, “Customer_ID_v3”?). It takes x amount of time (depends on the industry, right?). One week for a small bakery, six months for a multinational insurance company. It’s a wild ride.
+
+You’ll spend a lot of time trying to understand the business context. "CRMs" for Customer Relationship Management? Seriously? It’s a constant stream of jargon and acronyms that make your head spin.
+
+**Phase 3: The Schema Struggle**
+
+Then there’s the schema. Oh, the schema. It takes a couple of weeks to learn the schema. It’s like deciphering ancient hieroglyphics, except instead of predicting the rise and fall of empires, you’re trying to understand why a field called “Customer_ID_v3” exists. It’s a puzzle, and a frustrating one at that.
+
+**Phase 4: The Tooling Tango**
+
+You’ll wrestle with the tools. SQL interpreters, data transformation software – they’re all there, but they’re often clunky, outdated, and require a surprising amount of manual effort. It's like finding a decent cup of coffee in Melbourne – you know it’s out there, but it’s often hidden behind a wall of bureaucracy.
+
+**Phase 5: The Reporting Revelation (and Despair)**
+
+Finally, you get to the reporting tool. And cry. Seriously, who actually *likes* this part? It’s a soul-crushing exercise in formatting and filtering, and the output is usually something that nobody actually reads.
+
+**The AI Factor – A Realistic Perspective**
+
+Now, everyone’s talking about AI. And, look, I’m not saying AI is a bad thing. It’s got potential. But let’s be realistic. This will for quite some time be the point where we need people. AI can automate the process of extracting data from a spreadsheet. But it can’t understand *why* that spreadsheet was created in the first place. It can’t understand the context, the assumptions, the biases. It can’t tell you if the data is actually useful.
+
+We can use tools like datahub to capture some of this business knowledge but those tool are only as good as the people who use them. We need to make sure AI is used for those uniform parts – schema discovery, finding the tools, ugh reporting. But where the rubber hits the road… thats where we need people and that we are making sure that there is a person interpreting not only what goes out.. but what goes in.
+
+**The Bottom Line**
+
+It’s a bit like trying to build a great BBQ. You can buy the fanciest gadgets and the most expensive wood, but if you don’t know how to cook, you’re going to end up with a burnt mess. So, let’s not get carried away with the hype. Let’s focus on building a data culture that values human intelligence, critical thinking, and a good dose of common sense. And let’s keep wrangling. Because, let’s be honest, someone’s gotta do it.
--- a/generated_files/when_to_use_ai.md
+++ b/generated_files/when_to_use_ai.md
--- a/requirements.txt
+++ b/requirements.txt
@ -2,3 +2,5 @@ ollama
 trilium-py
 gitpython
 PyGithub
+chromadb
+langchain-ollama
--- a/src/ai_generators/ollama_md_generator.py
+++ b/src/ai_generators/ollama_md_generator.py
@ -1,40 +1,148 @@
-import os
+import os, re, json, random, time
 from ollama import Client
-
+import chromadb
+from langchain_ollama import ChatOllama

 class OllamaGenerator:

-    def __init__(self, title: str, content: str, model: str):
+    def __init__(self, title: str, content: str, inner_title: str):
        self.title = title
+        self.inner_title = inner_title
        self.content = content
+        self.response = None
+        self.chroma = chromadb.HttpClient(host="172.19.0.2", port=8000)
        ollama_url = f"{os.environ["OLLAMA_PROTOCOL"]}://{os.environ["OLLAMA_HOST"]}:{os.environ["OLLAMA_PORT"]}"
        self.ollama_client = Client(host=ollama_url)
-        self.ollama_model = model
-
-    def generate_markdown(self) -> str:
-
-        prompt = f"""
-            You are a Software Developer and DevOps expert
-            who has transistioned in Developer Relations 
-            writing a 1000 word blog for other tech enthusiast.
+        self.ollama_model = os.environ["EDITOR_MODEL"]
+        self.embed_model = os.environ["EMBEDDING_MODEL"]
+        self.agent_models = json.loads(os.environ["CONTENT_CREATOR_MODELS"])
+        self.llm = ChatOllama(model=self.ollama_model, temperature=0.6, top_p=0.5) #This  is the level head in the room
+        self.prompt_inject = f"""
+            You are a journalist, Software Developer and DevOps expert
+            writing a 1000 word draft blog for other tech enthusiasts.
            You like to use almost no code examples and prefer to talk
            in a light comedic tone. You are also Australian
            As this person write this blog as a markdown document.
-            The title for the blog is {self.title}.
+            The title for the blog is {self.inner_title}.
            Do not output the title in the markdown.
            The basis for the content of the blog is:
                {self.content}
-            Only output markdown DO NOT GENERATE AN EXPLANATION
+            """
+
+    def split_into_chunks(self, text, chunk_size=100):
+        '''Split text into chunks of size chunk_size'''
+        words = re.findall(r'\S+', text)
+
+        chunks = []
+        current_chunk = []
+        word_count = 0
+
+        for word in words:
+            current_chunk.append(word)
+            word_count += 1
+
+        if word_count >= chunk_size:
+            chunks.append(' '.join(current_chunk))
+            current_chunk = []
+            word_count = 0
+
+        if current_chunk:
+            chunks.append(' '.join(current_chunk))
+
+        return chunks
+
+    def generate_draft(self, model) -> str:
+        '''Generate a draft blog post using the specified model'''
+        try:
+            # the idea behind this is to make the "creativity" random amongst the content creators
+            # contorlling temperature will allow cause the output to allow more "random" connections in sentences
+            # Controlling top_p will tighten or loosen the embedding connections made
+            # The result should be varied levels of "creativity" in the writing of the drafts
+            # for more see https://python.langchain.com/v0.2/api_reference/ollama/chat_models/langchain_ollama.chat_models.ChatOllama.html
+            temp = random.uniform(0.5, 1.0)
+            top_p = random.uniform(0.4, 0.8)
+            top_k = int(random.uniform(30, 80))
+            agent_llm = ChatOllama(model=model, temperature=temp, top_p=top_p, top_k=top_k)
+            messages = [
+                ("system", self.prompt_inject),
+                ("human", "make the blog post in a format to be edited easily" )
+            ]
+            response = agent_llm.invoke(messages)
+            # self.response = self.ollama_client.chat(model=model,
+            #                                         messages=[
+            #         {
+            #             'role': 'user',
+            #             'content': f'{self.prompt_inject}',
+            #         },
+            #     ])
+            #print ("draft")
+            #print (response)
+            return response.text()#['message']['content']
+
+        except Exception as e:
+            raise Exception(f"Failed to generate blog draft: {e}")
+
+    def get_draft_embeddings(self, draft_chunks):
+        '''Get embeddings for the draft chunks'''
+        embeds = self.ollama_client.embed(model=self.embed_model, input=draft_chunks)
+        return embeds.get('embeddings', [])
+
+
+    def load_to_vector_db(self):
+        '''Load the generated blog drafts into a vector database'''
+        collection_name = f"blog_{self.title.lower().replace(" ", "_")}"
+        collection = self.chroma.get_or_create_collection(name=collection_name, metadata={"hnsw:space": "cosine"})
+        #if any(collection.name == collectionname for collectionname in self.chroma.list_collections()):
+        #    self.chroma.delete_collection("blog_creator")
+        for model in self.agent_models:
+            print (f"Generating draft from {model} for load into vector database")
+            draft_chunks = self.split_into_chunks(self.generate_draft(model))
+            print(f"generating embeds")
+            embeds = self.get_draft_embeddings(draft_chunks)
+            ids = [model + str(i) for i in range(len(draft_chunks))]
+            chunknumber = list(range(len(draft_chunks)))
+            metadata = [{"model_agent": model} for index in chunknumber]
+            print(f'loading into collection')
+            collection.add(documents=draft_chunks, embeddings=embeds, ids=ids, metadatas=metadata)
+
+        return collection
+
+
+    def generate_markdown(self) -> str:
+
+        prompt_system = f"""
+            You are an editor taking information from {len(self.agent_models)} Software
+            Developers and Data experts
+            writing a 3000 word blog for other tech enthusiasts.
+            You like when they use almost no code examples and the
+            voice is in a light comedic tone. You are also Australian
+            As this person produce and an amalgamtion of this blog as a markdown document.
+            The title for the blog is {self.inner_title}.
+            Do not output the title in the markdown. Avoid repeated sentences
+            The basis for the content of the blog is:
+                {self.content}
            """
        try:
-            self.response = self.ollama_client.chat(model=self.ollama_model,
-                                                    messages=[
-                    {
-                        'role': 'user',
-                        'content': f'{prompt}',
-                    },
-                ])
-            return self.response['message']['content']
+            query_embed = self.ollama_client.embed(model=self.embed_model, input=prompt_system)['embeddings']
+            collection = self.load_to_vector_db()
+            collection_query = collection.query(query_embeddings=query_embed, n_results=100)
+            print("Showing pertinent info from drafts used in final edited edition")
+            pertinent_draft_info = '\n\n'.join(collection.query(query_embeddings=query_embed, n_results=100)['documents'][0])
+            #print(pertinent_draft_info)
+            prompt_human = f"Generate the final document using this information from the drafts: {pertinent_draft_info} - ONLY OUTPUT THE MARKDOWN"
+            print("Generating final document")
+            messages = [("system", prompt_system), ("human", prompt_human),]
+            self.response = self.llm.invoke(messages).text()
+            # self.response = self.ollama_client.chat(model=self.ollama_model,
+            #                                         messages=[
+            #         {
+            #             'role': 'user',
+            #             'content': f'{prompt_enhanced}',
+            #         },
+            #     ])
+            #print ("Markdown Generated")
+            #print (self.response)
+            return self.response#['message']['content']

        except Exception as e:
            raise Exception(f"Failed to generate markdown: {e}")
@ -42,3 +150,10 @@ class OllamaGenerator:
    def save_to_file(self, filename: str) -> None:
        with open(filename, "w") as f:
            f.write(self.generate_markdown())
+
+    def generate_commit_message(self):
+        prompt_system = "You are a blog creator commiting a piece of content to a central git repo"
+        prompt_human = f"Generate a 10 word git commit message describing {self.response}"
+        messages = [("system", prompt_system), ("human", prompt_human),]
+        commit_message = self.llm.invoke(messages).text()
+        return commit_message
--- a/src/main.py
+++ b/src/main.py
@ -1,5 +1,7 @@
 import ai_generators.ollama_md_generator as omg
 import trilium.notes as tn
+import repo_management.repo_manager as git_repo
+import string,os

 tril = tn.TrilumNotes()

@ -7,16 +9,26 @@ tril.get_new_notes()
 tril_notes = tril.get_notes_content()


-def convert_to_lowercase_with_underscores(string):
-    return string.lower().replace(" ", "_")
+def convert_to_lowercase_with_underscores(s):
+    allowed = set(string.ascii_letters + string.digits + ' ')
+    filtered_string = ''.join(c for c in s if c in allowed)
+    return filtered_string.lower().replace(" ", "_")


 for note in tril_notes:
    print(tril_notes[note]['title'])
    # print(tril_notes[note]['content'])
    print("Generating Document")
-    ai_gen = omg.OllamaGenerator(tril_notes[note]['title'],
-                                 tril_notes[note]['content'],
-                                 "deepseek-r1:7b")
+
    os_friendly_title = convert_to_lowercase_with_underscores(tril_notes[note]['title'])
-    ai_gen.save_to_file(f"/blog_creator/generated_files/{os_friendly_title}.md")
+    ai_gen = omg.OllamaGenerator(os_friendly_title,
+                                 tril_notes[note]['content'],
+                                 tril_notes[note]['title'])
+    blog_path = f"/blog_creator/generated_files/{os_friendly_title}.md"
+    ai_gen.save_to_file(blog_path)
+    # Generate commit messages and push to repo
+    commit_message = ai_gen.generate_commit_message()
+    git_user = os.environp["GIT_USER"]
+    git_pass = os.environ["GIT_PASS"]
+    repo_manager = git_repo("blog/", git_user, git_pass)
+    repo_manager.create_copy_commit_push(blog_path, os_friendly_title, commit_message)
--- a/src/repo_management/push_markdown.py
+++ b/src/repo_management/push_markdown.py
@ -1,48 +0,0 @@
-import os
-import sys
-from git import Repo
-
-# Set these variables accordingly
-REPO_OWNER = "your_repo_owner"
-REPO_NAME = "your_repo_name"
-
-def clone_repo(repo_url, branch="main"):
-    Repo.clone_from(repo_url, ".", branch=branch)
-
-def create_markdown_file(file_name, content):
-    with open(f"{file_name}.md", "w") as f:
-        f.write(content)
-
-def commit_and_push(file_name, message):
-    repo = Repo(".")
-    repo.index.add([f"{file_name}.md"])
-    repo.index.commit(message)
-    repo.remote().push()
-
-def create_new_branch(branch_name):
-    repo = Repo(".")
-    repo.create_head(branch_name).checkout()
-    repo.head.reference.set_tracking_url(f"https://your_git_server/{REPO_OWNER}/{REPO_NAME}.git/{branch_name}")
-    repo.remote().push()
-
-if __name__ == "__main__":
-    if len(sys.argv) < 3:
-        print("Usage: python push_markdown.py <repo_url> <markdown_file_name>")
-        sys.exit(1)
-
-    repo_url = sys.argv[1]
-    file_name = sys.argv[2]
-
-    # Clone the repository
-    clone_repo(repo_url)
-
-    # Create a new Markdown file with content
-    create_markdown_file(file_name, "Hello, World!\n")
-
-    # Commit and push changes to the main branch
-    commit_and_push(file_name, f"Add {file_name}.md")
-
-    # Create a new branch named after the Markdown file
-    create_new_branch(file_name)
-
-    print(f"Successfully created '{file_name}' branch with '{file_name}.md'.")
--- a/src/repo_management/repo_manager.py
+++ b/src/repo_management/repo_manager.py
@ -1,35 +1,91 @@
-import os
-from git import Git
-from git.repo import BaseRepository
-from git.exc import InvalidGitRepositoryError
-from git.remote import RemoteAction
+import os, shutil
+from git import Repo
+from git.exc import GitCommandError

-# Set the path to your blog repo here
-blog_repo = "/path/to/your/blog/repo"
+class GitRepository:
+    # This is designed to be transitory it will desctruvtively create the repo at repo_path
+    # if you have uncommited changes you can kiss them goodbye!
+    # Don't use the repo created by this function for dev -> its a tool!
+    # It is expected that  when used you will add, commit, push, delete
+    def __init__(self, repo_path, username=None, password=None):
+        git_protocol = os.environ["GIT_PROTOCOL"]
+        git_remote = os.environ["GIT_REMOTE"]
+        remote = f"{git_protocol}://{username}:{password}@{git_remote}"

-# Checkout a new branch and create a new file for our blog post
-branch_name = "new-post"
+        if os.path.exists(repo_path):
+            shutil.rmtree(repo_path)
+        self.repo_path = repo_path
+        Repo.clone_from(remote, repo_path)
+        self.repo = Repo(repo_path)
+        self.username = username
+        self.password = password
+
+    def clone(self, remote_url, destination_path):
+        """Clone a Git repository with authentication"""
        try:
-    repo = Git(blog_repo)
-    repo.checkout("-b", branch_name, "origin/main")
-    with open("my-blog-post.md", "w") as f:
-        f.write(content)
-except InvalidGitRepositoryError:
-    # Handle repository errors gracefully
-    pass
+            self.repo.clone(remote_url, destination_path)
+            return True
+        except GitCommandError as e:
+            print(f"Cloning failed: {e}")
+            return False

-# Add and commit the changes to Git
-repo.add("my-blog-post.md")
-repo.commit("-m", "Added new blog post about DevOps best practices.")
-
-# Push the changes to Git and create a PR
-repo.remote().push("refs/heads/{0}:refs/for/main".format(branch_name), "--set-upstream")
-base_branch = "origin/main"
-target_branch = "main"
-pr_title = "DevOps best practices"
+    def fetch(self, remote_name='origin', ref_name='main'):
+        """Fetch updates from a remote repository with authentication"""
        try:
-    repo.create_head("{0}-{1}", base=base_branch, message="{}".format(pr_title))
-except RemoteAction.GitExitStatus as e:
-    # Handle Git exit status errors gracefully
-    pass
+            self.repo.remotes[remote_name].fetch(ref_name=ref_name)
+            return True
+        except GitCommandError as e:
+            print(f"Fetching failed: {e}")
+            return False

+    def pull(self, remote_name='origin', ref_name='main'):
+        """Pull updates from a remote repository with authentication"""
+        try:
+            self.repo.remotes[remote_name].pull(ref_name=ref_name)
+            return True
+        except GitCommandError as e:
+            print(f"Pulling failed: {e}")
+            return False
+
+    def get_branches(self):
+        """List all branches in the repository"""
+        return [branch.name for branch in self.repo.branches]
+    
+    
+    def create_branch(self, branch_name, remote_name='origin', ref_name='main'):
+        """Create a new branch in the repository with authentication."""
+        try:
+            # Use the same remote and ref as before
+            self.repo.git.branch(branch_name, commit=True)
+            return True
+        except GitCommandError as e:
+            print(f"Failed to create branch: {e}")
+            return False
+        
+    def add_and_commit(self, message=None):
+        """Add and commit changes to the repository."""
+        try:
+            # Add all changes
+            self.repo.git.add(all=True)
+            # Commit with the provided message or a default
+            if message is None:
+                commit_message = "Added and committed new content"
+            else:
+                commit_message = message
+            self.repo.git.commit(commit_message=commit_message)
+            return True
+        except GitCommandError as e:
+            print(f"Commit failed: {e}")
+            return False
+        
+    def create_copy_commit_push(self, file_path, title, commit_messge):
+        self.create_branch(title)
+
+        shutil.copy(f"{file_path}", f"{self.repo_path}src/content/")
+
+        self.add_and_commit(commit_messge)
+
+        self.repo.git.push(remote_name='origin', ref_name=title, force=True)
+
+    def remove_repo(self):
+        shutil.rmtree(self.repo_path)
--- a/src/trilium/notes.py
+++ b/src/trilium/notes.py
@ -18,9 +18,13 @@ class TrilumNotes:
            print("Please run get_token and set your token")
        else:
            self.ea = ETAPI(self.server_url, self.token)
+        self.new_notes = None
+        self.note_content = None

    def get_token(self):
        ea = ETAPI(self.server_url)
+        if self.tril_pass == None:
+            raise ValueError("Trillium password can not be none")
        token = ea.login(self.tril_pass)
        print(token)
        print("I would recomend you update the env file with this tootsweet!")
@ -40,10 +44,11 @@ class TrilumNotes:

    def get_notes_content(self):
        content_dict = {}
+        if self.new_notes is None:
+            raise ValueError("How did you do this? new_notes is None!")
        for note in self.new_notes['results']:
            content_dict[note['noteId']] = {"title" : f"{note['title']}",
                                            "content" : f"{self._get_content(note['noteId'])}"
                                            }
        self.note_content = content_dict
        return content_dict
-
Author	SHA1	Message	Date
=	4119b2ec41	fix dockerifle	2025-05-26 00:18:07 +10:00
=	01b7f1cd78	untested git stuff	2025-05-24 00:25:35 +10:00
armistace	c606f72d90	env vars and starting work on repo_manager	2025-05-23 15:47:25 +10:00
armistace	8a64d9c959	fix pyrefly typuing errors	2025-05-19 11:38:15 +10:00
armistace	0c090c8489	add vscode to gitignore	2025-05-19 11:28:10 +10:00
=	e0b2c80bc9	latest commits	2025-05-19 11:07:41 +10:00
=	44141ab545	pre attempt at langchain	2025-03-25 15:26:56 +10:00
=	e57d6eb6b6	getting gemma3 in the mix	2025-03-17 16:33:16 +10:00
=	c80f692cb0	update main.py	2025-02-27 09:44:19 +10:00
=	bc2f8a8bca	move to vm	2025-02-27 09:41:01 +10:00
=	e7f7a79d86	integrating agentic chroma	2025-02-26 23:16:00 +10:00
=	9b11fea0e7	integrating agentic chroma	2025-02-26 23:13:27 +10:00
=	6320571528	set up chroma	2025-02-25 22:11:45 +10:00