set up chroma

2025-02-25 22:11:45 +10:00 · 2025-02-25 22:11:45 +10:00 · 6320571528
commit 6320571528
parent c2ee21abfc
6 changed files with 142 additions and 92 deletions
--- a/2
+++ b/2
@ -7,7 +7,7 @@ ENV PYTHONUNBUFFERED 1

 ADD src/ /blog_creator

-RUN apt-get update && apt-get install -y rustc cargo python-is-python3 pip python3.12-venv libmagic-dev 
+RUN apt-get update && apt-get install -y rustc cargo python-is-python3 pip python3-venv libmagic-dev 

 RUN python -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -1,3 +1,7 @@
+networks:
+  net:
+    driver: bridge
+
 services:
  blog_creator:
    build:
@ -9,3 +13,38 @@ services:
    volumes:
      - ./generated_files/:/blog_creator/generated_files

+  chroma:
+    image: chromadb/chroma
+    volumes:
+      # Be aware that indexed data are located in "/chroma/chroma/"
+      # Default configuration for persist_directory in chromadb/config.py
+      # Read more about deployments: https://docs.trychroma.com/deployment
+      - chroma-data:/chroma/chroma
+    command: "--workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30"
+    environment:
+      - IS_PERSISTENT=TRUE
+      - CHROMA_SERVER_AUTHN_PROVIDER=${CHROMA_SERVER_AUTHN_PROVIDER}
+      - CHROMA_SERVER_AUTHN_CREDENTIALS_FILE=${CHROMA_SERVER_AUTHN_CREDENTIALS_FILE}
+      - CHROMA_SERVER_AUTHN_CREDENTIALS=${CHROMA_SERVER_AUTHN_CREDENTIALS}
+      - CHROMA_AUTH_TOKEN_TRANSPORT_HEADER=${CHROMA_AUTH_TOKEN_TRANSPORT_HEADER}
+      - PERSIST_DIRECTORY=${PERSIST_DIRECTORY:-/chroma/chroma}
+      - CHROMA_OTEL_EXPORTER_ENDPOINT=${CHROMA_OTEL_EXPORTER_ENDPOINT}
+      - CHROMA_OTEL_EXPORTER_HEADERS=${CHROMA_OTEL_EXPORTER_HEADERS}
+      - CHROMA_OTEL_SERVICE_NAME=${CHROMA_OTEL_SERVICE_NAME}
+      - CHROMA_OTEL_GRANULARITY=${CHROMA_OTEL_GRANULARITY}
+      - CHROMA_SERVER_NOFILE=${CHROMA_SERVER_NOFILE}
+    restart: unless-stopped # possible values are: "no", always", "on-failure", "unless-stopped"
+    ports:
+      - "8001:8000"
+    healthcheck:
+      # Adjust below to match your container port
+      test: [ "CMD", "curl", "-f", "http://localhost:8000/api/v2/heartbeat" ]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    networks:
+      - net
+
+volumes:
+  chroma-data:
+    driver: local
--- a/generated_files/creating_an_ollama_blog_writer.md
+++ b/generated_files/creating_an_ollama_blog_writer.md
@ -1,83 +1,108 @@
 <think>
-Alright, I've got this query from someone who wants to create an Ollama Blog Writer using Python. Let me break down what they're asking for.
+Alright, so I'm trying to figure out how to create this Ollama Blog Writer Python script. Let me break down what needs to be done.

-First off, they mention needing a Python file that can communicate with a local Ollama instance. So, I should look into existing libraries or tools that enable communication with Ollama. The user is also interested in integrating Trilium for structured notes as prompts. They've provided a link to the trilium-py GitHub repository, which seems like a good starting point.
+First, the user wants a Python file that can communicate with a local Ollama instance. I remember from some previous knowledge that Ollama has a REST API, but maybe there's a more convenient way like using a serialization layer or something else. Oh right! There was a project called `ollama-talk` which allows sending messages to Ollama over HTTP. That sounds perfect. So the first step is to install and use this library.

-Next, their goal is to create a blog entry through their GitLab repo by making a branch and submitting a pull request. They want the PR content approved before proceeding further. That suggests they need guidance on structuring this part of their project, possibly including how to implement the API calls for both Ollama and Trilium.
+Next, they mentioned connecting to Trilium for structured notes as prompts. I found the repository for trilium-py on GitHub. It looks like it's designed to work with Markdown documents and extract structured notes. The user included a link, so that should help in integrating these notes into the prompts when creating blog entries.

-The user also wants to send a notification to their matrix account about the new PR and provide a 20-word summary. This means I'll need to help them craft a concise message that includes these elements.
+Then, the Python script needs to create a blog entry in their local repo. They want it to create a branch first, push the changes, and then send a PR notification through Matrix with a summary. I'm not too familiar with Git operations beyond basic commands, but maybe using `git` from the command line would work for creating branches and committing.

-Additionally, they specified using almost no code examples in their blog post and adopting a light, comedic tone. They mentioned being Australian, so incorporating some local humor could make the content more relatable.
+Putting this together, I think the main components of the script are:

-I should ensure that the markdown is properly formatted without any code blocks or markdown language. The title isn't to be included in the output, so I'll focus solely on the content of the blog post.
+1. Connect to Ollama via `ollama-talk`.
+2. Fetch existing blog posts.
+3. Extract structured notes into prompts.
+4. Write new blog entries with these prompts.
+5. Create a branch in their Git repo for new entries.
+6. Push changes and create a PR.
+7. Notify the user on Matrix about the PR.

-Putting it all together, the structure will start with an introduction explaining the project's purpose and then delve into each component: Ollama communication, Trilium setup, blog entry creation via GitLab, and finally, notifications. Each section should be concise to keep within the 1000-word limit and maintain a friendly tone.
+I also need to make sure that all dependencies are installed, like `trilium-py` and `ollama-talk`. I should probably include some error handling in case something goes wrong during the process, but since it's an initial version, maybe just a basic script without too many checks would suffice.

-I need to make sure that the instructions are clear but not too technical, avoiding jargon where possible or explaining it when necessary. The humor will come from the relatable examples of writing blog posts and handling PRs with enthusiasm.
+Finally, writing the notification for Matrix. They need to send an email-like message with a link to view the PR and a brief summary. The user mentioned a 20-word summary, so I can condense that into something concise.
+
+I think that covers the main points. Now, time to structure this as a Python script.
 </think>

-# Creating an Ollama Blog Writer  
+# Creating an Ollama Blog Writer: A Fun Guide for Tech Enthusiasts

-Alright, fellow tech enthusiasts! Today, I’m super excited to share how I built a custom Python tool to create my very own Ollama blog writer. It’s basically my personal scribe for tech blogs—except it uses AI to generate content instead of me typing it out. Let me break down the process step by step, because honestly, it’s as much of a rollercoaster as writing a blog post!  
+In the ever-evolving tech world, tools like Ollama and Trilium are revolutionizing how we create content and organize our thoughts. But did you know there's a way to combine these two incredible technologies into one cohesive solution? Let me walk you through creating an *Ollama Blog Writer*—a tool that lets you generate blog posts with structured notes, all while having fun!

-## Step 1: Communicating with Ollama  
+## Step 1: Set Up Your Environment

-First things first, I needed to connect my Python script to a running Ollama instance. Lucky for me, there are some great libraries out there that make this happen. One of my favorites is `ollama-sql` for SQL-like queries and `ollama-py` for general communication. With these tools, I could send requests to Ollama and get back the responses in a structured format.  
+First things first, you'll need to set up your environment. Install the required Python packages:
+```bash
+pip install ollama-talk trilium-py
+```

-For example, if I wanted to ask Ollama about the latest tech trends, I might send something like:  
+## Step 2: Connect to Ollama
+
+Install and use `ollama-talk` for communication with your local Ollama instance:
 ```python
-import ollama as Ollama  
-ollama_instance = Ollama.init()  
-response = ollama_instance.query("What are the top AI developments this year?")  
-print(response)  
+from ollama_talk import Ollama
+
+ollama = Ollama()
 ```

-This would give me a JSON response that I could parse and use for my blog. Easy peasy!  
+## Step 3: Extract Notes from Your Blog

-## Step 2: Integrating Trilium for Structured Notes  
+Use Trilium to pull structured notes into prompts. For example, if you have a blog post about "Creating an Ollama Blog Writer," your note might look like this:
+```markdown
+# Blog Post Title

-Speaking of which, I also wanted to make sure my blog posts were well-organized. That’s where Trilium comes in—its structured note system is perfect for keeping track of ideas before writing them up. By using prompts based on Trilium entries, my Python script can generate more focused and coherent blog posts.  
+* Step-by-step guide to building an Ollama-based tool.

-For instance, if I had a Trilium entry like:  
-```json  
-{
-  "id": "123",
-  "content": "AI in customer service is booming.",
-  "type": "thought"
-}
+## Steps
+
+1. Install the necessary packages.
+2. Create a Python script with the following structure: ...
+
+3. Run the script and enjoy!
 ```
-I could use that as a prompt to generate something like:  
-*"In the rapidly evolving landscape of AI applications, customer service has taken a quantum leap with AI-powered platforms...."*  

-Trilium makes it easy to manage these notes and pull them into prompts for my blog writer script.  
+## Step 4: Generate New Content

-## Step 3: Creating Blog Entries in My GitLab Repo  
-
-Now, here’s where things get interesting (and slightly nerve-wracking). I wanted to create a proper blog entry that posts directly to my GitLab repo. So, I forked the [aridgwayweb/blog](https://git.aridgwayweb.com/blog) repository and started working on a branch dedicated to this project.  
-
-In my `create_blog_entry.py` script, I used GitLab’s API to create a new entry. It involved authenticating with my account and constructing the appropriate JSON payload that includes all the necessary metadata—like title, summary, content, etc. The hardest part was making sure everything fit within GitLab’s API constraints and formatting correctly.  
-
-Here’s an excerpt of what I sent:  
+Integrate these notes into your blog generation workflow:
 ```python
-import gitlab  
-gl = gitlab.Gitlab('gitlab.com', 'your_api_key')  
-entry = gl.entries.create(
-    title="The Future of AI in Software Development",  
-    summary="Exploring how artificial intelligence is transforming software development processes.",  
-    content=[
-        "AI has always been a disruptive force in technology, and its role in software development is no different.",
-        "From automating repetitive tasks to enhancing decision-making, AI is reshaping the industry landscape."
-    ]
-)  
+from trilium import Markdown
+
+markdown = Markdown()
+structured_notes = markdown.load_from_file("your_blog_post.md")
+
+prompts = []
+
+for note in structured_notes.notes:
+    prompts.append(f"Based on this structured note:\n\n{note}\n\nCreate a detailed blog post about: {note.title()}")
 ```

-And then I notified myself that it was done!  
+## Step 5: Create and Push to Git

-## Step 4: Sending Notifications via Matrix  
+Commit the new content with meaningful changes. For example, update your README.md file:
+```markdown
+<<<<<<< SEARCH
+- [Ollama Blog Writer](https://github.com/yourusername/blogRepo/blob/master/examples/ollama_blog_writer.py)
+=======
+ [Ollama Blog Writer](https://github.com/yourusername/blogRepo/blob/master/examples/ollama_blog_writer.py) - Step-by-step guide to creating your own Ollama-based blog writer.
+>>>>>>> REPLACE
+```

-Finally, after everything was up and running, I sent a quick notification to my matrix account about the new pull request. It went something like this:  
-*"Hey everyone, I’m super excited to announce a new PR for my Ollama blog writer project! This is pretty much the closest thing to an AI-powered scribe that doesn’t involve me actually writing anything."*  
+## Step 6: Create a PR

-Of course, it’s still pending approval since I need to make sure all the pieces fit together before releasing it to the public. But hey, at least I’ve got a solid foundation to build on!  
+Use Git to create a new branch and push the changes:
+```bash
+git checkout -b ollama-blog-writer
+git add .
+git commit -m "Added comprehensive guide to building an Ollama blog generator"
+git push origin main
+```

-In conclusion, creating my Ollama Blog Writer has been an absolute blast. It combines my love for tech with Python and AI in ways I never imagined. Now, if only I could find a way to automate writing blog *reviews*…
+## Step 7: Notify on Matrix
+
+Send a message with link to PR and summary:
+`matrix://yourusername/yourchannel/@yourusername> "New PR: [Ollama Blog Writer Guide](https://github.com/yourusername/blogRepo/commit) - Learn how to integrate Ollama with structured notes for dynamic content creation! #tech}`
+
+## Conclusion
+
+By combining Ollama's power with Trilium's structure, you can take your blog writing game up a notch. Whether it's creating detailed guides or insightful tutorials, the possibilities are endless.
+
+Now go ahead and try it out—you might just become the tech wizard your team admires!
--- a/generated_files/powerbi_and_api_performance.md
+++ b/generated_files/powerbi_and_api_performance.md
@ -1,46 +1,29 @@
 <think>
-Okay, so I'm trying to wrap my head around this PowerBI experience for a data product. Let me start by thinking about why someone might switch to PowerBI as their main tool.
+Okay, so I'm trying to figure out how PowerBI can be used as a core enabler for my data product. From what I understand, PowerBI is great for visualizing data and making it accessible, but I've hit some roadblocks when integrating it with our existing systems.

-First, the blog title says it's about API performance. So maybe they're looking at how well PowerBI can handle getting data from different sources efficiently. The user mentioned that PowerBI requires everyone to be on the same tier, which probably means they have to use the same subscription level or tiered access. That could be a problem if you're in a company where not everyone is on the same plan because it might limit flexibility or cause costs to spike.
+First off, sharing in PowerBI requires everyone to be on the same tier. That means if my team isn't already fully migrating to Microsoft 365, we can't easily share reports or datasets. This is a problem because not everyone might have access to premium features like these. It's like trying to connect to exclusive clubs when only some people have the invites.

-Next, pulling data with PowerBI seems limited. They say it's only useful for small tables. I guess that means if your dataset is big or complex, PowerBI can't handle it well. Maybe it's not optimized for large-scale data or intricate queries, which could be a deal-breaker for more robust applications.
+Then there's the API part. I remember reading that PowerBI APIs are mainly good for pulling small tables. So if my data needs are more complex, with multiple joins or larger datasets, it doesn't cut it. It's like having a tool that can only handle simple tasks—definitely not enough when you're dealing with real-world complexities.

-Then there's the issue of being part of the Microsoft ecosystem. If you're using other Microsoft tools like SQL Server or Azure, that might actually help with structuring queries in PowerBI. But if you're outside this ecosystem, it fails. Hmm, so maybe the user is frustrated because their team isn't all Microsoft users, making integration tricky.
+Speaking of the Microsoft ecosystem, PowerBI has this structure where everything from reports to queries has to be within the same top-to-bottom setup. If we ever want to move beyond that, it seems like a total fail. It's almost like the data product would have to start over each time there's a change outside of PowerBI.

-Lastly, while PowerBI is great for visualization, it seems to come at a cost of real-time performance and versioning. So even though it's good visually, when it comes to handling data performance or ensuring proper versioning across different environments, maybe it falls short.
+And here's another thing—its great for visualization but at the expense of performance and versioning outside of Microsoft. So while I can make pretty charts in PowerBI, when it comes to handling high traffic or needing persistent storage, it might not be reliable across different environments. It's like choosing style over substance sometimes.

-Putting this together, the user's takeaway is that unless there's no choice, especially if you're not within Microsoft's ecosystem, PowerBI might not be the best fit. It could lead to failures and hurt their product's performance. They probably need to consider alternatives or find ways to integrate PowerBI more effectively even outside the Microsoft ecosystem.
+From my experience using PowerBI, unless there's no other choice—like if we're stuck in the Microsoft ecosystem—using it as a core enabler isn't working out well. It just creates more pain points and fails when things aren't aligned with their usual setup.
 </think>

-# The Curious Case of PowerBI in Data Product Development  
+# Embracing PowerBI: A Core Enabler for Data Products

-Alright, let me spill the beans on my latest adventure with PowerBI—spoiler alert: it wasn’t all smooth sailing. So here’s what I learned along the way, and why (gulp) it might not be the silver bullet you think it is.
+In my quest to leverage PowerBI as the backbone of our data product, I've encountered several challenges that have shaped my perspective on its effectiveness.

-## The Shared Data Tier Problem  
-Okay, so one of the first hurdles was this whole shared data tier thing. Let me tell ya, it felt like a non-starter for most companies out there. Imagine walking into an office with this in your lap: “Everyone has to be on the same tier to use PowerBI.” Yeah, sounds like a lot of bureaucracy just to get some data flowing. But then I started thinking—what if they’re not? What if your team isn’t all on the same wavelength when it comes to subscriptions or access levels?
+Firstly, the sharing requirements mandate uniformity across the Microsoft 365 ecosystem. This creates a barrier when not everyone is ready or able to adopt these standards, limiting collaboration and accessibility.

-This meant that not only did you have to manage multiple tiers, but you also had to ensure everyone was up to speed before anyone could even start pulling data. It was like being in a room with people speaking different dialects—nobody could communicate effectively without translating. And trust me, once PowerBI started acting like that, it wasn’t just a little slow; it felt like a whole lot of red tape.
+Secondly, PowerBI APIs are optimized for simplicity, excelling in small datasets but faltering with complex queries involving joins or large volumes of data. It's akin to using a tool suited only for basic tasks when tackling real-world complexities.

-## Pulling Data: The Small Table Limitation  
-Another thing I quickly realized is the limitation when pulling data from various sources into PowerBI. They say one size fits all, but in reality, it’s more like one size fits most—or at least small tables. When you start dealing with larger datasets or more complex queries, PowerBI just doesn’t cut it. It’s like trying to serve a hot dog in a rice bowl—it’s doable, but it’s just not the same.
+Thirdly, PowerBI enforces an integrated approach within its ecosystem, necessitating a complete restructure whenever stepping outside. This rigidity can hinder adaptability and scalability in dynamic environments.

-I mean, sure, PowerBI is great for visualizing data once it’s in its native format. But if you need to pull from multiple databases or APIs, it starts to feel like it was built by someone who couldn’t handle more than five columns without getting overwhelmed. And then there are those pesky API calls—each one feels like a separate language that PowerBI doesn’t understand well.
+Lastly, while excelling in visualization, PowerBI sacrifices performance and versioning flexibility outside its ecosystem. High-traffic scenarios or persistent storage needs may not find reliable solutions here.

-## The Microsoft Ecosystem Dependency  
-Speaking of which, being part of the Microsoft ecosystem is apparently a double-edged sword. On one hand, it does make integrating and structuring queries within PowerBI much smoother. It’s like having a native tool for your data needs instead of forcing your data into an Excel spreadsheet or some other proprietary format.
+Reflecting on my experience, unless there's no alternative—specifically within the Microsoft ecosystem—it seems ineffective as a core enabler. It often leads to more challenges than benefits when data product requirements transcend its native capabilities.

-But on the flip side, if you’re not in this ecosystem—whether because of company policy, budget constraints, or just plain convenience—it starts to feel like a failsafe. Imagine trying to drive with one wheel—well, maybe that’s not exactly analogous, but it gets the point across. Without the right tools and environments, PowerBI isn’t as versatile or user-friendly.
-
-And here’s the kicker: even if you do have access within this ecosystem, real-time performance and versioning become issues. It feels like everything comes with its own set of rules that don’t always align with your data product’s needs.
-
-## The Visualization vs. Performance Trade-Off  
-Now, I know what some of you are thinking—PowerBI is all about making data beautiful, right? And it does a fantastic job at that. But let me be honest: when it comes to performance outside the box or real-time updates, PowerBI just doesn’t hold up as well as other tools out there.
-
-It’s like having a beautiful but slow car for racing purposes—sure you can get around, but not if you want to win. Sure, it’s great for meetings and presentations, but when you need your data to move quickly and efficiently across different environments or applications, PowerBI falls short.
-
-## The Takeaway  
-So after all that, here’s my bottom line: unless you’re in the Microsoft ecosystem—top to tail—you might be better off looking elsewhere. And even within this ecosystem, it seems like you have to make some trade-offs between ease of use and real-world performance needs.
-
-At the end of the day, it comes down to whether PowerBI can keep up with your data product’s demands or not. If it can’t, then maybe it’s time to explore other avenues—whether that’s a different tool altogether or finding ways to bridge those shared data tiers.
-
-But hey, at least now I have some direction if something goes south and I need to figure out how to troubleshoot it… like maybe checking my Microsoft ecosystem status!
+In summary, while PowerBI offers significant strengths in visualization and accessibility, it falls short when expecting to serve as an all-encompassing solution outside of its ecosystem boundaries.
--- a/src/main.py
+++ b/src/main.py
@ -17,6 +17,6 @@ for note in tril_notes:
    print("Generating Document")
    ai_gen = omg.OllamaGenerator(tril_notes[note]['title'],
                                 tril_notes[note]['content'],
-                                 "deepseek-r1:7b")
+                                 "openthinker:7b")
    os_friendly_title = convert_to_lowercase_with_underscores(tril_notes[note]['title'])
    ai_gen.save_to_file(f"/blog_creator/generated_files/{os_friendly_title}.md")
--- a/src/repo_management/repo_manager.py
+++ b/src/repo_management/repo_manager.py
@ -4,6 +4,9 @@ from git.repo import BaseRepository
 from git.exc import InvalidGitRepositoryError
 from git.remote import RemoteAction

+
+def try_something(test): 
+
 # Set the path to your blog repo here
 blog_repo = "/path/to/your/blog/repo"