SourceCode

Source Code for event Python Jogja x Berijalan Techno Center Member of ASTRA

oleh Ekky Armandi • 23 Sep 2024

Source Code for downloading online article to Markdown

Below is the source code for downloading online article to markdown (.md) using Newspaper3k. Github Repo.

  • Dependencies to install
    newspaper3k==0.2.8
  • How to run it
    python app1.py
    or
    python3 app1.py

Source Code for downloading online article to Spreadsheets

Below is the source code for downloading online article to as a spread sheet (.csv) using Newspaper3k and Pandas. To download it as Excel file (.xlsx) don’t forget to install openpyxl library. Github Repo.

  • Dependencies to install
    rich==13.8.1
    bs4==0.0.2
    scrapy==2.11.2
    pandas==2.2.2
    requests==2.32.3
  • How to run it
    python app2.py
    or
    python3 app2.py

Source Code for Data Scraping pipeline to Vector DB for LLM using RAG system

![[images/pyjo-2/data-scraping-to-chatbot.png|Data scraping to vector db diagram flow]] Last but not least below is the source code for data scraping pipeline to LLM RAG chatbot. Github Repo.

  • Dependencies to install
    langchain==0.3.0
    langchain-chroma==0.1.4
    langchain-huggingface==0.1.0
    langchain-openai==0.2.0
    python-decouple==3.8
    requests==2.32.3
    streamlit==1.38.0
    scrapy==2.11.2
  • Pip install-r to install all dependencies
    pip install -r requirements.txt
  • Environtment Variable (.env)
    OPENAI_API_KEY = ""
  • How to run Scrapy
    scrapy crawl mojok_co
  • How to run the Streamlit chatbot
    streamlit run chatbot.py

Note: video tutorial menyusul ✌🏼. You also can read story about this event here.

Follow me on