LEDE Table of Packages: Good or bad?

quick update: the script works, although I still need to optimize it a bit as it is kinda slow.

For those that want to boast a bit: LEDE/OpenWRT has nearly 1000 packages.

I don't care about speed. Gimme data! :smiley:

One question that comes to my mind: How do we identify unmaintained packages? By Maintainer Name = empty?
Is this then set automatically via the script?

[quote="tmomas, post:23, topic:123, full:true"]
One question that comes to my mind: How do we identify unmaintained packages? By Maintainer Name = empty?
Is this then set automatically via the script?
[/quote]afaik all packages without mantainer are in LEDE base, and are core stuff like zlib, so they aren't really "unmantained".

I can tell the script to write a "LEDE team" for such packages. I can tell the script to add many things based on filters I add in it.

afaik all packages without mantainer outside of LEDE base are purged from the repos and their source ends up in the "abandoned packages repository" on github https://github.com/openwrt/packages-abandoned

I'm sure we will find some weird outlier here and there.

I don't care about speed. Gimme data!

I'm multithreading it as to get data about multiple arcs I need to run it over all package lists (all arcs), if it is running in single thread would take many many hours, it's embarassing.

This is the script with basic multithreading, but extracting data from the community packages repo list is kinda slow (it's a 1.2 MB text file), I'm still implementing better multithreading in that part.
Feel free to suggest a better approach or bash me for some stupid logic mistake or something.
My comments are verbose, I know, they are mostly for me.

#!/bin/bash

# Copyright 2016 Alberto Bursi alberto.bursi@outlook.it

# This is free software, licensed under the GNU General Public License v2.



# This script's job is to load the package lists shown in the LEDE wiki.
# It starts by downloading package lists then merges them to get the arch list
# in packages that aren't available to all
# then it downloads LEDE source and OpenWRT package feeds to extract other info.

# For this script are required basic coreutils (ask/sed/grep), wget and git.
# The script assumes you have installed these tools already.


#COMMON VARIABLES (not global, but used throughout the script nontheless

newpackages_folder="./newpackages_folder"

#package list generic link
package_list_main_URL="https://downloads.lede-project.org/snapshots/packages/"

#types of repos there are currently
LEDE_repos="base luci routing telephony packages"

#the only release we have currently
package_release="trunk"

# package_lists_base=https://downloads.lede-project.org/snapshots/packages/x86_64/base/
# package_lists_luci=https://downloads.lede-project.org/snapshots/packages/x86_64/luci/
# package_lists_routing=https://downloads.lede-project.org/snapshots/packages/x86_64/routing/
# package_lists_telephony=https://downloads.lede-project.org/snapshots/packages/x86_64/telephony/

#cleaning work dir
rm -rf "$newpackages_folder"
mkdir -p "$newpackages_folder"

write_to_file(){
#writing the package file


#generating the correct repository name and bugreport link

case $repository in

base)
    repository_name="base"
    package_bugreport_link="https://bugs.lede-project.org/"
;;

luci)
    repository_name="" # (not needed here, the package section is already luci)
    package_bugreport_link="https://github.com/openwrt/luci/issues"
;;

routing)
    repository_name="routing"
    package_bugreport_link="https://github.com/openwrt-routing/packages/issues"
;;

telephony)
    repository_name="telephony"
    package_bugreport_link="https://github.com/openwrt/telephony/issues"
;;

packages)
    repository_name="community-packages"
    package_bugreport_link="https://github.com/openwrt/packages/issues"
;;

esac

    if [ $already_listed = "0" ]; then

    cat <<ENDofINPUT > /tmp/temporaryfile-"$architecture"-"$repository"-"$package_name"
---- dataentry packages ----
Name_pkg-page                    : $package_name # Name of the package
Version                          : $package_version # Version of the package
Description                      : $package_description # Description, max. 1kB
Installed size kilobytes (jffs2) : $package_installed_size # size occupied on jffs2 or ubifs when installed, without dependencies
Dependenciess                    : $package_dependencies # Dependencies of this package
Categories_pkg-category          : $repository_name $package_section # Select multiple categories
Architectures_pkg-arch           : $package_architecture # Select multiple architectures
LEDE releases_lede-release       : $package_release # Select multiple releases
File name                        : $package_filename # File name of the package
File size (kilobytes)            : $package_filesize # .ipk size in kbytes, without dependencies
License                          : $package_license # License type.
Maintainer_pkg-maintainer        : $package_maintainer # Maintainer name
Bug report                       : $package_bugreport_link # bugreport link
----
ENDofINPUT

#these are package-specific
# MD5Sum                       : $package_md5sum # Md5sum if any.
# SHA256sum                    : $package_sha256sum # SHA256sum if any.

    #for some reason I can't generate a file with a variable in it, so I must use this trick
    mv /tmp/temporaryfile-"$architecture"-"$repository"-"$package_name" "$newpackages_folder/$package_name" &

else

    # if there is already a text file we need to load the package's architecture from the main loop variable
    package_architecture=$architecture

    #loading current architectures
    additonal_package_architecture=$(cat $newpackages_folder/$package_name | sed "s/#.*//" | sed -n '8p' | awk -F":" '{print $2}')

    #echo $additonal_package_architecture

    sed -i '8s/.*/Architectures_pkg-arch           :'"$additonal_package_architecture $package_architecture # Select multiple architectures"' /' $newpackages_folder/$package_name

    already_listed=0

fi

}


extract_package(){

file_source=$1

#setting a flag
already_listed=0

# number=$2   #1
#
# maxlines=$3   #"$(wc -l  "$file_source" | awk  '{print $1}' )"
#
# #if already listed read only the line 8 then skip the stuff below
#
# while [ "$number" != "$maxlines" ] ; do

while IFS= read -r line; do

#reading first line, printing only first block, we must know what this line is for.
line_type=$( printf '%s\n' "$line" | awk '{print $1}')

#printf '%s\n' "$line"

case $line_type in

    Package:)

        #first checking if we have a package name already loaded, if so we need to print down info first.
        if [ $package_name != "" ]; then

                #removing doublespaces
        package_description=$( echo "$package_description" | sed 's/^..............//' | sed 's/  \+/ /g' )

        #writing all to file now
        write_to_file

        #cleaning all variables
        package_version="" # Version of the package
        package_description="" # Description, max. 1kB
        package_installed_size="" # raw .ipk size in kbytes, without dependencies
        package_dependencies="" # List dependencies of this package
        package_section="" # Select multiple categories
        package_architecture="" # Select multiple architectures
        package_filename="" # File name of the package
        package_filesize="" # .ipk size in kbytes, without dependencies
        package_md5sum="" # Md5sum if any.
        package_sha256sum="" # SHA256sum if any.
        package_license="" # License type.
        package_maintainer="" # Maintainer name
        package_bugreport_link="" # bugreport link


        fi

        #loading package name, reading first line with sed and then printing what is after the ":" with awk
        #then removing a space before the name
        package_name=$(printf '%s\n' "$line" | awk -F":" '{print $2}' |  sed 's/^.//')

        #deleting the line
        #sed -i '1d' "$file_source"


        echo "$package_name $architecture"

        # if package's text file exists we are NOT reading a new package,
        if [ -f "$newpackages_folder/$package_name"  ] ; then

            #we set the flag for the writer system later
            already_listed=1

        else

            #we set the flag for the writer system later
            already_listed=0

        fi



    ;;

    Version:)
        #loading only if needed
        if [ "$already_listed" = "0" ]; then
        package_version=$(printf '%s\n' "$line" | awk -F":" '{print $2}' |  sed 's/^.//')
        fi

    ;;
    Depends:)
        #loading only if needed
        if [ "$already_listed" = "0" ]; then
        package_dependencies=$(printf '%s\n' "$line"  | awk -F":" '{print $2}' |  sed 's/^.//')
        fi

    ;;
    Source:)
        #loading only if needed
        if [ "$already_listed" = "0" ]; then
        package_source_folder=$(printf '%s\n' "$line"  | awk -F":" '{print $2}' |  sed 's/^.//')
        fi

    ;;
    License:)
        #loading only if needed
        if [ "$already_listed" = "0" ]; then
        package_license=$(printf '%s\n' "$line"  | awk -F":" '{print $2}' |  sed 's/^.//')
        fi

    ;;
    Section:)
        #loading only if needed
        if [ "$already_listed" = "0" ]; then
        package_section=$(printf '%s\n' "$line"  | awk -F":" '{print $2}' |  sed 's/^.//')
        fi

    ;;
    Maintainer:)
        #loading only if needed
        if [ "$already_listed" = "0" ]; then
        package_maintainer=$(printf '%s\n' "$line"  | awk -F":" '{print $2}' | sed "s/<.*//" |  sed 's/^.//') # removing email
        fi

    ;;
    Architecture:)
        package_architecture="$architecture"     #$(sed -n "$number"p "$file_source" | awk -F":" '{print $2}' |  sed 's/^.//')


    ;;
    Installed-Size:)
        #loading only if needed
        if [ "$already_listed" = "0" ]; then
        package_installed_size=$(printf '%s\n' "$line"  | awk -F":" '{print $2}' |  sed 's/^.//')

        package_installed_size=$(( $package_installed_size / 1024  ))

        fi

    ;;
    Filename:)
        #loading only if needed
        if [ "$already_listed" = "0" ]; then
        package_filename=$(printf '%s\n' "$line"  | awk -F":" '{print $2}' |  sed 's/^.//' | sed "s/$architecture/<architecture>/" | sed "s/all/<architecture>/")

        fi

    ;;
    Size:)
        #loading only if needed
        if [ "$already_listed" = "0" ]; then
        package_filesize=$(printf '%s\n' "$line"  | awk -F":" '{print $2}' |  sed 's/^.//')

        package_filesize=$(( $package_filesize / 1024  ))
        fi

    ;;
#     MD5Sum:)
#         #loading only if needed
#         if [ "$already_listed" = "0" ]; then
#         package_md5sum=$(printf '%s\n' "$line"  | awk -F":" '{print $2}' |  sed 's/^.//')
#         fi
#
#     ;;
#     SHA256sum:)
#         #loading only if needed
#         if [ "$already_listed" = "0" ]; then
#         package_sha256sum=$(printf '%s\n' "$line"  | awk -F":" '{print $2}' |  sed 's/^.//')
#         fi
#
#     ;;
    Description:)
        #description can be longer, I need a loop to load all until I get to an empty line
        #Also checking if the file is empty or it locks up


        package_description="$(printf '%s\n' "$line" )"



    ;;

    *)
        #empty space or stuff we don't care about
        package_description="$package_description $(printf '%s\n' "$line" )"

    ;;

esac

#done

done < $file_source

}

load_packages_from_repo(){

repository=$1
#workfolder
workfolder="/tmp/listpackages-folder"
rm -fr "$workfolder"
mkdir -p "$workfolder"

package_list="$workfolder/package-list-$architecture-$repository"

        wget -O "$package_list" "$package_list_main_URL/$architecture/$repository/Packages"

        extract_package "$package_list" #1 "$(wc -l  "$package_list" | awk  '{print $1}' )"

        rm -f "$package_list"

        echo "loaded all packages of $repository"

}


load_packages_of_all_archs(){

for architecture in $LEDE_archs ; do

    load_packages_from_repo $1


done
}

#generating automatically the list of archs
#downloading the indexpage and piping the pagesource instead of saving it
#removing all lines that don't start with "<a" with sed
#extracting the arch name with awk (it is the second "block" if the awk separator is set as " )
#removing the / at the end of it with sed
LEDE_archs=$(wget -q -O- "$package_list_main_URL" | sed '/^\(<a\)/!d' | awk -F\" '{print $2}' | sed s'/.$//')

echo $LEDE_archs

#downloading now all package lists of all archs $LEDE_archs

#LEDE_repos="base luci routing telephony packages"

   load_packages_of_all_archs base &
   load_packages_of_all_archs luci &
   load_packages_of_all_archs routing &
   load_packages_of_all_archs telephony &

   load_packages_of_all_archs packages




echo "loaded all packages of $architecture"

Btw, I still need to see if I can extract data from makefiles to make this faster (maybe they have all archs listed in one file, need to check) and to get better categories (makefiles contain SUBMENU category used in menuconfig that is missing in the package lists)

EDIT: for better understanding of that, I recommend copy-pasting into a decent text editor with syntax highlighting. I'm using Kwrite as it does also have basic IDE functions like shrinking functions/loops and autocomplete.

Please without (jffs2) and (kilobytes).
The first column represents the naming of the field, and you would have to carry (jffs2) and (kilobytes) in the naming each and everywhere you use this field in datatables, making columns wider than necessary. Any explanations on a field should go into the Comment.

BTW: How fast/slow is your script? How long does it take to process the approx. 1000 packages?

[quote="tmomas, post:25, topic:123, full:true"]
Please without (jffs2) and (kilobytes).
The first column represents the naming of the field, and you would have to carry (jffs2) and (kilobytes) in the naming each and everywhere you use this field in datatables, making columns wider than necessary. Any explanations on a field should go into the Comment.[/quote]Ok, removed.

I also dropped MD5 and SHA256sum as they are architecture-specific (so each package would have like 30 different MD5/SHA256).

[quote]BTW: How fast/slow is your script? How long does it take to process the approx. 1000 packages?[/quote]There are actually around 4300 packages, I said 1000 because 3/4 of the packages are dumb shovelware like luci language modules, or php/python/ruby/whatever specific libraries.

Anyway, it reads/converts most of the 4300 packages (+ rescan for each architecture) in like a couple hours, but most of the time is spent on community package lists in single thread, if I can solve that I can shrink that to like 30 minutes probably.

And I'm using pipes profusely, reading the file without using shell loops or other inefficient stuff.

Also, the more multithreading part of the script (first 10 minutes) loads 100% my Xeon E3-1275 V2 (ivy bridge, slightly more powerful than best non-enthusiast ivy bridge i7).

Anyway, I added also the github makefile reading (no performance impact as it is forking! yay!), and I'm currently running the script to generate the package txt files.

I've looked at wiki media manager, I can upload multiple files, it's great and all, but only in "media" folder. Can you grant me permission to upload files in the wiki proper so I can upload these in /pkgdata ?

Or I can upload a zip on dropbox for you to load in there (I'd rather be able to load data on my own though).

as a comparison, Debian has around 57000 packages in Jessie (current stable).
(number of lines in this https://packages.debian.org/stable/allpackages?format=txt.gz )

Btw, this is a link to the zip file with all packages converted, if you want to take a look.

Good idea. Yes, please!

"abandoned" or "unmaintained" would be good.
What I've got in mind: Show unmaintained packages on a wiki page:

< rough idea>
WANTED: Package maintainers
If you feel skilled enough and want to contribute to LEDE: [[linkto how-to-become-a-package-maintainer|Become a package maintainer]]! We've got lot's of work to offer, combined with no payment, but lots of thankful people all over this planet.

---- datatable ----
cols   : ....
filter : Maintainer name=unmaintained
----

< /rough idea>

You can now upload + delete files in /pkgdata/
You can also provide a zip which I will then extract to /pkgdata in one go. Whatever creates the least work for you.

[quote="tmomas, post:28, topic:123, full:true"]Good idea. Yes, please![/quote] Will do tomorrow.

[quote]"abandoned" or "unmaintained" would be good.
What I've got in mind: Show unmaintained packages on a wiki page:[/quote]Lol, that's cool but I'd say it is secondary. If you see the abandoned package repository https://github.com/openwrt/packages-abandoned there are like 30 packages tops.

Most of the packages worthy of adding to LEDE are in random public git repos, see what a github search yelds https://github.com/search?utf8=✓&q=openwrt+packages&type=Repositories&ref=searchresults

So if you want to make a page like that it will have to link to that search.

I also dropped MD5 and SHA256sum as they are architecture-specific (so each package would have like 30 different MD5/SHA256).

do the checksum on the source, not the binaries.

David Lang

[quote="dlang, post:31, topic:123, full:true"]

I also dropped MD5 and SHA256sum as they are architecture-specific (so each package would have like 30 different MD5/SHA256).

do the checksum on the source, not the binaries.

David Lang
[/quote]Who will use the checksum of the source? the checksum of the packages is used to ensure that the download was successful.

[quote="dlang, post:31, topic:123, full:true"]

I also dropped MD5 and SHA256sum as they are architecture-specific (so each package would have like 30 different MD5/SHA256).

do the checksum on the source, not the binaries.

David Lang
[/quote]Who will use the checksum of the source? the checksum of the packages is used to ensure that the download was successful.

The download mechanism should ensure that the download is complete (and
everything except ftp does this today)

David Lang

[quote="dlang, post:33, topic:123, full:true"]The download mechanism should ensure that the download is complete (and everything except ftp does this today)

David Lang
[/quote]opkg (package manager in LEDE) checks package checksum (Chaos Calmer uses MD5, LEDE trunk uses SHA256), as you can't rely on the network stack alone.

[quote="tmomas, post:29, topic:123, full:true"]You can now upload + delete files in /pkgdata/
[/quote]Hmm, closer but still nope. I created a dummy start page to get the namespace to show up https://wiki.lede-project.org/pkgdata/start
But it seems I cannot upload plain text files, it fails with "Upload denied. This file extension is forbidden!"
I tried uploading pdf files (actually a renamed text file) and it works (but of course it's then treated like a file, not as wiki pages)

It does not seem to like text files without extension, nor text files with .txt extension.

I think this needs some tweaking again, btw, for me isn't an issue to rename all files with a specific extension. :relaxed:

Try again, I allowed txt upload now.
Please let me know when you are finished, since this possibility must not exist longer than needed.

Only now I realized that there are some multiselects, which need to be changed slightly:
Categories_pkg-category -> Categories_pkg-categorys
Architectures_pkg-arch -> Architectures_pkg-archs
LEDE releases_lede-release -> LEDE releases_lede-releases

Obviously without the ** marking.

Sidenote1: That's one more strange thing about this forum editor: You can not format substrings (e.g. bold).
Substring -> the part between the ** should be bold, but it isn't (at least in the preview).
Yes, S ubstr ing works, but is looking odd.

Sidenote2: The reason for the dataentry update I ran this morning was exactly this: I forgot to add an 's' at the end of "Flash MB", which lead to this field showing up only as dropdown, not a multiselect. D'oh! :slight_smile:

...and if you want to show up "Bug report" as link, then you need to name it "Bug report_url"

Try putting two backticks before the first ** as workaround:

sub``**sub**string = sub``substring

You are certainly going to write that down somewhere, so others can look it up later? :wink: