8000 GitHub - alexomics/read-paf: Scripts for reading minimap2 PAF files
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

alexomics/read-paf

{"props":{"initialPayload":{"allShortcutsEnabled":false,"path":"/","repo":{"id":198815218,"defaultBranch":"main","name":"read-paf","ownerLogin":"alexomics","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2019-07-25T11:04:05.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/30386655?v=4","public":true,"private":false,"isOrgOwned":false},"currentUser":null,"refInfo":{"name":"main","listCacheKey":"v0:1646047898.3883588","canEdit":false,"refType":"branch","currentOid":"c5ae41738319a052a8cdeefe5a5474a724824d1c"},"tree":{"items":[{"name":".github/workflows","path":".github/workflows","contentType":"directory","hasSimplifiedPath":true},{"name":"tests","path":"tests","contentType":"directory"},{"name":".gitignore","path":".gitignore","contentType":"file"},{"name":"LICENSE","path":"LICENSE","contentType":"file"},{"name":"MANIFEST.in","path":"MANIFEST.in","contentType":"file"},{"name":"README.md","path":"README.md","contentType":"file"},{"name":"readpaf.py","path":"readpaf.py","contentType":"file"},{"name":"setup.cfg","path":"setup.cfg","contentType":"file"},{"name":"setup.py","path":"setup.py","contentType":"file"},{"name":"tox.ini","path":"tox.ini","contentType":"file"}],"templateDirectorySuggestionUrl":null,"readme":null,"totalCount":10,"showBranchInfobar":false},"fileTree":null,"fileTreeProcessingTime":null,"foldersToFetch":[],"treeExpanded":false,"symbolsExpanded":false,"isOverview":true,"overview":{"banners":{"shouldRecommendReadme":false,"isPersonalRepo":false,"showUseActionBanner":false,"actionSlug":null,"actionId":null,"showProtectBranchBanner":false,"publishBannersInfo":{"dismissActionNoticePath":"/settings/dismiss-notice/publish_action_from_repo","releasePath":"/alexomics/read-paf/releases/new?marketplace=true","showPublishActionBanner":false},"interactionLimitBanner":null,"showInvitationBanner":false,"inviterName":null,"actionsMigrationBannerInfo":{"releaseTags":[],"showImmutableActionsMigrationBanner":false,"initialMigrationStatus":null}},"codeButton":{"contactPath":"/contact","isEnterprise":false,"local":{"protocolInfo":{"httpAvailable":true,"sshAvailable":null,"httpUrl":"https://github.com/alexomics/read-paf.git","showCloneWarning":null,"sshUrl":null,"sshCertificatesRequired":null,"sshCertificatesAvailable":null,"ghCliUrl":"gh repo clone alexomics/read-paf","defaultProtocol":"http","newSshKeyUrl":"/settings/ssh/new","setProtocolPath":"/users/set_protocol"},"platformInfo":{"cloneUrl":"https://desktop.github.com","showVisualStudioCloneButton":false,"visualStudioCloneUrl":"https://windows.github.com","showXcodeCloneButton":false,"xcodeCloneUrl":"xcode://clone?repo=https%3A%2F%2Fgithub.com%2Falexomics%2Fread-paf","zipballUrl":"/alexomics/read-paf/archive/refs/heads/main.zip"}},"newCodespacePath":"/codespaces/new?hide_repo_select=true\u0026repo=198815218"},"popovers":{"rename":null,"renamedParentRepo":null},"commitCount":"51","overviewFiles":[{"displayName":"README.md","repoName":"read-paf","refName":"main","path":"README.md","preferredFileType":"readme","tabName":"README","richText":"\u003carticle class=\"markdown-body entry-content container-lg\" itemprop=\"text\"\u003e\u003cdiv class=\"markdown-heading\" dir=\"auto\"\u003e\u003ch1 tabindex=\"-1\" class=\"heading-element\" dir=\"auto\"\u003ereadpaf\u003c/h1\u003e\u003ca id=\"user-content-readpaf\" class=\"anchor\" aria-label=\"Permalink: readpaf\" href=\"#readpaf\"\u003e\u003csvg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"\u003e\u003cpath d=\"m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z\"\u003e\u003c/path\u003e\u003c/svg\u003e\u003c/a\u003e\u003c/div\u003e\n\u003cp dir=\"auto\"\u003e\u003ca href=\"https://github.com/alexomics/read-paf/actions/workflows/main.yml\"\u003e\u003cimg src=\"https://github.com/alexomics/read-paf/actions/workflows/main.yml/badge.svg\" alt=\"Build\" style=\"max-width: 100%;\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pypi.org/p/readpaf\" rel=\"nofollow\"\u003e\u003cimg src=\"https://camo.githubusercontent.com/9ac90b50577c554bc67e3dd87829537e0b08429e2a30173732e6558f2bd96bfa/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f72656164706166\" alt=\"PyPI\" data-canonical-src=\"https://img.shields.io/pypi/v/readpaf\" style=\"max-width: 100%;\"\u003e\u003c/a\u003e\u003c/p\u003e\n\u003cp dir=\"auto\"\u003ereadpaf is a fast parser for \u003ca href=\"https://github.com/lh3/minimap2\"\u003eminimap2\u003c/a\u003e PAF\n(\u003cstrong\u003eP\u003c/strong\u003eairwise m\u003cstrong\u003eA\u003c/strong\u003epping \u003cstrong\u003eF\u003c/strong\u003eormat) files. It is written in pure python with\nno required dependencies unless a \u003ca href=\"https://pandas.pydata.org/\" rel=\"nofollow\"\u003epandas\u003c/a\u003e DataFrame\nis required.\u003c/p\u003e\n\u003cdiv class=\"markdown-heading\" dir=\"auto\"\u003e\u003ch1 tabindex=\"-1\" class=\"heading-element\" dir=\"auto\"\u003eInstallation\u003c/h1\u003e\u003ca id=\"user-content-installation\" class=\"anchor\" aria-label=\"Permalink: Installation\" href=\"#installation\"\u003e\u003csvg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"\u003e\u003cpath d=\"m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z\"\u003e\u003c/path\u003e\u003c/svg\u003e\u003c/a\u003e\u003c/div\u003e\n\u003cp dir=\"auto\"\u003eMinimal install:\u003c/p\u003e\n\u003cdiv class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"pip install readpaf\"\u003e\u003cpre\u003epip install readpaf\u003c/pre\u003e\u003c/div\u003e\n\u003cp dir=\"auto\"\u003eWith optional \u003ccode\u003epandas\u003c/code\u003e dependency:\u003c/p\u003e\n\u003cdiv class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"pip install readpaf[pandas]\"\u003e\u003cpre\u003epip install readpaf[pandas]\u003c/pre\u003e\u003c/div\u003e\n\u003cdetails\u003e\n \u003csummary\u003eDirect download\u003c/summary\u003e\nAs readpaf is a self contained module it can be installed by downloading just \nthe module. The latest version is available from:\n\u003cdiv class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"https://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py\"\u003e\u003cpre class=\"notranslate\"\u003e\u003ccode\u003ehttps://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py\n\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003cp dir=\"auto\"\u003eor a specific version can be downloaded from a release/tag like so:\u003c/p\u003e\n\u003cdiv class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"https://raw.githubusercontent.com/alexomics/read-paf/v0.0.5/readpaf.py\"\u003e\u003cpre\u003ehttps://raw.githubusercontent.com/alexomics/read-paf/v0.0.5/readpaf.py\u003c/pre\u003e\u003c/div\u003e\n\u003cp dir=\"auto\"\u003e\u003ca href=\"https://pypi.org/p/readpaf\" rel=\"nofollow\"\u003ePyPI\u003c/a\u003e is the recommended install method.\u003c/p\u003e\n\u003c/details\u003e\n\u003cdiv class=\"markdown-heading\" dir=\"auto\"\u003e\u003ch1 tabindex=\"-1\" class=\"heading-element\" dir=\"auto\"\u003eUsage\u003c/h1\u003e\u003ca id=\"user-content-usage\" class=\"anchor\" aria-label=\"Permalink: Usage\" href=\"#usage\"\u003e\u003csvg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"\u003e\u003cpath d=\"m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z\"\u003e\u003c/path\u003e\u003c/svg\u003e\u003c/a\u003e\u003c/div\u003e\n\u003cp dir=\"auto\"\u003ereadpaf only has one user function, \u003ccode\u003eparse_paf\u003c/code\u003e that accepts of file-like object; this\nis any object in python that has a file-oriented API (\u003ccode\u003esys.stdin\u003c/code\u003e, \u003ccode\u003estdout\u003c/code\u003e from subprocess,\n\u003ccode\u003eio.StringIO\u003c/code\u003e, open files from \u003ccode\u003egzip\u003c/code\u003e or \u003ccode\u003eopen\u003c/code\u003e).\u003c/p\u003e\n\u003cp dir=\"auto\"\u003eThe following script demonstrates how minimap2 output can be piped into readpaf\u003c/p\u003e\n\u003cdiv class=\"highlight highlight-source-python notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"from readpaf import parse_paf\nfrom sys import stdin\n\nfor record in parse_paf(stdin):\n print(record.query_name, record.target_name)\"\u003e\u003cpre\u003e\u003cspan class=\"pl-k\"\u003efrom\u003c/span\u003e \u003cspan class=\"pl-s1\"\u003ereadpaf\u003c/span\u003e \u003cspan class=\"pl-k\"\u003eimport\u003c/span\u003e \u003cspan class=\"pl-s1\"\u003eparse_paf\u003c/span\u003e\n\u003cspan class=\"pl-k\"\u003efrom\u003c/span\u003e \u003cspan class=\"pl-s1\"\u003esys\u003c/span\u003e \u003cspan class=\"pl-k\"\u003eimport\u003c/span\u003e \u003cspan class=\"pl-s1\"\u003estdin\u003c/span\u003e\n\n\u003cspan class=\"pl-k\"\u003efor\u003c/span\u003e \u003cspan class=\"pl-s1\"\u003erecord\u003c/span\u003e \u003cspan class=\"pl-c1\"\u003ein\u003c/span\u003e \u003cspan class=\"pl-en\"\u003eparse_paf\u003c/span\u003e(\u003cspan class=\"pl-s1\"\u003estdin\u003c/span\u003e):\n \u003cspan class=\"pl-en\"\u003eprint\u003c/span\u003e(\u003cspan class=\"pl-s1\"\u003erecord\u003c/span\u003e.\u003cspan class=\"pl-c1\"\u003equery_name\u003c/span\u003e, \u003cspan class=\"pl-s1\"\u003erecord\u003c/span\u003e.\u003cspan class=\"pl-c1\"\u003etarget_name\u003c/span\u003e)\u003c/pre\u003e\u003c/div\u003e\n\u003cp dir=\"auto\"\u003ereadpaf can also generate a pandas DataFrame:\u003c/p\u003e\n\u003cdiv class=\"highlight highlight-source-python notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"from readpaf import parse_paf\n\nwith open(\u0026quot;test.paf\u0026quot;, \u0026quot;r\u0026quot;) as handle:\n df = parse_paf(handle, dataframe=True)\n\"\u003e\u003cpre\u003e\u003cspan class=\"pl-k\"\u003efrom\u003c/span\u003e \u003cspan class=\"pl-s1\"\u003ereadpaf\u003c/span\u003e \u003cspan class=\"pl-k\"\u003eimport\u003c/span\u003e \u003cspan class=\"pl-s1\"\u003eparse_paf\u003c/span\u003e\n\n\u003cspan class=\"pl-k\"\u003ewith\u003c/span\u003e \u003cspan class=\"pl-en\"\u003eopen\u003c/span\u003e(\u003cspan class=\"pl-s\"\u003e\"test.paf\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"r\"\u003c/span\u003e) \u003cspan class=\"pl-k\"\u003eas\u003c/span\u003e \u003cspan class=\"pl-s1\"\u003ehandle\u003c/span\u003e:\n \u003cspan class=\"pl-s1\"\u003edf\u003c/span\u003e \u003cspan class=\"pl-c1\"\u003e=\u003c/span\u003e \u003cspan class=\"pl-en\"\u003eparse_paf\u003c/span\u003e(\u003cspan class=\"pl-s1\"\u003ehandle\u003c/span\u003e, \u003cspan class=\"pl-s1\"\u003edataframe\u003c/span\u003e\u003cspan class=\"pl-c1\"\u003e=\u003c/span\u003e\u003cspan class=\"pl-c1\"\u003eTrue\u003c/span\u003e)\u003c/pre\u003e\u003c/div\u003e\n\u003cdiv class=\"markdown-heading\" dir=\"auto\"\u003e\u003ch1 tabindex=\"-1\" class=\"heading-element\" dir=\"auto\"\u003eFunctions\u003c/h1\u003e\u003ca id=\"user-content-functions\" class=\"anchor\" aria-label=\"Permalink: Functions\" href=\"#functions\"\u003e\u003csvg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"\u003e\u003cpath d=\"m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z\"\u003e\u003c/path\u003e\u003c/svg\u003e\u003c/a\u003e\u003c/div\u003e\n\u003cp dir=\"auto\"\u003ereadpaf has a single user function\u003c/p\u003e\n\u003cdiv class=\"markdown-heading\" dir=\"auto\"\u003e\u003ch2 tabindex=\"-1\" class=\"heading-element\" dir=\"auto\"\u003eparse_paf\u003c/h2\u003e\u003ca id=\"user-content-parse_paf\" class=\"anchor\" aria-label=\"Permalink: parse_paf\" href=\"#parse_paf\"\u003e\u003csvg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"\u003e\u003cpath d=\"m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z\"\u003e\u003c/path\u003e\u003c/svg\u003e\u003c/a\u003e\u003c/div\u003e\n\u003cdiv class=\"highlight highlight-source-python notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"parse_paf(file_like=file_handle, fields=list, na_values=list, na_rep=numeric, dataframe=bool)\"\u003e\u003cpre\u003e\u003cspan class=\"pl-en\"\u003eparse_paf\u003c/span\u003e(\u003cspan class=\"pl-s1\"\u003efile_like\u003c/span\u003e\u003cspan class=\"pl-c1\"\u003e=\u003c/span\u003e\u003cspan class=\"pl-s1\"\u003efile_handle\u003c/span\u003e, \u003cspan class=\"pl-s1\"\u003efields\u003c/span\u003e\u003cspan class=\"pl-c1\"\u003e=\u003c/span\u003e\u003cspan class=\"pl-s1\"\u003elist\u003c/span\u003e, \u003cspan class=\"pl-s1\"\u003ena_values\u003c/span\u003e\u003cspan class=\"pl-c1\"\u003e=\u003c/span\u003e\u003cspan class=\"pl-s1\"\u003elist\u003c/span\u003e, \u003cspan class=\"pl-s1\"\u003ena_rep\u003c/span\u003e\u003cspan class=\"pl-c1\"\u003e=\u003c/span\u003e\u003cspan class=\"pl-s1\"\u003enumeric\u003c/span\u003e, \u003cspan class=\"pl-s1\"\u003edataframe\u003c/span\u003e\u003cspan class=\"pl-c1\"\u003e=\u003c/span\u003e\u003cspan class=\"pl-s1\"\u003ebool\u003c/span\u003e)\u003c/pre\u003e\u003c/div\u003e\n\u003cp dir=\"auto\"\u003eParameters:\u003c/p\u003e\n\u003cul dir=\"auto\"\u003e\n\u003cli\u003e\u003cstrong\u003efile_like:\u003c/strong\u003e A file like object, such as \u003ccode\u003esys.stdin\u003c/code\u003e, a file handle from open or io.StringIO objects\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003efields:\u003c/strong\u003e A list of 13 field names to use for the PAF file, default:\n\u003cdiv class=\"highlight highlight-source-python notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"\u0026quot;query_name\u0026quot;, \u0026quot;query_length\u0026quot;, \u0026quot;query_start\u0026quot;, \u0026quot;query_end\u0026quot;, \u0026quot;strand\u0026quot;,\n\u0026quot;target_name\u0026quot;, \u0026quot;target_length\u0026quot;, \u0026quot;target_start\u0026quot;, \u0026quot;target_end\u0026quot;,\n\u0026quot;residue_matches\u0026quot;, \u0026quot;alignment_block_length\u0026quot;, \u0026quot;mapping_quality\u0026quot;, \u0026quot;tags\u0026quot;\"\u003e\u003cpre\u003e\u003cspan class=\"pl-s\"\u003e\"query_name\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"query_length\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"query_start\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"query_end\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"strand\"\u003c/span\u003e,\n\u003cspan class=\"pl-s\"\u003e\"target_name\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"target_length\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"target_start\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"target_end\"\u003c/span\u003e,\n\u003cspan class=\"pl-s\"\u003e\"residue_matches\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"alignment_block_length\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"mapping_quality\"\u003c/span\u003e, \u003cspan class=\"pl-s\"\u003e\"tags\"\u003c/span\u003e\u003c/pre\u003e\u003c/div\u003e\nThese are based on the \u003ca href=\"https://github.com/lh3/miniasm/blob/master/PAF.md\"\u003ePAF specification\u003c/a\u003e.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003ena_values:\u003c/strong\u003e A list of values to interpret as NaN. This is only applied to numeric fields, default: \u003ccode\u003e[\"*\"]\u003c/code\u003e\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003ena_rep:\u003c/strong\u003e Value to use when a NaN value specified in \u003ccode\u003ena_values\u003c/code\u003e is found. This should ideally be \u003ccode\u003e0\u003c/code\u003e to match minimap2's output default: \u003ccode\u003e0\u003c/code\u003e\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003edataframe:\u003c/strong\u003e bool, if True, return a pandas.DataFrame with the tags expanded into separate Series\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp dir=\"auto\"\u003eIf used as an iterator, then each object returned is a named tuple representing a single line in the PAF file.\nEach named tuple has field names as specified by the \u003ccode\u003efields\u003c/code\u003e parameter.\nThe SAM-like tags are converted into their specified types and stored in a dictionary with the tag name as the key and the value a named tuple with fields \u003ccode\u003ename\u003c/code\u003e, \u003ccode\u003etype\u003c/code\u003e, and \u003ccode\u003evalue\u003c/code\u003e.\nWhen \u003ccode\u003eprint\u003c/code\u003e or \u003ccode\u003estr\u003c/code\u003e are called on \u003ccode\u003ePAF\u003c/code\u003e record (named tuple) a formated PAF string is returned, which is useful for writing records to a file.\nThe \u003ccode\u003ePAF\u003c/code\u003e record also has a method \u003ccode\u003eblast_identity\u003c/code\u003e which calculates the \u003ca href=\"https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity\" rel=\"nofollow\"\u003eblast identity\u003c/a\u003e for that record.\u003c/p\u003e\n\u003cp dir=\"auto\"\u003eIf used to generate a pandas DataFrame, then each row represents a line in the PAF file and the SAM-like tags\nare expanded into individual series.\u003c/p\u003e\n\u003c/article\u003e","loaded":true,"timedOut":false,"errorMessage":null,"headerInfo":{"toc":[{"level":1,"text":"readpaf","anchor":"readpaf","htmlText":"readpaf"},{"level":1,"text":"Installation","anchor":"installation","htmlText":"Installation"},{"level":1,"text":"Usage","anchor":"usage","htmlText":"Usage"},{"level":1,"text":"Functions","anchor":"functions","htmlText":"Functions"},{"level":2,"text":"parse_paf","anchor":"parse_paf","htmlText":"parse_paf"}],"siteNavLoginPath":"/login?return_to=https%3A%2F%2Fgithub.com%2Falexomics%2Fread-paf"}},{"displayName":"LICENSE","repoName":"read-paf","refName":"main","path":"LICENSE","preferredFileType":"license","tabName":"MIT","richText":null,"loaded":false,"timedOut":false,"errorMessage":null,"headerInfo":{"toc":null,"siteNavLoginPath":"/login?return_to=https%3A%2F%2Fgithub.com%2Falexomics%2Fread-paf"}}],"overviewFilesProcessingTime":0}},"appPayload":{"helpUrl":"https://docs.github.com","findFileWorkerPath":"/assets-cdn/worker/find-file-worker-7d7eb7c71814.js","findInFileWorkerPath":"/assets-cdn/worker/find-in-file-worker-708ec8ade250.js","githubDevUrl":null,"enabled_features":{"copilot_workspace":null,"code_nav_ui_events":false,"react_blob_overlay":false,"accessible_code_button":true,"github_models_repo_integration":false}}}}

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

readpaf

Build PyPI

readpaf is a fast parser for minimap2 PAF (Pairwise mApping Format) files. It is written in pure python with no required dependencies unless a pandas DataFrame is required.

Installation

Minimal install:

pip install readpaf

With optional pandas dependency:

pip install readpaf[pandas]
Direct download As readpaf is a self contained module it can be installed by downloading just the module. The latest version is available from:
https://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py

or a specific version can be downloaded from a release/tag like so:

https://raw.githubusercontent.com/alexomics/read-paf/v0.0.5/readpaf.py

PyPI is the recommended install method.

Usage

readpaf only has one user function, parse_paf that accepts of file-like object; this is any object in python that has a file-oriented API (sys.stdin, stdout from subprocess, io.StringIO, open files from gzip or open).

The following script demonstrates how minimap2 output can be piped into readpaf

from readpaf import parse_paf
from sys import stdin

for record in parse_paf(stdin):
    print(record.query_name, record.target_name)

readpaf can also generate a pandas DataFrame:

from readpaf import parse_paf

with open("test.paf", "r") as handle:
    df = parse_paf(handle, dataframe=True)

Functions

readpaf has a single user function

parse_paf

parse_paf(file_like=file_handle, fields=list, na_values=list, na_rep=numeric, dataframe=bool)

Parameters:

  • file_like: A file like object, such as sys.stdin, a file handle from open or io.StringIO objects
  • fields: A list of 13 field names to use for the PAF file, default:
    "query_name", "query_length", "query_start", "query_end", "strand",
    "target_name", "target_length", "target_start", "target_end",
    "residue_matches", "alignment_block_length", "mapping_quality", "tags"
    These are based on the PAF specification.
  • na_values: A list of values to interpret as NaN. This is only applied to numeric fields, default: ["*"]
  • na_rep: Value to use when a NaN value specified in na_values is found. This should ideally be 0 to match minimap2's output default: 0
  • dataframe: bool, if True, return a pandas.DataFrame with the tags expanded into separate Series

If used as an iterator, then each object returned is a named tuple representing a single line in the PAF file. Each named tuple has field names as specified by the fields parameter. The SAM-like tags are converted into their specified types and stored in a dictionary with the tag name as the key and the value a named tuple with fields name, type, and value. When print or str are called on PAF record (named tuple) a formated PAF string is returned, which is useful for writing records to a file. The PAF record also has a method blast_identity which calculates the blast identity for that record.

If used to generate a pandas DataFrame, then each row represents a line in the PAF file and the SAM-like tags are expanded into individual series.

0