Generating PHP Project Dependency Graph with Python and JavaScript

Generating Dependency graph for my Weixin Official Account Project which has a CLI api back-end, articles module and editor module serving as blog a like website.

Preparation

PHP Scripts Handling

Best Practice of Scripts Including

As a widely agreed rule, using absolute path for including scripts should be adopted. Since the scripts may be included by scripts anywhere else. And this is also the reason why including statements in the formate require_once dirname(__FILE__).'/included.php'; are so widely adopted.

My Old Practice

One foregone style of including statements adopted by me is as following.

require_once __DIR__."/config.php"; // define('APP_ASSETS_PATH', __DIR__."/assets/");
require_once APP_ASSETS_PATH."php/dbhelper.php";

The above APP_ASSETS_PATH is the constant defined using a series of define statements as define('APP_ASSETS_PATH', __DIR__."/assets/"); in config.php file.

The advantages using this style are as follows: 1. using one config.php to define path constants makes it easy to chance assets folder location. After every change, only modification to constants definition is required. 2. using this style, the average line length of including statements is much more shorter.

My Current Practice

However, the above style is abandoned by me today, since I have the requirement to generate PHP scripts dependency relationship. After Googling for a while, there seems to be no out-of-box solution, and I have found one good base Python script for generate the dependency relationship. So, in order to use Python script to generate this dependency relationship by reading PHP source code, I have to use better including statements style which can be understand by Python and still neat.

And as the result, the following include style is adopted.

require_once __DIR__.DIRECTORY_SEPARATOR.'php/dbhelper.php';

Advantages include: 1. relative path inside single quotes; 2. while in fact, absolute path is used considering constants prefix __DIR__.DIRECTORY_SEPARATOR.;

Thanks to advantages point 1 & 2, I can use python script to generate the dependency relationship among PHP scripts by reading source codes considering that Python cannot understand PHP constants and variables (at least, in a direct way without outside dictionary support) but able to understand path.

More Consideration

  1. Single quotes vs. double quotes: Since single quotes and double quotes have differences in PHP interpretation, for including scripts, better practice is to use single quotes since path is better to contain no variables which make Python can understand the path. As an example for illustration, if require_once "$basePath./config.php"; is used in PHP, there would be some problem for Python to interpret this path since Python has no knowledge about $basePath unless more efforts are contributed.
  2. [include, include_once, require or require_once]http://stackoverflow.com/questions/3546160/include-include-once-require-or-require-once
  3. whether to have a slash / at the beginning of quoted path. My answer, no.

Python Environment Check

import sys
print(sys.executable)
print(sys.version_info) # `sys.version`

Generating the Relationship

Common sense, there is usually 2 ways to finish the job, manually or programmably. Since I regard myself as a programmer. As common practice, if I am able to make jobs done by scripts without too much more efforts comparing to manually, I would do it by programming since low future cost when similar jobs need to be done in the future.

For generating the relationship, one way of course is using your hand. Just as the following initial stage draft shows.

Manual Draft of PHP Dependency Relationship Graph

Since my project is not really mini, quickly, I realized it is really exhausted to do this job by hand. And apparently, since I am using pen and paper, it is really not easy to change the graph style and so easily to be come a mess when more dependency relationship is included. So I gave up doing it by hand.

Of course, generating the relationship and generating the relationship graph is not the same job. While, manually generating relationship in Excel should never be an easy job. And in fact, for manually processing, it is the same process for generating the relationship comparing to generating the graph, and generating the graph could be even easier since no restricted typing is required.

In this section, I will just cover the process of generating the relationship but not the graph.

For generating the relationship, the following steps are needed:

  1. for a given folder, get the list of PHP files;
  2. read PHP files one by one, and based on current file contents (regex match results for include, include_once, require and require_once), generate dependent file list for current file and generate tuple list for current file in the formate of (<current_file>, <dependent_file_i>);
  3. combine all tuple list for each file into the final list;
  4. depending on your preference, output this final list in different formate, such as json, csv or dotFile.

In my case, the output in 4 different formate are as follows.

Formate 1 - raw python list (as example, to save space, I truncate the list)

[('api/_debug.php', 'api/index.php'), ('api/apiinvoker.php', 'firebase/firebaseInterface.php'), ('api/apiinvoker.php', 'firebase/firebaseStub.php'), ('api/apiinvoker.php', 'firebase/firebaseLib.php'), ('api/games.php', 'api/config.php'),  ('editor/index.php', 'php/dbhelper.php'), ('editor/index.php', 'php/login.php')]

Formate 2 - json file (as example, to save space, I truncate the json)

{
    "multigraph": false,
    "directed": true,
    "links": [
        {
            "target": 8,
            "source": 1
        },
        {
            "target": 39,
            "source": 1
        },
        {
            "target": 17,
            "source": 3
        },
        {
            "target": 1,
            "source": 3
        }
    ],
    "graph": {},
    "nodes": [
        {
            "id": "php/httputility.php"
        },
        {
            "id": "php/qrcodepage.php"
        },
        {
            "id": "php/dbhelper.php"
        },
        {
            "id": "articles/article-dbversion.php"
        }
    ]
}

Formate 3 - csv file (as example, to save space, I truncate the csv file)

source,target
api/_debug.php,api/index.php
api/apiinvoker.php,firebase/firebaseInterface.php
api/apiinvoker.php,firebase/firebaseStub.php
api/apiinvoker.php,firebase/firebaseLib.php
api/games.php,api/config.php
api/index.php,api/config.php
api/index.php,php/dbhelper.php
api/index.php,php/httputility.php
api/index.php,php/wxclisdk.php
api/index.php,php/dbconnect.php

Formate 4 - dot file (as example, to save space, I truncate the dot file)

digraph {
    "api/_debug.php" -> "api/index.php";
    "api/apiinvoker.php" -> "assets/php/firebase/firebaseInterface.php";
    "api/apiinvoker.php" -> "assets/php/firebase/firebaseStub.php";
    "api/apiinvoker.php" -> "assets/php/firebase/firebaseLib.php";
    "api/games.php" -> "api/config.php";
    "api/index.php" -> "api/config.php";
    "api/index.php" -> "assets/php/dbhelper.php";
    "api/index.php" -> "assets/php/httputility.php";
    "api/index.php" -> "assets/php/wxclisdk.php";
    "api/index.php" -> "assets/php/dbconnect.php";
    "api/index.php" -> "api/apiinvoker.php";
}

Generating the Graph

As you can notice from above, I output a d3 friendly json file and aslo a Graphviz friendly dot file.

So in this step, I am going to generate the relationship graph base on above output relationship data.

Since output formate 1, is the raw Python list data, I am not going to visualize this data set. Although Python has the ability to do visualization, for me, it is not the best way.

For formate 2, json data, I used d3.js to generate the following force directed graph.

Force Directed Graph for Dependency Relationship

PDF version of above graph can be accessed via this link: Force Directed Graph

For formate 3, csv data, I used d3.js to generate the following chord diagram.

Chord Diagram for Dependency Relationship

PDF version of above graph can be accessed via this link: Chord Diagram

For formate 4, DOT data, I used Graphviz to generate the following hierarchical graph.

Hierarchical Graph for Dependency Relationship

SVG version of above graph can be accessed via this link: Hierarchical Graph

Afterwords

In fact, the core part of this PHP project is implemented at the first half of 2016. Since I was not out during Christmas time, I spend sometime organizing this old project. And since I will go back to Shanghai in January 2017 and have to find a new job then, I hope this project can give me some advantage for finding a good job.

I am interested in Artificial Intelligence, especially Computer Vision. If I am not lucky enough to find a job in Artificial Intelligence industry, Data Analysis job is also my choice. Since I have a relatively good background and enjoy doing some innovation in Algorithm Programming, I would really like to do a job where innovation algorithm need to be developed.

If anyone who would like to recommend me a job, millions of thanks.

References


* cached version, generated at 2019-06-22 03:46:09 UTC.

Subscribe by RSS