Storing GitHub Traffic with Azure Functions

栏目: IT技术 · 发布时间: 4年前

内容简介：As an owner of a handful of large public repos for my job, one of the things that I care about is knowing the traffic to the repos to get an understanding of visibility. GitHub has this amazing feature, calledI have seen a few ways people are getting aroun

Frustrated by GitHub Insights

As an owner of a handful of large public repos for my job, one of the things that I care about is knowing the traffic to the repos to get an understanding of visibility. GitHub has this amazing feature, called GitHub Insights , which allows you to see some of this data, however it is capped at a date range of 14 days in the past. I am not sure on the long-term goals of Insights, but if that cap stays, it will frustrate folks, some of them very well known folks .

Storing Traffic Data Over Time

I have seen a few ways people are getting around this date cap, written in all sorts of languages on all sorts of platforms. They all had one thing in common, they all use the GitHub Api Traffic Endpoint to get this data. This Api has all the info I need in it, so thinking like a lazy developer, here are my options.

Go to the Traffic Pages for all my repos twice a month and collect the data (yea…. no)
Write some tool that will “poll” the GitHub Api and store the results of the Api in some data store

Azure Functions Saving Me Again

It seems like whenever I blog, the topic is usually some silly task that I have that I automate with Azure Functions, and this is a silly task, time to automate! The first things that I needed to do was spin up Visual Studio (VS Code works has great tooling for Functions as well) and create a new Azure Functions Project. After that, time to plan out what I needed to do, which I concluded was

Hit the GitHub Api to get the Traffic for some repos
Store that data somewhere I can easily build a Power BI report with

I wanted to get a working prototype as soon as possible, so I just picked a random repo out of the many I have, wired up the call to get the traffic data, which looked like

var client = new GitHubClient(new ProductHeaderValue("TrackRepos"));
var basicAuth = new Credentials(Environment.GetEnvironmentVariable("GitHubToken"));
client.Credentials = basicAuth;
var data = await client.Repository.Traffic.GetViews("owner", "repo", new RepositoryTrafficRequest(TrafficDayOrWeek.Day));

I found out very quickly that in order to get Traffic data for a repo, I need push access, so I needed a token. After getting a token from GitHub, I was seeing data come in.

Now that I had some data, I started to think about the quickest way to get that data into storage. I could put it in a SQL table, but that seemed like a ton of overkill for what I was doing(had to create a server as well) and I realized that I needed an Azure Storage account to store the metadata for Azure Functions, which also has a Table feature in that would allow me to store data (Cosmos DB is an option here, but a little overkill as well). So the next thing I needed to do was structure my data and wire up all the bits to Insert and Update Records in there. Good thing I could “borrow” all this code from the Table Api Docs so that was solved pretty quick. At this point, I wire it all up and I have a function that hits the GitHub Api and stores the results in Azure Table Storage.

Scaling for More Repos

Now that I have something working, time to build some code that I can use to collect data for all my repos. I came to an idea that if I had some JSON representation of all my repos, I could easily alter the list and the function would just use it as I go. I knew I had a hierarchy of sorts in my case.

Campaign Name (Collection of Repos)
Organization Name (Also known as the “owner”)
- Repository Name (The Repo itself)

So I created a class structure and JSON schema that represents that.

[
  {
    "CampaignName": "",
    "OrgName": "",
    "Repos": [ { "RepoName": "" } ]
  },
  {
    "CampaignName": "",
    "OrgName": "",
    "Repos": [
      { "RepoName": "" }
    ]
  }
]

Now with a schema. I will be able to effectively store and group all my traffic data. I decided to store this JSON file that houses my campaigns as a blob in the same Azure Storage account as the data store, reuse is awesome! So the flow my code looks like this

Read the JSON campaigns from Blob Storage
Loop through all the campaigns and repos in campaigns
Call GitHub Api to get traffic data
Build POCO that represents a row in Azure Table Storage
Store said data

A list is cool, but what about the code?

var client = new GitHubClient(new ProductHeaderValue("TrackRepos"));

var basicAuth = new Credentials(Environment.GetEnvironmentVariable("GitHubToken"));
client.Credentials = basicAuth;

string storageConnectionString = Environment.GetEnvironmentVariable("AzureWebJobsStorage");
BlobServiceClient blobServiceClient = new BlobServiceClient(storageConnectionString);
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient("repos");
BlobClient blobClient = containerClient.GetBlobClient("Repos.json");
BlobDownloadInfo download = await blobClient.DownloadAsync();

List<Campaign> campaigns = JsonConvert.DeserializeObject<List<Campaign>>(new StreamReader(download.Content).ReadToEnd());

List<RepoStats> views = new List<RepoStats>();

foreach (Campaign campaign in campaigns)
{
    if (!string.IsNullOrEmpty(campaign.CampaignName) && !string.IsNullOrEmpty(campaign.OrgName))
        foreach (Repo repo in campaign.Repos)
        {
            if (!string.IsNullOrEmpty(repo.RepoName))
            {
                var data = await client.Repository.Traffic.GetViews(campaign.OrgName, repo.RepoName, new RepositoryTrafficRequest(TrafficDayOrWeek.Day));
                foreach (var item in data.Views)
                {
                    var stat = new RepoStats($"{campaign.CampaignName}{repo.RepoName}", item.Timestamp.UtcDateTime.ToShortDateString().Replace("/", ""))
                    {
                        OrgName = campaign.OrgName,
                        CampaignName = campaign.CampaignName,
                        RepoName = repo.RepoName,
                        Date = item.Timestamp.UtcDateTime.ToShortDateString(),
                        Views = item.Count,
                        UniqueUsers = item.Uniques
                    };
                    views.Add(stat);
                }
                Thread.Sleep(3000);
            }
        }
}

string tableName = "RepoStats";
CloudTable table = await TableStorageHelper.CreateTableAsync(tableName);

foreach (var view in views)
{
    Console.WriteLine("Insert an Entity.");
    await TableStorageHelper.InsertOrMergeEntityAsync(table, view);
}

I am glossing over a bit here, but most of the code you don’t see is boilerplate, and you can look at the repo if you are curious. Now all I needed to do was hook it up to a function that runs on a CRON timer for once a day, and reap the rewards.

Hope you enjoy!

I had fun with this little project, and it solved a need, exactly what I love. From idea to deployed it took a little less than an hours, man Azure Functions are awesome. As always, feel free to let me know your thoughts. TTFN!

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Storing GitHub Traffic with Azure Functions

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

长尾理论

[美]克里斯·安德森 / 中信出版集团股份有限公司 / 2015-8-1 / 59.00元

互联网时代，大众市场不再一统天下，小众市场也可以呼风唤雨。在《长尾理论》一书中，克里斯·安德森详细阐释了长尾的精华所在，指出商业和文化的未来不在于传统需求曲线上那个代表“畅销商品”的头部，而是那条代表“冷门商品”的经常被人遗忘的长尾。尽管我们仍然对热门商品着迷，但它们对消费者的吸引力已经大不如从前，因为市场已经大大分化。黄金电视节目的收视率几十年来一直在萎缩，若是在七八十年代，现在的一档最......一起来看看《长尾理论》这本书的介绍吧!

码农工具

图片转BASE64编码

在线图片转Base64编码工具

XML、JSON 在线转换

在线XML、JSON转换工具