Commerical Anti Virus solutions are generally not able to deal with the internal Git data structure, so simply scanning the repository is usually not sufficient. You could checkout each commit, going back through history, and scan the working directory for each commit. That would likely be a rather time consuming process, but technically possible.
There are two recommendations which can help mitigate the threat of spreading viruses through Git repositories:
Client side virus scanning. If an infected file was contained in a repository and pulled onto a client desktop, it should be detected at that time and a security response would be initiated based on your client side protections.
External Virus scanning as part of a continuous integration process. This would be similar to any other testing process that you performed upon pushes to your repositories, where code or files that contained known viruses would receive failed test statuses.
Regarding GitHub Enterprise Server specifically; In order to provide the best support experience and most stable system, we discourage the installation of an anti-virus solution or any other extra software on an GitHub Enterprise Server instance itself. If you have any additional questions regarding your specifc GitHub Enterprise environment, please reach out via our Enterprise Support Portal.
Cheers, Ryan
... View more

Understanding your graphs part 7 - Pre-Recieve Hooks, Git caching, and Cluster (HA) ping
In part 6 of our 'Understanding your graphs' mini-series, we talked about GitHub Enterprise System services. In part 7 - our final article in this mini-series - we'll dive into GitHub Enterprise Pre-receive hooks, Git caching, and Cluster (HA) ping graphs.
Custom hooks
Graphs related to pre-receive hook execution.
Pre-Receive Hooks
Execution time of pre-receive hooks, in milliseconds.
Pre-recieve hooks have a non-configurable 5 second timeout. Longer or more indepth checks should be performed via CI and reported as a required Status Check to the relevant Pull Request instead.
Git fetch caching
GitHub Enterprise will attempt to cache intensive operations, such as git pack-objects , when multiple identical requests arrive in quick succession.
Cached Requests
Git client requests which GitHub Enterprise cached.
High sustained rates of git requests being cached can be a result of clients polling for changes.
Served Requests
Git client requests that GitHub Enterprise was able to serve from cache.
Indicates a detected "Thundering Herd" of requests.
Ignored Requests
Requests ignored by the caching system, as they were not good candidates for caching.
High sustained rates of ignored requests may also indicate polling, however the built in caching was unable to provide any benefit. These requests may have lower performance overall.
Cluster
Graphs related to GitHub Enterprise High Availability or Clustering.
Cluster ping
High ping response times between HA and Cluster nodes may impact replication performance.
In Geo-replication environments ping times between replicas may reach upwards of several hundred milliseconds. Overall Git push speeds in Geo-replication environments will also be impacted by this latency.
Continue the conversation
This concludes our "Understanding your graphs" mini-series. Thanks so much for following along! If you'd like to read back on or reference all the articles from this mini-series, just subscribe to the "Understanding your graphs" tag (link below). Please let us know if you have any questions in the comments.
... View more

Understanding your graphs part 6 - System services
In part 5 of our 'Understanding your graphs' mini-series, we talked about GitHub Enterprise Network and Storage graphs. In part 6, we're going to talk about GitHub Enterprise system service graphs.
System services graphs contain data related to the major databases on GitHub Enterprise. These are MySQL and Elasticseach persistent databases, as well as Redis and Memcached which contain ephemeral data.
Memcached
Memcached provides a layer of in-memory caching for web and API operations. Memcached helps to provide quicker response times for users and integrations interacting with the system.
Memcached usage
Displays the remaining free memory that Memcached can consume if required.
On systems with over 64GB RAM, it is possible for this to reach 0, indicating that the daemon has cached the maximum amount of data for the instance.
Memcached operations
Abnormal spikes or plateaus in operations per second could indicate polling or other busy activity periods.
API and Web UI requests will often have portions of the response cached in memcache to speed up future requests for the same data.
MySQL
MySQL is the primary database in GitHub Enterprise. User, issue, and other non-git or search related metadata is stored within MySQL.
MySQL usage
Reflects memory InnoDB buffer pool memory used.
Memory used by the buffer pool is typically very constant after the system has been warmed up and running with a few hours of normal usage.
MySQL threads
Running and cached threads may fluctuate based on activity.
Connected will reflect a constant value in most cases.
MySQL operations
Select activity is most commonly the highest value on this graph, and reflects user and API integration activity most closely as data is read from the database.
Insert and update trends will be influenced by API POST activity, or other operations which write to the database, such as issue comments.
MySQL rows
The rows read from the database is most commonly the highest value on this graph.
Redis
The Redis database mainly contains background job queue, as well as session state information.
Redis usage
Memory used by Redis is typically constant after the system has warmed up and is running with normal usage patterns.
Redis operations
Large spikes in Redis operations may indicate an issue with background job processing.
Elasticsearch
Elasticsearch powers the built in search features in GitHub Enterprise.
Elasticsearch index
The number of documents and bytes may increase over time, as data is indexed through normal operation of the GitHub Enterprise appliance.
Elasticsearch operations
Fetch and query operation trends will fluctuate based on usage of GitHub Enterprise Search, and the Search API.
Elasticsearch memory
Non-heap typically remains constant, while the heap memory fluctuates as background garbage collection and reindexing occurs.
Elasticsearch garbage collection
Elasticsearch is a Java based service, and Java is a garbage-collected language. It is normal for these operations to regularly occur on a GitHub Enterprise system.
Abnormal trends of garbage collection could indicate a performance problem with the Elasticsearch service.
Continue the conversation
There's more to come in the "Understanding your graphs" mini-series. If you'd like to follow along, just subscribe to the "Understanding your graphs" label (link below). Please let us know if you have any questions in the comments.
... View more

Understanding your graphs part 5 - Network and Storage
In part 4 of our 'Understanding your graphs' mini-series, we talked about GitHub Enterprise Application servers and Background job graphs. In part 5, we're going to talk about GitHub Enterprise Network and Storage graphs.
Network
The network interface graphs can be useful in profiling user activity, and throughput of traffic in and out of the the GitHub Enterprise appliance.
Clients
Breaks down the number of clients per TCP port, which is useful for examining how users are interacting with GitHub Enterprise.
Sockets
Further details on the TCP connection state, which can be useful in troubleshooting network or Load Balancer issues in some cases.
Interface Throughput
The amount of data transferred inbound and outbound from the GitHub Enterprise appliance.
TX (outbound) traffic is most commonly higher than RX (inbound), especially when many systems are "polling" the API or Git repositories for changes.
Plateaus in this graph can be an indication of link saturation or reaching the maximum possible link throughput.
Interface Errors
The presence of any errors may indicate a problem with the physical or virtual network card, or cables connected to the Hypervisor host system.
Replication Throughput
The amount of data sent to, and received by replica instances over the internal OpenVPN interface.
Replication Interface Errors
Errors may occur here due to saturation or MTU problems on the physical link; however, these are generally not critical errors.
Storage
GitHub Enterprise repository performance is very dependent on the underlying storage system. Low latency, local SSD disks provide the highest performance. For more information on the GitHub Enterprise storage architecture, please see the System Overview guide on our documentation site.
Disk usage (Root Device)
Disk space in bytes available for root volume storage.
Growth on this volume is generally due to logging, which is on a 24 hour rotation schedule.
The root volume reaching 100% usage can cause a system outage, or indicate a service issue which is causing extreme log growth.
Disk usage (Data Device dm-0)
Disk space in bytes available for the user data volume.
All user profile data, pull request and issue metadata, repositories, and release assets are stored on this device.
The data volume reaching 85% usage will cause problems with the built in search functionality of GitHub Enterprise. It is recommended to increase storage capacity of the data volume prior to reaching 85% usage.
Disk latency (Root Device & Data Device dm-0)
For best IO performance, average latency values below 10ms are recommended.
Large spikes may be an indication of storage system saturation.
Disk operations (Root Device)
Abnormally large amounts of time spent in root IO suddenly appearing may indicate a logging issue, or a general storage problem.
Disk operations (Data Device dm-0)
Abnormally large amounts of time spent in data volume IO suddenly appearing may indicate a repository maintenance issue, or a general storage problem.
Graph for reads trends generally follows the pattern of Git fetch or clone traffic on the system.
Disk pending operations (root Device)
Pending disk operations on the root device may indicate storage system saturation for the root volume.
Disk pending operations (Data Device dm-0)
Pending disk operations on the data device may indicate storage system saturation for the data volume.
Disk traffic (Root Device)
Write traffic on the root volume is mostly due to logging and collectd graph data collection.
Read traffic on the root volume is typically very low; However, support bundle generation may cause temporary spikes.
Disk traffic (Data Device dm-0)
Read and write trends depend on user and integration activity.
Plateaus in this graph may indicate storage system saturation.
Continue the conversation
There's more to come in the "Understanding your graphs" mini-series. If you'd like to follow along, just subscribe to the "Understanding your graphs" label (link below). Please let us know if you have any questions in the comments.
... View more

In part 3 of our 'Understanding your graphs' mini-series, we talked about GitHub Enterprise Authentication graphs . In part 4, we're going to talk about GitHub Enterprise Application server and background job graphs.
App servers
The application servers section provides insight into the activity of GitHub Enterprise services which provide data to users, or integrations.
Sessions
Profile of active sessions connected to GitHub Enterprise backend services. This graph provides a summary of the volume and type of activity from users.
Web unicorns sessions are often the largest portion of this graph, as users interact via the Web UI and API.
Errors
High error rates may indicate a problem with a service, or potential saturation due to request volume.
Please reach out to GitHub Business Support if you regularly encounter errors on this graph.
Active Workers
Service workers which are currently serving a request.
User and integration daily activity trends are very visible in this graph.
Plateaus for extended periods in this graph indicate worker saturation, and should be investigated for any request queueing.
Worker counts automatically scale with system memory size at boot.
Queued Requests
Values in this graph indicate that requests were required to wait for a worker process to become available before it was able to process and serve the request.
If requests are constantly queuing, users will notice delays in responsiveness, as well as encounter errors or timeouts more frequently.
Queued requests occurring regularly is a major indicator of an undersized appliance for the amount of incoming requests.
App request/response
The Application request / response section looks at the rate of requests, how quickly those requests are responded to, and with what status they returned.
Throughput
Per minute request counts, broken down by type.
API is typically the highest on systems with many integrations or active CI and project management tools.
Response time
Reflects the speed of web requests at the 90th percentile in milliseconds.
Times of over a few seconds can indicate a poor user experience due to long browser load times, or slow API responses.
CPU Time
Time spent in Ruby garbage collection within the GitHub Enterprise web application.
Plateaus for extended periods of GC time may indicate a problem with the GitHub Enterprise application itself.
I/O Time
Time spent accessing disk IO by data services which GitHub Enterprise depends on.
Plateaus for extended periods of time may indicate system resource saturation.
Response Code
The number of responses per HTTP status code.
2xx successful status codes will normally be the largest.
401 Unauthorized codes will also be present in environments where API and Git over HTTP traffic is present, as initial requests from clients may not provide authentication headers.
500 statuses indicate a potential issue with the GitHub Enterprise application, and should be investigated with support.
Errors
Represents the number of application exceptions generated per minute.
High rates of errors may indicate an issue impacting the GitHub Enterprise application.
Background jobs
Number of tasks queued for background processing on the GitHub Enterprise appliance.
Resque
Many user and application actions trigger jobs which run asynchronously on GitHub Enterprise, and are queued to be processed by resqued.
Workers which process the maint_git-serv queues are paused during GitHub Enterprise Backup Utilities snapshot runs. It is normal to see the number for this queue increase while a snapshot is in progress. The queue should then drain rather quickly once the snapshot run is complete.
As there are a finite number of resque worker processes, queues which never drain to 0 may indicate resource saturation or in some cases jobs which have gotten stuck, requiring manual intervention to clear.
Many queues simultaneously having hundreds or thousands of jobs pending can indicate resource saturation. Queue length can also be inspected from the SSH admin console by running ghe-resque-info .
E-mail
When E-mail for notifications is enabled, this graph displays the length of the onboard postfix mail queues.
High numbers of deferred E-mail messages may indicate a problem with the configured SMTP server, or failures in mail delivery to specific user E-mail addresses.
Continue the conversation
There's more to come in the "Understanding your graphs" mini-series. If you'd like to follow along, just subscribe to the "Understanding your graphs" label (link below). Please let us know if you have any questions in the comments.
... View more

In part 2 of our 'Understanding your graphs' mini-series, we talked about GitHub Enterprise Process graphs. In part 3, we'll dive into GitHub Enterprise Authentication graphs.
Authentication
The authentication graphs break down the rates at which users and applications are authenticating to the GitHub Enterprise appliance. We also track the protocol or service type such as Git or API for the authentications, which is useful in identifying broad user activity trends. The authentication graphs can help find interesting trends or timeframes to look at when diving deeper into authentication and API request logs.
Authentication Totals
Displays which methods users are authenticating with, and if they are successful in those attempts.
Large numbers of failures usually indicate misconfigured clients which are failing repeatedly.
Authentication Rate
Large numbers of authentications per second can cause authentication worker saturation.
Automated requests or "polling" can be identified by a flat baseline, or intervals of authentications which occur regularly, even during off-peak times such as weekends or holidays.
Human user authentication trends typically follow a bell curve, more closely matching your organization's daily business hours.
LDAP
LDAP graphs will only display data if LDAP Authentication is enabled on the GitHub Enterprise appliance. These graphs can help to identify slow responses from your LDAP server, as well as the overall volume of LDAP password based authentications.
LDAP authentications
If any timeouts appear in the graph, GitHub Enterprise was unable to communicate with the LDAP server in time for an authentication request to take place.
Failures indicate that users or clients are attempting to authenticate with an invalid LDAP username or password.
Using a Personal Access Token authentication instead of username and password for users can help reduce the number and frequency of requests which rely on the LDAP server.
LDAP authentication response time
Useful for tracking LDAP server performance trends, from the perspective of the GitHub Enterprise appliance.
LDAP responses which take longer than 10 seconds will result in a timeout for the authentication request.
LDAP Sync Totals
Reflects the number of user , team , and net new_members records which were synchronized via the LDAP Synchronization feature, when the feature is enabled.
LDAP Sync Runtime
If the runtime of team or user sync cycles exceeds the current LDAP Sync interval, the interval should be increased to allow completion before the next cycle.
Long run times may indicate poor LDAP server performance, or suboptimal configuration of Domain Bases and restricted groups.
Continue the conversation
There's more to come in the "Understanding your graphs" mini-series. If you'd like to follow along, just subscribe to the "Understanding your graphs" label (link below). Please let us know if you have any questions in the comments.
... View more

Understanding your graphs part 2 - Processes
In part 1 of our 'Understanding your graphs' mini-series, we talked about GitHub Enterprise System Health graphs. We'll now look at GitHub Enterprise Processes graphs.
Processes
The processes graph section looks deeper into the major individual services which make up the GitHub Enterprise appliance. Looking at these services individually can show how usage trends impact system resources over time.
Processes
Process counts will fluctuate with usage trends.
The longpoll process will often have the highest average, with more peaks and valleys reflecting daily web UI usage trends.
Memory
The unicorn and resque process groups normally consume the highest amount of memory, followed by memcached , mysql , and elasticsearch .
resqued , babeld , and git-daemon processes are most influenced by user activity.
Elements such the size of repositories interacted with and frequency of requests. Because of this, these process graphs can have peaks and valleys.
CPU (Kernel)
CPU time consumed by processes accessing hardware directly, via trusted lower level operating system functions.
The unicorn process normally consumes the most CPU time.
Often has lower values than the following CPU (Application) graph.
CPU (Application)
CPU time consumed by processes via Linux kernel interfaces.
The majority of GitHub Enterprise service CPU time occurs here.
On busy systems, unicorn consumes the most CPU time, followed by babeld , git , git-daemon , and resque .
I/O operations (Read IOPS)
git-daemon , and babeld read Input/Output Operations Per Second (IOPS) values are influenced by Git fetch / pull activity.
unicorn read IOPS are influenced by web application or API GET requests.
resque read IOPS are from background jobs, such as regular repository maintenance and repacking.
I/O operations (Write IOPS)
git-daemon write IOPS reflect Git push activity and is most often the largest consumer of write IOPS.
resque background jobs, such as search indexing, and repository repacking are also be a large consumer of write IOPS
Storage traffic (Read)
Read throughput trends are a counterpart to Read IOPS. These values used together can help determine if storage system read performance is as expected.
User activity such as fetching, and retrieving API data will result in read activity.
Storage traffic (Written)
Write throughput trends are a counterpart to Write IOPS. These values used together can help determine if storage system read performance is as expected.
Pushes, background repacks, and API POST operations are often the largest influencers of this graph.
Continue the conversation
There's more to come in the "Understanding your graphs" mini-series. If you'd like to follow along, subscribe to the "Understanding your graphs" label. Please let us know if you have any questions in the comments!
... View more

Understanding your graphs part 1 - System Health
A GitHub Enterprise virtual appliance consists of individual services, configured to run on a customized Linux operating system. Monitoring the system resources such as CPU and Memory (RAM), along with GitHub application and system service metrics can help GitHub Enterprise administrators to identify performance bottlenecks, or unusual activity trends.
The GitHub Enterprise Management Console includes a Monitor dashboard located at http(s)://[hostname]/setup/monitor . This dashboard displays graphs created with data gathered by the built in collectd service. Data used in the graphs is sampled every 10 seconds.
Each graph has an informational tooltip describing the graph, which is accessible by hovering over or clicking on the i in the upper left corner of each graph.
Graph data can also be forwarded to an external receiver, by enabling Collectd forwarding within the GitHub Enterprise Management Console. This allows building customized dashboards and alerts for your GitHub Enterprise graph data.
This article series will explore what each of the dashboard sections cover and what specific graph trends to watch out for. As each GitHub Enterprise system is unique in user patterns and integrations, we encourage administrators to reach out to the GitHub Enterprise support team to assist with interpreting your specific instance's monitor graphs if questions arise. The graph data is included within appliance Support Bundles which can be shared with our support team.
System Health
The system health graphs provide a general overview of services and system resource utilization. CPU, Memory, and Load Average graphs are useful for identifying trends or times where provisioned resource saturation has occurred.
CPU
Abnormally high CPU utilization, or prolonged spikes can mean your instance is under-provisioned.
In the above example, the CPU was nearly 100% consumed by user for a period of time.
Presence of CPU "steal" time on the CPU graph can be an indication that other virtual machines running on the same host system are saturating the underlying resources, causing the GitHub Enterprise system to wait for CPU cycles.
User and System are generally the largest consumers of CPU time.
Memory
The Linux Kernel provides a layer of in memory disk caching, which is represented by the "cache" on the graph. It is perfectly normal, and recommended to have at least a few GB of cache overhead. The system will attempt to cache as much as possible, but applications can take this memory on demand. Because of this, we consider the total amount of available memory to be the sum of "cached" and "free" values.
Running out of available free + cached memory can lead to out of memory (OOM) events, causing services to terminate and unexpected application behavior.
Load
System Load Average is a measurement showing the running task demand on the system.
We recommend monitoring the fifteen minute longterm system load average for values nearing or exceeding the number of CPU cores allocated to the virtual machine.
When the load average rises above the number of CPU cores, it generally means that tasks are needing to wait for resources before they can run.
Assuming the above example graph is a GitHub Enterprise system with 2 CPU cores, we can determine that processes are often waiting for resources.
Processes
By clicking on running in the legend at the bottom of the graph, we can isolate different process states. In the above example we have selected running processes.
The running process count will fluctuate with system activity. Sharp changes or drops could be expected depending on usage trends.
Large or consistent numbers of blocked or zombie processes may indicate a service problem.
It is expected to have processes in the sleeping state during normal operation.
Files
This graph represents the max number of open files, as well as the current number of used open files.
On a healthy system, the number of used files should never reach the max value. Reaching the max can indicate problems with a GitHub Enterprise service.
Limiting maximum open files is a protection built into Linux to prevent runaway processes from impacting other services on the system.
Forks
The fork_rate trend greatly depends on system activity, and will reach values upwards of 1000-2000 on busy systems.
Large spikes beyond the observed averages should be investigated.
Continue the conversation
There's more to come in the "Understanding your graphs" mini-series. If you'd like to follow along, just subscribe to the "Understanding your graphs" label (link below). Please let us know if you have any questions in the comments.
... View more

Hey Robert, I noticed that your repo doesn't have a `gh-pages` branch, have you set a branch as the source in the repo settings located at https://github.com/rbhamill/roberthamill.design/settings ?
... View more