# Econometric models for count data with an application to the patents d relationship

### Applying Fixed Effects Panel Count Model to Examine Road Accident Occurrence

The relationship between patent applications and R&D expenditure is a prominent relationship by estimating three different panel count data models. and staff at the SFU Economics Department, for their assistance .. where dl are firm specific dummies, a, (= exp(d,)) are the individual specific effect, 4. Patents and R&D An Econometric Investigation Using Applications for German, Using various count data models, the paper explores the relationship between. Hausman J., Hall B. and Griliches Z. (), Econometric models for count data with an application to patents-R& D relationship, Econometrica, –

Recent contribution on the application of panel count model can also be seen in varieties of application Montalvo, ; Cincera, Following the development of panel count model by Hausman et al.

Hsiao and Karlaftis and Tarko highlighted the advantages of using panel count model over analysis based on cross section or time series data alone. In addition, this method can also account for possible existence of heterogeneity in count data. Some of the studies reviewed here had provided guidance on relevant explanatory variables for road accident. Fridstrom and Ingebrigsten applied the Generalized Poisson Regression model assumption using pooled time series and cross section data on monthly personal injury road accidents and their severity for 18 Norwegian counties, covering the period to The explanatory variables involved in their study included road use exposureweather, daylight, traffic density, road investment and maintenance expenditure, accident reporting routines, vehicle inspection, law enforcement, seat belt usage, proportion of inexperienced drivers and alcohol sales.

Their results showed that the expected number of casualties was positively associated with the amount of rainfall. While the more congested the road network in a given county resulted in fewer casualties. Karlaftis and Tarko addressed the importance of the issue of heterogeneity in panel data analysis when developing models to estimate crashes for urban and suburban arterial of 92 counties in Indiana for the period of to The fixed effects model for both Poisson and negative binomial was adopted to develop the panel model.

Their findings indicated increased in the number of accidents are associated with higher vehicle miles travelled, population, the proportion of city mileage and the proportion of urban roads in total vehicle miles travelled. Noland applied the fixed effects negative binomial regression models to analyze the impact of road infrastructure variables on traffic-related fatalities and injuries using the panel data for 50 US states from the period to In order to account for heterogeneity, the researcher used the fixed effects over dispersion model assumption by conditioning the joint probability of the count for each group upon the sum of the counts for the group.

The results of this study rejected the hypothesis that the infrastructure improvements had been effective in reducing total fatalities and injuries. Chin and Quddus examined the relationship between traffic accident frequencies and the geometric, traffic and regulatory control characteristics of a total of 52 four-legged signalized intersections.

They developed the Random Effects Negative Binomial RENB panel model to take into account the location-specific effects and serial correlation in time of the accident counts. The findings indicated that the total approach volumes, right-turn volumes, the uncontrolled left-turn lane, median width above 2 m, the presence of bus stop, intersection sight distance together with the presence of surveillance camera and the number of phases per cycle were highly associated with higher total accident occurrence.

Kweon and Kockelman investigated the safety effects of the speed limit increase on crash count measure in Washington State using the panel regression methods of both the fixed-effects and random effects model assumption.

To develop models for the number of fatalities, injuries, crashes, fatal crashes, injury crashes, and Property-Damage-Only PDO crashes, the initial standard Poisson models for pooled count data were used followed by the more complex panel model of fixed-effect and random effects Poisson and negative binomial model specification.

The estimated model was based on the conditional likelihood estimation process as described in Hausman et al. The findings showed that the fixed-effects negative binomial model performed best for injury count while for fatalities, injuries, PDO crashes and total crashes, the pooled negative binomial model was a better choice.

Noland and Quddus presented an analysis on a panel of regional pedestrian and bicycle road accident for 20 years covering 11 region of United Kingdom. They used the fixed and random effects negative model derived by Hausman et al. Their results revealed that high pedestrian and bicycle road accident were associated with lower income areas, increases in road network, increases in alcohol expenditure and total population.

Kumara and Chin found that road network per capita, gross national product, population and number of registered vehicles were positively associated with fatal accident occurrence. They analyzed accident data from the period of to across 41 countries in the Asia Pacific region.

### Dirichlet negative multinomial regression for overdispersed correlated count data

By using the fixed effects negative binomial panel model, they found that socio economic and infrastructure factors had significant effects on fatal accidents. Recently, Law et al. They applied the fixed effects negative binomial model on a panel of 25 countries from the period of to Their findings revealed that implementation of road safety regulation, improvement in the quality of political institution and medical care and technology developments showed significant effect in reducing motorcycle death.

A simple pooled Poisson model was first employed in this study, given the non-negative discrete nature of accident data. The basic Poisson probability specification is: However, the property for time independence permits a possible weakness of the serial correlation of residuals in the model specification that needs further investigation. Nevertheless, this model will be used as a benchmark model: In order to account for differences among cross-sectional unit, two basic approaches can be used by negative binomial model; using fixed effects or random effects.

The random effects model assumed the individual effects are independent, identically distributed and uncorrelated with the observed effect. This method will produce inconsistent estimator if the unobserved individual heterogeneity specific effect is correlated with explanatory variables.

In which the fixed effects estimators will be more appropriate. Fixed effects poisson model: Suppose that we have panel count data for i states, each state observed a total of Ti times Hu, The conditional maximum likelihood method is used to estimate the model parameters Hausman et al. As yit follows the Poisson distribution, the sum of accidents: The parameters will be estimated by obtaining the joint distribution of yi1, According to Hausman et al.

In this study, the number of road accidents are monthly figures for individual state with defined socioeconomic and development characteristics. Thus, when cross sectional heterogeneity exists, the fixed effects model is more appropriate. Based on the fixed effects negative binomial formulation of Hausman et al. To illustrate the fixed effect negative binomial model, Hausman et al. This model allows the variance to be greater than mean.

To show this, let: The estimation of vector is done using the maximum likelihood method by maximizing the log likelihood function given by: In the development of statistical models, it is important to decide whether one model is significantly better than another when additional explanatory variable is added or excluded from the model.

The quality of model goodness of fit between the fitted and the observed values yi were measured using various statistics.

### References :: SAS/ETS(R) User's Guide

To confirm on the suitability of the fitted model, the Akaike information criterion AIC is first applied to identify the best model among various model examined. The data used in this study comprised of the monthly Malaysian road accident data from January to December The number referred to the total number of vehicles private and public that was registered with the Department of Road Transport.

It included all vehicles using either petrol or diesel which are motorcycles, motorcars, buses, taxis and hired cars, goods vehicle and other vehicles. However, the figure did not include army vehicle. While, data for climatic variables were collected from the Malaysian Metereologist Department.

- The COUNTREG Procedure
- Announcement

The climate factor considered is the amount of monthly rainfall in millimeter for a particular month.

The data were captured from 14 selected weather stations in the respective 14 states. This factor had been most commonly used in previous work in road safety modeling literature Fridstrom et al. Another climate factor considered was the number of rainy days in days for a particular month. The data were also captured from the same 14 weather stations. The number of rainy days was also considered in previous studies of its relationship with road accident crash Fridstrom et al.

Graphical analysis was done to observe the relationship between the monthly number of road accidents and vehicle volume for 14 states for each of twelve years from to Figure 1 provides a plot of original data on the number of accidents against the vehicle volumes using data for all states in all years. Inspection on the relationship between the number of accidents occurrence and the vehicles volume suggest a non-linear relationship. In our study of trial recruitment, we were interested in intervention effects on the population as a whole, rather than the recruitment performance of individual MDTs.

An additional advantage of marginal modeling, especially in contrast to models with subject-specific conditional explanatory variable effects, is that it has been shown to offer some degree of robustness for the estimation of regression parameters when departure from the underlying assumed random effect structure occurs Heagerty and Kurland, One way of employing a marginal model would be to make use of generalized estimating equation GEE methods Solis-Trapala and Farewell, However, GEE methods do not provide direct information on sources of variation, and there is a cost in efficiency compared with parametric maximum-likelihood estimationQ4.

Bearing this in mind, alternatives were investigated, and an analysis based on the Dirichlet negative multinomial distribution appeared to be potentially useful. The Dirichlet negative multinomial distribution is a discrete multivariate distribution having support on the non-negative integers, and has been well characterized by MosimannLeeds and Gelfandand Johnson and others Q5.

However, the typical formulation, in terms of a negative multinomial distribution compounded by a Dirchlet distribution, is unappealing in a regression context; we provide a more natural random effects description.

A parameterization for this adaptation is given in the following sections and an illustrative analysis of the motivating data is provided subsequently. We also discuss the results of a small simulation study that explores model robustness and finite-sample behavior.

The resulting distribution of y1,…,ym is then called Dirichlet more generically, compound negative multinomial. As shown by Mosimannthe joint probability mass function of y1,…,ym so-distributed is 2.

## Dirichlet negative multinomial regression for overdispersed correlated count data

Heuristically, these constraints can be thought of as placing upper bounds on the variance of the underlying Dirichlet distribution.

An informative alternative to the foregoing, traditional, derivation of the Dirichlet negative multinomial distribution proceeds as follows. This three-level formulation results in the same Dirichlet negative multinomial distribution as in 2. Regression formulation Mosimann reports the mean of the Dirichlet negative multinomial distribution, but omits its derivation.

Our alternative, random effects formulation of the Dirichlet negative multinomial provides a direct route: Mosimann also reports variances and covariances of the Dirichlet negative multinomial distribution, and these can similarly be derived from the random effects formulation.

Additionally, we should like the remaining two parameters to convey something meaningful about the variance structures in the model. A Dirichlet negative multinomial distribution with parameters is therefore a candidate regression model for correlated count data. Under the regression parameterization, the covariance of two observations yj and yk within the same unit takes the form 3.

For observations on multiple individuals, the full likelihood function will be defined in the usual way as a product over all Dirichlet negative multinomial observations of terms of the form 2.

Expressions for likelihood derivatives are provided in supplemental electronic material available at Biostatistics onlinetogether with R code to fit the foregoing regression model. Interpretation The Dirichlet negative multinomial distribution has been used most extensively when modeling the allocation of an unknown number of items such as product purchases to a known set of categories such as individual brandsas for example in Goodhardt and others In such contexts formulating a model in terms of a negative binomial total with Dirichlet multinomial category probabilities is sensible and interpretable.

However, when used as a model for longitudinal more generically, repeated measures count data, the concept of category probabilities has no substantive meaning: Neither is it natural to extend the notion of categories to regression on other variables, and Goodhardt and others provide no such extension.